Recent changes to this wiki:
reporting an annoying registerurl issue not registering a URL
diff --git a/doc/bugs/registerurl_does_not_register_if_external_remote.mdwn b/doc/bugs/registerurl_does_not_register_if_external_remote.mdwn new file mode 100644 index 0000000000..88c60d050c --- /dev/null +++ b/doc/bugs/registerurl_does_not_register_if_external_remote.mdwn @@ -0,0 +1,104 @@ +### Please describe the problem. + +Reference: issue/discovery in [repronim/containers while adding neurodesk images](https://github.com/ReproNim/containers/issues/64#issuecomment-1492256561) + +- apparently we had no URLs made registered with images despite running `registerurl KEY ANNEX` +- some images do have urls + +took awhile to grasp what is going on and then I found an unfinished reproducer from `Mar 15 2021 annex-claimurl.sh` without recollection why I have not finished it, but it seems that it might be "operator error" somehow? but seems unlikely... might be datalad special remote bug? + +Summary of the problem: if there is an external git-annex-remote which CLAIMURL - git-annex registerurl does **not** associate that URL with any (that external or web) remote and thus does not make that key available to the user despite knowing the url. + +Should it btw default to `web` if no remote is associated with it? + +Filed complimentary [registerurl --remote REMOTE](https://git-annex.branchable.com/todo/registerurl_--remote_REMOTE/) TODO since in this case I would have preferred to just register against web remote. + +### What steps will reproduce the problem? + +Here is a new "quick" reproducer but you need datalad being installed to get `git-annex-remote-datalad`. + +``` +#!/bin/bash + +export PS4='> ' + +set -eu +set -x + +cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)" + +git init +git annex init + +# It works fine if we do not enable datalad special remote! +# so it is something about interaction there +git annex initremote datalad externaltype=datalad type=external encryption=none autoenable=true uuid=65b6c36b-debd-4a23-8fa3-675cbd200496 +git annex enableremote datalad + +git annex info + +# so it seems that addurl does it right +git annex addurl --debug --file 123.dat http://www.oneukrainian.com/tmp/123.dat + +# but if I do via registerurl -- not quite so +echo 124 > 124.dat +git annex add 124.dat +key=$(readlink -f 124.dat | xargs basename) +git annex registerurl --debug "$key" http://www.oneukrainian.com/tmp/124.dat + +git commit -m 'added those two files with urls' + +git annex whereis --debug 123.dat +git annex whereis --debug 124.dat + +git checkout git-annex +: # URLs are known for both +git grep oneukrainian +: # but only 123.dat would be associated with datalad remote +git grep 65b6c36b-debd-4a23-8fa3-675cbd200496 +``` + +With [full log here](http://www.oneukrainian.com/tmp/annex-claimurl-2023.sh.log) and without `--debug` ending up like + +``` +❯ grep -v '^\[' annex-claimurl-2023.sh.log | tail -n 29 +(recording state in git...) +> git commit -m 'added those two files with urls' + 2 files changed, 2 insertions(+) + create mode 120000 123.dat + create mode 120000 124.dat +> git annex whereis --debug 123.dat +whereis 123.dat [2023-03-31 18:29:27.56573965] (Utility.Process) process [1429290] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"] +(2 copies) + 62c53770-5274-40d4-a45a-de308c234ea9 -- yoh@bilena:~/.tmp/dl-FbOrptq [here] + 65b6c36b-debd-4a23-8fa3-675cbd200496 -- [datalad] + + datalad: http://www.oneukrainian.com/tmp/123.dat +ok +> git annex whereis --debug 124.dat +whereis 124.dat [2023-03-31 18:29:27.857735575] (Utility.Process) process [1429322] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"] +(1 copy) + 62c53770-5274-40d4-a45a-de308c234ea9 -- yoh@bilena:~/.tmp/dl-FbOrptq [here] +ok +> git checkout git-annex +Switched to branch 'git-annex' +> : +> git grep oneukrainian +060/68b/SHA256E-s4--ca2ebdf97d7469496b1f4b78958f9dc8447efdcb623953fee7b6996b762f6fff.dat.log.web:1680301767.477711756s 1 :http://www.oneukrainian.com/tmp/124.dat +ae1/21c/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat.log.web:1680301767.037966322s 1 :http://www.oneukrainian.com/tmp/123.dat +> : +> git grep 65b6c36b-debd-4a23-8fa3-675cbd200496 +ae1/21c/SHA256E-s4--181210f8f9c779c26da1d9b2075bde0127302ee0e3fca38c9a83f5b1dd8e5d3b.dat.log:1680301767.038748415s 1 65b6c36b-debd-4a23-8fa3-675cbd200496 +remote.log:65b6c36b-debd-4a23-8fa3-675cbd200496 autoenable=true encryption=none externaltype=datalad name=datalad type=external timestamp=1680301766.517251391s +uuid.log:65b6c36b-debd-4a23-8fa3-675cbd200496 datalad timestamp=1680301765.789226249s +``` + +so - both keys have urls, but only 123.dat one is associated with datalad special remote, and only it has url reported by whereis + +### What version of git-annex are you using? On what operating system? + +10.20230126 but tried with older 8.20210803 since thought it must be regression -- the same result + + +[[!meta author=yoh]] +[[!tag projects/repronim]]
initial todo for adding --remote to registerurl
diff --git a/doc/todo/registerurl_--remote_REMOTE.mdwn b/doc/todo/registerurl_--remote_REMOTE.mdwn new file mode 100644 index 0000000000..45219a7394 --- /dev/null +++ b/doc/todo/registerurl_--remote_REMOTE.mdwn @@ -0,0 +1,5 @@ +ATM `registerurl` would consider external special remotes via CLAIMURL and might then associate that URL with a specific remote. +If a remote can handle some urls (e.g. regular http) which annex can handle as well, but user wants to not associate url with that special remote, having this dedicated option would be great to have. + +[[!meta author=yoh]] +[[!tag projects/repronim]]
Added a comment
diff --git a/doc/forum/How_to_recover_from_failed_branch_updates__63__/comment_2_0edc7239488f78345f9e624ef210ebac._comment b/doc/forum/How_to_recover_from_failed_branch_updates__63__/comment_2_0edc7239488f78345f9e624ef210ebac._comment new file mode 100644 index 0000000000..252e2f7c2e --- /dev/null +++ b/doc/forum/How_to_recover_from_failed_branch_updates__63__/comment_2_0edc7239488f78345f9e624ef210ebac._comment @@ -0,0 +1,25 @@ +[[!comment format=mdwn + username="gioele@678b7c03f524f2669b179b603f65352fcc16774e" + nickname="gioele" + avatar="http://cdn.libravatar.org/avatar/366dbda84e78aff8a8a070622aeb63ce" + subject="comment 2" + date="2023-03-31T20:16:04Z" + content=""" +Thanks! + +For the record, this is what git will say after `git checkout \"adjusted/master(unlockpresent)\"` + +``` +Warning: you are leaving 1 commit behind, not connected to +any of your branches: + + 9d92415fb git-annex adjusted branch + +If you want to keep it by creating a new branch, this may be a good time +to do so with: + + git branch <new-branch-name> 9d92415fb + +Switched to branch 'adjusted/master(unlockpresent)' +``` +"""]]
Sped up sqlite inserts 2x when built with persistent 2.14.5.0
https://github.com/yesodweb/persistent/issues/1457
Sponsored-by: Dartmouth College's DANDI project
https://github.com/yesodweb/persistent/issues/1457
Sponsored-by: Dartmouth College's DANDI project
diff --git a/CHANGELOG b/CHANGELOG index 699331bfc3..ff8ffdeb2a 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -2,6 +2,7 @@ git-annex (10.20230330) UNRELEASED; urgency=medium * git-annex.cabal: Prevent building with unix-compat 0.7 which removed System.PosixCompat.User. + * Sped up sqlite inserts 2x when built with persistent 2.14.5.0 -- Joey Hess <id@joeyh.name> Fri, 31 Mar 2023 12:48:54 -0400 diff --git a/Database/ContentIdentifier.hs b/Database/ContentIdentifier.hs index aa595a98ce..e304dca58f 100644 --- a/Database/ContentIdentifier.hs +++ b/Database/ContentIdentifier.hs @@ -36,6 +36,7 @@ module Database.ContentIdentifier ( import Database.Types import qualified Database.Queue as H import Database.Init +import Database.Utility import Annex.Locations import Annex.Common hiding (delete) import qualified Annex.Branch @@ -109,7 +110,7 @@ flushDbQueue (ContentIdentifierHandle h) = H.flushDbQueue h -- Be sure to also update the git-annex branch when using this. recordContentIdentifier :: ContentIdentifierHandle -> RemoteStateHandle -> ContentIdentifier -> Key -> IO () recordContentIdentifier h (RemoteStateHandle u) cid k = queueDb h $ do - void $ insertUnique $ ContentIdentifiers u cid k + void $ insertUniqueFast $ ContentIdentifiers u cid k getContentIdentifiers :: ContentIdentifierHandle -> RemoteStateHandle -> Key -> IO [ContentIdentifier] getContentIdentifiers (ContentIdentifierHandle h) (RemoteStateHandle u) k = @@ -132,7 +133,7 @@ getContentIdentifierKeys (ContentIdentifierHandle h) (RemoteStateHandle u) cid = recordAnnexBranchTree :: ContentIdentifierHandle -> Sha -> IO () recordAnnexBranchTree h s = queueDb h $ do deleteWhere ([] :: [Filter AnnexBranch]) - void $ insertUnique $ AnnexBranch $ toSSha s + void $ insertUniqueFast $ AnnexBranch $ toSSha s getAnnexBranchTree :: ContentIdentifierHandle -> IO Sha getAnnexBranchTree (ContentIdentifierHandle h) = H.queryDbQueue h $ do diff --git a/Database/Export.hs b/Database/Export.hs index b5c58afd0b..4e01752d9b 100644 --- a/Database/Export.hs +++ b/Database/Export.hs @@ -49,6 +49,7 @@ module Database.Export ( import Database.Types import qualified Database.Queue as H import Database.Init +import Database.Utility import Annex.Locations import Annex.Common hiding (delete) import Types.Export @@ -124,7 +125,7 @@ flushDbQueue (ExportHandle h _) = H.flushDbQueue h recordExportTreeCurrent :: ExportHandle -> Sha -> IO () recordExportTreeCurrent h s = queueDb h $ do deleteWhere ([] :: [Filter ExportTreeCurrent]) - void $ insertUnique $ ExportTreeCurrent $ toSSha s + void $ insertUniqueFast $ ExportTreeCurrent $ toSSha s getExportTreeCurrent :: ExportHandle -> IO (Maybe Sha) getExportTreeCurrent (ExportHandle h _) = H.queryDbQueue h $ do @@ -136,7 +137,7 @@ getExportTreeCurrent (ExportHandle h _) = H.queryDbQueue h $ do addExportedLocation :: ExportHandle -> Key -> ExportLocation -> IO () addExportedLocation h k el = queueDb h $ do - void $ insertUnique $ Exported k ef + void $ insertUniqueFast $ Exported k ef let edirs = map (\ed -> ExportedDirectory (SFilePath (fromExportDirectory ed)) ef) (exportDirectories el) @@ -186,7 +187,7 @@ getExportTreeKey (ExportHandle h _) el = H.queryDbQueue h $ do addExportTree :: ExportHandle -> Key -> ExportLocation -> IO () addExportTree h k loc = queueDb h $ - void $ insertUnique $ ExportTree k ef + void $ insertUniqueFast $ ExportTree k ef where ef = SFilePath (fromExportLocation loc) diff --git a/Database/Fsck.hs b/Database/Fsck.hs index 61e932e3da..cccefefeda 100644 --- a/Database/Fsck.hs +++ b/Database/Fsck.hs @@ -29,6 +29,7 @@ module Database.Fsck ( import Database.Types import qualified Database.Queue as H +import Database.Utility import Database.Init import Annex.Locations import Utility.Exception @@ -88,7 +89,7 @@ closeDb (FsckHandle h u) = do addDb :: FsckHandle -> Key -> IO () addDb (FsckHandle h _) k = H.queueDb h checkcommit $ - void $ insertUnique $ Fscked k + void $ insertUniqueFast $ Fscked k where -- Commit queue after 1000 changes or 5 minutes, whichever comes first. -- The time based commit allows for an incremental fsck to be diff --git a/Database/Handle.hs b/Database/Handle.hs index c960375877..da7a0e173a 100644 --- a/Database/Handle.hs +++ b/Database/Handle.hs @@ -5,7 +5,7 @@ - Licensed under the GNU AGPL version 3 or higher. -} -{-# LANGUAGE TypeFamilies, FlexibleContexts, OverloadedStrings #-} +{-# LANGUAGE TypeFamilies, FlexibleContexts, OverloadedStrings, CPP #-} module Database.Handle ( DbHandle, @@ -329,4 +329,3 @@ isDatabaseModified (DatabaseInodeCache a1 b1) (DatabaseInodeCache a2 b2) = takeMVarSafe :: MVar a -> IO (Either BlockedIndefinitelyOnMVar a) takeMVarSafe = try . takeMVar - diff --git a/Database/Keys/SQL.hs b/Database/Keys/SQL.hs index c97a4280b9..e190c90ab0 100644 --- a/Database/Keys/SQL.hs +++ b/Database/Keys/SQL.hs @@ -21,6 +21,7 @@ module Database.Keys.SQL where import Database.Types import Database.Handle +import Database.Utility import qualified Database.Queue as H import Utility.InodeCache import Git.FilePath @@ -121,7 +122,7 @@ removeAssociatedFile k f = queueDb $ addInodeCaches :: Key -> [InodeCache] -> WriteHandle -> IO () addInodeCaches k is = queueDb $ - forM_ is $ \i -> insertUnique $ Content k i + forM_ is $ \i -> insertUniqueFast $ Content k i (inodeCacheToFileSize i) (inodeCacheToEpochTime i) diff --git a/Database/Utility.hs b/Database/Utility.hs new file mode 100644 index 0000000000..55943fcc89 --- /dev/null +++ b/Database/Utility.hs @@ -0,0 +1,27 @@ +{- Persistent sqlite database utilities. + - + - Copyright 2023 Joey Hess <id@joeyh.name> + - + - Licensed under the GNU AGPL version 3 or higher. + -} + +{-# LANGUAGE TypeFamilies, CPP #-} +{-# OPTIONS_GHC -fno-warn-missing-signatures #-} + +module Database.Utility ( + insertUniqueFast, +) where + +import Control.Monad +import Database.Persist.Class + +{- insertUnique_ is 2x as fast as insertUnique, so use when available. + - + - It would be difficult to write the type signature here, since older + - versions of persistent have different constraints on insertUnique. + -} +#if MIN_VERSION_persistent(2,14,5) +insertUniqueFast x = void (insertUnique_ x) +#else +insertUniqueFast x = void (insertUnique x) +#endif diff --git a/doc/bugs/performance_regression__63___init_takes_times_more/comment_18_13ce5ec87207d553388ec23663d9abcb._comment b/doc/bugs/performance_regression__63___init_takes_times_more/comment_18_13ce5ec87207d553388ec23663d9abcb._comment new file mode 100644 index 0000000000..0ce3682574 --- /dev/null +++ b/doc/bugs/performance_regression__63___init_takes_times_more/comment_18_13ce5ec87207d553388ec23663d9abcb._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 18""" + date="2023-03-31T18:36:56Z" + content=""" +Implemented support for +<https://github.com/yesodweb/persistent/issues/1457> in git-annex, +which does speed up sqlite inserts 2x. That will affect the scan in +question, since that inserts to the keys database. It also will speed up +some unrelated parts of git-annex. +"""]] diff --git a/git-annex.cabal b/git-annex.cabal index 365d19ce42..1fb739cf26 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -836,6 +836,7 @@ Executable git-annex Database.Keys.SQL Database.Queue Database.Types + Database.Utility (Diff truncated)
Added a comment
diff --git a/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7/comment_2_6b15d11b6e25689e2430663a4aa90168._comment b/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7/comment_2_6b15d11b6e25689e2430663a4aa90168._comment new file mode 100644 index 0000000000..2048e072f1 --- /dev/null +++ b/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7/comment_2_6b15d11b6e25689e2430663a4aa90168._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="zhongruoyu@80ae9772857666624009364c29f07c70beed46ac" + nickname="zhongruoyu" + avatar="http://cdn.libravatar.org/avatar/b1c7aaba6e8b09ef40ca161135d3b14b" + subject="comment 2" + date="2023-03-31T18:32:29Z" + content=""" +Thanks for doing that. I cannot comment on the `unix-compat` change, because I'm not familiar with it. But let me use the fix you've just committed and keep an eye on the updates, if any. + +"""]]
clarify
diff --git a/doc/bugs/init_fails_in_a_folder_with_newline_in_its_name/comment_1_2690ed9441685068c291a182d39c2616._comment b/doc/bugs/init_fails_in_a_folder_with_newline_in_its_name/comment_1_2690ed9441685068c291a182d39c2616._comment index 949bee65a5..284eea755d 100644 --- a/doc/bugs/init_fails_in_a_folder_with_newline_in_its_name/comment_1_2690ed9441685068c291a182d39c2616._comment +++ b/doc/bugs/init_fails_in_a_folder_with_newline_in_its_name/comment_1_2690ed9441685068c291a182d39c2616._comment @@ -14,4 +14,7 @@ written to uuid.log contains a newline, which prevents parsing that line of the log correctly. This can also be seen by passing a value with a newline to `git-annex describe`. It would also happen in the case with the newline directory if it didn't fail earlier. + +Also fixed this, though, with a one-way escaping, +see [[!commit 38e9ea8497bb2ab058e5bd46a666857789c0a84d]]. """]]
fix link
diff --git a/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows/comment_1_c4cfa1d0f90193b127722711285e1210._comment b/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows/comment_1_c4cfa1d0f90193b127722711285e1210._comment index 2b2521f555..b1ca8a5171 100644 --- a/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows/comment_1_c4cfa1d0f90193b127722711285e1210._comment +++ b/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows/comment_1_c4cfa1d0f90193b127722711285e1210._comment @@ -8,5 +8,5 @@ I tried this on windows, and the second command succeeds now. The first command still fails as shown. At this point, what's left of this bug seems to be the same as -[[bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows]]. +<https://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/> """]]
idea
diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_6_e38b776c73affdbb06f3debdddb07e59._comment b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_6_e38b776c73affdbb06f3debdddb07e59._comment new file mode 100644 index 0000000000..de7cd4053d --- /dev/null +++ b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_6_e38b776c73affdbb06f3debdddb07e59._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2023-03-31T16:55:49Z" + content=""" +What if I made git-annex send "EXTENSIONS PROTOCOLVERSION2"? Then +you could reply to EXPORTSUPPORTED with EXPORTSUPPORTED-FAILURE +when used by a buggy git-annex. +"""]]
git-annex.cabal: Prevent building with unix-compat 0.7
Which removed System.PosixCompat.User.
See https://github.com/haskell-pkg-janitors/unix-compat/issues/3
Sponsored-by: Noam Kremen on Patreon
Which removed System.PosixCompat.User.
See https://github.com/haskell-pkg-janitors/unix-compat/issues/3
Sponsored-by: Noam Kremen on Patreon
diff --git a/CHANGELOG b/CHANGELOG index ff28f59095..699331bfc3 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,10 @@ +git-annex (10.20230330) UNRELEASED; urgency=medium + + * git-annex.cabal: Prevent building with unix-compat 0.7 which + removed System.PosixCompat.User. + + -- Joey Hess <id@joeyh.name> Fri, 31 Mar 2023 12:48:54 -0400 + git-annex (10.20230329) upstream; urgency=medium * sync: Fix parsing of gcrypt::rsync:// urls that use a relative path. diff --git a/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7.mdwn b/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7.mdwn index 3aa6182037..542be432e0 100644 --- a/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7.mdwn +++ b/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7.mdwn @@ -56,3 +56,5 @@ Sorry, I'm not a git-annex user. I'm a maintainer of the Homebrew package manager, and I help to make the newest git-annex available to our users. Thanks for all your work maintaining git-annex! + +> [[fixed|done]] by avoiding the broken version --[[Joey]] diff --git a/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7/comment_1_cd573b785b7d7feec72387cb0dafdcab._comment b/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7/comment_1_cd573b785b7d7feec72387cb0dafdcab._comment new file mode 100644 index 0000000000..9894d24ea6 --- /dev/null +++ b/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7/comment_1_cd573b785b7d7feec72387cb0dafdcab._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-31T16:50:28Z" + content=""" +Unfortunately pinning to 0.6 is the only solution, I cannot work around +this ill-considered change in git-annex. I have opened an issue and hope +the maintainers reconsider. +<https://github.com/haskell-pkg-janitors/unix-compat/issues/3> +"""]] diff --git a/git-annex.cabal b/git-annex.cabal index 65f10351cd..365d19ce42 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -294,7 +294,7 @@ source-repository head location: git://git-annex.branchable.com/ custom-setup - Setup-Depends: base (>= 4.11.1.0 && < 5.0), split, unix-compat, + Setup-Depends: base (>= 4.11.1.0 && < 5.0), split, unix-compat (< 0.7), filepath, exceptions, bytestring, IfElse, data-default, filepath-bytestring (>= 1.4.2.1.4), process (>= 1.6.3), @@ -318,7 +318,7 @@ Executable git-annex case-insensitive, random, dlist, - unix-compat (>= 0.5), + unix-compat (>= 0.5 && < 0.7), SafeSemaphore, async, directory (>= 1.2.7.0),
Added a comment
diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_5_48656a34262fce77ac50836d1d38b9fe._comment b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_5_48656a34262fce77ac50836d1d38b9fe._comment new file mode 100644 index 0000000000..e2d96b7ecf --- /dev/null +++ b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_5_48656a34262fce77ac50836d1d38b9fe._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="wolf480@8ad1ccdd08efc303a88f7e88c4e629be6637a44e" + nickname="wolf480" + avatar="http://cdn.libravatar.org/avatar/816b19ee786208f3216fe146d7733086" + subject="comment 5" + date="2023-03-30T17:33:58Z" + content=""" +Oh, also many thanks for quickly fixing this! +"""]]
Added a comment
diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_4_db89603335995755d145141705ac4584._comment b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_4_db89603335995755d145141705ac4584._comment new file mode 100644 index 0000000000..fa9eff74b1 --- /dev/null +++ b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_4_db89603335995755d145141705ac4584._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="wolf480@8ad1ccdd08efc303a88f7e88c4e629be6637a44e" + nickname="wolf480" + avatar="http://cdn.libravatar.org/avatar/816b19ee786208f3216fe146d7733086" + subject="comment 4" + date="2023-03-30T17:33:10Z" + content=""" +Hmm but a remote only learns that it's being used with exporttree=yes *after* it has sent a `VERSION`? + +I'm afraid I can't bump the version in case of git-annex-remote-rclone, it's already widely used for non-exporttree scenarios and requiring a git-annex that supports `VERSION 2` for these existing usecases (which aren't affected by this bug) would be a regression... + +I'll ask git-annex-remote-rclone maintainers, but it seems to me that defensive coding around the `EXPORT` command is gonna be a better solution in this remote's case. +"""]]
diff --git a/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7.mdwn b/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7.mdwn new file mode 100644 index 0000000000..3aa6182037 --- /dev/null +++ b/doc/bugs/System.PosixCompat.User_removed_in_unix-compat-0.7.mdwn @@ -0,0 +1,58 @@ +### Please describe the problem. + +Module `System.PosixCompat.User` has been removed in `unix-compat-0.7` (see +[changelog](https://hackage.haskell.org/package/unix-compat-0.7/changelog)). As +a result, git-annex failed to build, with the following error: + +``` +Starting git-annex-10.20230329 (all, legacy fallback) +Error: cabal: Failed to build git-annex-10.20230329. The failure occurred +during the configure step. The exception was: +/private/tmp/git-annex-20230329-55610-12n1hf4/git-annex-10.20230329/.brew_home/.cabal/logs/ghc-9.4.4/gt-nnx-10.20230329-579147b2.log: +withFile: user error (Error: cabal: '/opt/homebrew/opt/ghc/bin/ghc' exited +with an error: + +/private/tmp/cabal-install.-55709/dist-newstyle/tmp/src-55709/git-annex-10.20230329/Utility/UserInfo.hs:24:1: +error: +Could not find module ‘System.PosixCompat.User’ +Perhaps you meant +System.PosixCompat.Temp (from unix-compat-0.7) +System.PosixCompat.Time (from unix-compat-0.7) +System.PosixCompat.Files (from unix-compat-0.7) +Use -v (or `:set -v` in ghci) to see a list of the files searched for. +| +24 | import System.PosixCompat.User +| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +) +``` + +### What steps will reproduce the problem? + +``` +cabal v2-update +cabal v2-install --jobs=8 --max-backjumps=100000 --install-method=copy --installdir=/opt/homebrew/Cellar/git-annex/10.20230329/bin --flags=+S3 +``` + +(Note: I omitted some workarounds used to build with GHC >= 9.2. The full +package description for building git-annex can be found +[here](https://github.com/Homebrew/homebrew-core/blob/83f9beeb6ce6d44cd06856f4e9fc513e80cd237d/Formula/git-annex.rb).) + +### What version of git-annex are you using? On what operating system? + +git-annex: 10.20230329 (But it failed with 10.20230321, too.) + +OS: macOS 11, 12, 13 (x86_64 and arm64), Ubuntu 22.04 (x86_64) + +### Please provide any additional information below. + +The error was observed while packaging git-annex for Homebrew +[here](https://github.com/Homebrew/homebrew-core/pull/127002). Currently, that's +being worked around by restricting `unix-compat` version to `>= 0.5 && < 0.7` in +`git-annex.cabal`. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Sorry, I'm not a git-annex user. I'm a maintainer of the Homebrew package +manager, and I help to make the newest git-annex available to our users. + +Thanks for all your work maintaining git-annex!
diff --git a/doc/users/ptilopteri.mdwn b/doc/users/ptilopteri.mdwn new file mode 100644 index 0000000000..a88cab4f23 --- /dev/null +++ b/doc/users/ptilopteri.mdwn @@ -0,0 +1 @@ +darktable user
add news item for git-annex 10.20230329
diff --git a/doc/news/version_10.20221212.mdwn b/doc/news/version_10.20221212.mdwn deleted file mode 100644 index 0862b49a5d..0000000000 --- a/doc/news/version_10.20221212.mdwn +++ /dev/null @@ -1,10 +0,0 @@ -git-annex 10.20221212 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Fix a hang that occasionally occurred during commands such as move, - when operating on unlocked files. (A bug introduced in 10.20220927) - * When youtube-dl is not available in PATH, use yt-dlp instead. - * Support parsing yt-dpl output to display download progress. - * init: Avoid scanning for annexed files, which can be lengthy in a - large repository. Instead that scan is done on demand. - * Sped up the initial scan for annexed files by 21%. - * test: Add --test-debug option. - * Support quettabyte and yottabyte."""]] \ No newline at end of file diff --git a/doc/news/version_10.20230329.mdwn b/doc/news/version_10.20230329.mdwn new file mode 100644 index 0000000000..701373a7c0 --- /dev/null +++ b/doc/news/version_10.20230329.mdwn @@ -0,0 +1,18 @@ +git-annex 10.20230329 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * sync: Fix parsing of gcrypt::rsync:// urls that use a relative path. + * Avoid failure to update adjusted branch --unlock-present after git-annex + drop when annex.adjustedbranchrefresh=1 + * Avoid leaving repo with a detached head when there is a failure + checking out an updated adjusted branch. + * view: Support annex.maxextensionlength when generating filenames for + the view branch. + * Windows: Support urls like "file:///c:/path" + * addurl, importfeed: Fix failure when annex.securehashesonly is set. + * Copy with a reflink when exporting a tree to a directory special remote. + * Fix bug that caused broken protocol to be used with external remotes + that use exporttree=yes. In some cases this could result in the wrong + content being exported to, or retrieved from the remote. + * Support VERSION 2 in the external special remote protocol, which is + identical to VERSION 1, but avoids external remote programs neededing + to work around the above bug. External remote program that support + exporttree=yes are recommended to be updated to send VERSION 2."""]] \ No newline at end of file
diff --git a/doc/bugs/Enabling_useConfigOnly_not_honored.mdwn b/doc/bugs/Enabling_useConfigOnly_not_honored.mdwn new file mode 100644 index 0000000000..40e54976aa --- /dev/null +++ b/doc/bugs/Enabling_useConfigOnly_not_honored.mdwn @@ -0,0 +1,65 @@ +### Please describe the problem. + +Git's `user.useConfigOnly` is not honored by git-annex commands that create commits (e.g., `init` or `sync`). + +### What steps will reproduce the problem? + +Having not globally set `user.{name,email}`, the following will cause git-annex to ignore said setting and proceed with `$USER` as identity: + + $ git config --global user.useConfigOnly true + $ git init annex && cd $_ + Initialized empty Git repository in /tmp/annex/.git/ + $ git annex init + init Author identity unknown + + *** Please tell me who you are. + + Run + + git config --global user.email "you@example.com" + git config --global user.name "Your Name" + + to set your account's default identity. + Omit --global to set the identity only in this repository. + + fatal: no email was given and auto-detection is disabled + ok + (recording state in git...) + +Looking at the git-annex branch confirms this: + + $ git log git-annex + commit 8eb575828cf52a3c150780d89c22672a51291f46 + Author: $USER <$USER> + Date: 2023-03-29 13:22:44 +0200 + + update + + commit 0b247e1da51e742bcf7b027aa25fc0d61520270d + Author: $USER <$USER> + Date: 2023-03-29 13:22:44 +0200 + + branch created + +Furthermore, the identity is now even set in the local repository: + + $ git config user.name + $USER + +Enabling `user.useConfigOnly` should prevent this, making it easier to work with several different identities. + +### What version of git-annex are you using? On what operating system? + + git-annex version: 10.20230321-gb624394c7 + build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV + dependency versions: aws-0.24 bloomfilter-2.0.1.0 cryptonite-0.30 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.13.1 persistent-sqlite-2.13.1.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1 + key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X* + remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external + operating system: linux x86_64 + supported repository versions: 8 9 10 + upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 + local repository version: 10 + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Yes, it's great!
Added a comment
diff --git a/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_7_ff7f26601e375146a08b454a2e448f1b._comment b/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_7_ff7f26601e375146a08b454a2e448f1b._comment new file mode 100644 index 0000000000..2926820d3e --- /dev/null +++ b/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_7_ff7f26601e375146a08b454a2e448f1b._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="dpifke" + avatar="http://cdn.libravatar.org/avatar/7b17ce0661a1b1cd708c5c5150eb2c33" + subject="comment 7" + date="2023-03-29T03:13:50Z" + content=""" +Thanks for tracking this down, and for the pointer to the `annex.crippledfilesystem` option. I agree the latter sounds sub-optimal but I'll keep it in mind if staying on the 1.8 release becomes untenable before the Gocryptfs bug is fixed. I'll follow the linked Github issue for updates on that. + +"""]]
fix whitespace
diff --git a/doc/design/external_special_remote_protocol.mdwn b/doc/design/external_special_remote_protocol.mdwn index 665687f5be..d4bf6bcf3d 100644 --- a/doc/design/external_special_remote_protocol.mdwn +++ b/doc/design/external_special_remote_protocol.mdwn @@ -435,7 +435,7 @@ The two protocol versions are actually identical. Old versions of git-annex that supported only `VERSION 1` had a bug in their implementation of the -part of the protocol documented in the[[export_and_import_appendix]]. +part of the protocol documented in the [[export_and_import_appendix]]. The bug could result in ontent being exported to the wrong file. External special remotes that implement that should use `VERSION 2` to avoid talking to the buggy old version of git-annex.
close
diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn b/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn index 15ce927e63..193f5c00f7 100644 --- a/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn +++ b/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn @@ -84,3 +84,4 @@ But I first needed to be able to exporttree over FTP. (I have a cheap NAS is terribly slow at encryption (SSH, etc) so I'm using FTP to get reasonable speeds with it.) So I thought I'd implement it and then hit this bug. +> [[fixed|done]] --[[Joey]]
external protocol VERSION 2
Support VERSION 2 in the external special remote protocol, which is
identical to VERSION 1, but avoids external remote programs neededing to
work around the above bug. External remote program that support
exporttree=yes are recommended to be updated to send VERSION 2.
Sponsored-by: Kevin Mueller on Patreon
Support VERSION 2 in the external special remote protocol, which is
identical to VERSION 1, but avoids external remote programs neededing to
work around the above bug. External remote program that support
exporttree=yes are recommended to be updated to send VERSION 2.
Sponsored-by: Kevin Mueller on Patreon
diff --git a/CHANGELOG b/CHANGELOG index 29f20cad5c..6ba5266332 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -13,6 +13,10 @@ git-annex (10.20230322) UNRELEASED; urgency=medium * Fix bug that caused broken protocol to be used with external remotes that use exporttree=yes. In some cases this could result in the wrong content being exported to, or retrieved from the remote. + * Support VERSION 2 in the external special remote protocol, which is + identical to VERSION 1, but avoids external remote programs neededing + to work around the above bug. External remote program that support + exporttree=yes are recommended to be updated to send VERSION 2. -- Joey Hess <id@joeyh.name> Thu, 23 Mar 2023 15:04:41 -0400 diff --git a/Remote/External/Types.hs b/Remote/External/Types.hs index 894f09dc30..633dc641bd 100644 --- a/Remote/External/Types.hs +++ b/Remote/External/Types.hs @@ -415,7 +415,7 @@ newtype JobId = JobId Integer deriving (Eq, Ord, Show) supportedProtocolVersions :: [ProtocolVersion] -supportedProtocolVersions = [1] +supportedProtocolVersions = [1, 2] instance Proto.Serializable JobId where serialize (JobId n) = show n diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_1_831bdc33451de9ef5c54592a7455683f._comment b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_1_831bdc33451de9ef5c54592a7455683f._comment index 036484a63b..f4c5189082 100644 --- a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_1_831bdc33451de9ef5c54592a7455683f._comment +++ b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_1_831bdc33451de9ef5c54592a7455683f._comment @@ -18,17 +18,7 @@ but am confident that I fixed it in ---- -I'm looking forward to export support in rclone! - -I'm wondering about the data loss part of this. You said: - -> As a result, one process receives two `EXPORT`s in a row, the first of which it ignores, -> while some other process receives a `TRANSFEREXPORT` without a prior `EXPORT`, -> reusing whatever filename was set in the previous transaction, -> and either ovewriting the last accessed remote file with the wrong content, -> or retrieving the content of the last accessed remote file instead of the one git-annex wanted. - -So in your implementation, you're keeping EXPORT set after handling TRANSFEREXPORT +In your implementation, you're keeping EXPORT set after handling TRANSFEREXPORT and similar commands, and so if you receive a TRANSFEREXPORT not prefixed by an EXPORT, it can be bad. A natural way to write things, indeed `doc/special_remotes/external/example.sh` does the same. @@ -37,26 +27,20 @@ An alternative implementation would be to clear the EXPORT after handling a TRANSFEREXPORT, CHECKPRESENTEXPORT, REMOVEEXPORT, or RENAMEEXPORT. And have those error out if no EXPORT was received before them. -So that's one way we can avoid the data loss problem if your external remote -is used with an old git-annex that has this bug. You might want to do that now. +But, that does not fully avoid the data loss problem. Because this might +happen: ----- + EXPORT foo + EXPORT bar + TRANSFEREXPORT STORE $foo_key $foo_file + +So even ignoring the first EXPORT can result in the external special remote +doing the wrong thing. To fully guard against it, you'd have to error out +if multiple EXPORT were received before a request that uses one, rather +than ignoring the first EXPORT. -But, that requires defensive coding in every external remote... Maybe it -would be worth changing the protocol in some way, that avoids the problem. -Then if external remotes were updated to the new protocol, they'd not work -with the buggy git-annex versions, and avoid data loss. I'm leaving this bug -open for now to consider such a protocol change.. - -Weighing the costs and benefits of such a change, the extent of the data -loss is fairly limited. exporttree=yes remotes are always untrusted, so -if a file on one is overwritten with the wrong content, git-annex will be preserving -the right content elsewhere. And if the wrong file is retrieved from one, -git-annex will notice it has the wrong content (so long as its key uses a checksum). -Still, whatever that export is used for would unexpectedly have the wrong file -content, which could still be a bad day for somebody. - -(If I decide against the protocol change, I should fix -`doc/special_remotes/external/example.sh` to defend against the bug, -and tell others who have externals that support exporttree too..) +So, fixing this in your remote and others would need significant defensive +coding. Too much to make sense, IMHO. I think git-annex needs to chnge the +protocol in some way instead, to make it easy for you to avoid speaking to +a buggy git-annex. """]] diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_2_8ff0638891c38f423d792f5c3cbfb64f._comment b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_2_8ff0638891c38f423d792f5c3cbfb64f._comment index 67e37d0fe9..ce766cb387 100644 --- a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_2_8ff0638891c38f423d792f5c3cbfb64f._comment +++ b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_2_8ff0638891c38f423d792f5c3cbfb64f._comment @@ -12,40 +12,5 @@ git-annex could either continue to also support VERSION 1, or it could refuse to work with externals that don't use VERSION 2. The latter would force externals to get updated. But then those externals would have no way to work with old git-annex even if they wanted to. I think forcing an update is not called -for. - -Note that git-annex's handling of an external that sends VERSION 2 is not -stellar currently: - - joey@darkstar:~/tmp/b>git-annex initremote t type=external externaltype=test encryption=none - initremote t - external special remote protocol error, unexpectedly received "CONFIG directory store data here" (command not allowed at this time) - - git-annex: unable to use special remote due to protocol error - - joey@darkstar:~/tmp/b>git-annex copy --to t - - external special remote protocol error, unexpectedly received "UNSUPPORTED-REQUEST" (command not allowed at this time) - - external special remote protocol error, unexpectedly received "UNSUPPORTED-REQUEST" (command not allowed at this time) - copy x - external special remote protocol error, unexpectedly received "UNSUPPORTED-REQUEST" (command not allowed at this time) - (unable to use special remote due to protocol error) failed - copy: 1 failed - -Protocol debug shows what's happening: - - [2023-03-28 16:00:09.361602472] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] --> VERSION 2 - [2023-03-28 16:00:09.361764519] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] <-- ERROR unsupported VERSION - [2023-03-28 16:00:09.361897595] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] <-- EXTENSIONS INFO GETGITREMOTENAME ASYNC - [2023-03-28 16:00:09.362112912] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] --> UNSUPPORTED-REQUEST - [2023-03-28 16:00:09.362212948] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] <-- PREPARE - [2023-03-28 16:00:09.362332494] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] --> UNSUPPORTED-REQUEST - -Not really ideal behavior for an old version of git-annex when used with -an external that wants to send VERSION 2 to ensure it does not need to deal with -this bug. But better than nothing I suppose. - -(That output really ought to be fixed going forward, but old versions of git-annex -of course can't be fixed now.) +for. So git-annex will keep supporting both versions. """]] diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_3_d774a7319bd1a1778adb2004229f057e._comment b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_3_d774a7319bd1a1778adb2004229f057e._comment new file mode 100644 index 0000000000..5f8d9eef55 --- /dev/null +++ b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_3_d774a7319bd1a1778adb2004229f057e._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2023-03-28T20:51:40Z" + content=""" +I've implemented support for `VERSION 2`. Recommend any external special +remotes that support exporttree=yes use it to avoid talking with a buggy +git-annex version. + +(See [[todo/better_message_for_external_special_remote_protocol_mismatch]] +for related todo.) +"""]] diff --git a/doc/design/external_special_remote_protocol.mdwn b/doc/design/external_special_remote_protocol.mdwn index 3d2f0588ae..665687f5be 100644 --- a/doc/design/external_special_remote_protocol.mdwn +++ b/doc/design/external_special_remote_protocol.mdwn @@ -39,7 +39,7 @@ empty, but the separating spaces are still required in that case. The special remote is responsible for sending the first message, indicating the version of the protocol it is using. - VERSION 1 + VERSION 2 Recent versions of git-annex respond with a message indicating protocol extensions that it supports. Older versions of @@ -271,7 +271,7 @@ These messages may be sent by the special remote at any time that it's handling a request. * `VERSION Int` - Supported protocol version. Current version is 1. Must be sent first + Supported protocol version. Current version is 2. Must be sent first thing at startup, as until it sees this git-annex does not know how to talk with the special remote program! (git-annex does not send a reply to this message, but may give up if it @@ -428,6 +428,18 @@ remote. git-annex will not talk to it any further. If the program receives an ERROR from git-annex, it can exit with its own ERROR. +## protocol versions + +Currently git-annex supports `VERSION 1` and `VERSION 2`. +The two protocol versions are actually identical. + +Old versions of git-annex that supported only `VERSION 1` +had a bug in their implementation of the +part of the protocol documented in the[[export_and_import_appendix]]. +The bug could result in ontent being exported to the wrong file. +External special remotes that implement that should use `VERSION 2` to +avoid talking to the buggy old version of git-annex. + ## extensions These protocol extensions are currently supported. diff --git a/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn b/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn index 3d728eb8cf..d6823e7941 100644 --- a/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn (Diff truncated)
comment
diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_2_8ff0638891c38f423d792f5c3cbfb64f._comment b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_2_8ff0638891c38f423d792f5c3cbfb64f._comment new file mode 100644 index 0000000000..67e37d0fe9 --- /dev/null +++ b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_2_8ff0638891c38f423d792f5c3cbfb64f._comment @@ -0,0 +1,51 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2023-03-28T19:29:01Z" + content=""" +What about simply increasing the protocol version number? If VERSION 2 is +the same as VERSION 1, but only supported by the fixed git-annex, then an +external can just be updated to send VERSION 2, and it does not need to +worry about talking with a buggy version of git-annex. + +git-annex could either continue to also support VERSION 1, or it could +refuse to work with externals that don't use VERSION 2. The latter would +force externals to get updated. But then those externals would have no way to work +with old git-annex even if they wanted to. I think forcing an update is not called +for. + +Note that git-annex's handling of an external that sends VERSION 2 is not +stellar currently: + + joey@darkstar:~/tmp/b>git-annex initremote t type=external externaltype=test encryption=none + initremote t + external special remote protocol error, unexpectedly received "CONFIG directory store data here" (command not allowed at this time) + + git-annex: unable to use special remote due to protocol error + + joey@darkstar:~/tmp/b>git-annex copy --to t + + external special remote protocol error, unexpectedly received "UNSUPPORTED-REQUEST" (command not allowed at this time) + + external special remote protocol error, unexpectedly received "UNSUPPORTED-REQUEST" (command not allowed at this time) + copy x + external special remote protocol error, unexpectedly received "UNSUPPORTED-REQUEST" (command not allowed at this time) + (unable to use special remote due to protocol error) failed + copy: 1 failed + +Protocol debug shows what's happening: + + [2023-03-28 16:00:09.361602472] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] --> VERSION 2 + [2023-03-28 16:00:09.361764519] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] <-- ERROR unsupported VERSION + [2023-03-28 16:00:09.361897595] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] <-- EXTENSIONS INFO GETGITREMOTENAME ASYNC + [2023-03-28 16:00:09.362112912] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] --> UNSUPPORTED-REQUEST + [2023-03-28 16:00:09.362212948] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] <-- PREPARE + [2023-03-28 16:00:09.362332494] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-remote-test[3] --> UNSUPPORTED-REQUEST + +Not really ideal behavior for an old version of git-annex when used with +an external that wants to send VERSION 2 to ensure it does not need to deal with +this bug. But better than nothing I suppose. + +(That output really ought to be fixed going forward, but old versions of git-annex +of course can't be fixed now.) +"""]]
fixed
diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_1_831bdc33451de9ef5c54592a7455683f._comment b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_1_831bdc33451de9ef5c54592a7455683f._comment new file mode 100644 index 0000000000..036484a63b --- /dev/null +++ b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_1_831bdc33451de9ef5c54592a7455683f._comment @@ -0,0 +1,62 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-28T18:35:23Z" + content=""" +Looking at the code, handleRequestExport first uses +withExternalState to send EXPORT, and then it calls +handleRequestKey to send the following command. That uses +withExternalState a second time. + +withExternalState operates on a pool of processes, so in a race +(when using -J presumably, as your test case does), the two +calls to it can use different externals. And so the bug. + +Was easy to fix based on that analysis. I have not tried to reproduce it, +but am confident that I fixed it in +[[!commit 02662f52920e84cd9464641ada84f6c3bbe3f86a]] + +---- + +I'm looking forward to export support in rclone! + +I'm wondering about the data loss part of this. You said: + +> As a result, one process receives two `EXPORT`s in a row, the first of which it ignores, +> while some other process receives a `TRANSFEREXPORT` without a prior `EXPORT`, +> reusing whatever filename was set in the previous transaction, +> and either ovewriting the last accessed remote file with the wrong content, +> or retrieving the content of the last accessed remote file instead of the one git-annex wanted. + +So in your implementation, you're keeping EXPORT set after handling TRANSFEREXPORT +and similar commands, and so if you receive a TRANSFEREXPORT not prefixed by an +EXPORT, it can be bad. A natural way to write things, indeed +`doc/special_remotes/external/example.sh` does the same. + +An alternative implementation would be to clear the EXPORT after handling a +TRANSFEREXPORT, CHECKPRESENTEXPORT, REMOVEEXPORT, or RENAMEEXPORT. +And have those error out if no EXPORT was received before them. + +So that's one way we can avoid the data loss problem if your external remote +is used with an old git-annex that has this bug. You might want to do that now. + +---- + +But, that requires defensive coding in every external remote... Maybe it +would be worth changing the protocol in some way, that avoids the problem. +Then if external remotes were updated to the new protocol, they'd not work +with the buggy git-annex versions, and avoid data loss. I'm leaving this bug +open for now to consider such a protocol change.. + +Weighing the costs and benefits of such a change, the extent of the data +loss is fairly limited. exporttree=yes remotes are always untrusted, so +if a file on one is overwritten with the wrong content, git-annex will be preserving +the right content elsewhere. And if the wrong file is retrieved from one, +git-annex will notice it has the wrong content (so long as its key uses a checksum). +Still, whatever that export is used for would unexpectedly have the wrong file +content, which could still be a bad day for somebody. + +(If I decide against the protocol change, I should fix +`doc/special_remotes/external/example.sh` to defend against the bug, +and tell others who have externals that support exporttree too..) +"""]]
clarify EXPORT
diff --git a/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn b/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn index b2f07bad0b..3d728eb8cf 100644 --- a/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn +++ b/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn @@ -33,14 +33,14 @@ a request, it can reply with `UNSUPPORTED-REQUEST`. * `EXPORTSUPPORTED-FAILURE` Indicates that it does not make sense to export to this special remote. * `EXPORT Name` - Comes immediately before each of the following requests, - specifying the name of the exported file. It will be in the form - of a relative path, and may contain path separators, whitespace, - and other special characters. + Comes immediately before each of the following requests (except for + `REMOVEEXPORTDIRECTORY`), specifying the name of the exported file. It + will be in the form of a relative path, and may contain path separators, + whitespace, and other special characters. No response is made to this message. * `TRANSFEREXPORT STORE|RETRIEVE Key File` Requests the transfer of a File on local disk to or from the previously - provided Name on the special remote. + provided `EXPORT` Name on the special remote. Note that it's important that, while a file is being stored, `CHECKPRESENTEXPORT` not indicate it's present until all the data has been transferred. @@ -52,7 +52,7 @@ a request, it can reply with `UNSUPPORTED-REQUEST`. * `TRANSFER-FAILURE STORE|RETRIEVE Key ErrorMsg` Indicates the transfer failed. * `CHECKPRESENTEXPORT Key` - Requests the remote to check if the previously provided Name is present + Requests the remote to check if the previously provided `EXPORT` Name is present in it. * `CHECKPRESENT-SUCCESS Key` Indicates that a content has been positively verified to be present in the @@ -65,7 +65,7 @@ a request, it can reply with `UNSUPPORTED-REQUEST`. present in the remote. (Perhaps the remote cannot be contacted.) * `REMOVEEXPORT Key` Requests the remote to remove content stored by `TRANSFEREXPORT` - with the previously provided Name. + with the previously provided `EXPORT` Name. * `REMOVE-SUCCESS Key` Indicates the content has been removed from the remote. May be returned when the content was already not present. @@ -87,7 +87,7 @@ a request, it can reply with `UNSUPPORTED-REQUEST`. Should not be returned if the directory did not exist. * `RENAMEEXPORT Key NewName` Requests the remote rename a file stored on it from the previously - provided Name to the NewName. Remotes that support exports but not + provided `EXPORT` Name to the NewName. Remotes that support exports but not renaming do not need to implement this. * `RENAMEEXPORT-SUCCESS Key` Indicates that a `RENAMEEXPORT` was done successfully.
comment
diff --git a/doc/bugs/Always_identical_UUIDs/comment_3_cdb2eec514bcf125b7f79c7be5ed810a._comment b/doc/bugs/Always_identical_UUIDs/comment_3_cdb2eec514bcf125b7f79c7be5ed810a._comment new file mode 100644 index 0000000000..40c2a1d16d --- /dev/null +++ b/doc/bugs/Always_identical_UUIDs/comment_3_cdb2eec514bcf125b7f79c7be5ed810a._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2023-03-28T18:26:52Z" + content=""" +If you're able to run ghci, here's how to generate a UUID the same way +git-annex does: + +ghci -e 'import Data.UUID.V4' -e nextRandom + +And here's how to get entropy the same way that does: + +ghci -e 'System.Entropy.getEntropy 16' +"""]]
comment
diff --git a/doc/bugs/Always_identical_UUIDs/comment_1_1223fe5e635d771848d7054f0492920f._comment b/doc/bugs/Always_identical_UUIDs/comment_1_1223fe5e635d771848d7054f0492920f._comment new file mode 100644 index 0000000000..6371df9f78 --- /dev/null +++ b/doc/bugs/Always_identical_UUIDs/comment_1_1223fe5e635d771848d7054f0492920f._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-28T18:09:26Z" + content=""" +What CPU is this happening with? + +git-annex generates uuids using <https://hackage.haskell.org/package/uuid> +and the V4 uuid uses <https://hackage.haskell.org/package/entropy>, +which in turn uses RDRAND when available. + +So if your CPU supports RDRAND, it seems the CPU must have a broken +random number generator! Or there's a bug in that software stack +somewhere. +"""]] diff --git a/doc/bugs/Always_identical_UUIDs/comment_2_6c44232cac847f6160f5bec7c8f1434b._comment b/doc/bugs/Always_identical_UUIDs/comment_2_6c44232cac847f6160f5bec7c8f1434b._comment new file mode 100644 index 0000000000..32839257cf --- /dev/null +++ b/doc/bugs/Always_identical_UUIDs/comment_2_6c44232cac847f6160f5bec7c8f1434b._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2023-03-28T18:25:30Z" + content=""" +> I also don't get anything from `git config --local annex.uuid` + +Do you mean you don't get anything different? Surely that outputs +something. Or exits with a nonzero exit code. +"""]]
comment
diff --git a/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_7_375b8234fc79d75f3cb39004cbe4d9e3._comment b/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_7_375b8234fc79d75f3cb39004cbe4d9e3._comment new file mode 100644 index 0000000000..c471d3286b --- /dev/null +++ b/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_7_375b8234fc79d75f3cb39004cbe4d9e3._comment @@ -0,0 +1,25 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2023-03-28T17:19:46Z" + content=""" +This is a filesystem bug. I reported it here: +<https://github.com/rfjakob/gocryptfs/issues/724> + +`git-annex add` makes a hard link to the file and then stats the hard link +before and after hashing. Due to this filesystem bug, it gets a different +size. + +One workaround is: + + git config annex.crippledfilesystem true + +Although setting annex.crippledfilesystem has other effects, including +git-annex not locking down permissions of annexed files. So I don't know if +I'd really recommend that workaround. + +It would be possible for git-annex to work around this by statting the file +before making the hard link, rather than statting the hardlink after +creation. But I don't think I want to work around filesystem breakage like +that. +"""]]
diff --git a/doc/bugs/Always_identical_UUIDs.mdwn b/doc/bugs/Always_identical_UUIDs.mdwn index 4380e1f21d..e742616583 100644 --- a/doc/bugs/Always_identical_UUIDs.mdwn +++ b/doc/bugs/Always_identical_UUIDs.mdwn @@ -40,3 +40,48 @@ I also don't get anything from ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) Git annex is great, I use it to manage 3TB+ of research data, and it's useful for manually keeping data in track and versioned + + + +### Please describe the problem. +If I do `git annex init`, I consistently get the same UUID's which messes up lots of remote transfers + +### What steps will reproduce the problem? +If I do `git annex init` on my system and then `git annex info`, it always shows +``` +trusted repositories: 0 +semitrusted repositories: 3 + 00000000-0000-0000-0000-000000000001 -- web + 00000000-0000-0000-0000-000000000002 -- bittorrent + 5df76b66-a137-4b11-bc65-316ee27d52b7 -- :CURRENTPATH [here] +``` +(where currentpath is the path of the git annex repo). It's always 5df + +### What version of git-annex are you using? On what operating system? +I used the downloaded version, as well as recompiled this morning +``` +git-annex version: 10.20230322-ga91f8070e7 +build flags: Assistant Webapp Pairing Inotify TorrentParser Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.0 ghc-8.10.7 http-client-0.7.9 persistent-sqlite-2.13.0.3 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.1.2 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 9 +``` + +### Please provide any additional information below. +I also don't get anything from +`git config --local annex.uuid` so tried to add a manual UUID, but it starts to feel overly hacky when dealing with critical data + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +Git annex is great, I use it to manage 3TB+ of research data, and it's useful for manually keeping data in track and versioned
diff --git a/doc/bugs/Always_identical_UUIDs.mdwn b/doc/bugs/Always_identical_UUIDs.mdwn new file mode 100644 index 0000000000..4380e1f21d --- /dev/null +++ b/doc/bugs/Always_identical_UUIDs.mdwn @@ -0,0 +1,42 @@ +### Please describe the problem. +If I do `git annex init`, I consistently get the same UUID's which messes up lots of remote transfers + +### What steps will reproduce the problem? +If I do `git annex init` on my system and then `git annex info`, it always shows +``` +trusted repositories: 0 +semitrusted repositories: 3 + 00000000-0000-0000-0000-000000000001 -- web + 00000000-0000-0000-0000-000000000002 -- bittorrent + 5df76b66-a137-4b11-bc65-316ee27d52b7 -- :CURRENTPATH [here] +``` +(where currentpath is the path of the git annex repo). It's always 5df + +### What version of git-annex are you using? On what operating system? +I used the downloaded version, as well as recompiled this morning +``` +git-annex version: 10.20230322-ga91f8070e7 +build flags: Assistant Webapp Pairing Inotify TorrentParser Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.0 ghc-8.10.7 http-client-0.7.9 persistent-sqlite-2.13.0.3 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.1.2 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 9 +``` + +### Please provide any additional information below. +I also don't get anything from +`git config --local annex.uuid` so tried to add a manual UUID, but it starts to feel overly hacky when dealing with critical data + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +Git annex is great, I use it to manage 3TB+ of research data, and it's useful for manually keeping data in track and versioned
comment
diff --git a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_6_f9ef22fcfabc6445060b964f08080c76._comment b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_6_f9ef22fcfabc6445060b964f08080c76._comment new file mode 100644 index 0000000000..b5155705f4 --- /dev/null +++ b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_6_f9ef22fcfabc6445060b964f08080c76._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2023-03-28T17:11:53Z" + content=""" +Weird, I don't see how a network failure could lead in that direction. + +Anyway, [[!commit 038a2600f4cf71294976280c5c29f6710359375f]] +avoids it with a detached head. + +git-annex fsck can't do anything about this because the user may have their +own reasons to check out a detached head. +"""]]
Copy with a reflink when exporting a tree to a directory special remote
Remote.Directory makes a temp file, then calls this, and since the temp
file exists, it prevented probing if CoW works.
Note that deleting the empty file does mean there's a small window for a
race. If another process is also exporting to the remote, that could let it
make the same temp file. However, the temp filename actually has the
processes's pid in it, which avoids that being a problem.
This may have been a reversion caused by commits around
63d508e8855b2e61a725c906ea17d1c7f4a2e125, but I haven't gone back and
tested to be sure. The directory special remote had supposedly supported
CoW for this going back to about half a year before that.
Sponsored-by: Graham Spencer on Patreon
Remote.Directory makes a temp file, then calls this, and since the temp
file exists, it prevented probing if CoW works.
Note that deleting the empty file does mean there's a small window for a
race. If another process is also exporting to the remote, that could let it
make the same temp file. However, the temp filename actually has the
processes's pid in it, which avoids that being a problem.
This may have been a reversion caused by commits around
63d508e8855b2e61a725c906ea17d1c7f4a2e125, but I haven't gone back and
tested to be sure. The directory special remote had supposedly supported
CoW for this going back to about half a year before that.
Sponsored-by: Graham Spencer on Patreon
diff --git a/Annex/CopyFile.hs b/Annex/CopyFile.hs index 9fc8eafc53..0be9debd5f 100644 --- a/Annex/CopyFile.hs +++ b/Annex/CopyFile.hs @@ -31,17 +31,15 @@ newCopyCoWTried = CopyCoWTried <$> newEmptyMVar {- Copies a file is copy-on-write is supported. Otherwise, returns False. - - - The destination file must not exist yet, or it will fail to make a CoW copy, - - and will return false. + - The destination file must not exist yet (or may exist but be empty), + - or it will fail to make a CoW copy, and will return false. -} tryCopyCoW :: CopyCoWTried -> FilePath -> FilePath -> MeterUpdate -> IO Bool tryCopyCoW (CopyCoWTried copycowtried) src dest meterupdate = -- If multiple threads reach this at the same time, they -- will both try CoW, which is acceptable. ifM (isEmptyMVar copycowtried) - -- If dest exists, don't try CoW, since it would - -- have to be deleted first. - ( ifM (doesFileExist dest) + ( ifM destfilealreadypopulated ( return False , do ok <- docopycow @@ -61,6 +59,22 @@ tryCopyCoW (CopyCoWTried copycowtried) src dest meterupdate = where docopycow = watchFileSize dest meterupdate $ copyCoW CopyTimeStamps src dest + + dest' = toRawFilePath dest + + -- Check if the dest file already exists, which would prevent + -- probing CoW. If the file exists but is empty, there's no benefit + -- to resuming from it when CoW does not work, so remove it. + destfilealreadypopulated = + tryIO (R.getFileStatus dest') >>= \case + Left _ -> return False + Right st -> do + sz <- getFileSize' dest' st + if sz == 0 + then tryIO (removeFile dest) >>= \case + Right () -> return False + Left _ -> return True + else return True data CopyMethod = CopiedCoW | Copied diff --git a/CHANGELOG b/CHANGELOG index bc5022724c..afa1699777 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -9,6 +9,7 @@ git-annex (10.20230322) UNRELEASED; urgency=medium the view branch. * Windows: Support urls like "file:///c:/path" * addurl, importfeed: Fix failure when annex.securehashesonly is set. + * Copy with a reflink when exporting a tree to a directory special remote. -- Joey Hess <id@joeyh.name> Thu, 23 Mar 2023 15:04:41 -0400 diff --git a/doc/bugs/use_reflink_to_export_to_directory_special_remote.mdwn b/doc/bugs/use_reflink_to_export_to_directory_special_remote.mdwn index 759c751235..e716fceb22 100644 --- a/doc/bugs/use_reflink_to_export_to_directory_special_remote.mdwn +++ b/doc/bugs/use_reflink_to_export_to_directory_special_remote.mdwn @@ -40,3 +40,5 @@ After running, a `btrfs filesystem du ...` tells me that the `repo` and `import` --- In general: Thank you so much for this wonderful piece of software, I'm using it for years now and manage virtually everything with it (audio, video, pictures, important documents, papers, …). + +> [[fixed|done]] --[[Joey]]
comment
diff --git a/doc/todo/wishlist__58___Option_to_specify_max_transfer_rate/comment_5_428d638e1f41d84df7e6c51675fe65da._comment b/doc/todo/wishlist__58___Option_to_specify_max_transfer_rate/comment_5_428d638e1f41d84df7e6c51675fe65da._comment new file mode 100644 index 0000000000..232eda99e0 --- /dev/null +++ b/doc/todo/wishlist__58___Option_to_specify_max_transfer_rate/comment_5_428d638e1f41d84df7e6c51675fe65da._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2023-03-28T16:10:43Z" + content=""" +annex.bwlimit is the limit for each transfer, it doesn't take into account +how many transfers might be running whether by --jobs or multiple git-annex +processes. +"""]]
Added a comment: annex.bwlimit and jobs
diff --git a/doc/todo/wishlist__58___Option_to_specify_max_transfer_rate/comment_4_53c43e480a37b20bc8fed9d38527651b._comment b/doc/todo/wishlist__58___Option_to_specify_max_transfer_rate/comment_4_53c43e480a37b20bc8fed9d38527651b._comment new file mode 100644 index 0000000000..9b2cb64bf7 --- /dev/null +++ b/doc/todo/wishlist__58___Option_to_specify_max_transfer_rate/comment_4_53c43e480a37b20bc8fed9d38527651b._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="gerzoyayde@85d55f6dec266753698f42d1c8e06917ef6674a3" + nickname="gerzoyayde" + avatar="http://cdn.libravatar.org/avatar/0866019bf4d6741b2f4a2c288356513f" + subject="annex.bwlimit and jobs" + date="2023-03-28T00:00:43Z" + content=""" +Does annex.bwlimit limit across multiple jobs? E.g. I have a annex.bwlimit of \"30MiB\" and run a copy with --jobs 3, will the maximum possible bandwidth be 30MB/s or 90MB/s? +"""]]
addurl, importfeed: Fix failure when annex.securehashesonly is set
The temporary URL key used for the download, before the real key is
generated, was blocked by annex.securehashesonly.
Fixed by passing the Backend that will be used for the final key into
runTransfer. When a Backend is provided, have preCheckSecureHashes
check that, rather than the key being transferred.
Sponsored-by: unqueued on Patreon
The temporary URL key used for the download, before the real key is
generated, was blocked by annex.securehashesonly.
Fixed by passing the Backend that will be used for the final key into
runTransfer. When a Backend is provided, have preCheckSecureHashes
check that, rather than the key being transferred.
Sponsored-by: unqueued on Patreon
diff --git a/Annex/Ingest.hs b/Annex/Ingest.hs index 1dcbb2f6a6..e7e5ec3f32 100644 --- a/Annex/Ingest.hs +++ b/Annex/Ingest.hs @@ -178,7 +178,7 @@ ingest' preferredbackend meterupdate (Just (LockedDown cfg source)) mk restage = Nothing -> do backend <- maybe (chooseBackend $ keyFilename source) - (return . Just) + return preferredbackend fst <$> genKey source meterupdate backend Just k -> return k diff --git a/Annex/Transfer.hs b/Annex/Transfer.hs index 5ead81b655..22801a9619 100644 --- a/Annex/Transfer.hs +++ b/Annex/Transfer.hs @@ -1,6 +1,6 @@ {- git-annex transfers - - - Copyright 2012-2021 Joey Hess <id@joeyh.name> + - Copyright 2012-2023 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -34,6 +34,7 @@ import Utility.ThreadScheduler import Annex.LockPool import Types.Key import qualified Types.Remote as Remote +import qualified Types.Backend import Types.Concurrency import Annex.Concurrent import Types.WorkerPool @@ -64,11 +65,11 @@ upload r key f d witness = -- Upload, not supporting canceling detected stalls upload' :: Observable v => UUID -> Key -> AssociatedFile -> Maybe StallDetection -> RetryDecider -> (MeterUpdate -> Annex v) -> NotifyWitness -> Annex v upload' u key f sd d a _witness = guardHaveUUID u $ - runTransfer (Transfer Upload u (fromKey id key)) f sd d a + runTransfer (Transfer Upload u (fromKey id key)) Nothing f sd d a alwaysUpload :: Observable v => UUID -> Key -> AssociatedFile -> Maybe StallDetection -> RetryDecider -> (MeterUpdate -> Annex v) -> NotifyWitness -> Annex v alwaysUpload u key f sd d a _witness = guardHaveUUID u $ - alwaysRunTransfer (Transfer Upload u (fromKey id key)) f sd d a + alwaysRunTransfer (Transfer Upload u (fromKey id key)) Nothing f sd d a -- Download, supporting canceling detected stalls. download :: Remote -> Key -> AssociatedFile -> RetryDecider -> NotifyWitness -> Annex Bool @@ -87,7 +88,7 @@ download r key f d witness = -- Download, not supporting canceling detected stalls. download' :: Observable v => UUID -> Key -> AssociatedFile -> Maybe StallDetection -> RetryDecider -> (MeterUpdate -> Annex v) -> NotifyWitness -> Annex v download' u key f sd d a _witness = guardHaveUUID u $ - runTransfer (Transfer Download u (fromKey id key)) f sd d a + runTransfer (Transfer Download u (fromKey id key)) Nothing f sd d a guardHaveUUID :: Observable v => UUID -> Annex v -> Annex v guardHaveUUID u a @@ -109,20 +110,20 @@ guardHaveUUID u a - Cannot cancel stalls, but when a likely stall is detected, - suggests to the user that they enable stall detection handling. -} -runTransfer :: Observable v => Transfer -> AssociatedFile -> Maybe StallDetection -> RetryDecider -> (MeterUpdate -> Annex v) -> Annex v +runTransfer :: Observable v => Transfer -> Maybe Backend -> AssociatedFile -> Maybe StallDetection -> RetryDecider -> (MeterUpdate -> Annex v) -> Annex v runTransfer = runTransfer' False {- Like runTransfer, but ignores any existing transfer lock file for the - transfer, allowing re-running a transfer that is already in progress. -} -alwaysRunTransfer :: Observable v => Transfer -> AssociatedFile -> Maybe StallDetection -> RetryDecider -> (MeterUpdate -> Annex v) -> Annex v +alwaysRunTransfer :: Observable v => Transfer -> Maybe Backend -> AssociatedFile -> Maybe StallDetection -> RetryDecider -> (MeterUpdate -> Annex v) -> Annex v alwaysRunTransfer = runTransfer' True -runTransfer' :: Observable v => Bool -> Transfer -> AssociatedFile -> Maybe StallDetection -> RetryDecider -> (MeterUpdate -> Annex v) -> Annex v -runTransfer' ignorelock t afile stalldetection retrydecider transferaction = +runTransfer' :: Observable v => Bool -> Transfer -> Maybe Backend -> AssociatedFile -> Maybe StallDetection -> RetryDecider -> (MeterUpdate -> Annex v) -> Annex v +runTransfer' ignorelock t eventualbackend afile stalldetection retrydecider transferaction = enteringStage (TransferStage (transferDirection t)) $ debugLocks $ - preCheckSecureHashes (transferKey t) go + preCheckSecureHashes (transferKey t) eventualbackend go where go = do info <- liftIO $ startTransferInfo afile @@ -244,7 +245,7 @@ runTransferrer -> NotifyWitness -> Annex Bool runTransferrer sd r k afile retrydecider direction _witness = - enteringStage (TransferStage direction) $ preCheckSecureHashes k $ do + enteringStage (TransferStage direction) $ preCheckSecureHashes k Nothing $ do info <- liftIO $ startTransferInfo afile go 0 info where @@ -271,18 +272,25 @@ runTransferrer sd r k afile retrydecider direction _witness = - still contains content using an insecure hash, remotes will likewise - tend to be configured to reject it, so Upload is also prevented. -} -preCheckSecureHashes :: Observable v => Key -> Annex v -> Annex v -preCheckSecureHashes k a = ifM (isCryptographicallySecure k) - ( a - , ifM (annexSecureHashesOnly <$> Annex.getGitConfig) - ( do - warning $ "annex.securehashesonly blocked transfer of " ++ decodeBS (formatKeyVariety variety) ++ " key" - return observeFailure - , a - ) - ) +preCheckSecureHashes :: Observable v => Key -> Maybe Backend -> Annex v -> Annex v +preCheckSecureHashes k meventualbackend a = case meventualbackend of + Just eventualbackend -> go + (pure (Types.Backend.isCryptographicallySecure eventualbackend)) + (Types.Backend.backendVariety eventualbackend) + Nothing -> go + (isCryptographicallySecure k) + (fromKey keyVariety k) where - variety = fromKey keyVariety k + go checksecure variety = ifM checksecure + ( a + , ifM (annexSecureHashesOnly <$> Annex.getGitConfig) + ( blocked variety + , a + ) + ) + blocked variety = do + warning $ "annex.securehashesonly blocked transfer of " ++ decodeBS (formatKeyVariety variety) ++ " key" + return observeFailure type NumRetries = Integer diff --git a/Backend.hs b/Backend.hs index d3eb9414dd..ba5c6f3650 100644 --- a/Backend.hs +++ b/Backend.hs @@ -54,15 +54,13 @@ defaultBackend = maybe cache return =<< Annex.getState Annex.backend lookupname = lookupBackendVariety . parseKeyVariety . encodeBS {- Generates a key for a file. -} -genKey :: KeySource -> MeterUpdate -> Maybe Backend -> Annex (Key, Backend) -genKey source meterupdate preferredbackend = do - b <- maybe defaultBackend return preferredbackend - case B.genKey b of - Just a -> do - k <- a source meterupdate - return (k, b) - Nothing -> giveup $ "Cannot generate a key for backend " ++ - decodeBS (formatKeyVariety (B.backendVariety b)) +genKey :: KeySource -> MeterUpdate -> Backend -> Annex (Key, Backend) +genKey source meterupdate b = case B.genKey b of + Just a -> do + k <- a source meterupdate + return (k, b) + Nothing -> giveup $ "Cannot generate a key for backend " ++ + decodeBS (formatKeyVariety (B.backendVariety b)) getBackend :: FilePath -> Key -> Annex (Maybe Backend) getBackend file k = maybeLookupBackendVariety (fromKey keyVariety k) >>= \case @@ -78,12 +76,16 @@ unknownBackendVarietyMessage v = {- Looks up the backend that should be used for a file. - That can be configured on a per-file basis in the gitattributes file, - or forced with --backend. -} -chooseBackend :: RawFilePath -> Annex (Maybe Backend) +chooseBackend :: RawFilePath -> Annex Backend chooseBackend f = Annex.getRead Annex.forcebackend >>= go where - go Nothing = maybeLookupBackendVariety . parseKeyVariety . encodeBS - =<< checkAttr "annex.backend" f - go (Just _) = Just <$> defaultBackend + go Nothing = do + mb <- maybeLookupBackendVariety . parseKeyVariety . encodeBS + =<< checkAttr "annex.backend" f + case mb of + Just b -> return b + Nothing -> defaultBackend + go (Just _) = defaultBackend {- Looks up a backend by variety. May fail if unsupported or disabled. -} lookupBackendVariety :: KeyVariety -> Annex Backend @@ -111,5 +113,5 @@ isStableKey k = maybe False (`B.isStableKey` k) <$> maybeLookupBackendVariety (fromKey keyVariety k) isCryptographicallySecure :: Key -> Annex Bool -isCryptographicallySecure k = maybe False (`B.isCryptographicallySecure` k) +isCryptographicallySecure k = maybe False B.isCryptographicallySecure <$> maybeLookupBackendVariety (fromKey keyVariety k) diff --git a/CHANGELOG b/CHANGELOG index a38d297c5c..bc5022724c 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -8,6 +8,7 @@ git-annex (10.20230322) UNRELEASED; urgency=medium * view: Support annex.maxextensionlength when generating filenames for the view branch. * Windows: Support urls like "file:///c:/path" + * addurl, importfeed: Fix failure when annex.securehashesonly is set. -- Joey Hess <id@joeyh.name> Thu, 23 Mar 2023 15:04:41 -0400 diff --git a/Command/AddUrl.hs b/Command/AddUrl.hs index 836854d1b3..a65522d1ac 100644 --- a/Command/AddUrl.hs +++ b/Command/AddUrl.hs @@ -323,28 +323,28 @@ addUrlFile addunlockedmatcher o url urlinfo file = (Diff truncated)
promote comment to bug
diff --git a/doc/bugs/securehashesonly_conflicts_with_addurl.mdwn b/doc/bugs/securehashesonly_conflicts_with_addurl.mdwn new file mode 100644 index 0000000000..d98d545bae --- /dev/null +++ b/doc/bugs/securehashesonly_conflicts_with_addurl.mdwn @@ -0,0 +1,15 @@ +Turning on `securehashesonly` seems to disable the `addurl` command: + +```console +% git config --get annex.securehashesonly +true +% git annex addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html +addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html + annex.securehashesonly blocked transfer of URL key +failed +addurl: 1 failed +% git annex addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html --relaxed +addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html (to www.gutenberg.org_cache_epub_2591_pg2591-images.html) ok +(recording state in git...) +% ls -l www.gutenberg.org_cache_epub_2591_pg2591-images.html +www.gutenberg.org_cache_epub_2591_pg2591-images.html -> .git/annex/objects/gg/kG/URL--https&c%%www.gutenberg.org%cache%epub%2591%pg2591-images.html/URL--https&c%%www.gutenberg.org%cache%epub%2591%pg2591-images.html diff --git a/doc/bugs/securehashesonly_conflicts_with_addurl/comment_1_808e276381f98b4ebb7d61628db703d9._comment b/doc/bugs/securehashesonly_conflicts_with_addurl/comment_1_808e276381f98b4ebb7d61628db703d9._comment new file mode 100644 index 0000000000..bf5ae10626 --- /dev/null +++ b/doc/bugs/securehashesonly_conflicts_with_addurl/comment_1_808e276381f98b4ebb7d61628db703d9._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-27T18:03:11Z" + content=""" +I think that git-annex addurl without --fast or --relaxed should +work with annex.securehashesonly set. (Unless annex.backend is set +to something it prevents adding.) It is currently failing because +of a temporary URL key that never reaches the repository. So the check +should be disabled for that. + +Arguably with --relaxed (or --fast) it should also fail. However, +that does not actually download the content, so it does not technically +add content to the repository that is not using a cryptographically +signed hash. That's how it manages to skate by without failing. +Of course git-annex get will later fail, git-annex fsck +will complain, etc. +"""]] diff --git a/doc/git-annex-addurl/comment_16_c6e1743647bb4d45b5a1b237f53d77a6._comment b/doc/git-annex-addurl/comment_16_c6e1743647bb4d45b5a1b237f53d77a6._comment new file mode 100644 index 0000000000..78b1fb9b8d --- /dev/null +++ b/doc/git-annex-addurl/comment_16_c6e1743647bb4d45b5a1b237f53d77a6._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: securehashesonly conflicts with addurl""" + date="2023-03-27T17:59:35Z" + content=""" +Opened a bug report: [[bugs/securehashesonly_conflicts_with_addurl]] +"""]]
verified fixed
diff --git a/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows.mdwn b/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows.mdwn index 7451f055a8..f499338e27 100644 --- a/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows.mdwn +++ b/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows.mdwn @@ -64,3 +64,5 @@ Awhile back we [had related discussion](https://git-annex.branchable.com/bugs/gi [[!meta author=yoh]] [[!tag projects/repronim]] + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows/comment_2_ff71a541b3df8bb272e576c856b3aa9d._comment b/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows/comment_2_ff71a541b3df8bb272e576c856b3aa9d._comment new file mode 100644 index 0000000000..2c216e7e40 --- /dev/null +++ b/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows/comment_2_ff71a541b3df8bb272e576c856b3aa9d._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2023-03-27T17:57:12Z" + content=""" +Ok, put in an ugly hack to fix this. +"""]] diff --git a/doc/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows.mdwn b/doc/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows.mdwn index ada6621f9b..6f46c6f6ea 100644 --- a/doc/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows.mdwn +++ b/doc/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows.mdwn @@ -27,3 +27,5 @@ windows # End of transcript or log. """]] + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_5_e00d70ba6b88d9cf60fcb183f7ea4980._comment b/doc/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_5_e00d70ba6b88d9cf60fcb183f7ea4980._comment new file mode 100644 index 0000000000..96e0489914 --- /dev/null +++ b/doc/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_5_e00d70ba6b88d9cf60fcb183f7ea4980._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2023-03-27T17:57:52Z" + content=""" +Ok, put in that ugly fix. +"""]]
comment
diff --git a/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows/comment_1_c4cfa1d0f90193b127722711285e1210._comment b/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows/comment_1_c4cfa1d0f90193b127722711285e1210._comment new file mode 100644 index 0000000000..2b2521f555 --- /dev/null +++ b/doc/bugs/Unable_to_addurl_file__58____47____47____47___on_Windows/comment_1_c4cfa1d0f90193b127722711285e1210._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-27T16:27:00Z" + content=""" +I tried this on windows, and the second command succeeds now. + +The first command still fails as shown. + +At this point, what's left of this bug seems to be the same as +[[bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows]]. +"""]]
diff --git a/doc/bugs/use_reflink_to_export_to_directory_special_remote.mdwn b/doc/bugs/use_reflink_to_export_to_directory_special_remote.mdwn new file mode 100644 index 0000000000..759c751235 --- /dev/null +++ b/doc/bugs/use_reflink_to_export_to_directory_special_remote.mdwn @@ -0,0 +1,42 @@ +I'm using version `10.20230321-gb624394c7` on Arch Linux and BTRFS. +I'd love to have the directory special remote using `cp --reflink=auto` to export to a directory on the same file system. + +My use case is using syncthing as a way to export (parts of my repo) to my phone. +I'm deliberately not using `adb` as this needs my phone to be connected to the computer somehow. +I basically want to do what's described on [syncthing special remote](/todo/syncthing_special_remote/#index2h2) under “Copying objects”. + +The main drawback, as also noted in the linked document, is that all files are duplicated. +As I'm using BTRFS it would really help if the `export` were to use `cp --reflink=auto` instead of its own copying mechanism. +I've read [here](/forum/Use_reflinks_on_BTRFS_instead_of_symlinks___63__/) that you suggested (albeit 7 years ago) using a shared clone instead. +However, this does not work for me as syncthing does not synchronize symlinks to Android (they are ignored even if they point to something). +(Trying with a shared repo and `adjust --unlock-present` gives merge problems when dropping files from the directory.) + +Nearly two years ago, you wrote [here](/todo/import_from_directory_does_not_use_cp_--reflink__63___/) that you “Implemented CoW for directory special remote, comprehensively”. +It seems to me that this is only true for importing files which, as I checked, actually uses cow. + +Since I want to manage relatively many files of decent size (music, audio books, pictures, videos / movies) it would be much better if the tree export would also use reflink. +Maybe having this as an option, for the cost of not having a nice progress bar, would be something worth considering. + +A test script to show that an export does indeed use cow is the following: + +```bash +mkdir export import +git init repo +cd repo +git annex init +git annex initremote import type=directory directory=../import encryption=none importtree=yes +git annex initremote export type=directory directory=../export encryption=none exporttree=yes +git config remote.import.annex-tracking-branch main +git config remote.export.annex-tracking-branch main + +for i in {1..100}; do + dd if=/dev/urandom bs=1M count=10 of=../import/file$i +done + +git annex sync --content +``` +After running, a `btrfs filesystem du ...` tells me that the `repo` and `import` files point to the same file, whereas `export` does not. + +--- + +In general: Thank you so much for this wonderful piece of software, I'm using it for years now and manage virtually everything with it (audio, video, pictures, important documents, papers, …).
diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn b/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn index 666e63f0a7..15ce927e63 100644 --- a/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn +++ b/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn @@ -24,7 +24,7 @@ Here's an example test run when it fails [on github](https://github.com/Wolf480pl/git-annex-remote-rclone/actions/runs/4525558194/jobs/7970140554) and the test code - [on github](https://github.com/Wolf480pl/git-annex-remote-rclone/blob/bd1a497fca8614286ec290bb557c83442c0e23c9/tests/all-in-one.sh#L224) -and as [[an attachment|all-in-one.sh]] +(I'd post it as an attachment but I'm getting an error that only admin can upload attachments). ### What version of git-annex are you using? On what operating system? @@ -33,9 +33,8 @@ OS: macOS 12.6.3 21G419 ### Please provide any additional information below. -Full log is quite long so I put it in [[an attachment|annex-rclone.log]] -and also on github gists in case anyone prefers that: -https://gist.github.com/Wolf480pl/4bdc83e23154827aad46e84bad631419 +Full log is quite long so I put it in a [github gist](https://gist.github.com/Wolf480pl/4bdc83e23154827aad46e84bad631419). +I wanted to upload it as an attachment but apparently I'm not allowed. Here are the interesting parts:
diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn b/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn new file mode 100644 index 0000000000..666e63f0a7 --- /dev/null +++ b/doc/bugs/external_remote_export_sent_to_wrong_process.mdwn @@ -0,0 +1,87 @@ +### Please describe the problem. + +When getting a lot of objects from an external special remote of the exporttree kind, +sometimes the `EXPORT` command is sent to a different process of the external remote +than the following `TRANSFEREXPORT` command. + +As a result, one process receives two `EXPORT`s in a row, the first of which it ignores, +while some other process receives a `TRANSFEREXPORT` without a prior `EXPORT`, +reusing whatever filename was set in the previous transaction, +and either ovewriting the last accessed remote file with the wrong content, +or retrieving the content of the last accessed remote file instead of the one git-annex wanted. + +Or I misunderstood the protocol - please tell me if that's the case. + +### What steps will reproduce the problem? + +I don't have a minimal reproducing example. + +I was working on adding tree export support to git-annex-remote-rclone, +and its the testsuite on my branch can reproduce the problem around half of the time, but only on MacOS. +It seems like it's a race condition and the interleaving that triggers it doesn't happen on Linux + +Here's an example test run when it fails +[on github](https://github.com/Wolf480pl/git-annex-remote-rclone/actions/runs/4525558194/jobs/7970140554) + +and the test code - [on github](https://github.com/Wolf480pl/git-annex-remote-rclone/blob/bd1a497fca8614286ec290bb557c83442c0e23c9/tests/all-in-one.sh#L224) +and as [[an attachment|all-in-one.sh]] + +### What version of git-annex are you using? On what operating system? + +git-annex version: 10.20230227 +OS: macOS 12.6.3 21G419 + +### Please provide any additional information below. + +Full log is quite long so I put it in [[an attachment|annex-rclone.log]] +and also on github gists in case anyone prefers that: +https://gist.github.com/Wolf480pl/4bdc83e23154827aad46e84bad631419 + +Here are the interesting parts: + +First, during the tree export, we see an example of correct upload of the `test 9.dat` and `test 10.dat` files: +[[!format txt """ +2023-03-26T16:34:52.2551390Z [2023-03-26 16:34:52.085377] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[3] <-- EXPORT test 9.dat +2023-03-26T16:34:52.2653060Z [2023-03-26 16:34:52.086085] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[3] <-- TRANSFEREXPORT STORE SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat .git/annex/objects/zZ/kw/SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat/SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat +[...] +2023-03-26T16:34:49.9382650Z [2023-03-26 16:34:49.629684] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[3] <-- EXPORT test 10.dat +2023-03-26T16:34:49.9384770Z [2023-03-26 16:34:49.629761] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[3] <-- TRANSFEREXPORT STORE SHA256E-s8--d5624694cf1515bdf8c3a648ae35d64f4bdff934800a22a070c6d0baddc120b0.dat .git/annex/objects/6k/X3/SHA256E-s8--d5624694cf1515bdf8c3a648ae35d64f4bdff934800a22a070c6d0baddc120b0.dat/SHA256E-s8--d5624694cf1515bdf8c3a648ae35d64f4bdff934800a22a070c6d0baddc120b0.dat +"""]] + +so we know that key of `test 9.dat` is `a30e...973.dat`, and for `test 10.dat` it's `d56...0b0.dat` + +Then, during the get, we see the bug happen: +[[!format txt """ +2023-03-26T16:34:55.4392510Z [2023-03-26 16:34:54.534833] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[5] --> TRANSFER-SUCCESS RETRIEVE SHA256E-s8--fbb0a327e1528cdb3214abb2ec3fb1dd97cc39dfc12751df102eca019f602e73.dat +2023-03-26T16:34:55.4394480Z [2023-03-26 16:34:54.53532] (Annex.Perms) freezing content .git/annex/objects/45/Gp/SHA256E-s8--91d9183c1a8f61526ed68c5357d52a719481ccbba5039f815cf8f71ae4edbf24.dat/SHA256E-s8--91d9183c1a8f61526ed68c5357d52a719481ccbba5039f815cf8f71ae4edbf24.dat +2023-03-26T16:34:55.4395900Z [2023-03-26 16:34:54.545524] (Annex.Branch) read b51/a3b/SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat.log +2023-03-26T16:34:55.4397280Z [2023-03-26 16:34:54.545836] (Annex.Perms) freezing content directory .git/annex/objects/45/Gp/SHA256E-s8--91d9183c1a8f61526ed68c5357d52a719481ccbba5039f815cf8f71ae4edbf24.dat +2023-03-26T16:34:55.4398630Z [2023-03-26 16:34:54.54609] (Annex.Branch) read 253/975/SHA256E-s8--91d9183c1a8f61526ed68c5357d52a719481ccbba5039f815cf8f71ae4edbf24.dat.log +2023-03-26T16:34:55.4399950Z [2023-03-26 16:34:54.5477] (Annex.Branch) set 253/975/SHA256E-s8--91d9183c1a8f61526ed68c5357d52a719481ccbba5039f815cf8f71ae4edbf24.dat.log +2023-03-26T16:34:55.4401900Z [2023-03-26 16:34:54.558828] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[2] --> TRANSFER-SUCCESS RETRIEVE SHA256E-s7--b163189df1a08d63ba85e1566799cf07ca2de3865ac97a85b2b2d0cbfbd9a2f3.dat +2023-03-26T16:34:55.4403360Z [2023-03-26 16:34:54.559391] (Annex.Branch) read c9a/21e/SHA256E-s8--128888bb8975fc51a7fd410322b088593e158b37a24973483da2ad17fb6d7ff5.dat.log +2023-03-26T16:34:55.4405510Z [2023-03-26 16:34:54.561866] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[2] <-- EXPORT test 9.dat +2023-03-26T16:34:55.4407810Z [2023-03-26 16:34:54.561942] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[2] <-- EXPORT test 10.dat +2023-03-26T16:34:55.4409480Z [2023-03-26 16:34:54.561946] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[5] <-- TRANSFEREXPORT RETRIEVE SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat .git/annex/tmp/SHA256E-s7--a30e9d26ce633b40da3ba8cd8806b9b349cb6c3de6816c6d64f85a571012a973.dat +2023-03-26T16:34:55.4411390Z [2023-03-26 16:34:54.562032] (Annex.ExternalAddonProcess) /Users/runner/work/git-annex-remote-rclone/git-annex-remote-rclone/git-annex-remote-rclone[2] <-- TRANSFEREXPORT RETRIEVE SHA256E-s8--d5624694cf1515bdf8c3a648ae35d64f4bdff934800a22a070c6d0baddc120b0.dat .git/annex/tmp/SHA256E-s8--d5624694cf1515bdf8c3a648ae35d64f4bdff934800a22a070c6d0baddc120b0.dat +"""]] + +As you can see in the last 4 lines: + + * process 2 got `EXPORT test 9.dat` and then immediately `EXPORT test 10.dat` + * then process 5 got the `TRANSFEREXPORT` command for the `test 9.dat` file (`a30..973.dat`) + * process 5 did not, prior to this transfer, receive any `EXPORT` command since its last transfer of an unrelated file (see `TRANSFER-SUCCESS` in the first quoted line). + * so it was still holding the old filename from that unrelated transfer + * then, process 2 receives a (correct) `TRANSFEREXPORT` for the `test 10.dat` file (`d56...0b0.dat`). + +It appears that the `EXPORT test 9.dat` should've been sent to process 5 instead of process 2. +But since it wasn't, process 5 retrieved the wrong file into what git-annex expected to be `test 9.dat`. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +I really wanted to use git annex for my media storage, it seems like the project I've been looking for for years. + +But I first needed to be able to exporttree over FTP. +(I have a cheap NAS is terribly slow at encryption (SSH, etc) so I'm using FTP to get reasonable speeds with it.) +So I thought I'd implement it and then hit this bug. +
Added a comment
diff --git a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_5_7b989f777b211f21ca0e9ca681869bff._comment b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_5_7b989f777b211f21ca0e9ca681869bff._comment new file mode 100644 index 0000000000..9e7aba1e28 --- /dev/null +++ b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_5_7b989f777b211f21ca0e9ca681869bff._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="gioele@678b7c03f524f2669b179b603f65352fcc16774e" + nickname="gioele" + avatar="http://cdn.libravatar.org/avatar/366dbda84e78aff8a8a070622aeb63ce" + subject="comment 5" + date="2023-03-25T08:47:01Z" + content=""" +> I wonder if there are other situations where modifications can prevent checkout of the updated adjusted branch? + +I clearly remember seeing something similar when a `sync` failed due a non-recoverable network failure. Unfortunately I have no logs from that day. + +> It seems wise for git-annex to defend against it in depth, by making sure no crash can leave it in detached head state. + +Perhaps also extend `fsck` to automatically recover from such a state? Or at least suggest a couple of possible solutions to the user? +"""]]
Added a comment: securehashesonly conflicts with addurl
diff --git a/doc/git-annex-addurl/comment_15_460d474cb8ef32d41eae71ee070de0b3._comment b/doc/git-annex-addurl/comment_15_460d474cb8ef32d41eae71ee070de0b3._comment new file mode 100644 index 0000000000..1c8f523eb5 --- /dev/null +++ b/doc/git-annex-addurl/comment_15_460d474cb8ef32d41eae71ee070de0b3._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="jt" + avatar="http://cdn.libravatar.org/avatar/920cb5f2bc783736e679d046aa7cf987" + subject="securehashesonly conflicts with addurl" + date="2023-03-25T03:22:50Z" + content=""" +Turning on `securehashesonly` seems to disable the `addurl` command: + +```console +% git config --get annex.securehashesonly +true +% git annex addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html +addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html + annex.securehashesonly blocked transfer of URL key +failed +addurl: 1 failed +% git annex addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html --relaxed +addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html (to www.gutenberg.org_cache_epub_2591_pg2591-images.html) ok +(recording state in git...) +% ls -l www.gutenberg.org_cache_epub_2591_pg2591-images.html +www.gutenberg.org_cache_epub_2591_pg2591-images.html -> .git/annex/objects/gg/kG/URL--https&c%%www.gutenberg.org%cache%epub%2591%pg2591-images.html/URL--https&c%%www.gutenberg.org%cache%epub%2591%pg2591-images.html +``` + +Does this have something to do with the URL prefix that the annex object has? + + +"""]]
annex.maxextensionlength for view
view: Support annex.maxextensionlength when generating filenames for the
view branch.
Note that refining an existing view will reuse the extension length that was
configured when initially constructing the view. This is necessarily the case
because it reuses the filenames.
Also view files used to have all extensions at the end, no matter how
many there were. Since annex.maxextensionlength's documentation includes
that it's limited to 2 extensions, I made it consistent with that.
Sponsored-by: k0ld on Patreon
view: Support annex.maxextensionlength when generating filenames for the
view branch.
Note that refining an existing view will reuse the extension length that was
configured when initially constructing the view. This is necessarily the case
because it reuses the filenames.
Also view files used to have all extensions at the end, no matter how
many there were. Since annex.maxextensionlength's documentation includes
that it's limited to 2 extensions, I made it consistent with that.
Sponsored-by: k0ld on Patreon
diff --git a/Annex/View.hs b/Annex/View.hs index 65db159710..b47e34564b 100644 --- a/Annex/View.hs +++ b/Annex/View.hs @@ -387,7 +387,7 @@ prop_view_roundtrips (AssociatedFile Nothing) _ _ = True prop_view_roundtrips (AssociatedFile (Just f)) metadata visible = or [ B.null (P.takeFileName f) && B.null (P.takeDirectory f) , viewTooLarge view - , all hasfields (viewedFiles view viewedFileFromReference (fromRawFilePath f) metadata) + , all hasfields (viewedFiles view (viewedFileFromReference' Nothing) (fromRawFilePath f) metadata) ] where view = View (Git.Ref "foo") $ @@ -421,7 +421,9 @@ getViewedFileMetaData = getDirMetaData . dirFromViewedFile . takeFileName - branch for the view. -} applyView :: View -> Maybe Adjustment -> Annex Git.Branch -applyView = applyView' viewedFileFromReference getWorkTreeMetaData +applyView v ma = do + gc <- Annex.getGitConfig + applyView' (viewedFileFromReference gc) getWorkTreeMetaData v ma {- Generates a new branch for a View, which must be a more narrow - version of the View originally used to generate the currently @@ -553,7 +555,8 @@ updateView view madj = do Git.LsTree.LsTreeRecursive (Git.LsTree.LsTreeLong True) (viewParentBranch view) - applyView'' viewedFileFromReference getWorkTreeMetaData view madj l clean $ + gc <- Annex.getGitConfig + applyView'' (viewedFileFromReference gc) getWorkTreeMetaData view madj l clean $ \ti -> do let ref = Git.Ref.branchFileRef (viewParentBranch view) (getTopFilePath (Git.LsTree.file ti)) diff --git a/Annex/View/ViewedFile.hs b/Annex/View/ViewedFile.hs index c804a50c0b..6aa992babb 100644 --- a/Annex/View/ViewedFile.hs +++ b/Annex/View/ViewedFile.hs @@ -1,6 +1,6 @@ {- filenames (not paths) used in views - - - Copyright 2014 Joey Hess <id@joeyh.name> + - Copyright 2014-2023 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -11,6 +11,7 @@ module Annex.View.ViewedFile ( ViewedFile, MkViewedFile, viewedFileFromReference, + viewedFileFromReference', viewedFileReuse, dirFromViewedFile, prop_viewedFile_roundtrips, @@ -35,17 +36,27 @@ type MkViewedFile = FilePath -> ViewedFile - - So, from dir/subdir/file.foo, generate file_%dir%subdir%.foo -} -viewedFileFromReference :: MkViewedFile -viewedFileFromReference f = concat $ - [ escape (fromRawFilePath base) +viewedFileFromReference :: GitConfig -> MkViewedFile +viewedFileFromReference g = viewedFileFromReference' (annexMaxExtensionLength g) + +viewedFileFromReference' :: Maybe Int -> MkViewedFile +viewedFileFromReference' maxextlen f = concat $ + [ escape (fromRawFilePath base') , if null dirs then "" else "_%" ++ intercalate "%" (map escape dirs) ++ "%" - , escape $ fromRawFilePath $ S.concat extensions + , escape $ fromRawFilePath $ S.concat extensions' ] where (path, basefile) = splitFileName f dirs = filter (/= ".") $ map dropTrailingPathSeparator (splitPath path) - (base, extensions) = splitShortExtensions (toRawFilePath basefile') - + (base, extensions) = case maxextlen of + Nothing -> splitShortExtensions (toRawFilePath basefile') + Just n -> splitShortExtensions' (n+1) (toRawFilePath basefile') + {- Limit to two extensions maximum. -} + (base', extensions') + | length extensions <= 2 = (base, extensions) + | otherwise = + let (es,more) = splitAt 2 (reverse extensions) + in (base <> mconcat (reverse more), reverse es) {- On Windows, if the filename looked like "dir/c:foo" then - basefile would look like it contains a drive letter, which will - not work. There cannot really be a filename like that, probably, @@ -90,7 +101,7 @@ prop_viewedFile_roundtrips tf -- Relative filenames wanted, not directories. | any (isPathSeparator) (end f ++ beginning f) = True | isAbsolute f || isDrive f = True - | otherwise = dir == dirFromViewedFile (viewedFileFromReference f) + | otherwise = dir == dirFromViewedFile (viewedFileFromReference' Nothing f) where f = fromTestableFilePath tf dir = joinPath $ beginning $ splitDirectories f diff --git a/CHANGELOG b/CHANGELOG index 1a2f35bad3..3b7f40eb04 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -5,6 +5,8 @@ git-annex (10.20230322) UNRELEASED; urgency=medium drop when annex.adjustedbranchrefresh=1 * Avoid leaving repo with a detached head when there is a failure checking out an updated adjusted branch. + * view: Support annex.maxextensionlength when generating filenames for + the view branch. -- Joey Hess <id@joeyh.name> Thu, 23 Mar 2023 15:04:41 -0400 diff --git a/Utility/Path.hs b/Utility/Path.hs index dcb21400ea..64ef076ff9 100644 --- a/Utility/Path.hs +++ b/Utility/Path.hs @@ -20,6 +20,7 @@ module Utility.Path ( runSegmentPaths', dotfile, splitShortExtensions, + splitShortExtensions', relPathDirToFileAbs, inSearchPath, searchPath, diff --git a/doc/git-annex-view.mdwn b/doc/git-annex-view.mdwn index b8e126403b..f2677019a7 100644 --- a/doc/git-annex-view.mdwn +++ b/doc/git-annex-view.mdwn @@ -44,6 +44,12 @@ into the `_` directory and committing will unset the metadata. The name of the `_` directory can be changed using the annex.viewunsetdirectory git config. +Filenames in the view branch include their path within the original branch, to +ensure that they are unique. The path comes after the main filename, and +before any extensions. For example, "foo/bar.baz" will have a name +like "bar_%foo%.baz". annex.maxextensionlength can be used to configure +what is treated as an extension. + # OPTIONS * The [[git-annex-common-options]](1) can be used. diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 5e63631be7..aac408d3ff 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -825,9 +825,11 @@ repository, using [[git-annex-config]]. See its man page for a list.) * `annex.maxextensionlength` - Maximum length, in bytes, of what is considered a filename extension when - adding a file to a backend that preserves filename extensions. The - default length is 4, which allows extensions like "jpeg". The dot before + Maximum length, in bytes, of what is considered a filename extension. + This is used when adding a file to a backend that preserves filename extensions, + and also when generating a view branch. + + The default length is 4, which allows extensions like "jpeg". The dot before the extension is not counted part of its length. At most two extensions at the end of a filename will be preserved, e.g. .gz or .tar.gz . diff --git a/doc/todo/Configuring_metadata_view_filenames/comment_8_2bcfc677da72637f34904b84fdd95c10._comment b/doc/todo/Configuring_metadata_view_filenames/comment_8_2bcfc677da72637f34904b84fdd95c10._comment new file mode 100644 index 0000000000..b379529f5c --- /dev/null +++ b/doc/todo/Configuring_metadata_view_filenames/comment_8_2bcfc677da72637f34904b84fdd95c10._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2023-03-24T17:48:43Z" + content=""" +I've made git-annex view use `annex.maxextensionlength`. Note that refining +an existing view will reuse the extension length that was configured when +initially constructing the view. +"""]]
Added a comment
diff --git a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_4_2a4de8981ced7894148ca9a68e5ef60b._comment b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_4_2a4de8981ced7894148ca9a68e5ef60b._comment new file mode 100644 index 0000000000..76b2779873 --- /dev/null +++ b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_4_2a4de8981ced7894148ca9a68e5ef60b._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="gioele@678b7c03f524f2669b179b603f65352fcc16774e" + nickname="gioele" + avatar="http://cdn.libravatar.org/avatar/366dbda84e78aff8a8a070622aeb63ce" + subject="comment 4" + date="2023-03-24T08:13:00Z" + content=""" +Thanks for investigating this! + +These are the local settings: + + thin = true + adjustedbranchrefresh = 1 + +"""]]
Added a comment
diff --git a/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_5_a1f8d718be941f4dd5e7b06fd6431c10._comment b/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_5_a1f8d718be941f4dd5e7b06fd6431c10._comment new file mode 100644 index 0000000000..a82ac5e3f6 --- /dev/null +++ b/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_5_a1f8d718be941f4dd5e7b06fd6431c10._comment @@ -0,0 +1,73 @@ +[[!comment format=mdwn + username="dpifke" + avatar="http://cdn.libravatar.org/avatar/7b17ce0661a1b1cd708c5c5150eb2c33" + subject="comment 5" + date="2023-03-23T23:43:17Z" + content=""" +Some additional experiments: + +``` +$ stat Invoice.pdf + File: Invoice.pdf + Size: 54623 Blocks: 112 IO Block: 4096 regular file +Device: bfh/191d Inode: 294625 Links: 1 +Access: (0444/-r--r--r--) Uid: ( 1000/ dave) Gid: ( 1000/ dave) +Access: 2023-03-23 17:28:30.299963443 -0600 +Modify: 2023-03-23 17:19:56.731965402 -0600 +Change: 2023-03-23 17:28:30.299963443 -0600 + Birth: - +$ sha256sum Invoice.pdf +868cb65310f5ef46fcd4f7d85fb364347ea9047766cadcf3d184a8c704164b90 Invoice.pdf +$ stat Invoice.pdf + File: Invoice.pdf + Size: 54623 Blocks: 112 IO Block: 4096 regular file +Device: bfh/191d Inode: 294625 Links: 1 +Access: (0444/-r--r--r--) Uid: ( 1000/ dave) Gid: ( 1000/ dave) +Access: 2023-03-23 17:35:38.303961810 -0600 +Modify: 2023-03-23 17:19:56.731965402 -0600 +Change: 2023-03-23 17:28:30.299963443 -0600 + Birth: - +``` + +From the above, I can see that `-ko noatime` doesn't do anything, but the ctime change is something unique to git-annex, not just from opening and reading file contents. (I've used git-annex on filesystems without noatime before, so I didn't think noatime support was mandatory?) + +But wait... why is the file 0444? + +``` +$ chmod 644 Invoice.pdf +$ stat Invoice.pdf + File: Invoice.pdf + Size: 54623 Blocks: 112 IO Block: 4096 regular file +Device: bfh/191d Inode: 294625 Links: 1 +Access: (0644/-rw-r--r--) Uid: ( 1000/ dave) Gid: ( 1000/ dave) +Access: 2023-03-23 17:35:38.303961810 -0600 +Modify: 2023-03-23 17:19:56.731965402 -0600 +Change: 2023-03-23 17:38:04.539961252 -0600 + Birth: - +``` + +OK, so setting it back to 0644 modifies ctime. + +``` +$ git annex add Invoice.pdf +add Invoice.pdf + + Invoice.pdf changed while it was being added +failed +add: 1 failed +$ stat Invoice.pdf + File: Invoice.pdf + Size: 54623 Blocks: 112 IO Block: 4096 regular file +Device: bfh/191d Inode: 294625 Links: 1 +Access: (0444/-r--r--r--) Uid: ( 1000/ dave) Gid: ( 1000/ dave) +Access: 2023-03-23 17:38:28.999961159 -0600 +Modify: 2023-03-23 17:19:56.731965402 -0600 +Change: 2023-03-23 17:38:28.999961159 -0600 + Birth: - +``` + +And running `git annex add` changes it back to 0444, *and* updates ctime. So that's why that change is being observed. + +So I don't think gocryptfs is doing anything unexpected w.r.t. modifying the file out from under git-annex. Is there a way to get more detailed information about what git-annex thinks has changed? + +"""]]
Added a comment
diff --git a/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_4_2689a87a68dc459615d9542ad908b7a3._comment b/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_4_2689a87a68dc459615d9542ad908b7a3._comment new file mode 100644 index 0000000000..3100df7fd5 --- /dev/null +++ b/doc/forum/Can_you_use_git_annex_on_gocryptfs__63__/comment_4_2689a87a68dc459615d9542ad908b7a3._comment @@ -0,0 +1,75 @@ +[[!comment format=mdwn + username="dpifke" + avatar="http://cdn.libravatar.org/avatar/7b17ce0661a1b1cd708c5c5150eb2c33" + subject="comment 4" + date="2023-03-23T23:32:45Z" + content=""" +After successfully using git-annex + gocryptfs for several years now, I came across this issue because I also am no longer able to add files (\"... changed while it was being added\") after upgrading from bullseye to bookworm. + +The issue appears with gocryptfs 2.3.1+b3, but if I downgrade to gocryptfs 1.8.0-1+b6 it works again. + +It doesn't seem to be an inode number issue, but I do see the atime and ctime of the file change before and after running `git annex add`: + +``` +$ sha256sum Invoice.pdf +868cb65310f5ef46fcd4f7d85fb364347ea9047766cadcf3d184a8c704164b90 Invoice.pdf +$ stat Invoice.pdf + File: Invoice.pdf + Size: 54623 Blocks: 112 IO Block: 4096 regular file +Device: bfh/191d Inode: 294625 Links: 1 +Access: (0444/-r--r--r--) Uid: ( 1000/ dave) Gid: ( 1000/ dave) +Access: 2023-03-23 17:19:59.707965391 -0600 +Modify: 2023-03-23 17:19:56.731965402 -0600 +Change: 2023-03-23 17:19:56.731965402 -0600 + Birth: - +$ git annex --debug add Invoice.pdf +[2023-03-23 17:20:23.680446749] (Utility.Process) process [2900635] read: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"git-annex\"] +[2023-03-23 17:20:23.684813572] (Utility.Process) process [2900635] done ExitSuccess +[2023-03-23 17:20:23.684979722] (Utility.Process) process [2900637] read: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"] +[2023-03-23 17:20:23.688066503] (Utility.Process) process [2900637] done ExitSuccess +[2023-03-23 17:20:23.688331712] (Utility.Process) process [2900638] read: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"log\",\"refs/heads/git-annex..3b3eebc02f68101a6e2aeecc1733f2c214847361\",\"--pretty=%H\",\"-n1\"] +[2023-03-23 17:20:23.690117996] (Utility.Process) process [2900638] done ExitSuccess +[2023-03-23 17:20:23.690338783] (Utility.Process) process [2900639] read: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"log\",\"refs/heads/git-annex..60c9383872bd6750ab3d91730de902e2aa330d54\",\"--pretty=%H\",\"-n1\"] +[2023-03-23 17:20:23.692089872] (Utility.Process) process [2900639] done ExitSuccess +[2023-03-23 17:20:23.692439951] (Utility.Process) process [2900640] chat: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch\"] +[2023-03-23 17:20:23.694521134] (Utility.Process) process [2900641] read: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"symbolic-ref\",\"-q\",\"HEAD\"] +[2023-03-23 17:20:23.695417786] (Utility.Process) process [2900641] done ExitSuccess +[2023-03-23 17:20:23.695579032] (Utility.Process) process [2900642] read: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"refs/heads/master\"] +[2023-03-23 17:20:23.697616186] (Utility.Process) process [2900642] done ExitSuccess +[2023-03-23 17:20:23.69780391] (Utility.Process) process [2900643] read: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"ls-files\",\"-z\",\"--others\",\"--exclude-standard\",\"--\",\"Invoice.pdf\"] +[2023-03-23 17:20:23.699829084] (Utility.Process) process [2900644] chat: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"check-attr\",\"-z\",\"--stdin\",\"annex.backend\",\"annex.largefiles\",\"annex.numcopies\",\"annex.mincopies\",\"--\"] +[2023-03-23 17:20:23.701005154] (Utility.Process) process [2900645] chat: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch-check=%(objectname) %(objecttype) %(objectsize)\"] +add Invoice.pdf [2023-03-23 17:20:23.703386207] (Annex.Perms) freezing content Invoice.pdf + + + Invoice.pdf changed while it was being added +failed +[2023-03-23 17:20:23.704163913] (Utility.Process) process [2900643] done ExitSuccess +[2023-03-23 17:20:23.704269644] (Utility.Process) process [2900646] read: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"ls-files\",\"-z\",\"--modified\",\"--\",\"Invoice.pdf\"] +[2023-03-23 17:20:23.706229644] (Utility.Process) process [2900646] done ExitSuccess +[2023-03-23 17:20:23.706353485] (Utility.Process) process [2900647] read: git [\"--git-dir=../../.git\",\"--work-tree=../..\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"diff\",\"--name-only\",\"--diff-filter=T\",\"-z\",\"--cached\",\"--\",\"Invoice.pdf\"] +[2023-03-23 17:20:23.708586251] (Utility.Process) process [2900647] done ExitSuccess +[2023-03-23 17:20:23.709078263] (Utility.Process) process [2900640] done ExitSuccess +[2023-03-23 17:20:23.709198563] (Utility.Process) process [2900645] done ExitSuccess +[2023-03-23 17:20:23.709484216] (Utility.Process) process [2900644] done ExitSuccess +add: 1 failed +$ stat Invoice.pdf + File: Invoice.pdf + Size: 54623 Blocks: 112 IO Block: 4096 regular file +Device: bfh/191d Inode: 294625 Links: 1 +Access: (0444/-r--r--r--) Uid: ( 1000/ dave) Gid: ( 1000/ dave) +Access: 2023-03-23 17:20:23.699965299 -0600 +Modify: 2023-03-23 17:19:56.731965402 -0600 +Change: 2023-03-23 17:20:23.699965299 -0600 + Birth: - +$ sha256sum Invoice.pdf +868cb65310f5ef46fcd4f7d85fb364347ea9047766cadcf3d184a8c704164b90 Invoice.pdf +``` + +The filesystem was mounted as `/home/dave/.encrypted/Finances on /home/dave/Finances type fuse.gocryptfs (rw,nosuid,nodev,noatime,user_id=1000,group_id=1000,max_read=131072)`. + +This is with git-annex 10.20230126-2. + +I am happy to debug further if someone wants to point me in a useful direction. + +"""]]
Added a comment
diff --git a/doc/todo/Configuring_metadata_view_filenames/comment_7_ae259f68ab5b366b6fd29e5df1a05469._comment b/doc/todo/Configuring_metadata_view_filenames/comment_7_ae259f68ab5b366b6fd29e5df1a05469._comment new file mode 100644 index 0000000000..4a005a956a --- /dev/null +++ b/doc/todo/Configuring_metadata_view_filenames/comment_7_ae259f68ab5b366b6fd29e5df1a05469._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="Xyem" + avatar="http://cdn.libravatar.org/avatar/dea2d98057721c21166c3721f6d55c06" + subject="comment 7" + date="2023-03-23T22:26:24Z" + content=""" +So if my understanding is correct, the file paths generated for this view should something like `sd/v1.5_%model%.safetensors` but as `annex.maxextensionlength` isn't being considered during this, it doesn't realise `safetensors` is the extension? + +Unfortunately, the software will only regard certain extensions as being usable files, so I will be unable to use metadata views for now. I've set up separate branches and will copy symlinks between branches in the meantime. +"""]]
comment
diff --git a/doc/todo/Configuring_metadata_view_filenames/comment_6_1eb8c2d70ff9e3e5b3dbebf88270eb93._comment b/doc/todo/Configuring_metadata_view_filenames/comment_6_1eb8c2d70ff9e3e5b3dbebf88270eb93._comment new file mode 100644 index 0000000000..7a0ddb9053 --- /dev/null +++ b/doc/todo/Configuring_metadata_view_filenames/comment_6_1eb8c2d70ff9e3e5b3dbebf88270eb93._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2023-03-23T20:39:33Z" + content=""" +@Xyem no, it's unchanged. But annex.maxextensionlength does not configure +the extension length here currently. I think it would be a good thing for +it to do, probably. +"""]]
Avoid leaving repo with a detached head when there is a failure checking out an updated adjusted branch
I don't know of scenarios where that can happen (besides the bug
fixed by the parent commit), but there probably are some.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
I don't know of scenarios where that can happen (besides the bug
fixed by the parent commit), but there probably are some.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
diff --git a/Annex/AdjustedBranch.hs b/Annex/AdjustedBranch.hs index 7ea5b234aa..5cef6ec29c 100644 --- a/Annex/AdjustedBranch.hs +++ b/Annex/AdjustedBranch.hs @@ -248,26 +248,42 @@ checkoutAdjustedBranch (AdjBranch b) quietcheckout = do updateAdjustedBranch :: Adjustment -> AdjBranch -> OrigBranch -> Annex Bool updateAdjustedBranch adj (AdjBranch currbranch) origbranch | not (adjustmentIsStable adj) = do - b <- preventCommits $ \commitlck -> do + (b, origheadfile, newheadfile) <- preventCommits $ \commitlck -> do -- Avoid losing any commits that the adjusted branch -- has that have not yet been propigated back to the -- origbranch. _ <- propigateAdjustedCommits' origbranch adj commitlck + + origheadfile <- inRepo $ readFile . Git.Ref.headFile -- Git normally won't do anything when asked to check -- out the currently checked out branch, even when its -- ref has changed. Work around this by writing a raw -- sha to .git/HEAD. - inRepo (Git.Ref.sha currbranch) >>= \case - Just headsha -> inRepo $ \r -> - writeFile (Git.Ref.headFile r) (fromRef headsha) - _ -> noop + newheadfile <- inRepo (Git.Ref.sha currbranch) >>= \case + Just headsha -> do + inRepo $ \r -> do + let newheadfile = fromRef headsha + writeFile (Git.Ref.headFile r) newheadfile + return (Just newheadfile) + _ -> return Nothing - adjustBranch adj origbranch + b <- adjustBranch adj origbranch + return (b, origheadfile, newheadfile) -- Make git checkout quiet to avoid warnings about -- disconnected branch tips being lost. - checkoutAdjustedBranch b True + ok <- checkoutAdjustedBranch b True + + -- Avoid leaving repo with detached head. + unless ok $ case newheadfile of + Nothing -> noop + Just v -> preventCommits $ \_commitlck -> inRepo $ \r -> do + v' <- readFile (Git.Ref.headFile r) + when (v == v') $ + writeFile (Git.Ref.headFile r) origheadfile + + return ok | otherwise = preventCommits $ \commitlck -> do -- Done for consistency. _ <- propigateAdjustedCommits' origbranch adj commitlck diff --git a/CHANGELOG b/CHANGELOG index 33a27d6a02..1a2f35bad3 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -3,6 +3,8 @@ git-annex (10.20230322) UNRELEASED; urgency=medium * sync: Fix parsing of gcrypt::rsync:// urls that use a relative path. * Avoid failure to update adjusted branch --unlock-present after git-annex drop when annex.adjustedbranchrefresh=1 + * Avoid leaving repo with a detached head when there is a failure + checking out an updated adjusted branch. -- Joey Hess <id@joeyh.name> Thu, 23 Mar 2023 15:04:41 -0400 diff --git a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop.mdwn b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop.mdwn index 7e4f38e9cd..a190b262e7 100644 --- a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop.mdwn +++ b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop.mdwn @@ -53,4 +53,4 @@ local repository version: 8 git-annex is too good. It so rarely causes problems that one does not develop the "git-annex troubleshooting muscle". :) - +> [[fixed|done]] --[[Joey]]
run restagePointerFiles in adjustedBranchRefreshFull
Avoid failure to update adjusted branch --unlock-present after git-annex
drop when annex.adjustedbranchrefresh=1
At higher values, it did flush the queue, which ran restagePointerFiles.
But at 1, adjustedBranchRefreshFull gets added to the queue, and while
restagePointerFiles is also in the queue, it runs after that.
Sponsored-by: Brock Spratlen on Patreon
Avoid failure to update adjusted branch --unlock-present after git-annex
drop when annex.adjustedbranchrefresh=1
At higher values, it did flush the queue, which ran restagePointerFiles.
But at 1, adjustedBranchRefreshFull gets added to the queue, and while
restagePointerFiles is also in the queue, it runs after that.
Sponsored-by: Brock Spratlen on Patreon
diff --git a/Annex/AdjustedBranch.hs b/Annex/AdjustedBranch.hs index 1907157e72..7ea5b234aa 100644 --- a/Annex/AdjustedBranch.hs +++ b/Annex/AdjustedBranch.hs @@ -45,7 +45,6 @@ import Annex.Common import Types.AdjustedBranch import Annex.AdjustedBranch.Name import qualified Annex -import qualified Annex.Queue import Git import Git.Types import qualified Git.Branch @@ -312,22 +311,20 @@ adjustedBranchRefresh _af a = do !s' = s { Annex.adjustedbranchrefreshcounter = c' } in pure (s', enough) - update adj origbranch = do - -- Flush the queue, to make any pending changes be written - -- out to disk. But mostly so any pointer files - -- restagePointerFile was called on get updated so git - -- checkout won't fall over. - Annex.Queue.flush - -- This is slow, it would be better to incrementally - -- adjust the AssociatedFile, and only call this once - -- at shutdown to handle cases where not all - -- AssociatedFiles are known. + -- This is slow, it would be better to incrementally + -- adjust the AssociatedFile, and only call this once + -- at shutdown to handle cases where not all + -- AssociatedFiles are known. + update adj origbranch = adjustedBranchRefreshFull adj origbranch {- Slow, but more dependable version of adjustedBranchRefresh that - does not rely on all AssociatedFiles being known. -} adjustedBranchRefreshFull :: Adjustment -> OrigBranch -> Annex () adjustedBranchRefreshFull adj origbranch = do + -- Restage pointer files so modifications to them due to get/drop + -- do not prevent checking out the updated adjusted branch. + restagePointerFiles =<< Annex.gitRepo let adjbranch = originalToAdjusted origbranch adj unlessM (updateAdjustedBranch adj adjbranch origbranch) $ warning $ unwords [ "Updating adjusted branch failed." ] diff --git a/CHANGELOG b/CHANGELOG index c901ffa1a7..33a27d6a02 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,6 +1,8 @@ git-annex (10.20230322) UNRELEASED; urgency=medium * sync: Fix parsing of gcrypt::rsync:// urls that use a relative path. + * Avoid failure to update adjusted branch --unlock-present after git-annex + drop when annex.adjustedbranchrefresh=1 -- Joey Hess <id@joeyh.name> Thu, 23 Mar 2023 15:04:41 -0400 diff --git a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment index 3d43aeff32..1380567474 100644 --- a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment +++ b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment @@ -6,4 +6,6 @@ I assume this is a --hide-missing adjusted branch? Update: Oh, I see from a forum post that it's --unlock-present actually. + +What is the annex.adjustedbranchrefresh git config set to? """]] diff --git a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_2_a947f8afde3d7f63fd33b0b7e5998e43._comment b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_2_a947f8afde3d7f63fd33b0b7e5998e43._comment new file mode 100644 index 0000000000..f5ce8f9aea --- /dev/null +++ b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_2_a947f8afde3d7f63fd33b0b7e5998e43._comment @@ -0,0 +1,42 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2023-03-23T19:35:33Z" + content=""" +I was able to reproduce something that looks similar to this without +needing to interrupt any command: + + joey@darkstar:~/tmp/bench>git clone a b + Cloning into 'b'... + done. + joey@darkstar:~/tmp/bench>cd b + joey@darkstar:~/tmp/bench/b>git config annex.adjustedbranchrefresh 1 + joey@darkstar:~/tmp/bench/b>git annex adjust --unlock-present + adjust + Switched to branch 'adjusted/master(unlockpresent)' + ok + joey@darkstar:~/tmp/bench/b#master(unlockpresent)>ls + foo@ + joey@darkstar:~/tmp/bench/b#master(unlockpresent)>git-annex get + get foo (from origin...) + ok + (recording state in git...) + joey@darkstar:~/tmp/bench/b#master(unlockpresent)>ls + foo + joey@darkstar:~/tmp/bench/b#master(unlockpresent)>git-annex drop + drop foo ok + error: Your local changes to the following files would be overwritten by checkout: + foo + Please commit your changes or stash them before you switch branches. + Aborting + + Updating adjusted branch failed. + (recording state in git...) + +And it was left in a similar detached head status: + + HEAD detached at 2aab85d + nothing to commit, working tree clean + +This seems be be a bug with the implementation of annex.adjustedbranchrefresh +"""]] diff --git a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_3_c8ce6c9fc35fa6ad5165ecf9a3592c9d._comment b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_3_c8ce6c9fc35fa6ad5165ecf9a3592c9d._comment new file mode 100644 index 0000000000..28215d4c79 --- /dev/null +++ b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_3_c8ce6c9fc35fa6ad5165ecf9a3592c9d._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2023-03-23T19:50:45Z" + content=""" +git-annex is what writes that detached HEAD, +in updateAdjustedBranch. And I've verified that when it fails like this, +git status shows the file is modified. But of course the modification is +only that the annex content has been replaced by an annex pointer. Running +`git add` on the file makes the modification status go away. + +And this only happens when annex.adjustedbranchrefresh=1. +At higher values, it calls Annex.Queue.flush, but at 1 it does not, and so +restagePointerFiles does not get called before adjustedBranchRefreshFull. +(Or at least may not, they're both running as cleanup actions and order is +not really defined.) + +I wonder if there are other situations where modifications can prevent +checkout of the updated adjusted branch? Eg, what if the user has made some +other modification to an annexed file? It seems wise for git-annex to +defend against it in depth, by making sure no crash can leave it in +detached head state. +"""]] diff --git a/doc/forum/How_to_recover_from_failed_branch_updates__63__/comment_1_f296d4870a6fdef5c57bc8bb1a1e0474._comment b/doc/forum/How_to_recover_from_failed_branch_updates__63__/comment_1_f296d4870a6fdef5c57bc8bb1a1e0474._comment new file mode 100644 index 0000000000..ed6f496d2d --- /dev/null +++ b/doc/forum/How_to_recover_from_failed_branch_updates__63__/comment_1_f296d4870a6fdef5c57bc8bb1a1e0474._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-23T19:30:12Z" + content=""" +You can just run "git checkout adjusted/master(unlockpresent)" +to switch back to that branch. + +It's fine to switch between adjusted and other branches, though the way git-annex +failed and left your head detached is certainly a bug. +"""]]
add comment
diff --git a/doc/bugs/gcrypt_remotes_using_relative_paths/comment_2_45281c947992c2ab124efd0f109255d6._comment b/doc/bugs/gcrypt_remotes_using_relative_paths/comment_2_45281c947992c2ab124efd0f109255d6._comment new file mode 100644 index 0000000000..4175db0e42 --- /dev/null +++ b/doc/bugs/gcrypt_remotes_using_relative_paths/comment_2_45281c947992c2ab124efd0f109255d6._comment @@ -0,0 +1,36 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2023-03-23T18:27:36Z" + content=""" +Analysis: When parsing the relative url, git-annex ends up constructing a +Repo with location `LocalUnknown "./rsync://user@user.rsync.net:relative/path/to/repo.git"` + +So, it thinks it's a local git repository, which of course does not exist, +so it skips syncing with it. + +Why does it do that? Well: + + ghci> import Network.URI + ghci> parseURI "rsync://user@user.rsync.net:relative/path/to/repo.git" + Nothing + +And Git.GCrypt.encryptedRemote strips off the "gcrypt::", +leaving that. + + ghci> r + Repo {location = LocalUnknown "/home/joey/tmp/bb", config = fromList [], fullconfig = fromList [], remoteName = Nothing, gitEnv = Nothing, gitEnvOverridesGitDir = False, gitGlobalOpts = [], gitDirSpecifiedExplicitly = False} + ghci> rr + Repo {location = Url gcrypt::rsync://user@user.rsync.net:relative/path/to/repo, config = fromList [], fullconfig = fromList [], remoteName = Just "test1", gitEnv = Nothing, gitEnvOverridesGitDir = False, gitGlobalOpts = [], gitDirSpecifiedExplicitly = False} + ghci> encryptedRemote r rr + Repo {location = LocalUnknown "/home/joey/tmp/bb/rsync://user@user.rsync.net:relative/path/to/repo.git", config = fromList [], fullconfig = fromList [], remoteName = Nothing, gitEnv = Nothing, gitEnvOverridesGitDir = False, gitGlobalOpts = [], gitDirSpecifiedExplicitly = False} + ghci> parseRemoteLocation "rsync://user@user.rsync.net:relative/path/to/repo" r + RemotePath "rsync://user@user.rsync.net:relative/path/to/repo" + +parseRemoteLocation uses isURI and that does not parse as a valid URI. + +So Git.GCrypt.encryptedRemote will need to force it to parse as an url in this case. + +(I remember that I fixed git-remote-gcrypt to support absolute urls for related +reasons. Those relative nonstandard urls are not a good idea.) +"""]]
update
diff --git a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment index b4050b6918..3d43aeff32 100644 --- a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment +++ b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment @@ -4,4 +4,6 @@ date="2023-03-23T19:26:12Z" content=""" I assume this is a --hide-missing adjusted branch? + +Update: Oh, I see from a forum post that it's --unlock-present actually. """]]
comment
diff --git a/doc/bugs/SQlite_failed_when_copying_to_remote_repository/comment_9_192c219b32c95954ec6400367474dd78._comment b/doc/bugs/SQlite_failed_when_copying_to_remote_repository/comment_9_192c219b32c95954ec6400367474dd78._comment new file mode 100644 index 0000000000..bfa4e2dc4a --- /dev/null +++ b/doc/bugs/SQlite_failed_when_copying_to_remote_repository/comment_9_192c219b32c95954ec6400367474dd78._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2023-03-23T19:27:33Z" + content=""" +I sent in a pull request today to persistent-sqlite +that will let git-annex fix this once accepted. +"""]]
comment
diff --git a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment new file mode 100644 index 0000000000..b4050b6918 --- /dev/null +++ b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop/comment_1_08b3eafdabe5f60ec2206584dff5d230._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-23T19:26:12Z" + content=""" +I assume this is a --hide-missing adjusted branch? +"""]]
response
diff --git a/doc/special_remotes/comment_52_02172d2b81de8c5e212e5ed093a01aa1._comment b/doc/special_remotes/comment_52_02172d2b81de8c5e212e5ed093a01aa1._comment new file mode 100644 index 0000000000..f15bf43f34 --- /dev/null +++ b/doc/special_remotes/comment_52_02172d2b81de8c5e212e5ed093a01aa1._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 52""" + date="2023-03-23T19:23:29Z" + content=""" +@gaknuyardi that is expected, they are hash directories. You can see the +same effect in .git/annex/object/ hash directories when there are enough +objects. +"""]]
sync: Fix parsing of gcrypt::rsync:// urls that use a relative path
Such an url is not valid; parseURI will fail on it. But git-annex doesn't
actually need to parse the url, because all it needs to do to support
syncing with it is know that it's not a local path, and use git pull and
push.
(Note that there is no good reason for the user to use such an url. An
absolute url is valid and I patched git-remote-gcrypt to support them
years ago. Still, users gonna do anything that tools allow, and
git-remote-gcrypt still supports them.)
Sponsored-by: Jack Hill on Patreon
Such an url is not valid; parseURI will fail on it. But git-annex doesn't
actually need to parse the url, because all it needs to do to support
syncing with it is know that it's not a local path, and use git pull and
push.
(Note that there is no good reason for the user to use such an url. An
absolute url is valid and I patched git-remote-gcrypt to support them
years ago. Still, users gonna do anything that tools allow, and
git-remote-gcrypt still supports them.)
Sponsored-by: Jack Hill on Patreon
diff --git a/Assistant/WebApp/Configurators/Edit.hs b/Assistant/WebApp/Configurators/Edit.hs index c430697097..3e09818936 100644 --- a/Assistant/WebApp/Configurators/Edit.hs +++ b/Assistant/WebApp/Configurators/Edit.hs @@ -235,7 +235,7 @@ editForm _new r@(RepoName _) = page "Edit repository" (Just Configuration) $ do Nothing -> getRepoInfo Nothing mempty g <- liftAnnex gitRepo mrepo <- liftAnnex $ maybe (pure Nothing) (Just <$$> Remote.getRepo) mr - let sshrepo = maybe False (remoteLocationIsSshUrl . flip parseRemoteLocation g . Git.repoLocation) mrepo + let sshrepo = maybe False (\repo -> remoteLocationIsSshUrl (parseRemoteLocation (Git.repoLocation repo) False g)) mrepo $(widgetFile "configurators/edit/nonannexremote") {- Makes any directory associated with the repository. -} diff --git a/Assistant/WebApp/Gpg.hs b/Assistant/WebApp/Gpg.hs index 20cd1504da..3723d03907 100644 --- a/Assistant/WebApp/Gpg.hs +++ b/Assistant/WebApp/Gpg.hs @@ -110,5 +110,5 @@ checkGCryptRepoEncryption location notencrypted notinstalled encrypted = - Only works if the gcrypt repo was created as a git-annex remote. -} probeGCryptRemoteUUID :: String -> Annex (Maybe UUID) probeGCryptRemoteUUID repolocation = do - r <- inRepo $ Git.Construct.fromRemoteLocation repolocation + r <- inRepo $ Git.Construct.fromRemoteLocation repolocation False GCrypt.getGCryptUUID False r diff --git a/Assistant/WebApp/RepoList.hs b/Assistant/WebApp/RepoList.hs index e89a782b63..980bd67ff2 100644 --- a/Assistant/WebApp/RepoList.hs +++ b/Assistant/WebApp/RepoList.hs @@ -186,12 +186,12 @@ repoList reposelector -- Skip gcrypt repos on removable drives; -- handled separately. case fromProposedAccepted <$> getconfig (Accepted "gitrepo") of - Just rr | remoteLocationIsUrl (parseRemoteLocation rr g) -> + Just rr | remoteLocationIsUrl (parseRemoteLocation rr False g) -> val True EnableSshGCryptR _ -> Nothing Just "git" -> case fromProposedAccepted <$> getconfig (Accepted "location") of - Just loc | remoteLocationIsSshUrl (parseRemoteLocation loc g) -> + Just loc | remoteLocationIsSshUrl (parseRemoteLocation loc False g) -> val True EnableSshGitRemoteR _ -> Nothing _ -> Nothing diff --git a/CHANGELOG b/CHANGELOG index 59890c2cb3..c901ffa1a7 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,9 @@ +git-annex (10.20230322) UNRELEASED; urgency=medium + + * sync: Fix parsing of gcrypt::rsync:// urls that use a relative path. + + -- Joey Hess <id@joeyh.name> Thu, 23 Mar 2023 15:04:41 -0400 + git-annex (10.20230321) upstream; urgency=medium * Using git-annex view in an adjusted branch, or git-annex adjust in a diff --git a/Command/Sync.hs b/Command/Sync.hs index 21f0df45ba..97c31502d6 100644 --- a/Command/Sync.hs +++ b/Command/Sync.hs @@ -220,7 +220,7 @@ seek' o = do let withbranch a = a =<< getCurrentBranch remotes <- syncRemotes (syncWith o) - -- Remotes that are git repositories, not special remotes. + -- Remotes that are git repositories, not (necesarily) special remotes. let gitremotes = filter (Remote.gitSyncableRemoteType . Remote.remotetype) remotes -- Remotes that contain annex object content. contentremotes <- filter (\r -> Remote.uuid r /= NoUUID) diff --git a/Git/Construct.hs b/Git/Construct.hs index f82a3e91a1..d0b4f95582 100644 --- a/Git/Construct.hs +++ b/Git/Construct.hs @@ -140,7 +140,7 @@ fromRemotes repo = catMaybes <$> mapM construct remotepairs filterkeys f = filterconfig (\(k,_) -> f k) remotepairs = filterkeys isRemoteUrlKey construct (k,v) = remoteNamedFromKey k $ - fromRemoteLocation (fromConfigValue v) repo + fromRemoteLocation (fromConfigValue v) False repo {- Sets the name of a remote when constructing the Repo to represent it. -} remoteNamed :: String -> IO Repo -> IO Repo @@ -156,9 +156,15 @@ remoteNamedFromKey k r = case remoteKeyToRemoteName k of Just n -> Just <$> remoteNamed n r {- Constructs a new Repo for one of a Repo's remotes using a given - - location (ie, an url). -} -fromRemoteLocation :: String -> Repo -> IO Repo -fromRemoteLocation s repo = gen $ parseRemoteLocation s repo + - location (ie, an url). + - + - knownurl can be true if the location is known to be an url. This allows + - urls that don't parse as urls to be used, returning UnparseableUrl. + - If knownurl is false, the location may still be an url, if it parses as + - one. + -} +fromRemoteLocation :: String -> Bool -> Repo -> IO Repo +fromRemoteLocation s knownurl repo = gen $ parseRemoteLocation s knownurl repo where gen (RemotePath p) = fromRemotePath p repo gen (RemoteUrl u) = fromUrl u diff --git a/Git/GCrypt.hs b/Git/GCrypt.hs index 072598b755..07db13c1c0 100644 --- a/Git/GCrypt.hs +++ b/Git/GCrypt.hs @@ -55,7 +55,15 @@ encryptedRemote baserepo = go -- allows them); need to de-escape any such -- to get back the path to the repository. l' = Network.URI.unEscapeString l - in fromRemoteLocation l' baserepo + -- gcrypt supports relative urls for rsync + -- like "rsync://host:relative/path" + -- but that does not parse as a valid url + -- (while the absolute urls it supports are + -- valid). + -- In order to support it, force treating it as + -- an url. + knownurl = "rsync://" `isPrefixOf` l' + in fromRemoteLocation l' knownurl baserepo | otherwise = notencrypted notencrypted = giveup "not a gcrypt encrypted repository" diff --git a/Git/Remote.hs b/Git/Remote.hs index e6036a7b2c..9cdaad61ca 100644 --- a/Git/Remote.hs +++ b/Git/Remote.hs @@ -63,7 +63,7 @@ makeLegalName s = case filter legal $ replace "/" "_" s of legal c = isAlphaNum c data RemoteLocation = RemoteUrl String | RemotePath FilePath - deriving (Eq) + deriving (Eq, Show) remoteLocationIsUrl :: RemoteLocation -> Bool remoteLocationIsUrl (RemoteUrl _) = True @@ -75,16 +75,18 @@ remoteLocationIsSshUrl _ = False {- Determines if a given remote location is an url, or a local - path. Takes the repository's insteadOf configuration into account. -} -parseRemoteLocation :: String -> Repo -> RemoteLocation -parseRemoteLocation s repo = ret $ calcloc s +parseRemoteLocation :: String -> Bool -> Repo -> RemoteLocation +parseRemoteLocation s knownurl repo = go where - ret v + s' = calcloc s + go #ifdef mingw32_HOST_OS - | dosstyle v = RemotePath (dospath v) + | dosstyle s' = RemotePath (dospath s') #endif - | scpstyle v = RemoteUrl (scptourl v) - | urlstyle v = RemoteUrl v - | otherwise = RemotePath v + | scpstyle s' = RemoteUrl (scptourl s') + | urlstyle s' = RemoteUrl s' + | knownurl && s' == s = RemoteUrl s' + | otherwise = RemotePath s' -- insteadof config can rewrite remote location calcloc l | null insteadofs = l diff --git a/Remote/GCrypt.hs b/Remote/GCrypt.hs index fb5b5aafbc..57e675e43f 100644 --- a/Remote/GCrypt.hs +++ b/Remote/GCrypt.hs @@ -266,7 +266,7 @@ gCryptSetup _ mu _ c gc = go $ fromProposedAccepted <$> M.lookup gitRepoField c let u = genUUIDInNameSpace gCryptNameSpace gcryptid if Just u == mu || isNothing mu then do - method <- setupRepo gcryptid =<< inRepo (Git.Construct.fromRemoteLocation gitrepo) + method <- setupRepo gcryptid =<< inRepo (Git.Construct.fromRemoteLocation gitrepo False) gitConfigSpecialRemote u c' [("gcrypt", fromAccessMethod method)] return (c', u) else giveup $ "uuid mismatch; expected " ++ show mu ++ " but remote gitrepo has " ++ show u ++ " (" ++ show gcryptid ++ ")" diff --git a/Remote/Git.hs b/Remote/Git.hs index 2fc5867058..afe73cb25e 100644 --- a/Remote/Git.hs +++ b/Remote/Git.hs @@ -102,7 +102,7 @@ list autoinit = do Nothing -> return r Just url -> inRepo $ \g -> Git.Construct.remoteNamed n $ - Git.Construct.fromRemoteLocation (Git.fromConfigValue url) g + Git.Construct.fromRemoteLocation (Git.fromConfigValue url) False g {- Git remotes are normally set up using standard git commands, not - git-annex initremote and enableremote. @@ -118,7 +118,7 @@ gitSetup :: SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> Remote gitSetup Init mu _ c _ = do let location = maybe (giveup "Specify location=url") fromProposedAccepted $ M.lookup locationField c - r <- inRepo $ Git.Construct.fromRemoteLocation location + r <- inRepo $ Git.Construct.fromRemoteLocation location False r' <- tryGitConfigRead False r False let u = getUncachedUUID r' if u == NoUUID diff --git a/Remote/GitLFS.hs b/Remote/GitLFS.hs index 621089f203..6e2607e321 100644 --- a/Remote/GitLFS.hs (Diff truncated)
comment
diff --git a/doc/bugs/gcrypt_remotes_using_relative_paths/comment_1_b4523baf0cdfc2ad70cded93b1629eca._comment b/doc/bugs/gcrypt_remotes_using_relative_paths/comment_1_b4523baf0cdfc2ad70cded93b1629eca._comment new file mode 100644 index 0000000000..7f589384a2 --- /dev/null +++ b/doc/bugs/gcrypt_remotes_using_relative_paths/comment_1_b4523baf0cdfc2ad70cded93b1629eca._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-23T17:53:22Z" + content=""" +The lack of a transcript made this bug hard to understand. After about +half an hour of staring at it, I realized that you mean that `git-annex sync` +does not even attempt to sync with the remote when it has the relative url: + + joey@darkstar:~/tmp/bb>git remote add test gcrypt::rsync://user@user.rsync.net:relative/path/to/repo + joey@darkstar:~/tmp/bb>git-annex sync + commit + On branch master + nothing to commit, working tree clean + ok + +It does sync with it when asked to sync with explicitly that remote: + + joey@darkstar:~/tmp/bb>git-annex sync test + commit + On branch master + nothing to commit, working tree clean + ok + pull test + gcrypt: Repository not found: rsync://user@user.rsync.net:relative/path/to/repo + ok + +With the absolute url, it does sync with it when no remote +is explicitly specified. +"""]]
Added a comment
diff --git a/doc/todo/Configuring_metadata_view_filenames/comment_5_b222634d9f97e2ef604b476df357d54b._comment b/doc/todo/Configuring_metadata_view_filenames/comment_5_b222634d9f97e2ef604b476df357d54b._comment new file mode 100644 index 0000000000..4eb0f26f99 --- /dev/null +++ b/doc/todo/Configuring_metadata_view_filenames/comment_5_b222634d9f97e2ef604b476df357d54b._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="Xyem" + avatar="http://cdn.libravatar.org/avatar/dea2d98057721c21166c3721f6d55c06" + subject="comment 5" + date="2023-03-23T17:05:24Z" + content=""" +Has the format been changed since this previously asked? I am currently trying to leverage git-annex and its metadata views with AI tooling, but the format seems to be filename_%path%, resulting in the extension being in the middle of the path. I have set `annex.maxextensionlength` to `12` so the extensions are present on the files in the backend. + + $ git annex view type=model model/=* + $ ls -lr + .: + sd + + ./sd: + v1.4.safetensors_%model%sd% v1.5.safetensors_%model%sd% + +whereas I would expect (or rather, I am trying to achieve): + + $ ls -lr + .: + sd + + ./sd: + v1.4.safetensors v1.5.safetensors + +"""]]
Added a comment
diff --git a/doc/bugs/One_Client_Not_Syncing_Content/comment_3_d8a05e086ee41ef9bcc096d29ac8e8f8._comment b/doc/bugs/One_Client_Not_Syncing_Content/comment_3_d8a05e086ee41ef9bcc096d29ac8e8f8._comment new file mode 100644 index 0000000000..0f006d0c3d --- /dev/null +++ b/doc/bugs/One_Client_Not_Syncing_Content/comment_3_d8a05e086ee41ef9bcc096d29ac8e8f8._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="john" + avatar="http://cdn.libravatar.org/avatar/dae0abf1394490f543ce2b67b395ea28" + subject="comment 3" + date="2023-03-23T11:37:52Z" + content=""" +One of the best game <a href=\"https://fundohub.com/word-scramble/\">word scramble</a> which helps in mind. +"""]]
Added a comment: Understanding encrypted special remote folder structure
diff --git a/doc/special_remotes/comment_51_1a8a217dc31b0a36dfa9975f72d40b3a._comment b/doc/special_remotes/comment_51_1a8a217dc31b0a36dfa9975f72d40b3a._comment new file mode 100644 index 0000000000..59bb3cf8d3 --- /dev/null +++ b/doc/special_remotes/comment_51_1a8a217dc31b0a36dfa9975f72d40b3a._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="gaknuyardi@f7280525ccd44eafd8d1485ec087f27532efd2e9" + nickname="gaknuyardi" + avatar="http://cdn.libravatar.org/avatar/2cfaf95a0836c7a00ef3efb16c3315ae" + subject="Understanding encrypted special remote folder structure" + date="2023-03-22T19:26:55Z" + content=""" +@joey While inspecting my special remote (rsync, encryption=hybrid) I noticed that a couple files ended up in the same folder tree, is that normal or is something wrong? Obfuscated tree output below. + +``` +├── [ 0] xxx +│ └── [ 0] yyy +│ ├── [ 0] GPGHMACSHA512--1234 +│ │ └── [29M] GPGHMACSHA512--1234 +│ └── [ 0] GPGHMACSHA512--5678 +│ └── [ 2G] GPGHMACSHA512--5678 +``` +"""]]
remove appveyor badge, which does not work
diff --git a/doc/builds.mdwn b/doc/builds.mdwn index ce7b60861c..1f1df94037 100644 --- a/doc/builds.mdwn +++ b/doc/builds.mdwn @@ -62,6 +62,4 @@ <img src="https://github.com/datalad/git-annex/workflows/Build%20git-annex%20on%20Ubuntu/badge.svg"> </a> <h2>Appveyor</h2> -<a href="https://ci.appveyor.com/project/mih/git-annex"> -<img src="https://ci.appveyor.com/api/projects/status/mih/git-annex?retina=true"> -</a> +<a href="https://ci.appveyor.com/project/mih/git-annex">here</a>
diff --git a/doc/forum/How_to_recover_from_failed_branch_updates__63__.mdwn b/doc/forum/How_to_recover_from_failed_branch_updates__63__.mdwn new file mode 100644 index 0000000000..6a227a99d3 --- /dev/null +++ b/doc/forum/How_to_recover_from_failed_branch_updates__63__.mdwn @@ -0,0 +1,28 @@ +Due to the issue reported in [[bugs/Failed_adjusted_branch_update_after_error_in_drop]], my git-annex repo is now is a limbo state: + + +``` +$ git branch --show-current +adjusted/master(unlockpresent) + +$ git annex drop dirF/ +drop dirF/a/a1.txt ok +drop dirF/a/a2.txt ok +[...] +error: Your local changes to the following files would be overwritten by checkout: + dirF/a/a1.txt + dirF/a/a2.txt + [...] + +Aborting + + Updating adjusted branch failed. +(recording state in git...) +$ git status +HEAD detached at 9d92415fb +nothing to commit, working tree clean +``` + +What is the recommended course of action to (safely) return to the `adjusted/master(unlockpresent)` branch? + +Is it safe to just use `git annex adjust --unlock-present`?
diff --git a/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop.mdwn b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop.mdwn new file mode 100644 index 0000000000..7e4f38e9cd --- /dev/null +++ b/doc/bugs/Failed_adjusted_branch_update_after_error_in_drop.mdwn @@ -0,0 +1,56 @@ +### Please describe the problem. + +git-annex failed to update the local adjusted branch after `git annex drop` stopped abruptly. In turn, `git annex drop` stopped abruptly because a `git annex copy --from X --to Y` had to be stopped (CTRL-C) because of lack of space. + +(Perhaps there are two issues here: `drop` being unable to cope with the unexpected situation, and the adjustment code being unable to cope with the weird stat left by `drop`'s abrupt interruption.) + +### What steps will reproduce the problem? + +``` +$ git copy -J4 --from X --to Y +[notice that git-annex starts saying "not enough free space, need 27.87 MB more"] +[kill process with CTRL-C] +$ git annex drop dirF/ +drop dirF/a/a1.txt ok +drop dirF/a/a2.txt ok +[...] +error: Your local changes to the following files would be overwritten by checkout: + dirF/a/a1.txt + dirF/a/a2.txt + [...] + +Aborting + + Updating adjusted branch failed. +(recording state in git...) +$ git status +HEAD detached at 9d92415fb +nothing to commit, working tree clean +$ git annex status +$ git branch +* (HEAD detached at 9d92415fb) + adjusted/master(unlocked) + adjusted/master(unlockpresent) + git-annex + master + synced/master +``` + +`dirF/` is the directory whose files were being copied when the process has been stopped with `CTRL-C`. + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20230215-gd24914f2a +[...] +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 8 +``` + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +git-annex is too good. It so rarely causes problems that one does not develop the "git-annex troubleshooting muscle". :) + +
diff --git a/doc/forum/Clarifications_about_how_to_work_with_git-annex.mdwn b/doc/forum/Clarifications_about_how_to_work_with_git-annex.mdwn new file mode 100644 index 0000000000..9f8943f00c --- /dev/null +++ b/doc/forum/Clarifications_about_how_to_work_with_git-annex.mdwn @@ -0,0 +1,11 @@ +Hello. I am a kinda-new git-annex user. I think git-annex is very difficult to understand. Thus, in this topic, I would like to write out my explanation of how git-annex works and how I can work with it, so that you can tell me if I am misunderstanding something. + +I am a machine learning engineer. I use git-annex to store code together data (large images) and artefacts (large images and model weights). Basically, I am trying to use git-annex as normal git, but with the ability to store large binary files. There is a central bare repository and a bunch of non-bare repositories cloned from it on different computers. Usually, the non-bare repositories connect only to the central bare repository, but sometimes I push/fetch between non-bare repositories directly. + +I can fetch/push/merge/pull the normal branches I work in just as I do with normal git (without annex). However, the git-annexed files aren't fetched/pushed this way because the branch actually contains only symlinks, not the files themselves. I have disabled all the automatic mergin, pulling, etc. functionality of git because I like to have total control. So, I merge or rebase everything manually when there is a need for it. And I never use pull. + +In each repository, there is a branch called git-annex. It contains some metadata that git-annex uses. While in a repository repo1, I can do `git annex sync --only-annex --no-content --no-commit --no-pull --no-push --no-resolvemerge repo2` and git-annex will use some magic (consisting of pushing, pulling, and some kind of automatic merging) to sync the git-annex branch of repo1 and repo2, i.e. it will make them contain the same metadata. The options `--no-commit --no-pull --no-push --no-resolvemerge` are needed to disable the dark magic that is useful for casual users but not for software developers who use git-annex as a git addon. The option `--only-annex` prevents git-annex from creating "synced" branches which are, as far as I understand, another piece of dark magic useful for casual users but not for software developers. If I want, I can remove the `--no-content` flag and git-annex will also download and/or upload the annexed data (does it affect only the data available in the current branch? or is it all data? i'm not sure). This is the only command I need to know to sync the git-annex branch. Supposedly, it's possible to do via normal git fetches, pushes, merges, and maybe pulls, but I don't know how to do that. + +The actual annexed data is stored somewhere in the .git directory. I don't need to worry where. What I need to know is that I can use `git annex copy`, `git annex get`, and `git annex sync --only-annex --no-commit --no-pull --no-push --no-resolvemerge otherrepo` with appropriate paths to copy the annexed data between repositories. + +Ok, so I can use `git annex sync` with a bunch of flags to sync the git-annex branch, I can use `git annex sync`, `git annex get`, `git annex copy` to copy the data around, and I don't need the synced branches. Is my understanding correct?
Added a comment
diff --git a/doc/bugs/SQlite_failed_when_copying_to_remote_repository/comment_8_1501223d25a03c27a7ceffc6b0ea32a3._comment b/doc/bugs/SQlite_failed_when_copying_to_remote_repository/comment_8_1501223d25a03c27a7ceffc6b0ea32a3._comment new file mode 100644 index 0000000000..c9fb1de251 --- /dev/null +++ b/doc/bugs/SQlite_failed_when_copying_to_remote_repository/comment_8_1501223d25a03c27a7ceffc6b0ea32a3._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="hurlebouc" + avatar="http://cdn.libravatar.org/avatar/bda734a6d937c1fe0c9778a6eaefffbc" + subject="comment 8" + date="2023-03-22T05:47:16Z" + content=""" +Thank you very much for the analysis! This reinforces my conviction that git-annex is a very high quality project. +"""]]
Added a comment
diff --git a/doc/forum/git_annex_copy_to_google_drive_is_very_slow/comment_2_539dbce8852a9b6ae20ff0e55e74fc2a._comment b/doc/forum/git_annex_copy_to_google_drive_is_very_slow/comment_2_539dbce8852a9b6ae20ff0e55e74fc2a._comment new file mode 100644 index 0000000000..c87249890a --- /dev/null +++ b/doc/forum/git_annex_copy_to_google_drive_is_very_slow/comment_2_539dbce8852a9b6ae20ff0e55e74fc2a._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="carlos@4c213b52601d57b650b22d9a246c59aea2c8f859" + nickname="carlos" + avatar="http://cdn.libravatar.org/avatar/ef0152bec3818b24d9318f6e7013e104" + subject="comment 2" + date="2023-03-21T21:34:01Z" + content=""" +Hi Joey, + +This is my current effort to copy ~6.7 GB using Datalad (git annex with rclone+Gdrive in the background) + +``` +Total: 53%|█████████████████████████████████████████████████████████████████▏ | 3.49G/6.53G [8:25:38<7:21:03, 115k Bytes/s] +``` + +There are many small files in this repository for sure, but I haven't been able to get specific advice as to whether these very small speeds (even copying to an external drive took many hours) are expected given the bottlenecks you mention. Any feedback is welcome. +"""]]
diff --git a/doc/bugs/gcrypt_remotes_using_relative_paths.mdwn b/doc/bugs/gcrypt_remotes_using_relative_paths.mdwn new file mode 100644 index 0000000000..68edcc74da --- /dev/null +++ b/doc/bugs/gcrypt_remotes_using_relative_paths.mdwn @@ -0,0 +1,38 @@ +### Please describe the problem. +`git annex sync` is not automatically run for gcrypt remotes using rsync with a relative path + +### What steps will reproduce the problem? +Flow 1 (relative path, broken) + +* `git remote add test gcrypt::rsync://user@user.rsync.net:relative/path/to/repo` +* `git annex sync` -> DOES NOT SYNC to test remote +* Nothing has been synced so I CANNOT successfully clone from the test remote with `git clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo` +* `git push test git-annex master` +* I can successfully clone from the test remote with `git clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo` + +Flow 2 (absolute path, working) + +* `git remote add test gcrypt::rsync://user@user.rsync.net/full/path/to/repo` +* `git annex sync` -> DOES SYNC to test remote +* I can successfully clone from the test remote with `git clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo` + + +### What version of git-annex are you using? On what operating system? +* Debian 11 +* git-annex version: 10.20230227 +* git-remote-gcrypt version 1.5 + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +I am VERY happy with git annex and am using it successfully with a gcrypt remote using an absolute path :) +
add news item for git-annex 10.20230321
diff --git a/doc/news/version_10.20221103.mdwn b/doc/news/version_10.20221103.mdwn deleted file mode 100644 index e7374b7c48..0000000000 --- a/doc/news/version_10.20221103.mdwn +++ /dev/null @@ -1,22 +0,0 @@ -git-annex 10.20221103 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Doubled the speed of git-annex drop when operating on many files, - and of git-annex get when operating on many tiny files. - * trust, untrust, semitrust, dead: Fix behavior when provided with - multiple repositories to operate on. - * trust, untrust, semitrust, dead: When provided with no parameters, - do not operate on a repository that has an empty name. - * move: Fix openFile crash with -J - (Fixes a reversion in 8.20201103) - * S3: Speed up importing from a large bucket when fileprefix= is set, - by only asking for files under the prefix. - * When importing from versioned remotes, fix tracking of the content - of deleted files. - * More robust handling of ErrorBusy when writing to sqlite databases. - * Avoid hanging when a suspended git-annex process is keeping a sqlite - database locked. - * Make --batch mode handle unstaged annexed files consistently - whether the file is unlocked or not. Note that this changes the - behavior of --batch when it is provided with locked files that are - in the process of being added to the repository, but have not yet been - staged in git. - * Make git-annex enable-tor work when using the linux standalone build."""]] \ No newline at end of file diff --git a/doc/news/version_10.20230321.mdwn b/doc/news/version_10.20230321.mdwn new file mode 100644 index 0000000000..3348f054e1 --- /dev/null +++ b/doc/news/version_10.20230321.mdwn @@ -0,0 +1,19 @@ +git-annex 10.20230321 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * Using git-annex view in an adjusted branch, or git-annex adjust in a + view branch, will enter an adjusted view branch. + * sync: Fix a reversion that prevented sending files to exporttree=yes + remotes when annex-tracking-branch was configured to branch:subdir + (Introduced in version 10.20230214) + * status: This command is deprecated because it was only needed in direct + mode; git status --short is very similar. + * Windows: Support long filenames in more (possibly all) of the code. + * Added arm64 build for ancient kernels, needed to support Android phones + whose kernels are too old to support kernels used by the current arm64 + build. + * importfeed: Display feed title. + * init: Support being ran in a repository that has a newline in its path. + * copy: When --from and --to are combined and the content is already + present on the destination remote, update location tracking as + necessary. + * Fixed spelling of some messages and added a .codespellrc + Thanks, Yaroslav Halchenko"""]] \ No newline at end of file
comment
diff --git a/doc/forum/git_annex_copy_to_google_drive_is_very_slow/comment_1_f3085ef9ddff51ee4110cc1000d26da2._comment b/doc/forum/git_annex_copy_to_google_drive_is_very_slow/comment_1_f3085ef9ddff51ee4110cc1000d26da2._comment new file mode 100644 index 0000000000..42f02aa943 --- /dev/null +++ b/doc/forum/git_annex_copy_to_google_drive_is_very_slow/comment_1_f3085ef9ddff51ee4110cc1000d26da2._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-21T18:03:40Z" + content=""" +How large are the individual files? + +With small enough files, non-bandwidth related overhead and/or TCL slow +start can take more time than copying the file does, preventing saturating +a connection. + +It sometimes helps to use -J10 or so. + +The --fast option can also speed up `git-annex copy`, +see the [[git-annex-copy]] man page for details about it. +"""]]
comment
diff --git a/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__/comment_3_0cdd9683709d55bb7340fb41fef265fa._comment b/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__/comment_3_0cdd9683709d55bb7340fb41fef265fa._comment new file mode 100644 index 0000000000..793aba7921 --- /dev/null +++ b/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__/comment_3_0cdd9683709d55bb7340fb41fef265fa._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2023-03-21T17:57:11Z" + content=""" +@jwiegley if you're still around, I'm curious what the answer was. + +I mean, I know these are unlocked annexed files. +But I'm not clear about what was going on here, which seems like it might +be some kind of bug. +"""]]
comment
diff --git a/doc/forum/Lost_git_repository._Recovery_from_S3_remote__63__/comment_2_0dbf6fcd15007ba66323ecf2b154ad2e._comment b/doc/forum/Lost_git_repository._Recovery_from_S3_remote__63__/comment_2_0dbf6fcd15007ba66323ecf2b154ad2e._comment new file mode 100644 index 0000000000..9c14f9185b --- /dev/null +++ b/doc/forum/Lost_git_repository._Recovery_from_S3_remote__63__/comment_2_0dbf6fcd15007ba66323ecf2b154ad2e._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2023-03-21T17:54:02Z" + content=""" +@Lukey is probably right. The one exception would be if the +S3 remote was set up with encryption=pubkey. Then the files +stored in the bucket would be simply encrypted to your gpg public key, and +could be decrypted without needing any other information that you've lost. + +(The filenames would still not be able to be recovered.) +"""]]
not git-annex specific
diff --git a/doc/forum/git_clone_over_ssh_hangs/comment_1_07ffe356ef5c441118cc9329060812f2._comment b/doc/forum/git_clone_over_ssh_hangs/comment_1_07ffe356ef5c441118cc9329060812f2._comment new file mode 100644 index 0000000000..49cb60afe8 --- /dev/null +++ b/doc/forum/git_clone_over_ssh_hangs/comment_1_07ffe356ef5c441118cc9329060812f2._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-21T17:49:31Z" + content=""" +Well, this is a git question, not a git-annex question. +There is no git-annex involved in what you showed. +You might have better luck on a forum specific to git. + +(Especially if it really involves some problem with sidebound packets, +which is deeper on git protocol than I can go.) +"""]]
this comment section is not a BTS, part N+1
diff --git a/doc/sync/comment_31_c85edac65571caff70e87dff2317a4e5._comment b/doc/sync/comment_31_c85edac65571caff70e87dff2317a4e5._comment new file mode 100644 index 0000000000..7c524394fa --- /dev/null +++ b/doc/sync/comment_31_c85edac65571caff70e87dff2317a4e5._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 31""" + date="2023-03-21T17:47:45Z" + content=""" +@talmukoydu you need to file a bug report and include things like the +version of git-annex you are using.. +<https://git-annex.branchable.com/bugs/> +"""]]
diff --git a/doc/forum/git_clone_over_ssh_hangs.mdwn b/doc/forum/git_clone_over_ssh_hangs.mdwn new file mode 100644 index 0000000000..f385937d2e --- /dev/null +++ b/doc/forum/git_clone_over_ssh_hangs.mdwn @@ -0,0 +1,17 @@ +I am trying to use git-annex to sync a folder with a backup drive over the network. +I have followed the walkthrough and run a git annex init in a TrueNAS jail and now I am trying to use git clone through ssh to link the drive and the jail together. +The problem is that the clone in the backup drive just hangs. It will enumerate the files and count them and then it just seems to do nothing. + + +Command: git clone ssh://user@ip/mnt/dir ./driveBackup +Output: +Cloning into './repo'... +remote: Enumerating objects: 1543, done. +remote: Counting objects: 100% (1543/1543), done. + +...Hangs here nothing obvious happens... + +Running it in verbose mode reveals nothing. When you stop the clone it will complain about some sideband packet. + +I don't know if it helps, but I have set up passwordless ssh login and I have verified that it works by running ssh user@ip successfully from the backup system. +Can someone help me please?
diff --git a/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn b/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn index d15cf46bd5..90bdea14d7 100644 --- a/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn +++ b/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn @@ -6,4 +6,6 @@ I'm focusing on `git annex` because I sidestep datalad by using `git annex copy Using an external drive instead of Google Drive was a little better, but it still took hours to copy the 10 GB. Not sure what's going on. +I'm running this on a M1 MacBook Pro. + Any ideas on how to troubeshoot this?
diff --git a/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn b/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn index 9db7a7e65f..d15cf46bd5 100644 --- a/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn +++ b/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn @@ -1,6 +1,6 @@ I'm trying to use [datalad](http://www.datalad.org) to manage some scientific data repositories. Datalad uses git annex. -I've set up an annex for my datalad/git repository using `git-annex-remote-rclone` [website](https://github.com/git-annex-remote-rclone/git-annex-remote-rclone). The setup went fine, but the transfers with a Gigabit connection are of the order of 50-100 kbs. I'm trying to troubleshoot the issue. I'm a new user of git annex. The repository has about 10 GB of stuff. +I've set up an annex for my datalad/git repository using [`git-annex-remote-rclone`](https://github.com/git-annex-remote-rclone/git-annex-remote-rclone). The setup went fine, but the transfers with a Gigabit connection are of the order of 50-100 kbs. I'm trying to troubleshoot the issue. I'm a new user of git annex. The repository has about 10 GB of stuff. I'm focusing on `git annex` because I sidestep datalad by using `git annex copy --to=gdrive2`, this is as slow as using `datalad push --to=gdrive2`, which makes sense as the latter is a thin wrapper around `git-annex-copy`.
diff --git a/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn b/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn index c67176c7fd..9db7a7e65f 100644 --- a/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn +++ b/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn @@ -1,4 +1,4 @@ -I'm trying to use datalad (http://www.datalad.org) to manage some scientific data repositories. Datalad uses git annex. +I'm trying to use [datalad](http://www.datalad.org) to manage some scientific data repositories. Datalad uses git annex. I've set up an annex for my datalad/git repository using `git-annex-remote-rclone` [website](https://github.com/git-annex-remote-rclone/git-annex-remote-rclone). The setup went fine, but the transfers with a Gigabit connection are of the order of 50-100 kbs. I'm trying to troubleshoot the issue. I'm a new user of git annex. The repository has about 10 GB of stuff.
diff --git a/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn b/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn new file mode 100644 index 0000000000..c67176c7fd --- /dev/null +++ b/doc/forum/git_annex_copy_to_google_drive_is_very_slow.mdwn @@ -0,0 +1,9 @@ +I'm trying to use datalad (http://www.datalad.org) to manage some scientific data repositories. Datalad uses git annex. + +I've set up an annex for my datalad/git repository using `git-annex-remote-rclone` [website](https://github.com/git-annex-remote-rclone/git-annex-remote-rclone). The setup went fine, but the transfers with a Gigabit connection are of the order of 50-100 kbs. I'm trying to troubleshoot the issue. I'm a new user of git annex. The repository has about 10 GB of stuff. + +I'm focusing on `git annex` because I sidestep datalad by using `git annex copy --to=gdrive2`, this is as slow as using `datalad push --to=gdrive2`, which makes sense as the latter is a thin wrapper around `git-annex-copy`. + +Using an external drive instead of Google Drive was a little better, but it still took hours to copy the 10 GB. Not sure what's going on. + +Any ideas on how to troubeshoot this?
Added a comment
diff --git a/doc/forum/Lost_git_repository._Recovery_from_S3_remote__63__/comment_1_a9028f00461ec1eddbeb699dadcbab72._comment b/doc/forum/Lost_git_repository._Recovery_from_S3_remote__63__/comment_1_a9028f00461ec1eddbeb699dadcbab72._comment new file mode 100644 index 0000000000..6dae8e786d --- /dev/null +++ b/doc/forum/Lost_git_repository._Recovery_from_S3_remote__63__/comment_1_a9028f00461ec1eddbeb699dadcbab72._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="Lukey" + avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b" + subject="comment 1" + date="2023-03-20T17:59:11Z" + content=""" +Not a chance. git-annex uses symetric encryption, storing the key in the git-annex branch. See [[design/encryption/]]. + +Are you sure you don't have other clones of the repo? + +How exactly did you loose the repo? Maybe its possible to recover it? +"""]]
Added a comment: RE: `git annex sync` not automatically syncing gcrypt remotes using relative paths
diff --git a/doc/sync/comment_30_f75f5957dbd0f6fd7b2d7291f06e7489._comment b/doc/sync/comment_30_f75f5957dbd0f6fd7b2d7291f06e7489._comment new file mode 100644 index 0000000000..41a455d79d --- /dev/null +++ b/doc/sync/comment_30_f75f5957dbd0f6fd7b2d7291f06e7489._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="talmukoydu@ab15521191b4d02584d003f3f211d90f575d5ebb" + nickname="talmukoydu" + avatar="http://cdn.libravatar.org/avatar/965ffeb49ec136674054e50928ddb2ed" + subject="RE: `git annex sync` not automatically syncing gcrypt remotes using relative paths" + date="2023-03-19T19:27:46Z" + content=""" +@joey definitely seems like a bug. I am able to easily verify by changing the remote url back and forth in the .git/config and then running git annex sync. If the relative url is used git annex sync does not sync to that remote. +"""]]
Added a comment: `git annex sync` not automatically syncing gcrypt remotes using relative paths
diff --git a/doc/sync/comment_29_161c5d3f693de45070e037d27ee7e8aa._comment b/doc/sync/comment_29_161c5d3f693de45070e037d27ee7e8aa._comment new file mode 100644 index 0000000000..c234527fac --- /dev/null +++ b/doc/sync/comment_29_161c5d3f693de45070e037d27ee7e8aa._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="talmukoydu@ab15521191b4d02584d003f3f211d90f575d5ebb" + nickname="talmukoydu" + avatar="http://cdn.libravatar.org/avatar/965ffeb49ec136674054e50928ddb2ed" + subject="`git annex sync` not automatically syncing gcrypt remotes using relative paths" + date="2023-03-19T19:20:44Z" + content=""" +@joey Is this a bug or am I missing something? + +Notes: + +* I am using the latest git-remote-gcrypt, version 1.5 + +Flow 1 + +* `git remote add test gcrypt::rsync://user@user.rsync.net:relative/path/to/repo` +* `git annex sync` -> DOES NOT SYNC to test remote +* Nothing has been synced so I CANNOT successfully clone from the test remote with `git clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo` +* `git push test git-annex master` +* I can successfully clone from the test remote with `git clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo` + +Flow 2 + +* `git remote add test gcrypt::rsync://user@user.rsync.net/full/path/to/repo` +* `git annex sync` -> DOES SYNC to test remote +* I can successfully clone from the test remote with `git clone gcrypt::rsync://user@user.rsync.net:relative/path/to/repo` +"""]]
removed
diff --git a/doc/sync/comment_29_ea7ae356fecea4b0cc846146312cc48d._comment b/doc/sync/comment_29_ea7ae356fecea4b0cc846146312cc48d._comment deleted file mode 100644 index 0419882289..0000000000 --- a/doc/sync/comment_29_ea7ae356fecea4b0cc846146312cc48d._comment +++ /dev/null @@ -1,20 +0,0 @@ -[[!comment format=mdwn - username="talmukoydu@ab15521191b4d02584d003f3f211d90f575d5ebb" - nickname="talmukoydu" - avatar="http://cdn.libravatar.org/avatar/965ffeb49ec136674054e50928ddb2ed" - subject="`git annex sync` not syncing automatically with gcrypt remotes" - date="2023-03-19T19:08:13Z" - content=""" -@joey Is this a bug or am I missing something? - -Notes: -* I am using the latest `git-remote-gcrypt`, version 1.5 - -* `ssh user@user.rsync.net \"git init --bare repo\"` -* `git remote add test gcrypt::rsync://xxxx@xxxx.rsync.net:repo` -* `git annex sync -> DOES NOT SYNC to test remote` - -* `ssh user@user.rsync.net \"git init --bare repo\"` -* `git remote add test gcrypt::xxxx@xxxx.rsync.net/full/path/to/repo` -* `git annex sync -> DOES SYNC to test remote` -"""]]
Added a comment: `git annex sync` not syncing automatically with gcrypt remotes
diff --git a/doc/sync/comment_29_ea7ae356fecea4b0cc846146312cc48d._comment b/doc/sync/comment_29_ea7ae356fecea4b0cc846146312cc48d._comment new file mode 100644 index 0000000000..0419882289 --- /dev/null +++ b/doc/sync/comment_29_ea7ae356fecea4b0cc846146312cc48d._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="talmukoydu@ab15521191b4d02584d003f3f211d90f575d5ebb" + nickname="talmukoydu" + avatar="http://cdn.libravatar.org/avatar/965ffeb49ec136674054e50928ddb2ed" + subject="`git annex sync` not syncing automatically with gcrypt remotes" + date="2023-03-19T19:08:13Z" + content=""" +@joey Is this a bug or am I missing something? + +Notes: +* I am using the latest `git-remote-gcrypt`, version 1.5 + +* `ssh user@user.rsync.net \"git init --bare repo\"` +* `git remote add test gcrypt::rsync://xxxx@xxxx.rsync.net:repo` +* `git annex sync -> DOES NOT SYNC to test remote` + +* `ssh user@user.rsync.net \"git init --bare repo\"` +* `git remote add test gcrypt::xxxx@xxxx.rsync.net/full/path/to/repo` +* `git annex sync -> DOES SYNC to test remote` +"""]]
add appveyor build badge
diff --git a/doc/builds.mdwn b/doc/builds.mdwn index 4d6ec9c7ac..ce7b60861c 100644 --- a/doc/builds.mdwn +++ b/doc/builds.mdwn @@ -61,3 +61,7 @@ <a href="https://github.com/datalad/git-annex/actions?query=workflow%3A%22Build+git-annex+on+Ubuntu%22"> <img src="https://github.com/datalad/git-annex/workflows/Build%20git-annex%20on%20Ubuntu/badge.svg"> </a> +<h2>Appveyor</h2> +<a href="https://ci.appveyor.com/project/mih/git-annex"> +<img src="https://ci.appveyor.com/api/projects/status/mih/git-annex?retina=true"> +</a>
rename an old closed bug to avoid filename too long on windows checkout
diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__.mdwn b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted.mdwn similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__.mdwn rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted.mdwn diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_10_61622483d4f1962f191fb6a791c6817d._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_10_61622483d4f1962f191fb6a791c6817d._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_10_61622483d4f1962f191fb6a791c6817d._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_10_61622483d4f1962f191fb6a791c6817d._comment diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_11_ccc6f5f1ac5743b0857f68cf21eaa6ea._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_11_ccc6f5f1ac5743b0857f68cf21eaa6ea._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_11_ccc6f5f1ac5743b0857f68cf21eaa6ea._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_11_ccc6f5f1ac5743b0857f68cf21eaa6ea._comment diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_1_5f9e9600d65c1270b479cc8910d507d0._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_1_5f9e9600d65c1270b479cc8910d507d0._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_1_5f9e9600d65c1270b479cc8910d507d0._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_1_5f9e9600d65c1270b479cc8910d507d0._comment diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_2_54cf519a09cfaa0c85d60734daf58d72._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_2_54cf519a09cfaa0c85d60734daf58d72._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_2_54cf519a09cfaa0c85d60734daf58d72._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_2_54cf519a09cfaa0c85d60734daf58d72._comment diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_3_132ea480f90af577e3cad67f2f01c73d._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_3_132ea480f90af577e3cad67f2f01c73d._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_3_132ea480f90af577e3cad67f2f01c73d._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_3_132ea480f90af577e3cad67f2f01c73d._comment diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_4_06e99c29eaa770ab96ea5fd832ee04b8._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_4_06e99c29eaa770ab96ea5fd832ee04b8._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_4_06e99c29eaa770ab96ea5fd832ee04b8._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_4_06e99c29eaa770ab96ea5fd832ee04b8._comment diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_5_0e7b492da14e067c34693b7be02e6864._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_5_0e7b492da14e067c34693b7be02e6864._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_5_0e7b492da14e067c34693b7be02e6864._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_5_0e7b492da14e067c34693b7be02e6864._comment diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_5_db1ae584fc803c0dbb48c7e31a540c4e._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_5_db1ae584fc803c0dbb48c7e31a540c4e._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_5_db1ae584fc803c0dbb48c7e31a540c4e._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_5_db1ae584fc803c0dbb48c7e31a540c4e._comment diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_7_f584dc15789b371569e1925c4ee5ae36._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_7_f584dc15789b371569e1925c4ee5ae36._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_7_f584dc15789b371569e1925c4ee5ae36._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_7_f584dc15789b371569e1925c4ee5ae36._comment diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_8_83eccac19016c971f1b799d935a4c0bb._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_8_83eccac19016c971f1b799d935a4c0bb._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_8_83eccac19016c971f1b799d935a4c0bb._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_8_83eccac19016c971f1b799d935a4c0bb._comment diff --git a/doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_9_5db907d77c5f5490c1bdc8e51387a9e9._comment b/doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_9_5db907d77c5f5490c1bdc8e51387a9e9._comment similarity index 100% rename from doc/projects/datalad/bugs-done/get_-J8_on_OSX_leads_to_git-annex__58___git__58___createProcess__58___runInteractiveProcess__58___pipe__58___resource_exhausted___40__Too_many_open_files__41__/comment_9_5db907d77c5f5490c1bdc8e51387a9e9._comment rename to doc/projects/datalad/bugs-done/get_-J8_resource_exhausted/comment_9_5db907d77c5f5490c1bdc8e51387a9e9._comment
diff --git a/doc/forum/Lost_git_repository._Recovery_from_S3_remote__63__.mdwn b/doc/forum/Lost_git_repository._Recovery_from_S3_remote__63__.mdwn new file mode 100644 index 0000000000..6cd392a600 --- /dev/null +++ b/doc/forum/Lost_git_repository._Recovery_from_S3_remote__63__.mdwn @@ -0,0 +1,14 @@ +Greetings friends. + +In an unfortunate event, I lost my git repository for my annex. 😬 + +The only thing left is the annex files in an S3 bucket. + +Is it possible to recover the repository? + +Is it possible to recover the files? + +All the files in S3 bucket are in this format: +GPGHMACSHA1--000b36e47d7ebd51f5ca2b8d294dea2d530c79c8 + +I'd appreciate any help! 🙏
Added a comment
diff --git a/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__/comment_2_176a92a6ce5d8f356dafc29021242b2d._comment b/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__/comment_2_176a92a6ce5d8f356dafc29021242b2d._comment new file mode 100644 index 0000000000..373e52d7e5 --- /dev/null +++ b/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__/comment_2_176a92a6ce5d8f356dafc29021242b2d._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jwiegley" + avatar="http://cdn.libravatar.org/avatar/910fdec093deffebb92d7db019b5996a" + subject="comment 2" + date="2023-03-16T12:52:53Z" + content=""" +Never mind, I found the answer on https://git-annex.branchable.com/git-annex-smudge/ +"""]]
Added a comment: An example of what I see
diff --git a/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__/comment_1_68ed73d8898bf2ce0a14e0e5909cc64b._comment b/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__/comment_1_68ed73d8898bf2ce0a14e0e5909cc64b._comment new file mode 100644 index 0000000000..5d852eea0b --- /dev/null +++ b/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__/comment_1_68ed73d8898bf2ce0a14e0e5909cc64b._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="jwiegley" + avatar="http://cdn.libravatar.org/avatar/910fdec093deffebb92d7db019b5996a" + subject="An example of what I see" + date="2023-03-16T12:24:45Z" + content=""" +Here's an example of what I see: +``` +Hermes ~/kadena/docs $ w +## main +?? archive/ +Hermes ~/kadena/docs $ git add archive +Hermes ~/kadena/docs $ git diff HEAD +diff --git c/archive/archive.org w/archive/archive.org +new file mode 100644 +index 0000000..6cd5281 +--- /dev/null ++++ w/archive/archive.org +@@ -0,0 +1 @@ ++/annex/objects/BLAKE2B512E-s239446--66c363a36b5d1919344b41d947d6ee9c3db879e547fb7cfc5e94d09c16bdec> +``` + +Note that I discovered that I had both a `kadena.org` and an `archive/kadena.org`. Once I renamed the latter to `archive/archive.org`, then I was able to `git add` the first file without it being placed in the Annex, but I still cannot add `archive/archive.org` without it being put into the Annex as above. +"""]]
diff --git a/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__.mdwn b/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__.mdwn new file mode 100644 index 0000000000..f88f683469 --- /dev/null +++ b/doc/forum/How_stop_annex_from_annexing_a_file_with___34__git_add__34__.mdwn @@ -0,0 +1,14 @@ +At one point, in two repositories, I added the `largefiles` options to my `.gitattributes` file: +``` +* annex.backend=BLAKE2B512E annex.numcopies=2 annex.largefiles=largerthan=32kb +``` + +There were already Org-mode files in those repositories, kept under git. I used `git annex init` on those repositories long after those files had been under version control. + +Since some these files were larger than 32k, it appears that after editing them, git-annex decided to alter the files so that the content in Git HEAD for each file is a pathname into the `objects` directory. That is, when I use `ls -l` the file is never a symlink, and yet if I use `git show HEAD:todo.org`, I see a pathname. Using `git annex unlock` on the file does nothing. + +I decided to remove the `annex.largefiles` setting, since I don't want this behavior to be "automatic" anymore. So, in one of the two repositories, I ran `git annex unannex todo.org`, and then `git add`, and now I have a regular file back under version control again. + +In the other repository, however, the file goes back to being a file path in the Git tree when I use "git add". Nothing that I do will add the file contents to Git rather than to the Annex, in this strange mode where it's never a symlink on disk, but it's definitely an Annex object file path in the Git tree. + +How do I tell Annex to stop managing this particular file?
reproduced
diff --git a/doc/bugs/SQlite_failed_when_copying_to_remote_repository/comment_7_3d4a7d330d80060275c34d0a13416fb7._comment b/doc/bugs/SQlite_failed_when_copying_to_remote_repository/comment_7_3d4a7d330d80060275c34d0a13416fb7._comment new file mode 100644 index 0000000000..6b6344a5b0 --- /dev/null +++ b/doc/bugs/SQlite_failed_when_copying_to_remote_repository/comment_7_3d4a7d330d80060275c34d0a13416fb7._comment @@ -0,0 +1,78 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2023-03-14T17:00:06Z" + content=""" +Oho, I reproduced it! + +First I made a git-annex repo named "fooé", and added a file "foo" to it, +and that is unlocked. + + joey@darkstar:/tmp>git clone fooé bar + Cloning into 'bar'... + done. + joey@darkstar:/tmp>cd bar + joey@darkstar:/tmp/bar>git-annex move --from origin + move foo (from origin...) + (recording state in git...) + ok + (recording state in git...) + joey@darkstar:/tmp/bar>git-annex copy --to origin + copy foo (to origin...) + (recording state in git...) + ok + (recording state in git...) + joey@darkstar:/tmp/bar>git-annex move --from origin + move foo (from origin...) (recording state in git...) + ok + (recording state in git...) + joey@darkstar:/tmp/bar>LANG=C git-annex copy --to origin + copy foo (to origin...) + + sqlite worker thread crashed: SQLite3 returned ErrorCan'tOpen while attempting to perform open "/tmp/foo\65533\65533/.git/annex/keysdb/db". + CallStack (from HasCallStack): + error, called at ./Database/Handle.hs:87:25 in main:Database.Handle + + ok + +Note that using git-annex in repo fooé with LANG=C works. The problem +seems limited to a remote with a unicode character in its name when not in a +unicode locale. + +In strace I see this: + + 3451696 openat(AT_FDCWD, "/tmp/foo\357\277\275\357\277\275/.git/annex/keysdb/db", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644 <unfinished ...> + 3451694 <... close resumed>) = 0 + 3451696 <... openat resumed>) = -1 ENOENT (No such file or directory) + +That doesn't look like the correct encoding for "é" does it? +`"/tmp/foo\303\251"` would be correct and is what git-annex otherwise uses +when accessing that repo. + +I think the reason for this is simply that persistent-sqlite uses Text +for the location of the database. And Text is unicode encoded. So when the non +unicode locale results in a FilePath that is encoded using the filesystem encoding, +with surrogate characters, and that gets fed to Data.Text.pack, it replaces the +"invalid scalar values" with "\65533". + +The same thing would happen in a unicode locale if the remote's path was not +valid unicode. + +Filed an issue to get persistent-sqlite to not use Text for the FilePath +<https://github.com/yesodweb/persistent/issues/1481> + +I don't think that non-unicode FilePaths can be generally squeezed into a Text, +so we may need to wait persistent getting fixed. Although it should be possible, +in a non-unicode locale, to convert a non-unicode FilePath like "fooé" +to a Text. + +---- + +Also notice that git-annex succeeds despite this error. Which is reasonable +since it was only unable to update the remote's keys db, and the remote +can and will just update it itself next time git-annex is used over there. +Which will work since that git-annex will be running in the directory +and will use relative paths. + +So perhaps the error message could just be suppressed? +"""]]
todo that I decided not to do, recorded for posterity
diff --git a/doc/todo/fsverify.mdwn b/doc/todo/fsverify.mdwn new file mode 100644 index 0000000000..4c16044172 --- /dev/null +++ b/doc/todo/fsverify.mdwn @@ -0,0 +1,53 @@ +git-annex could use linux's [fsverify](https://www.kernel.org/doc/html/latest/filesystems/fsverity.html) +feature as an alternative to hashing and verifying hashes of files itself. + +Benefits would include: + +* Any read of an annexed file that uses fsverify would check the blocks + that are read, and the read would fail if the file had gotten corrupted. +* Avoiding any theoretical cases where `git-annex add` is hashing a file + and something modifies it, causing the file to be added with the wrong + hash (which `git-annex fsck` will later detect). The + `FS_IOC_ENABLE_VERITY` ioctl prevents anything else from possibly + modifying the file while it's hashing it. +* Slightly faster git-annex fsck, because it would not need to hash + verified files. It would suffice to read the file, and if it all read + successfully, it's valid! + +Since fsverify uses a merkle tree, its hashes are not the same as simply +using SHA on the whole file. So for git-annex to use the fsverify hash as +the key for the file, it would need to be a separate type of key. That's a +bit problimatic because then git-annex would need a way to verify that +merkle hash itself on systems that do not support fsverify. Also, for large +files, the merkle tree can get relatively large (1/127th the size of the +file the docs say). So with a terabyte of annexed files, that's gigabytes +of merkle hashes, which seems too large to want to stote them in git. + +Alternatively, git-annex could hash as usual for the key. This would mean +that `git-annex add` would hash a file twice, once for the git-annex key +and the second time calling the `FS_IOC_ENABLE_VERITY` ioctl. Slower, but +perhaps these could parallelize and only use 2x the CPU or so. + +Since fsverified files are readonly, this would only be useful for locked +files. Unlocking a file would need to either remove the fsverify from it +(if possible?) or copy it. + +Using fsverify in this way would not work if the sysctl +`fs.verity.require_signatures` is set, because the annexed files would +not have signatures. + +--- + +Putting all this together, fsverify is not too compelling for use by +git-annex. A user who wants the verification on all reads of a file can +just call `FS_IOC_ENABLE_VERITY` on it themselves after git-annex add. +The annex.freezecontent-command hook could be used to to that. + +Then the only benefit of supporting it in git-annex is that perhaps `git-annex +add` could parallize enabling verification with checksumming, or avoid its +own checksumming, and so run faster than if a hook were used to enable +fsverify. And fsck would use less CPU. Is that worth complicating git-annex for? +--[[Joey]] + +> After investigating that, I currently don't think it's compelling, so I'm +> gonna close this. [[done]] --[[Joey]]
Apply codespell -w throughout
diff --git a/Annex.hs b/Annex.hs index 482c8455d4..5a9eac3c32 100644 --- a/Annex.hs +++ b/Annex.hs @@ -400,7 +400,7 @@ addGitConfigOverride v = do r { Git.gitGlobalOpts = go (Git.gitGlobalOpts r) } changeState $ \st -> st { gitconfigoverride = v : gitconfigoverride st } where - -- Remove any prior occurrance of the setting to avoid + -- Remove any prior occurrence of the setting to avoid -- building up many of them when the adjustment is run repeatedly, -- and add the setting to the end. go [] = [Param "-c", Param v] diff --git a/Annex/AdjustedBranch.hs b/Annex/AdjustedBranch.hs index 9b35b8f71b..1907157e72 100644 --- a/Annex/AdjustedBranch.hs +++ b/Annex/AdjustedBranch.hs @@ -460,7 +460,7 @@ findAdjustingCommit (AdjBranch b) = go =<< catCommit b _ -> return Nothing {- Check for any commits present on the adjusted branch that have not yet - - been propigated to the basis branch, and propigate them to the basis + - been propigated to the basis branch, and propagate them to the basis - branch and from there on to the orig branch. - - After propigating the commits back to the basis branch, @@ -536,7 +536,7 @@ rebaseOnTopMsg = "rebasing adjusted branch on top of updated original branch" reverseAdjustedCommit :: Sha -> Adjustment -> (Sha, Commit) -> OrigBranch -> Annex (Either String Sha) reverseAdjustedCommit commitparent adj (csha, basiscommit) origbranch | length (commitParent basiscommit) > 1 = return $ - Left $ "unable to propigate merge commit " ++ show csha ++ " back to " ++ show origbranch + Left $ "unable to propagate merge commit " ++ show csha ++ " back to " ++ show origbranch | otherwise = do cmode <- annexCommitMode <$> Annex.getGitConfig treesha <- reverseAdjustedTree commitparent adj csha diff --git a/Annex/Branch.hs b/Annex/Branch.hs index 7f03f6bece..37bf6e3a6b 100644 --- a/Annex/Branch.hs +++ b/Annex/Branch.hs @@ -396,7 +396,7 @@ getRef ref file = withIndex $ catFile ref file {- Applies a function to modify the content of a file. - - Note that this does not cause the branch to be merged, it only - - modifes the current content of the file on the branch. + - modifies the current content of the file on the branch. -} change :: Journalable content => RegardingUUID -> RawFilePath -> (L.ByteString -> content) -> Annex () change ru file f = lockJournal $ \jl -> f <$> getToChange ru file >>= set jl ru file @@ -422,7 +422,7 @@ data ChangeOrAppend t = Change t | Append t - value it provides is always appended to the journal file. That avoids - reading the journal file, and so can be faster when many lines are being - written to it. The information that is recorded will be effectively the - - same, only obsolate log lines will not get compacted. + - same, only obsolete log lines will not get compacted. - - Currently, only appends when annex.alwayscompact=false. That is to - avoid appending when an older version of git-annex is also in use in the @@ -494,7 +494,7 @@ append jl f appendable toappend = do invalidateCache {- Commit message used when making a commit of whatever data has changed - - to the git-annex brach. -} + - to the git-annex branch. -} commitMessage :: Annex String commitMessage = fromMaybe "update" . annexCommitMessage <$> Annex.getGitConfig @@ -624,7 +624,7 @@ branchFiles' = Git.Command.pipeNullSplit' $ {- Populates the branch's index file with the current branch contents. - - This is only done when the index doesn't yet exist, and the index - - is used to build up changes to be commited to the branch, and merge + - is used to build up changes to be committed to the branch, and merge - in changes from other branches. -} genIndex :: Git.Repo -> IO () diff --git a/Annex/ChangedRefs.hs b/Annex/ChangedRefs.hs index 83aa5561a7..7a9ce8a34f 100644 --- a/Annex/ChangedRefs.hs +++ b/Annex/ChangedRefs.hs @@ -106,6 +106,6 @@ notifyHook chan reffile _ sha <- catchDefaultIO Nothing $ extractSha <$> S.readFile reffile -- When the channel is full, there is probably no reader - -- running, or ref changes have been occuring very fast, + -- running, or ref changes have been occurring very fast, -- so it's ok to not write the change to it. maybe noop (void . atomically . tryWriteTBMChan chan) sha diff --git a/Annex/Content.hs b/Annex/Content.hs index 0090703047..568077cabc 100644 --- a/Annex/Content.hs +++ b/Annex/Content.hs @@ -392,9 +392,9 @@ withTmp key action = do - with colliding files it's their own fault and B) adding such a check - would not catch all cases of colliding keys. For example, perhaps - a remote has a key; if it's then added again with different content then - - the overall system now has two different peices of content for that + - the overall system now has two different pieces of content for that - key, and one of them will probably get deleted later. So, adding the - - check here would only raise expectations that git-annex cannot truely + - check here would only raise expectations that git-annex cannot truly - meet. - - May return false, when a particular variety of key is not being @@ -555,7 +555,7 @@ sendAnnex key rollback sendobject = go =<< prepSendAnnex' key {- Returns a file that contains an object's content, - and a check to run after the transfer is complete. - - - When a file is unlocked, it's possble for its content to + - When a file is unlocked, it's possible for its content to - change as it's being sent. The check detects this case - and returns False. - diff --git a/Annex/Content/Presence.hs b/Annex/Content/Presence.hs index 52020a9902..d3aea87151 100644 --- a/Annex/Content/Presence.hs +++ b/Annex/Content/Presence.hs @@ -164,7 +164,7 @@ contentLockFile :: Key -> Maybe RepoVersion -> Annex (Maybe RawFilePath) #ifndef mingw32_HOST_OS {- Older versions of git-annex locked content files themselves, but newer - versions use a separate lock file, to better support repos shared - - amoung users in eg a group. -} + - among users in eg a group. -} contentLockFile key v | versionNeedsWritableContentFiles v = pure Nothing | otherwise = Just <$> calcRepo (gitAnnexContentLock key) diff --git a/Annex/CopyFile.hs b/Annex/CopyFile.hs index 8fa84bddcb..9fc8eafc53 100644 --- a/Annex/CopyFile.hs +++ b/Annex/CopyFile.hs @@ -66,7 +66,7 @@ data CopyMethod = CopiedCoW | Copied {- Copies from src to dest, updating a meter. Preserves mode and mtime. - Uses copy-on-write if it is supported. If the the destination already - - exists, an interruped copy will resume where it left off. + - exists, an interrupted copy will resume where it left off. - - The IncrementalVerifier is updated with the content of the file as it's - being copied. But it is not finalized at the end. diff --git a/Annex/Import.hs b/Annex/Import.hs index c16eb18213..6f398564c2 100644 --- a/Annex/Import.hs +++ b/Annex/Import.hs @@ -492,7 +492,7 @@ importKeys remote importtreeconfig importcontent thirdpartypopulated importablec startimport cidmap importing db i@(loc, (cid, _sz)) oldversion largematcher = getcidkey cidmap db cid >>= \case (k:ks) -> -- If the same content was imported before - -- yeilding multiple different keys, it's not clear + -- yielding multiple different keys, it's not clear -- which is best to use this time, so pick the -- first in the list. But, if any of them is a -- git sha, use it, because the content must diff --git a/Annex/Init.hs b/Annex/Init.hs index b152f46aa5..1220c2e58e 100644 --- a/Annex/Init.hs +++ b/Annex/Init.hs @@ -421,7 +421,7 @@ initSharedClone True = do trustSet u UnTrusted setConfig (annexConfig "hardlink") (Git.Config.boolConfig True) -{- Propigate annex.securehashesonly from then global config to local +{- Propagate annex.securehashesonly from then global config to local - config. This makes a clone inherit a parent's setting, but once - a repository has a local setting, changes to the global config won't - affect it. -} diff --git a/Annex/Journal.hs b/Annex/Journal.hs index f2d3738f73..ce0f26e63e 100644 --- a/Annex/Journal.hs +++ b/Annex/Journal.hs @@ -73,7 +73,7 @@ privateUUIDsKnown' = not . S.null . annexPrivateRepos . Annex.gitconfig {- Records content for a file in the branch to the journal. - - - Using the journal, rather than immediatly staging content to the index + - Using the journal, rather than immediately staging content to the index - avoids git needing to rewrite the index after every change. - - The file in the journal is updated atomically. This avoids an diff --git a/Annex/Locations.hs b/Annex/Locations.hs index e8361b377e..c8ddc5cc96 100644 --- a/Annex/Locations.hs +++ b/Annex/Locations.hs @@ -580,7 +580,7 @@ gitAnnexAssistantDefaultDir = "annex" - dealing with characters that cause problems. - - This is used when a new Key is initially being generated, eg by genKey. - - Unlike keyFile and fileKey, it does not need to be a reversable + - Unlike keyFile and fileKey, it does not need to be a reversible - escaping. Also, it's ok to change this to add more problematic - characters later. Unlike changing keyFile, which could result in the - filenames used for existing keys changing and contents getting lost. @@ -666,7 +666,7 @@ keyPath key hasher = hasher key P.</> f P.</> f where f = keyFile key -{- All possibile locations to store a key in a special remote +{- All possible locations to store a key in a special remote - using different directory hashes. - - This is compatible with the annexLocationsNonBare and annexLocationsBare, diff --git a/Annex/NumCopies.hs b/Annex/NumCopies.hs (Diff truncated)
Typo: sansative -> sensitive
diff --git a/Annex/MetaData.hs b/Annex/MetaData.hs index 8379a6df8d..fdc539a324 100644 --- a/Annex/MetaData.hs +++ b/Annex/MetaData.hs @@ -111,7 +111,7 @@ parseMetaDataMatcher p = (,) ('>':v) -> checkcmp (>) (>) v _ -> checkglob "" checkglob v = - let cglob = compileGlob v CaseInsensative (GlobFilePath False) + let cglob = compileGlob v CaseInsensitive (GlobFilePath False) in matchGlob cglob . decodeBS . fromMetaValue checkcmp cmp cmp' v mv' = let v' = decodeBS (fromMetaValue mv') diff --git a/Annex/View.hs b/Annex/View.hs index 43de231c7a..65db159710 100644 --- a/Annex/View.hs +++ b/Annex/View.hs @@ -176,7 +176,7 @@ combineViewFilter old@(ExcludeValues olds) (ExcludeValues news) combineViewFilter (FilterValues _) newglob@(FilterGlob _) = (newglob, Widening) combineViewFilter (FilterGlob oldglob) new@(FilterValues s) - | all (matchGlob (compileGlob oldglob CaseInsensative (GlobFilePath False)) . decodeBS . fromMetaValue) (S.toList s) = (new, Narrowing) + | all (matchGlob (compileGlob oldglob CaseInsensitive (GlobFilePath False)) . decodeBS . fromMetaValue) (S.toList s) = (new, Narrowing) | otherwise = (new, Widening) {- With two globs, the old one is discarded, and the new one is used. - We can tell if that's a narrowing change by checking if the old @@ -185,7 +185,7 @@ combineViewFilter (FilterGlob oldglob) new@(FilterValues s) - widening. -} combineViewFilter (FilterGlob old) newglob@(FilterGlob new) | old == new = (newglob, Unchanged) - | matchGlob (compileGlob old CaseInsensative (GlobFilePath False)) new = (newglob, Narrowing) + | matchGlob (compileGlob old CaseInsensitive (GlobFilePath False)) new = (newglob, Narrowing) | otherwise = (newglob, Widening) {- Combining FilterValuesOrUnset and FilterGlobOrUnset with FilterValues - and FilterGlob maintains the OrUnset if the second parameter has it, @@ -220,7 +220,7 @@ combineViewFilter (FilterGlobOrUnset oldglob _) new@(FilterValuesOrUnset _ _) = in (new, viewchange) combineViewFilter (FilterGlobOrUnset old _) newglob@(FilterGlobOrUnset new _) | old == new = (newglob, Unchanged) - | matchGlob (compileGlob old CaseInsensative (GlobFilePath False)) new = (newglob, Narrowing) + | matchGlob (compileGlob old CaseInsensitive (GlobFilePath False)) new = (newglob, Narrowing) | otherwise = (newglob, Widening) combineViewFilter (FilterGlob _) newglob@(FilterGlobOrUnset _ _) = (newglob, Widening) @@ -285,7 +285,7 @@ viewComponentMatcher viewcomponent = \metadata -> matcher matchunset (FilterValues s) = \values -> setmatches matchunset $ S.intersection s values matcher matchunset (FilterGlob glob) = - let cglob = compileGlob glob CaseInsensative (GlobFilePath False) + let cglob = compileGlob glob CaseInsensitive (GlobFilePath False) in \values -> setmatches matchunset $ S.filter (matchGlob cglob . decodeBS . fromMetaValue) values matcher _ (ExcludeValues excludes) = diff --git a/CHANGELOG b/CHANGELOG index d61faa42e1..10b4dd5b3c 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -5117,14 +5117,14 @@ git-annex (5.20140227) unstable; urgency=medium * metadata: Field names limited to alphanumerics and a few whitelisted punctuation characters to avoid issues with views, etc. - * metadata: Field names are now case insensative. + * metadata: Field names are now case insensitive. * When constructing views, metadata is available about the location of the file in the view's reference branch. Allows incorporating parts of the directory hierarchy in a view. For example `git annex view tag=* podcasts/=*` makes a view in the form tag/showname. * --metadata field=value can now use globs to match, and matches - case insensatively, the same as git annex view field=value does. + case insensitively, the same as git annex view field=value does. * annex.genmetadata can be set to make git-annex automatically set metadata (year and month) when adding files. * Make annex.web-options be used in several places that call curl. diff --git a/Limit.hs b/Limit.hs index 9afaada438..9cb9ac2feb 100644 --- a/Limit.hs +++ b/Limit.hs @@ -122,7 +122,7 @@ limitExclude glob = Right $ MatchFiles matchGlobFile :: String -> MatchInfo -> Annex Bool matchGlobFile glob = go where - cglob = compileGlob glob CaseSensative (GlobFilePath True) -- memoized + cglob = compileGlob glob CaseSensitive (GlobFilePath True) -- memoized go (MatchingFile fi) = pure $ matchGlob cglob (fromRawFilePath (matchFile fi)) go (MatchingInfo p) = pure $ case providedFilePath p of Just f -> matchGlob cglob (fromRawFilePath f) @@ -168,7 +168,7 @@ matchSameContentGlob glob mi = checkKey (go mi) mi check k . toRawFilePath =<< getUserInfo (userProvidedFilePath p) - cglob = compileGlob glob CaseSensative (GlobFilePath True) -- memoized + cglob = compileGlob glob CaseSensitive (GlobFilePath True) -- memoized matchesglob f = matchGlob cglob (fromRawFilePath f) #ifdef mingw32_HOST_OS @@ -232,7 +232,7 @@ matchMagic _limitname querymagic selectprovidedinfo selectuserprovidedinfo (Just , matchNeedsLocationLog = False } where - cglob = compileGlob glob CaseSensative (GlobFilePath False) -- memoized + cglob = compileGlob glob CaseSensitive (GlobFilePath False) -- memoized go (MatchingFile fi) = catchBoolIO $ maybe False (matchGlob cglob) <$> querymagic magic (fromRawFilePath (contentFile fi)) diff --git a/Remote/Web.hs b/Remote/Web.hs index ad8a3050cf..4a1b7a61c3 100644 --- a/Remote/Web.hs +++ b/Remote/Web.hs @@ -202,7 +202,7 @@ mkUrlIncludeExclude = go fallback getglob f pc = do glob <- getRemoteConfigValue f pc - Just $ compileGlob glob CaseInsensative (GlobFilePath False) + Just $ compileGlob glob CaseInsensitive (GlobFilePath False) mk minclude mexclude = pure $ UrlIncludeExclude { checkUrlIncludeExclude = \u -> and diff --git a/Types/RefSpec.hs b/Types/RefSpec.hs index 0567622319..86925634d7 100644 --- a/Types/RefSpec.hs +++ b/Types/RefSpec.hs @@ -22,7 +22,7 @@ data RefSpecPart | RemoveMatching Glob allRefSpec :: RefSpec -allRefSpec = [AddMatching $ compileGlob "*" CaseSensative (GlobFilePath False)] +allRefSpec = [AddMatching $ compileGlob "*" CaseSensitive (GlobFilePath False)] parseRefSpec :: String -> Either String RefSpec parseRefSpec v = case partitionEithers (map mk $ splitc ':' v) of @@ -31,9 +31,9 @@ parseRefSpec v = case partitionEithers (map mk $ splitc ':' v) of where mk ('+':s) | any (`elem` s) "*?" = - Right $ AddMatching $ compileGlob s CaseSensative (GlobFilePath False) + Right $ AddMatching $ compileGlob s CaseSensitive (GlobFilePath False) | otherwise = Right $ AddRef $ Ref $ encodeBS s - mk ('-':s) = Right $ RemoveMatching $ compileGlob s CaseSensative (GlobFilePath False) + mk ('-':s) = Right $ RemoveMatching $ compileGlob s CaseSensitive (GlobFilePath False) mk "reflog" = Right AddRefLog mk s = Left $ "bad refspec item \"" ++ s ++ "\" (expected + or - prefix)" diff --git a/Utility/Glob.hs b/Utility/Glob.hs index 9f2d147b4c..f09763f973 100644 --- a/Utility/Glob.hs +++ b/Utility/Glob.hs @@ -24,7 +24,7 @@ import Data.Char newtype Glob = Glob Regex -data GlobCase = CaseSensative | CaseInsensative +data GlobCase = CaseSensitive | CaseInsensitive -- Is the glob being used to match filenames? -- @@ -44,8 +44,8 @@ compileGlob glob globcase globfilepath = Glob $ where regex = '^' : wildToRegex globfilepath glob ++ "$" casesentitive = case globcase of - CaseSensative -> True - CaseInsensative -> False + CaseSensitive -> True + CaseInsensitive -> False wildToRegex :: GlobFilePath -> String -> String wildToRegex (GlobFilePath globfile) = concat . go diff --git a/doc/bugs/OSX_case_insensitive_filesystem/comment_1_2e81165ac03e1d0566c81016e7728ee6._comment b/doc/bugs/OSX_case_insensitive_filesystem/comment_1_2e81165ac03e1d0566c81016e7728ee6._comment index 01f355ec1d..73070f777e 100644 --- a/doc/bugs/OSX_case_insensitive_filesystem/comment_1_2e81165ac03e1d0566c81016e7728ee6._comment +++ b/doc/bugs/OSX_case_insensitive_filesystem/comment_1_2e81165ac03e1d0566c81016e7728ee6._comment @@ -17,8 +17,8 @@ It's actually possible to make brand-new git-annex repos use all lower case hash directories today, by setting `git config annex.tune.objecthashlower true` before you run `git annex init` for the first time. -If you know you will need to move a repository between case-insensative and -case-sensative filesystems, you could use that configuration. But that +If you know you will need to move a repository between case-insensitive and +case-sensitive filesystems, you could use that configuration. But that would be very forward looking, and instead users are just going to stumble over the mixed case directories from time to time. diff --git a/doc/design/new_repo_versions.mdwn b/doc/design/new_repo_versions.mdwn index 0ea9045966..df5004e70a 100644 --- a/doc/design/new_repo_versions.mdwn +++ b/doc/design/new_repo_versions.mdwn @@ -41,7 +41,7 @@ Possible reasons to make changes: git-annex checks both locations (eg, a bare repo defaults to xxx/yyy but really old ones might use xX/yY for some keys). - The mixed case hash directories have caused trouble on case-insensative + The mixed case hash directories have caused trouble on case-insensitive filesystems, although that has mostly been papered over to avoid problems. One remaining problem users can stuble on occurs when [[moving a repository from OSX to Linux|bugs/OSX_case_insensitive_filesystem]].
comment
diff --git a/doc/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_4_f2bd555ca36b42dac6a213e7c947f1f9._comment b/doc/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_4_f2bd555ca36b42dac6a213e7c947f1f9._comment new file mode 100644 index 0000000000..df17624c7b --- /dev/null +++ b/doc/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_4_f2bd555ca36b42dac6a213e7c947f1f9._comment @@ -0,0 +1,32 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2023-03-13T18:54:33Z" + content=""" +Wikipedia seems to be wrong about that. +I took a quick look at <https://datatracker.ietf.org/doc/html/rfc8089#appendix-E.2> +and it says that "file:c:/path/to/file" is a valid URI on Windows. And it will be +parsed ok by git-annex. So you could just use those. + +The RFC does say that "file:///c:/path/to/file" should also be supported. +(Though I don't understand its reference to the "path-absolute" rule.) + +I don't know if network-uri could be made to support that, it seems that +it would have to handle windows and non-windows differently. Because on linux, +"file:///c:/path/to/file" should parse to a path "/c:/path/to/file", +which is after all a valid path if you choose to have a `/c:` directory! + +But network-uri is a pure uri parser and it does not seem right for it to parse +the same uri two different ways depending on the OS it's running on. And it doesn't +special-case handling of file urls at all, it only implements RFC3986. +We could try opening an issue at <https://github.com/haskell/network-uri/issues> +and find out what its maintainer thinks. + +I suppose that git-annex, when running on windows, every place after it parses an +url could: + +1. Check if it's a file: url +2. If the path starts with "/DRIVE:", remove the leading "/" + +Yugh. +"""]]
copy --from --to location tracking update
copy: When --from and --to are combined and the content is already present
on the destination remote, update location tracking as necessary.
Sponsored-by: Dartmouth College's DANDI project
copy: When --from and --to are combined and the content is already present
on the destination remote, update location tracking as necessary.
Sponsored-by: Dartmouth College's DANDI project
diff --git a/CHANGELOG b/CHANGELOG index 649e54f877..d61faa42e1 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -13,6 +13,9 @@ git-annex (10.20230228) UNRELEASED; urgency=medium build. * importfeed: Display feed title. * init: Support being ran in a repository that has a newline in its path. + * copy: When --from and --to are combined and the content is already + present on the destination remote, update location tracking as + necessary. -- Joey Hess <id@joeyh.name> Mon, 27 Feb 2023 12:31:14 -0400 diff --git a/Command/Move.hs b/Command/Move.hs index 528e6ce22c..925b4e45d9 100644 --- a/Command/Move.hs +++ b/Command/Move.hs @@ -399,6 +399,9 @@ fromToPerform src dest removewhen key afile = do Right True -> do showAction $ "from " ++ Remote.name src showAction $ "to " ++ Remote.name dest + -- The log may not indicate dest's copy + -- yet, so make sure it does. + logChange key (Remote.uuid dest) InfoPresent -- Drop from src, checking copies including -- the one already in dest. dropfromsrc id diff --git a/doc/bugs/copy_--from_--to_does_not_adjust_avail_info.mdwn b/doc/bugs/copy_--from_--to_does_not_adjust_avail_info.mdwn index 1d34ec7cef..40f6d995b7 100644 --- a/doc/bugs/copy_--from_--to_does_not_adjust_avail_info.mdwn +++ b/doc/bugs/copy_--from_--to_does_not_adjust_avail_info.mdwn @@ -51,3 +51,5 @@ I would expect `copy` to make a record locally that now the content is also on d [[!meta author=yoh]] [[!tag projects/dandi]] + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/copy_--from_--to_does_not_adjust_avail_info/comment_1_d6a1dd2e039dcb6dafcf4675b9244617._comment b/doc/bugs/copy_--from_--to_does_not_adjust_avail_info/comment_1_d6a1dd2e039dcb6dafcf4675b9244617._comment new file mode 100644 index 0000000000..b5698f0fe1 --- /dev/null +++ b/doc/bugs/copy_--from_--to_does_not_adjust_avail_info/comment_1_d6a1dd2e039dcb6dafcf4675b9244617._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-13T18:34:36Z" + content=""" +Aah, I see, this is when the content is present on the --to +remote, but git-annex is not locally aware of that yet. + +And `git-annex copy --to remote` does +update location tracking in such a case, so --from --to should also. +"""]]