Recent changes to this wiki:
FTBFS on Windows
diff --git a/doc/bugs/windows_FTBFS__44___advise_needed.mdwn b/doc/bugs/windows_FTBFS__44___advise_needed.mdwn new file mode 100644 index 0000000000..cccf814793 --- /dev/null +++ b/doc/bugs/windows_FTBFS__44___advise_needed.mdwn @@ -0,0 +1,46 @@ +### Please describe the problem. + + +Our windows build was failing for a while + +[here is the recent log](https://github.com/datalad/git-annex/actions/runs/17753795690/job/50453300001) +which shows + +``` +[336 of 754] Compiling Database.Init +[337 of 754] Compiling Database.Benchmark +D:\a\git-annex\git-annex\Annex\Multicast.hs:36:15: error: [GHC-88464] +Error: Variable not in scope: + fdToHandle + :: ghc-internal-9.1002.0:GHC.Internal.System.Posix.Internals.FD + -> IO Handle + | +36 | rh <- fdToHandle rfd + | ^^^^^^^^^^ + +D:\a\git-annex\git-annex\Annex\Multicast.hs:48:22: error: [GHC-88464] +Error: Variable not in scope: fdToHandle :: t0 -> IO Handle + | +48 | h <- fdToHandle fd + | ^^^^^^^^^^ + +[338 of 754] Compiling Creds +... +[752 of 754] Compiling Command.Assistant + +Error: [S-7282] + Stack failed to execute the build plan. + + While executing the build plan, Stack encountered the error: + + [S-7011] + While building package git-annex-10.20250828 (scroll up to its section to see the error) + using: + D:\a\git-annex\git-annex\.stack-work\dist\56bb250d\setup\setup --verbose=1 --builddir=.stack-work\dist\56bb250d build exe:git-annex --ghc-options "" + Process exited with code: ExitFailure 1 +``` + +### What steps will reproduce the problem? + + CI action with all the steps is [here](https://github.com/datalad/git-annex/blob/master/.github/workflows/build-windows.yaml#L112) +
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_24_2181f4b0acc9d01c85d7263cfa2d0cc1._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_24_2181f4b0acc9d01c85d7263cfa2d0cc1._comment new file mode 100644 index 0000000000..862ebf8647 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_24_2181f4b0acc9d01c85d7263cfa2d0cc1._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 24" + date="2025-09-16T19:34:27Z" + content=""" +well, may be [at least Michael's pypi whl builds](https://github.com/psychoinformatics-de/git-annex-wheel/blob/main/.github/workflows/build-linux.yaml) which directly install stack (I do not think they use system pkgs) could be tuned as needed? +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_23_8e104885ed1e89c1b24bda54c7ba2bb4._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_23_8e104885ed1e89c1b24bda54c7ba2bb4._comment new file mode 100644 index 0000000000..c02320110f --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_23_8e104885ed1e89c1b24bda54c7ba2bb4._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 23" + date="2025-09-16T19:33:40Z" + content=""" +well, may be [at least Michael's pypi whl builds](https://github.com/psychoinformatics-de/git-annex-wheel/blob/main/.github/workflows/build-linux.yaml) which directly install stack (I do not think they use system pkgs) could be tuned as needed? +"""]]
annex.assistant.allowunlocked
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/Assistant/Threads/Committer.hs b/Assistant/Threads/Committer.hs index 6ffc9eb0e1..53663e8b5b 100644 --- a/Assistant/Threads/Committer.hs +++ b/Assistant/Threads/Committer.hs @@ -62,6 +62,11 @@ commitThread = namedThread "Committer" $ do fmap Seconds . annexDelayAdd <$> Annex.getGitConfig largefilematcher <- liftAnnex largeFilesMatcher annexdotfiles <- liftAnnex $ getGitConfigVal annexDotFiles + addunlockedmatcher <- liftAnnex $ + ifM (annexSupportUnlocked <$> Annex.getGitConfig) + ( Just <$> addUnlockedMatcher + , return Nothing + ) msg <- liftAnnex Command.Sync.commitMsg lockdowndir <- liftAnnex $ fromRepo gitAnnexTmpWatcherDir liftAnnex $ do @@ -70,7 +75,7 @@ commitThread = namedThread "Committer" $ do void $ liftIO $ tryIO $ removeDirectoryRecursive lockdowndir void $ createAnnexDirectory lockdowndir waitChangeTime $ \(changes, time) -> do - readychanges <- handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd $ + readychanges <- handleAdds lockdowndir havelsof largefilematcher annexdotfiles addunlockedmatcher delayadd $ simplifyChanges changes if shouldCommit False time (length readychanges) readychanges then do @@ -275,8 +280,8 @@ commitStaged msg = do - Any pending adds that are not ready yet are put back into the ChangeChan, - where they will be retried later. -} -handleAdds :: OsPath -> Bool -> GetFileMatcher -> Bool -> Maybe Seconds -> [Change] -> Assistant [Change] -handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd cs = returnWhen (null incomplete) $ do +handleAdds :: OsPath -> Bool -> GetFileMatcher -> Bool -> Maybe AddUnlockedMatcher -> Maybe Seconds -> [Change] -> Assistant [Change] +handleAdds lockdowndir havelsof largefilematcher annexdotfiles addunlockedmatcher delayadd cs = returnWhen (null incomplete) $ do let (pending, inprocess) = partition isPendingAddChange incomplete let lockdownconfig = LockDownConfig { lockingFile = False @@ -340,9 +345,9 @@ handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd cs = ret Command.Add.addFile Command.Add.Small f =<< liftIO (R.getSymbolicLinkStatus (fromOsPath f)) - {- Avoid overhead of re-injesting a renamed unlocked file, by - - examining the other Changes to see if a removed file has the - - same InodeCache as the new file. If so, we can just update + {- When adding the file unlocked, avoid overhead of re-injesting a renamed + - unlocked file, by examining the other Changes to see if a removed + - file has the same InodeCache as the new file. If so, we can just update - bookkeeping, and stage the file in git. -} addannexed :: [Change] -> Assistant [Maybe Change] @@ -357,18 +362,36 @@ handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd cs = ret , checkWritePerms = True } if M.null m - then forM toadd (addannexed' cfg) + then forM toadd $ \c -> do + mcache <- liftIO $ genInodeCache (changeFile c) delta + addunlocked <- checkaddunlocked c + addannexed' cfg c addunlocked mcache else forM toadd $ \c -> do mcache <- liftIO $ genInodeCache (changeFile c) delta - case mcache of - Nothing -> addannexed' cfg c - Just cache -> - case M.lookup (inodeCacheToKey ct cache) m of - Nothing -> addannexed' cfg c - Just k -> fastadd c k - - addannexed' :: LockDownConfig -> Change -> Assistant (Maybe Change) - addannexed' lockdownconfig change@(InProcessAddChange { lockedDown = ld }) = + ifM (checkaddunlocked c) + ( case mcache of + Nothing -> addannexed' cfg c True Nothing + Just cache -> + case M.lookup (inodeCacheToKey ct cache) m of + Nothing -> addannexed' cfg c True Nothing + Just k -> fastadd c k + , addannexed' cfg c False mcache + ) + + checkaddunlocked (InProcessAddChange { lockedDown = ld }) = + case addunlockedmatcher of + Just addunlockedmatcher' -> do + let mi = MatchingFile $ FileInfo + { contentFile = contentLocation (keySource ld) + , matchFile = keyFilename (keySource ld) + , matchKey = Nothing + } + liftAnnex $ addUnlocked addunlockedmatcher' mi True + Nothing -> return True + checkaddunlocked _ = return True + + addannexed' :: LockDownConfig -> Change -> Bool -> Maybe InodeCache -> Assistant (Maybe Change) + addannexed' lockdownconfig change@(InProcessAddChange { lockedDown = ld }) addunlocked mcache = catchDefaultIO Nothing <~> doadd where ks = keySource ld @@ -376,14 +399,14 @@ handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd cs = ret (mkey, _mcache) <- liftAnnex $ do showStartMessage (StartMessage "add" (ActionItemOther (Just (QuotedPath (keyFilename ks)))) (SeekInput [])) ingest nullMeterUpdate (Just $ LockedDown lockdownconfig ks) Nothing - maybe (failedingest change) (done change $ keyFilename ks) mkey - addannexed' _ _ = return Nothing + maybe (failedingest change) (done change addunlocked mcache $ keyFilename ks) mkey + addannexed' _ _ _ _ = return Nothing fastadd :: Change -> Key -> Assistant (Maybe Change) fastadd change key = do let source = keySource $ lockedDown change liftAnnex $ finishIngestUnlocked key source - done change (keyFilename source) key + done change True Nothing (keyFilename source) key removedKeysMap :: InodeComparisonType -> [Change] -> Annex (M.Map InodeCacheKey Key) removedKeysMap ct l = do @@ -399,11 +422,14 @@ handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd cs = ret liftAnnex showEndFail return Nothing - done change file key = liftAnnex $ do + done change addunlocked mcache file key = liftAnnex $ do logStatus NoLiveUpdate key InfoPresent - mode <- liftIO $ catchMaybeIO $ - fileMode <$> R.getFileStatus (fromOsPath file) - stagePointerFile file mode =<< hashPointerFile key + if addunlocked + then do + mode <- liftIO $ catchMaybeIO $ + fileMode <$> R.getFileStatus (fromOsPath file) + stagePointerFile file mode =<< hashPointerFile key + else addSymlink file key mcache showEndOk return $ Just $ finishedChange change key diff --git a/CHANGELOG b/CHANGELOG index 740253853a..72e4a419f5 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -17,6 +17,7 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * Removed support for building with cryptonite, use crypton. * p2phttp: Fix a hang that could occur when used with --directory, and a repository in the repository got removed. + * Added annex.assistant.allowunlocked config. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs index 35b07a50a3..156b88c32c 100644 --- a/Types/GitConfig.hs +++ b/Types/GitConfig.hs @@ -157,6 +157,7 @@ data GitConfig = GitConfig , annexSkipUnknown :: Bool , annexAdjustedBranchRefresh :: Integer , annexSupportUnlocked :: Bool + , annexAssistantAllowUnlocked :: Bool , coreSymlinks :: Bool , coreSharedRepository :: SharedRepository , coreQuotePath :: QuotePath @@ -281,6 +282,7 @@ extractGitConfig configsource r = GitConfig (if getbool "adjustedbranchrefresh" False then 1 else 0) (getmayberead (annexConfig "adjustedbranchrefresh")) , annexSupportUnlocked = getbool (annexConfig "supportunlocked") True + , annexAssistantAllowUnlocked = getbool (annexConfig "assistant.allowunlocked") False , coreSymlinks = getbool "core.symlinks" True , coreSharedRepository = getSharedRepository r , coreQuotePath = QuotePath (getbool "core.quotepath" True) diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 6b668f69b5..4f633d7f0e 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1044,13 +1044,23 @@ repository, using [[git-annex-config]]. See its man page for a list.) To configure a default annex.addunlocked for all clones of the repository, this can be set in [[git-annex-config]](1). - (Using `git add` always adds files in unlocked form and it is not - affected by this setting.) + Using `git add` always adds files in unlocked form and it is not + affected by this setting. The assistant defaults to adding all files + unlocked, unless `annex.assistant.allowunlocked` is set. When a repository has core.symlinks set to false, or has an adjusted unlocked branch checked out, this setting is ignored, and files are always added to the repository in unlocked form. +* `annex.assistant.allowunlocked` + + The `git-annex assistant` defaults to adding all files unlocked, so that + files can be modified without the user needing to do anything to unlock + them. + + If this is set to `true` then it will instead use the `annex.addunlocked` + configuration to decide which files to add unlocked. + * `annex.numcopies` This is a deprecated setting. You should instead use the diff --git a/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn b/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn index ed0a60d5a4..2232354158 100644 --- a/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn +++ b/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn @@ -4,11 +4,10 @@ configured to add them locked. (Diff truncated)
tag repronim based on https://git-annex.branchable.com/forum/Is_there_a_way_to_have_assistant_add_files_locked__63__/#comment-096bedb2d22d5aae6a51a53179372d4f
diff --git a/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn b/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn index 00c67762ea..ed0a60d5a4 100644 --- a/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn +++ b/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn @@ -10,3 +10,5 @@ Or perhaps a better name would be annex.assistant.allowaddlocked. See here for some motivating use cases <https://git-annex.branchable.com/forum/Is_there_a_way_to_have_assistant_add_files_locked__63__/> + +[[!tag projects/repronim]] diff --git a/doc/todo/allow_configuring_assistant_to_add_files_locked/comment_2_62adc0910dcf29c74690d9da4a054048._comment b/doc/todo/allow_configuring_assistant_to_add_files_locked/comment_2_62adc0910dcf29c74690d9da4a054048._comment new file mode 100644 index 0000000000..2361480ecf --- /dev/null +++ b/doc/todo/allow_configuring_assistant_to_add_files_locked/comment_2_62adc0910dcf29c74690d9da4a054048._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-09-16T17:41:10Z" + content=""" +This looks like it would be a relatively simple feature to add, +eg an hour or two, and I see in the forum that @yarik thinks ReproNim +can use it. So I'll go ahead... +"""]]
improve example
diff --git a/doc/git-annex-unlock.mdwn b/doc/git-annex-unlock.mdwn index 1a2bd32596..6d0ad22a7a 100644 --- a/doc/git-annex-unlock.mdwn +++ b/doc/git-annex-unlock.mdwn @@ -39,9 +39,10 @@ repository. So, enable annex.thin with care. # git annex unlock photo.jpg # gimp photo.jpg - # git annex add photo.jpg - # git annex lock photo.jpg - # git commit -m "redeye removal" + # git commit photo.jpg -m "redeye removal" + # gimp photo.jpg + # git commit photo.jpg -m "fix oversaturation" + # git annex lock photo.jpg # OPTIONS diff --git a/doc/git-annex-unlock/comment_13_db3d6eb5f238edbd505b6909863167df._comment b/doc/git-annex-unlock/comment_13_db3d6eb5f238edbd505b6909863167df._comment new file mode 100644 index 0000000000..777bcdf187 --- /dev/null +++ b/doc/git-annex-unlock/comment_13_db3d6eb5f238edbd505b6909863167df._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: We don’t need a 'git annex lock' after a 'git annex add', right?""" + date="2025-09-16T17:28:49Z" + content=""" +Well spotted. `git-annex add` defaults to adding files locked, even when +adding what was an unlocked file before. + +I've improved the example. +"""]]
close
diff --git a/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn b/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn index b46105c19d..57a0af5a38 100644 --- a/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn +++ b/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn @@ -47,3 +47,5 @@ Then I commented out certain lines for each location. E.g. only try ignoring `a` Regardless of import or ignore, only `b` and `f` were ignored pertaining to the root `.gitignore` matching these files in the tree, even when the tree was imported to subtree `rel-ignore` or `root-ignore`. </details> + +> Closing as user error. [[done]] --[[Joey]]
improve error message when SETCREDS overwrites git-annex config
That is not allowed, so it's not a bug in git-annex when it happens and
instead tell the special remote developer how it's messed up.
Note that currently only Remote.External can overwrite the parsed remote
config with a PassedThrough value. PassedThrough values are otherwise
only generated for configs that are not parsed by the remote config
parser.
Sponsored-by: Joshua Antonishen
That is not allowed, so it's not a bug in git-annex when it happens and
instead tell the special remote developer how it's messed up.
Note that currently only Remote.External can overwrite the parsed remote
config with a PassedThrough value. PassedThrough values are otherwise
only generated for configs that are not parsed by the remote config
parser.
Sponsored-by: Joshua Antonishen
diff --git a/Annex/SpecialRemote/Config.hs b/Annex/SpecialRemote/Config.hs index 5f9d6db831..925b7e837c 100644 --- a/Annex/SpecialRemote/Config.hs +++ b/Annex/SpecialRemote/Config.hs @@ -206,13 +206,23 @@ getRemoteConfigValue :: HasCallStack => Typeable v => RemoteConfigField -> Parse getRemoteConfigValue f (ParsedRemoteConfig m _) = case M.lookup f m of Just (RemoteConfigValue v) -> case cast v of Just v' -> Just v' - Nothing -> error $ unwords - [ "getRemoteConfigValue" - , fromProposedAccepted f - , "found value of unexpected type" - , show (typeOf v) ++ "." - , "This is a bug in git-annex!" - ] + Nothing -> case cast v :: Maybe PassedThrough of + -- Handle the case where an external special remote + -- tries to SETCONFIG a value belonging to git-annex, + -- resulting in a PassedThrough type being stored. + Just _ -> error $ unwords + [ "Special remote config " + , fromProposedAccepted f + , "has been overwritten by SETCONFIG." + , "This is not supported." + ] + Nothing -> error $ unwords + [ "getRemoteConfigValue" + , fromProposedAccepted f + , "found value of unexpected type" + , show (typeOf v) ++ "." + , "This is a bug in git-annex!" + ] Nothing -> Nothing {- Gets all fields that remoteConfigRestPassthrough matched. -} diff --git a/doc/todo/encrypt_only_the_credentials/comment_8_7ec2d0de7c0b90f05475b86148982197._comment b/doc/todo/encrypt_only_the_credentials/comment_8_7ec2d0de7c0b90f05475b86148982197._comment new file mode 100644 index 0000000000..65c6774e52 --- /dev/null +++ b/doc/todo/encrypt_only_the_credentials/comment_8_7ec2d0de7c0b90f05475b86148982197._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2025-09-16T16:57:11Z" + content=""" +SETCONFIG is limited to setting the external program's configuration, +not to reaching inside git-annex and setting its own configuration. +The docs say that, but could perhaps be more clear. + +I have improved the error message. + +git-annex sets up encryption for the remote based on the encryption= and +encryptonlycreds= settings before it ever starts up the external program. +That would need to change in order to support this. + +But I'm also doubtful it would be a good idea to support SETCONFIG +of any of the things git-annex uses for encryption, chunking, etc. +It's essentially monkey-patching git-annex from the external program. +Some changes to git-annex's configs could lead to very unexpected behavior. + +If you really need the ability to turn on onlyencryptcreds by default +with your special remote, there will need to be some other way implemented +to do it. Please open a new todo about that. +"""]]
fixed
diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn index 50e3f1ba51..187e725f20 100644 --- a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn @@ -50,3 +50,4 @@ I'm running Arch Linux (kernel 6.15.1-arch1-2). The repo I'm running the command git-annex has been brilliant for managing my large media collection across several removable drives, and I'm confident it will continue to scale. This is the first issue I've run into with it. +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_4_68a3c0b736e1ba3d44177a0fbd18b257._comment b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_4_68a3c0b736e1ba3d44177a0fbd18b257._comment new file mode 100644 index 0000000000..f4ae0fef16 --- /dev/null +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_4_68a3c0b736e1ba3d44177a0fbd18b257._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-09-16T16:44:16Z" + content=""" +There was another bug filed about the same problem, +[[bugs/git-annex_add__47__unlock_fails_for_some_names]]. + +Cause is a filename that is 21 bytes long and begins with a utf-8 +character. Which AFAICS all the filenames mentioned here are. + +[[!commit 67f00027d1b326c979db8b81c973a61234c406d7]] fixes this. +"""]]
close
diff --git a/doc/bugs/35_failed_tests_on_beegfs.mdwn b/doc/bugs/35_failed_tests_on_beegfs.mdwn index 28e1babc48..b6edddf4fd 100644 --- a/doc/bugs/35_failed_tests_on_beegfs.mdwn +++ b/doc/bugs/35_failed_tests_on_beegfs.mdwn @@ -81,3 +81,5 @@ upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 [[!meta author=yoh]] [[!tag projects/repronim]] + +> [[fixed|done]] when built with OsPath. --[[Joey]]
comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_23_4c28579ce8bb003f0eca155184b0bdfc._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_23_4c28579ce8bb003f0eca155184b0bdfc._comment new file mode 100644 index 0000000000..5b67de98a2 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_23_4c28579ce8bb003f0eca155184b0bdfc._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 23""" + date="2025-09-16T14:36:24Z" + content=""" +Yay! + +OsPath needs the os-string and file-io haskell packages. Which are not +currently in Debian. So either work will need to be done to package those, +or when Debian upgrades ghc to 9.12.2, it will include those libraries +automatically since they are bundled with ghc since that version. + +Maybe you know more than I do about the state of Debian's haskell support. + +The transition is being tracked at [[todo/RawFilePath_conversion]] but I +don't know yet what the solution is to getting the dependencies broadly +available. + +(Or I could implement the same fixes when not built with that flag of +course. It is doable. Just annoying especially since that code will have to +be carefully gotten just right, only to be thrown away later.) +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_21_ea61c9101b9779e75b49f898ebd1e91a._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_21_ea61c9101b9779e75b49f898ebd1e91a._comment new file mode 100644 index 0000000000..c57f30a299 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_21_ea61c9101b9779e75b49f898ebd1e91a._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 21" + date="2025-09-16T00:40:21Z" + content=""" +that version had no errors: `All tests succeeded. (Ran 50 test groups in 11m54s)` + +So, this will be default option? any specific dependency requirements we need to add/constrain? +"""]]
boot libs
diff --git a/doc/todo/RawFilePath_conversion.mdwn b/doc/todo/RawFilePath_conversion.mdwn index ec61f59b0a..cd0dec9b1a 100644 --- a/doc/todo/RawFilePath_conversion.mdwn +++ b/doc/todo/RawFilePath_conversion.mdwn @@ -50,6 +50,8 @@ of the status. The `require_OsPath` branch removes the OsPath build flag, and merging it would resolve this. That will need packagers to do some work -to package the libraries though. --[[Joey]] +to package the libraries though. Or to upgrade ghc, since file-io and +os-string are boot libraries since ghc 9.12.2 and 9.10.1 respectively. +--[[Joey]] [[!tag confirmed]]
work around file-io not setting locale encoding when opening a Handle
Works around this bug https://github.com/haskell/file-io/issues/45
The fix is in Utility.FileIO.CloseOnExec because all use of file-io is
already wrapped through that module. Although perhaps that ought to be
refactored at this point.
I'd hope that file-io will eventually fix this bug, and also provide
CloseOnExec variants of its functions. That would allow depending on the
fixed version, and removing this ugly code.
Note that, functions like readFile that don't care about the encoding
due to reading/writing a ByteString were kept optimally fast by not
setting the encoding. This avoids an IORef read and write per open.
Sponsored-by: Graham Spencer
Works around this bug https://github.com/haskell/file-io/issues/45
The fix is in Utility.FileIO.CloseOnExec because all use of file-io is
already wrapped through that module. Although perhaps that ought to be
refactored at this point.
I'd hope that file-io will eventually fix this bug, and also provide
CloseOnExec variants of its functions. That would allow depending on the
fixed version, and removing this ugly code.
Note that, functions like readFile that don't care about the encoding
due to reading/writing a ByteString were kept optimally fast by not
setting the encoding. This avoids an IORef read and write per open.
Sponsored-by: Graham Spencer
diff --git a/Utility/FileIO.hs b/Utility/FileIO.hs index 3624f940d2..a775dca6c6 100644 --- a/Utility/FileIO.hs +++ b/Utility/FileIO.hs @@ -2,7 +2,8 @@ - readFileString, writeFileString, and appendFileString. - - When building with file-io, all exported functions set the close-on-exec - - flag. + - flag. Also, some other issues are handled that file-io does not handle + - correctly. - - When not building with file-io, this provides equvilant - RawFilePath versions. Note that those versions do not currently diff --git a/Utility/FileIO/CloseOnExec.hs b/Utility/FileIO/CloseOnExec.hs index 29e7c4b08a..3d1bb739f7 100644 --- a/Utility/FileIO/CloseOnExec.hs +++ b/Utility/FileIO/CloseOnExec.hs @@ -1,7 +1,12 @@ {- This is a subset of the functions provided by file-io. + - - All functions have been modified to set the close-on-exec - flag to True. - + - Also, functions that return a Handle have been modified to + - use the locale encoding, working around this bug: + - https://github.com/haskell/file-io/issues/45 + - - Copyright 2025 Joey Hess <id@joeyh.name> - Copyright 2024 Julian Ospald - @@ -34,7 +39,8 @@ module Utility.FileIO.CloseOnExec import System.File.OsPath.Internal (withOpenFile', augmentError) import qualified System.File.OsPath.Internal as I -import System.IO (IO, Handle, IOMode(..)) +import System.IO (IO, Handle, IOMode(..), hSetEncoding) +import GHC.IO.Encoding (getLocaleEncoding) import System.OsPath (OsPath, OsString) import Prelude (Bool(..), pure, either, (.), (>>=), ($)) import Control.Exception @@ -50,48 +56,47 @@ closeOnExec = True withFile :: OsPath -> IOMode -> (Handle -> IO r) -> IO r withFile osfp iomode act = (augmentError "withFile" osfp - $ withOpenFile' osfp iomode False False closeOnExec (try . act) True) + $ withOpenFileEncoding osfp iomode False False closeOnExec (try . act) True) >>= either ioError pure -withFile' - :: OsPath -> IOMode -> (Handle -> IO r) -> IO r +withFile' :: OsPath -> IOMode -> (Handle -> IO r) -> IO r withFile' osfp iomode act = (augmentError "withFile'" osfp - $ withOpenFile' osfp iomode False False closeOnExec (try . act) False) + $ withOpenFileEncoding osfp iomode False False closeOnExec (try . act) False) >>= either ioError pure openFile :: OsPath -> IOMode -> IO Handle openFile osfp iomode = augmentError "openFile" osfp $ - withOpenFile' osfp iomode False False closeOnExec pure False + withOpenFileEncoding osfp iomode False False closeOnExec pure False withBinaryFile :: OsPath -> IOMode -> (Handle -> IO r) -> IO r withBinaryFile osfp iomode act = (augmentError "withBinaryFile" osfp - $ withOpenFile' osfp iomode True False closeOnExec (try . act) True) + $ withOpenFileEncoding osfp iomode True False closeOnExec (try . act) True) >>= either ioError pure openBinaryFile :: OsPath -> IOMode -> IO Handle openBinaryFile osfp iomode = augmentError "openBinaryFile" osfp $ - withOpenFile' osfp iomode True False closeOnExec pure False + withOpenFileEncoding osfp iomode True False closeOnExec pure False readFile :: OsPath -> IO BSL.ByteString -readFile fp = withFile' fp ReadMode BSL.hGetContents +readFile fp = withFileNoEncoding' fp ReadMode BSL.hGetContents readFile' :: OsPath -> IO BS.ByteString -readFile' fp = withFile fp ReadMode BS.hGetContents +readFile' fp = withFileNoEncoding fp ReadMode BS.hGetContents writeFile :: OsPath -> BSL.ByteString -> IO () -writeFile fp contents = withFile fp WriteMode (`BSL.hPut` contents) +writeFile fp contents = withFileNoEncoding fp WriteMode (`BSL.hPut` contents) writeFile' :: OsPath -> BS.ByteString -> IO () -writeFile' fp contents = withFile fp WriteMode (`BS.hPut` contents) +writeFile' fp contents = withFileNoEncoding fp WriteMode (`BS.hPut` contents) appendFile :: OsPath -> BSL.ByteString -> IO () -appendFile fp contents = withFile fp AppendMode (`BSL.hPut` contents) +appendFile fp contents = withFileNoEncoding fp AppendMode (`BSL.hPut` contents) appendFile' :: OsPath -> BS.ByteString -> IO () -appendFile' fp contents = withFile fp AppendMode (`BS.hPut` contents) +appendFile' fp contents = withFileNoEncoding fp AppendMode (`BS.hPut` contents) {- Re-implementing openTempFile is difficult due to the current - structure of file-io. See this issue for discussion about improving @@ -99,16 +104,45 @@ appendFile' fp contents = withFile fp AppendMode (`BS.hPut` contents) - So, instead this uses noCreateProcessWhile. - -} openTempFile :: OsPath -> OsString -> IO (OsPath, Handle) -openTempFile tmp_dir template = +openTempFile tmp_dir template = do #ifdef mingw32_HOST_OS - I.openTempFile tmp_dir template + (p, h) <- I.openTempFile tmp_dir template + getLocaleEncoding >>= hSetEncoding h + pure (p, h) #else noCreateProcessWhile $ do (p, h) <- I.openTempFile tmp_dir template fd <- handleToFd h setFdOption fd CloseOnExec True h' <- fdToHandle fd + getLocaleEncoding >>= hSetEncoding h' pure (p, h') #endif +{- Wrapper around withOpenFile' that sets the locale encoding on the + - Handle. -} +withOpenFileEncoding :: OsPath -> IOMode -> Bool -> Bool -> Bool -> (Handle -> IO r) -> Bool -> IO r +withOpenFileEncoding fp iomode binary existing cloExec action close_finally = + withOpenFile' fp iomode binary existing cloExec action' close_finally + where + action' h = do + getLocaleEncoding >>= hSetEncoding h + action h + +{- Variant of withFile above that does not have the overhead of setting the + - locale encoding. Faster to use when the Handle is not used in a way that + - needs any encoding. -} +withFileNoEncoding :: OsPath -> IOMode -> (Handle -> IO r) -> IO r +withFileNoEncoding osfp iomode act = (augmentError "withFile" osfp + $ withOpenFile' osfp iomode False False closeOnExec (try . act) True) + >>= either ioError pure + +{- Variant of withFile' above that does not have the overhead of setting the + - locale encoding. Faster to use when the Handle is not used in a way that + - needs any encoding. -} +withFileNoEncoding' :: OsPath -> IOMode -> (Handle -> IO r) -> IO r +withFileNoEncoding' osfp iomode act = (augmentError "withFile'" osfp + $ withOpenFile' osfp iomode False False closeOnExec (try . act) False) + >>= either ioError pure + #endif diff --git a/doc/bugs/yt-dlp_mojibake.mdwn b/doc/bugs/yt-dlp_mojibake.mdwn index e4133f4fdc..ed7f8ac8b6 100644 --- a/doc/bugs/yt-dlp_mojibake.mdwn +++ b/doc/bugs/yt-dlp_mojibake.mdwn @@ -20,3 +20,5 @@ Unfortunatly, it is a bug in file-io: To fix it, git-annex will need to wrap file-io and call `getLocaleEncoding >>= hSetEncoding h` on each opened Handle. Or depend on a fixed version. --[[Joey]] + +> [[done]] --[[Joey]]
bug
diff --git a/doc/bugs/yt-dlp_mojibake.mdwn b/doc/bugs/yt-dlp_mojibake.mdwn new file mode 100644 index 0000000000..e4133f4fdc --- /dev/null +++ b/doc/bugs/yt-dlp_mojibake.mdwn @@ -0,0 +1,22 @@ +git-annex importfeed from an AvE video failed: + + renamePath:rename '/home/joey/lib/big/.git/annex/tmp/work.URL--yt&chttps&c%%www.youtube.com%watch,63v,61HPajFNxnuN8/ï¼ÂCorrectionï¼ hydraulic spool motor [HPajFNxnuN8].webm' to '../.git/annex/tmp/URL--yt&chttps&c%%www.youtube.com%watch,63v,61HPajFNxnuN8': does not exist (No such file or directory) + +Here's the file list: + + joey@darkstar:~/lib/big>cat .git/annex/tmp/work.URL--yt\&chttps\&c%%www.youtube.com%watch\,63v\,61HPajFNxnuN8/git-annex-file-list-file + /home/joey/lib/big/.git/annex/tmp/work.URL--yt&chttps&c%%www.youtube.com%watch,63v,61HPajFNxnuN8/*Correction* hydraulic spool motor [HPajFNxnuN8].webm + /home/joey/lib/big/.git/annex/tmp/work.URL--yt&chttps&c%%www.youtube.com%watch,63v,61HPajFNxnuN8/*Correction* hydraulic spool motor [HPajFNxnuN8].webm + +And the video was written to the file: + ".git/annex/tmp/work.URL--yt&chttps&c%%www.youtube.com%watch,63v,61HPajFNxnuN8/*Correction* hydraulic spool motor [HPajFNxnuN8].webm" + +This only affects a git-annex built with OsPath, and only recently +(not a released version). + +Unfortunatly, it is a bug in file-io: +<https://github.com/haskell/file-io/issues/45> + +To fix it, git-annex will need to wrap file-io and call +`getLocaleEncoding >>= hSetEncoding h` on each opened Handle. Or depend on +a fixed version. --[[Joey]]
require_OsPath branch
diff --git a/doc/todo/RawFilePath_conversion.mdwn b/doc/todo/RawFilePath_conversion.mdwn index b488353a60..ec61f59b0a 100644 --- a/doc/todo/RawFilePath_conversion.mdwn +++ b/doc/todo/RawFilePath_conversion.mdwn @@ -48,4 +48,8 @@ of the status. there use of FilePath remains in odd corners. These are unlikely to cause any noticiable performance impact. +The `require_OsPath` branch removes the OsPath build flag, +and merging it would resolve this. That will need packagers to do some work +to package the libraries though. --[[Joey]] + [[!tag confirmed]]
comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_21_54d9c66876137c549caafa469140904e._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_21_54d9c66876137c549caafa469140904e._comment new file mode 100644 index 0000000000..204656bf99 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_21_54d9c66876137c549caafa469140904e._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 21""" + date="2025-09-15T20:12:04Z" + content=""" +I have enabled OsPath in the build at +<https://downloads.kitenet.net/git-annex/autobuild/amd64/git-annex-standalone-amd64.tar.gz> +"""]]
update
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment index 035a891794..5992e82d7f 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment @@ -3,11 +3,10 @@ subject="""comment 19""" date="2025-09-15T18:20:21Z" content=""" -If your git-annex is not built with the OsPath build flag, -it will still not be using `O_CLOEXEC`. +Confirmed in your log that git-annex is not built with the +OsPath build flag, so it will still not be using `O_CLOEXEC`. -I'll bet it's not, since Debian doesn't have the necessary library packaged -yet.. - -Check for output from: `git-annex version | grep OsPath` +It would be good to get a build with OsPath and test it to see if my fixes +actually did work. Debian doesn't include the necessary library yet, so a +build using stack is needed. """]]
comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment new file mode 100644 index 0000000000..035a891794 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 19""" + date="2025-09-15T18:20:21Z" + content=""" +If your git-annex is not built with the OsPath build flag, +it will still not be using `O_CLOEXEC`. + +I'll bet it's not, since Debian doesn't have the necessary library packaged +yet.. + +Check for output from: `git-annex version | grep OsPath` +"""]]
drop problem end characters from filename operating on String not RawFilePath
Fix bug that could cause an invalid utf-8 sequence to be used in a
temporary filename when the input filename was valid utf-8.
Sponsored-by: k0ld
Fix bug that could cause an invalid utf-8 sequence to be used in a
temporary filename when the input filename was valid utf-8.
Sponsored-by: k0ld
diff --git a/CHANGELOG b/CHANGELOG index 371f53f30f..740253853a 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -3,6 +3,8 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * drop: --fast support when dropping from a remote. * Fix crash operating on filenames that are exactly 21 bytes long and begin with a utf-8 character. + * Fix bug that could cause an invalid utf-8 sequence to be used in a + temporary filename when the input filename was valid utf-8. * git-annex.cabal: Turn on the OsPath build flag by default. * Add build warnings when git-annex is built without the OsPath build flag. diff --git a/Utility/Tmp.hs b/Utility/Tmp.hs index 582f6849fc..df6673eadd 100644 --- a/Utility/Tmp.hs +++ b/Utility/Tmp.hs @@ -116,20 +116,29 @@ relatedTemplate' :: RawFilePath -> RawFilePath #ifndef mingw32_HOST_OS relatedTemplate' f | len > templateAddedLength = - {- Some filesystems like FAT have issues with filenames - - ending in ".", and others like VFAT don't allow a - - filename to end with trailing whitespace, so avoid - - truncating a filename to end that way. -} - let p = B.dropWhileEnd disallowed $ - truncateFilePath (len - templateAddedLength) f + let p = fixend $ truncateFilePath (len - templateAddedLength) f in if B.null p then "t" else p | otherwise = f where len = B.length f - disallowed c = c == dot || isSpace (chr (fromIntegral c)) + {- Some filesystems like FAT have issues with filenames + - ending in ".", and others like VFAT don't allow a + - filename to end with trailing whitespace, so avoid + - truncating a filename to end that way. -} + fixend p = + {- B.dropWhileEnd doesn't take wide characters + - into account, but is fast, so use it to check + - the common case. -} + let p' = B.dropWhileEnd disallowed p + in if p' == p + then p + else toRawFilePath $ reverse $ + dropWhile (disallowed . fromIntegral . ord) $ + reverse $ fromRawFilePath p dot = fromIntegral (ord '.') + disallowed c = c == dot || isSpace (chr (fromIntegral c)) #else -- Avoids a test suite failure on windows, reason unknown, but -- best to keep paths short on windows anyway. diff --git a/doc/bugs/multibyte_characters_broken.mdwn b/doc/bugs/multibyte_characters_broken.mdwn index ff868a0419..4b10d6419f 100644 --- a/doc/bugs/multibyte_characters_broken.mdwn +++ b/doc/bugs/multibyte_characters_broken.mdwn @@ -31,3 +31,5 @@ The original file obviously has a correct encoding, but it seems that git annex ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) I use git annex to manage my whole music collection successfully. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/multibyte_characters_broken/comment_1_eb421648f585296f7c44f969bdcae7a4._comment b/doc/bugs/multibyte_characters_broken/comment_1_eb421648f585296f7c44f969bdcae7a4._comment new file mode 100644 index 0000000000..8660462f46 --- /dev/null +++ b/doc/bugs/multibyte_characters_broken/comment_1_eb421648f585296f7c44f969bdcae7a4._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-15T16:59:59Z" + content=""" +git-annex actually attempts to truncate the filename taking unicode +character width into account. + +Here is the truncation on the wrong byte though: + + ghci> :t x + x :: String + ghci> x + "ingest-01-06 \19977\30707\29748\20035\12539\23500\27810\32654\26234\24693\12539\20037\24029\32190\12539\31712\21407\24693\32654\12539\28145\35211 \26792\21152 - Tuxedo Mirage.flac" + ghci> toRawFilePath x + "ingest-01-06 \228\184\137\231\159\179\231\144\180\228\185\131\227\131\187\229\175\140\230\178\162\231\190\142\230\153\186\230\129\181\227\131\187\228\185\133\229\183\157\231\182\190\227\131\187\231\175\160\229\142\159\230\129\181\231\190\142\227\131\187\230\183\177\232\166\139 \230\162\168\229\138\160 - Tuxedo Mirage.flac" + ghci> relatedTemplate (toRawFilePath x) + "ingest-01-06 \228\184\137\231\159\179\231\144\180\228\185\131\227\131\187\229\175\140\230\178\162\231\190\142\230\153\186\230\129\181\227\131\187\228\185\133\229\183\157\231\182\190\227\131\187\231\175\160\229\142\159\230\129\181\231\190\142\227\131\187\230\183\177\232\166\139 \230\162\168\229\138" + +What is going on is that '\160` is a space character, and filesystems like +FAT do not allow a filename to end with a space. So relatedTemplate trims +off trailing spaces, and accidentially trimmed off this byte, despite it +being part of a multibyte sequence. + +Aren't filesystems with arbitrary limitations on what valid filenames are fun? + +Fixed this. +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_18_3cddb2113a962827b495ae71f686453e._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_18_3cddb2113a962827b495ae71f686453e._comment new file mode 100644 index 0000000000..742a6d5403 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_18_3cddb2113a962827b495ae71f686453e._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 18" + date="2025-09-15T17:11:51Z" + content=""" +I think so -- I posted a [full log](http://www.oneukrainian.com/tmp/2025.09.11T11.15.27-2500297_stdout) now to check +"""]]
comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_17_bebe0ee51ba6a6c23ee3e4e5999d575b._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_17_bebe0ee51ba6a6c23ee3e4e5999d575b._comment new file mode 100644 index 0000000000..61164d930a --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_17_bebe0ee51ba6a6c23ee3e4e5999d575b._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 17""" + date="2025-09-15T16:07:35Z" + content=""" +Drat. Reopened bug. + +Is the error for these still "export.ex [...] Device or resource busy"? + +If so, the problem must not be beegfs not liking an open file to be +renamed, but something else. + +I have verified that the temp file that gets renamed to the "export.ex" +log file is now opened with `O_CLOEXEC`. +"""]]
reopen
diff --git a/doc/bugs/35_failed_tests_on_beegfs.mdwn b/doc/bugs/35_failed_tests_on_beegfs.mdwn index 557ecc8d53..28e1babc48 100644 --- a/doc/bugs/35_failed_tests_on_beegfs.mdwn +++ b/doc/bugs/35_failed_tests_on_beegfs.mdwn @@ -81,5 +81,3 @@ upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 [[!meta author=yoh]] [[!tag projects/repronim]] - -> [[fixed|done]] --[[Joey]]
fix p2phttp worker thread leak with deleted repository LOCKCONTENT
p2phttp: Fix a hang that could occur when used with --directory, and a
repository in the repository got removed.
It could leak up to -J number of worker threads, but this only affected a
client trying to access the deleted repository.
It may be that this could also affect a non-deleted repository, and also
leak a worker thread, if invalid p2p protocol is sent.
p2phttp: Fix a hang that could occur when used with --directory, and a
repository in the repository got removed.
It could leak up to -J number of worker threads, but this only affected a
client trying to access the deleted repository.
It may be that this could also affect a non-deleted repository, and also
leak a worker thread, if invalid p2p protocol is sent.
diff --git a/CHANGELOG b/CHANGELOG index cb3a0e0c15..371f53f30f 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -13,6 +13,8 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * Improve performance when used with a local git remote that has a large working tree. * Removed support for building with cryptonite, use crypton. + * p2phttp: Fix a hang that could occur when used with --directory, + and a repository in the repository got removed. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs index 6e3d530303..88e6fa3367 100644 --- a/P2P/Http/Server.hs +++ b/P2P/Http/Server.hs @@ -477,14 +477,19 @@ serveLockContent mst su apiver (B64Key k) cu bypass sec auth = do let lock = do lockresv <- newEmptyTMVarIO unlockv <- newEmptyTMVarIO + -- A single worker thread takes the lock, and keeps running +- -- until unlock in order to keep the lock held. annexworker <- async $ inAnnexWorker st $ do lockres <- runFullProto (clientRunState conn) (clientP2PConnection conn) $ do net $ sendMessage (LOCKCONTENT k) checkSuccess liftIO $ atomically $ putTMVar lockresv lockres - liftIO $ atomically $ takeTMVar unlockv - void $ runFullProto (clientRunState conn) (clientP2PConnection conn) $ do - net $ sendMessage UNLOCKCONTENT + case lockres of + Right True -> do + liftIO $ atomically $ takeTMVar unlockv + void $ runFullProto (clientRunState conn) (clientP2PConnection conn) $ do + net $ sendMessage UNLOCKCONTENT + _ -> return () atomically (takeTMVar lockresv) >>= \case Right True -> return (Just (annexworker, unlockv)) _ -> return Nothing diff --git a/doc/todo/p2phttp_serve_multiple_repositories.mdwn b/doc/todo/p2phttp_serve_multiple_repositories.mdwn index f2ad9c752e..47cf8ad8fd 100644 --- a/doc/todo/p2phttp_serve_multiple_repositories.mdwn +++ b/doc/todo/p2phttp_serve_multiple_repositories.mdwn @@ -19,3 +19,5 @@ I asked matrss if this would be useful for forgejo-aneksajo and he said very useful, although I think I can work with the limitation [of only 1]." [[!tag projects/INM7]] + +> [[done]] --[[Joey]] diff --git a/doc/todo/p2phttp_serve_multiple_repositories/comment_3_429520e5411c5785b63598ffee7dbb95._comment b/doc/todo/p2phttp_serve_multiple_repositories/comment_3_429520e5411c5785b63598ffee7dbb95._comment new file mode 100644 index 0000000000..c3dee29010 --- /dev/null +++ b/doc/todo/p2phttp_serve_multiple_repositories/comment_3_429520e5411c5785b63598ffee7dbb95._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-09-15T15:18:15Z" + content=""" +Seems the bug is specific to LOCKCONTENT. When doing other operations, +like CHECKPRESENT after the repo is deleted, the server returns +FAILURE and continues being able to serve more requests for that repo. + +Ah, the problem is that serveLockContent is running a block of actions in +a single inAnnexWorker call, which first sends on the LOCKCONTENT, then +blocks waiting for the unlock to arrive. Which never happens, so it remains +blocked there forever, consuming a worker thread. + +Fixed that, finally. +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_16_db166b7303911b63ec458dfb5309862a._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_16_db166b7303911b63ec458dfb5309862a._comment new file mode 100644 index 0000000000..02a8f992bd --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_16_db166b7303911b63ec458dfb5309862a._comment @@ -0,0 +1,36 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 16" + date="2025-09-11T15:46:52Z" + content=""" +I have reran with freshish build 10.20250828+git58-g38786a4e5e-1~ndall+1 and still observe those FAILs as before IIRC + +```shell +$> show-paths -e FAIL -f full-lines .duct/logs/2025.09.11T11.15.27-2500297_stdout +1016 Tests +1017 Repo Tests v10 locked +1025: git-remote-annex exporttree: FAIL (3.60s) +1206 Tests +1207 Repo Tests v10 locked +1215: export and import of subdir: FAIL (7.19s) +1225 Tests +1226 Repo Tests v10 locked +1234: export and import: FAIL (4.91s) +1268 Tests +1269 Repo Tests v10 adjusted unlocked branch +1277: git-remote-annex exporttree: FAIL (4.68s) +1299 Tests +1300 Repo Tests v10 unlocked +1308: export and import of subdir: FAIL (10.38s) +1330 Tests +1331 Repo Tests v10 unlocked +1339: export and import: FAIL (10.19s) +1373 Tests +1374 Repo Tests v10 adjusted unlocked branch +1382: export and import: FAIL (6.96s) +1428 Tests +1429 Repo Tests v10 adjusted unlocked branch +1437: export and import of subdir: FAIL (10.63s) +``` +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_15_86bd1e45651c6153128775af2c4eab57._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_15_86bd1e45651c6153128775af2c4eab57._comment new file mode 100644 index 0000000000..e0cbf3b0e8 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_15_86bd1e45651c6153128775af2c4eab57._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 15" + date="2025-09-11T12:30:49Z" + content=""" +FTR: fixed in [10.20250828-58-g38786a4e5e](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=38786a4e5ec2dd697d2abf1ee93a927a9e9fcf41) +"""]]
diff --git a/doc/bugs/multibyte_characters_broken.mdwn b/doc/bugs/multibyte_characters_broken.mdwn index 9eedac0c7e..ff868a0419 100644 --- a/doc/bugs/multibyte_characters_broken.mdwn +++ b/doc/bugs/multibyte_characters_broken.mdwn @@ -2,20 +2,23 @@ git annex add is not fully compatible with multibyte-characters in filenames and may generate filenames with invalid character sequences. ### What steps will reproduce the problem? +``` $ git init test; cd test $ git annex init test $ echo bla > 01-06\ 三石琴乃・富沢美智恵・久川綾・篠原恵美・深見梨加\ -\ Tuxedo\ Mirage.flac $ git annex add 01* - +``` The last command generates an invalid character sequence as filename which, depending on the filesystem, may cause an error: Example output: + +``` add "01-06 \344\270\211\347\237\263\347\220\264\344\271\203\343\203\273\345\257\214\346\262\242\347\276\216\346\231\272\346\201\265\343\203\273\344\271\205\345\267\235\347\266\276\343\203\273\347\257\240\345\216\237\346\201\265\347\276\216\343\203\273\346\267\261\350\246\213\346\242\250\345\212\240 - Tuxedo Mirage.flac" .git/annex/othertmp/: openTempFile template ingest-01-06 三石琴乃・富沢美智恵・久川綾・篠原恵美・深見梨�: invalid argument (Invalid or incomplete multibyte or wide character) failed add: 1 failed - +``` ### What version of git-annex are you using? On what operating system? git annex 10.20250630
diff --git a/doc/bugs/multibyte_characters_broken.mdwn b/doc/bugs/multibyte_characters_broken.mdwn new file mode 100644 index 0000000000..9eedac0c7e --- /dev/null +++ b/doc/bugs/multibyte_characters_broken.mdwn @@ -0,0 +1,30 @@ +### Please describe the problem. +git annex add is not fully compatible with multibyte-characters in filenames and may generate filenames with invalid character sequences. + +### What steps will reproduce the problem? +$ git init test; cd test +$ git annex init test +$ echo bla > 01-06\ 三石琴乃・富沢美智恵・久川綾・篠原恵美・深見梨加\ -\ Tuxedo\ Mirage.flac +$ git annex add 01* + +The last command generates an invalid character sequence as filename which, depending on the filesystem, may cause an error: + +Example output: +add "01-06 \344\270\211\347\237\263\347\220\264\344\271\203\343\203\273\345\257\214\346\262\242\347\276\216\346\231\272\346\201\265\343\203\273\344\271\205\345\267\235\347\266\276\343\203\273\347\257\240\345\216\237\346\201\265\347\276\216\343\203\273\346\267\261\350\246\213\346\242\250\345\212\240 - Tuxedo Mirage.flac" + .git/annex/othertmp/: openTempFile template ingest-01-06 三石琴乃・富沢美智恵・久川綾・篠原恵美・深見梨�: invalid argument (Invalid or incomplete multibyte or wide character) + +failed +add: 1 failed + + +### What version of git-annex are you using? On what operating system? +git annex 10.20250630 +NixOS 25.11pre851350.3b9f00d7a7bf + +### Please provide any additional information below. + +Creation of the file fails due to zfs being set to only accept valid utf-8 filenames (utf8only=on, normalization=formD), which greatly helps me detecting encoding issues in filenames. +The original file obviously has a correct encoding, but it seems that git annex generates a new filename by just cutting of the filename after a specific byte, instead of taking character lengths into account. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +I use git annex to manage my whole music collection successfully.
Added a comment: git annex bundle - questions
diff --git a/doc/forum/Equivalent_to_git_bundle__63__/comment_3_2935498815a3de295b7d573f28e12fdc._comment b/doc/forum/Equivalent_to_git_bundle__63__/comment_3_2935498815a3de295b7d573f28e12fdc._comment new file mode 100644 index 0000000000..c2d1dd3414 --- /dev/null +++ b/doc/forum/Equivalent_to_git_bundle__63__/comment_3_2935498815a3de295b7d573f28e12fdc._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="git annex bundle - questions" + date="2025-09-11T08:02:08Z" + content=""" +I'm also interested in this feature, because I'd got git annex repo corrupted a couple of times due to a power loss. + +But it's not clear enough yet how it's supposed to work. Should it create an archive containing a git-bundle + annexed files containing ONLY files in this bundle commit range? + +It might be possible to write a script that does exactly that, but having something integrated into git-annex itself could be a bonus with cross-platform support (available on both windows and linux), standardized and ready for archival (e.g. bundles can be written periodically onto m-discs). + +I'm also wondering what if a current repository get corrupted (at least partially), will git annex be able to \"restore\" it's state after git-bundle-restore? +"""]]
Added a comment: resolved
diff --git a/doc/forum/Current_git-annex_downloads_aren__39__t_available__63__/comment_1_570c154278cd2e94bcaf03eefeb9126e._comment b/doc/forum/Current_git-annex_downloads_aren__39__t_available__63__/comment_1_570c154278cd2e94bcaf03eefeb9126e._comment new file mode 100644 index 0000000000..e4b8f2209c --- /dev/null +++ b/doc/forum/Current_git-annex_downloads_aren__39__t_available__63__/comment_1_570c154278cd2e94bcaf03eefeb9126e._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="resolved" + date="2025-09-11T05:12:29Z" + content=""" +Seems like it's fine now, thanks. +"""]]
noCreateProcessWhile to fix close-on-exec races
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/Annex/Multicast.hs b/Annex/Multicast.hs index 0af2d888db..a559c76c23 100644 --- a/Annex/Multicast.hs +++ b/Annex/Multicast.hs @@ -1,18 +1,23 @@ {- git-annex multicast receive callback - - - Copyright 2017 Joey Hess <id@joeyh.name> + - Copyright 2017-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} +{-# LANGUAGE CPP #-} + module Annex.Multicast where import Common import Annex.Path import Utility.Env -import Utility.Process -import GHC.IO.Handle.FD +#ifndef mingw32_HOST_OS +import System.Posix.IO +#else +import System.Process (createPipeFd) +#endif multicastReceiveEnv :: String multicastReceiveEnv = "GIT_ANNEX_MULTICAST_RECEIVE" @@ -20,8 +25,14 @@ multicastReceiveEnv = "GIT_ANNEX_MULTICAST_RECEIVE" multicastCallbackEnv :: IO (OsPath, [(String, String)], Handle) multicastCallbackEnv = do gitannex <- programPath - -- This will even work on Windows +#ifndef mingw32_HOST_OS + (rfd, wfd) <- noCreateProcessWhile $ do + (rfd, wfd) <- createPipe + setFdOption rfd CloseOnExec True + return (rfd, wfd) +#else (rfd, wfd) <- createPipeFd +#endif rh <- fdToHandle rfd environ <- addEntry multicastReceiveEnv (show wfd) <$> getEnvironment return (gitannex, environ, rh) diff --git a/Remote/Directory.hs b/Remote/Directory.hs index 75ec9b09cd..f204d50bf4 100644 --- a/Remote/Directory.hs +++ b/Remote/Directory.hs @@ -470,7 +470,7 @@ retrieveExportWithContentIdentifierM ii dir cow loc cids dest gk p = docopynoncow iv = do #ifndef mingw32_HOST_OS - let open = do + let open = noCreateProcessWhile $ do fd <- openFdWithMode f' ReadOnly Nothing defaultFileFlags (CloseOnExecFlag True) -- Need a duplicate fd for the post check. diff --git a/Utility/FileIO/CloseOnExec.hs b/Utility/FileIO/CloseOnExec.hs index a638ea2d9b..29e7c4b08a 100644 --- a/Utility/FileIO/CloseOnExec.hs +++ b/Utility/FileIO/CloseOnExec.hs @@ -42,6 +42,7 @@ import qualified Data.ByteString as BS import qualified Data.ByteString.Lazy as BSL #ifndef mingw32_HOST_OS import System.Posix.IO +import Utility.Process #endif closeOnExec :: Bool @@ -92,24 +93,22 @@ appendFile' :: OsPath -> BS.ByteString -> IO () appendFile' fp contents = withFile fp AppendMode (`BS.hPut` contents) -{- Unlike all other functions in this module, this only sets the - - close-on-exec flag after opening the file. Thus, it is vulnerable to - - races. - - - - Re-implementing openTempFile is difficult due to the current +{- Re-implementing openTempFile is difficult due to the current - structure of file-io. See this issue for discussion about improving - that: https://github.com/haskell/file-io/issues/44 + - So, instead this uses noCreateProcessWhile. - -} openTempFile :: OsPath -> OsString -> IO (OsPath, Handle) -openTempFile tmp_dir template = do - (p, h) <- I.openTempFile tmp_dir template -#ifndef mingw32_HOST_OS - fd <- handleToFd h - setFdOption fd CloseOnExec True - h' <- fdToHandle fd - pure (p, h') +openTempFile tmp_dir template = +#ifdef mingw32_HOST_OS + I.openTempFile tmp_dir template #else - pure (p, h) + noCreateProcessWhile $ do + (p, h) <- I.openTempFile tmp_dir template + fd <- handleToFd h + setFdOption fd CloseOnExec True + h' <- fdToHandle fd + pure (p, h') #endif #endif diff --git a/Utility/Gpg.hs b/Utility/Gpg.hs index 6c13392032..2566bfdf85 100644 --- a/Utility/Gpg.hs +++ b/Utility/Gpg.hs @@ -162,8 +162,10 @@ feedRead cmd params passphrase feeder reader = do #ifndef mingw32_HOST_OS let setup = liftIO $ do -- pipe the passphrase into gpg on a fd - (frompipe, topipe) <- System.Posix.IO.createPipe - setFdOption topipe CloseOnExec True + (frompipe, topipe) <- noCreateProcessWhile $ do + (frompipe, topipe) <- System.Posix.IO.createPipe + setFdOption topipe CloseOnExec True + return (frompipe, topipe) toh <- fdToHandle topipe t <- async $ do B.hPutStr toh (passphrase <> "\n") diff --git a/Utility/Process.hs b/Utility/Process.hs index 81fbef30bd..6052c7186b 100644 --- a/Utility/Process.hs +++ b/Utility/Process.hs @@ -1,5 +1,6 @@ {- System.Process enhancements, including additional ways of running - - processes, and logging. + - processes, logging, and amelorations for cases where FDs are not able to + - be opened with close-on-exec. - - Copyright 2012-2025 Joey Hess <id@joeyh.name> - @@ -21,6 +22,7 @@ module Utility.Process ( forceSuccessProcess', checkSuccessProcess, withNullHandle, + noCreateProcessWhile, createProcess, withCreateProcess, waitForProcess, @@ -46,7 +48,9 @@ import System.Exit import System.IO import Control.Monad.IO.Class import Control.Concurrent.Async +import Control.Concurrent import qualified Data.ByteString as S +import System.IO.Unsafe (unsafePerformIO) data StdHandle = StdinHandle | StdoutHandle | StderrHandle deriving (Eq) @@ -173,9 +177,34 @@ startInteractiveProcess cmd args environ = do (Just from, Just to, _, pid) <- createProcess p return (pid, to, from) --- | Wrapper around 'System.Process.createProcess' that does debug logging. +-- | Runs an action, preventing any new processes from being started +-- until it is finished. +-- +-- Unfortunately, Haskell has a pervasive problem with the close-on-exec +-- flag not being set when opening files. It's also difficult to portably +-- dup or pipe a FD with the close-on-exec flag set. So, this can be used +-- to run an action that opens a FD, and then calls setFdOption to set the +-- close-on-exec flag, without risking a race with a process being forked +-- at the same time. +-- +-- Note that only one of these actions can run at a time, and long-duration +-- actions are not advisable. +noCreateProcessWhile :: (MonadIO m, MonadMask m) => (m a) -> m a +noCreateProcessWhile = bracket setup cleanup . const + where + setup = liftIO $ takeMVar createProcessSem + cleanup () = liftIO $ putMVar createProcessSem () + +-- | A shared global MVar. Processes are not created while it is empty. +{-# NOINLINE createProcessSem #-} +createProcessSem :: MVar () +createProcessSem = unsafePerformIO $ newMVar () + +-- | Wrapper around 'System.Process.createProcess'. +-- This adds debug logging, and avoids starting a process when in a +-- noCreateProcessWhile block. createProcess :: CreateProcess -> IO (Maybe Handle, Maybe Handle, Maybe Handle, ProcessHandle) -createProcess p = do +createProcess p = noCreateProcessWhile $ do r@(_, _, _, h) <- Utility.Process.Shim.createProcess p debugProcess p h return r diff --git a/Utility/Process/Transcript.hs b/Utility/Process/Transcript.hs index 7bf94ffa05..cb71e30b91 100644 --- a/Utility/Process/Transcript.hs +++ b/Utility/Process/Transcript.hs @@ -45,7 +45,7 @@ processTranscript'' cp input = do #ifndef mingw32_HOST_OS {- This implementation interleves stdout and stderr in exactly the order - the process writes them. -} (Diff truncated)
Improve performance when used with a local git remote that has a large working tree
git write-tree was being run once per file git-annex acts on when eg,
getting files, which is slow when the remote repository has a large
tree.
onLocal calls quiesce after each action, and quiesce closes the keys db
since [[!commit ba7ecbc6a9c]]. Which has a relevant comment about
performance. I have not addressed that, the keys db still gets closed and
reopened after each file.
Turns out that, since git write-tree was run by each call to
reconcileStaged, the .git/annex/keysdb.cache value was never the
same as the git index's inode. Because git write-tree updates the index's
mtime even when no changes have been made.
And so, when the database got closed and reopened, reconcileStaged would
see a changed index, and run git write-tree again. Over and over.
I considered writing the index's new inodecache after write-tree to the
keysdb.cache, but that would be vulnerable to a race, if the index was
changed just after write-tree.
The fix was to stop using keysb.cache at all. When the database is closed
and later reopened by the same process, avoid re-doing reconcileStaged.
Now that .git/annex/keysdb.cache is no longer used. It could be removed,
but the time overhead of removing it would be more than the space overhead
of keeping it. Defferred removal to the v11 upgrade.
Sponsored-by: unqueued
git write-tree was being run once per file git-annex acts on when eg,
getting files, which is slow when the remote repository has a large
tree.
onLocal calls quiesce after each action, and quiesce closes the keys db
since [[!commit ba7ecbc6a9c]]. Which has a relevant comment about
performance. I have not addressed that, the keys db still gets closed and
reopened after each file.
Turns out that, since git write-tree was run by each call to
reconcileStaged, the .git/annex/keysdb.cache value was never the
same as the git index's inode. Because git write-tree updates the index's
mtime even when no changes have been made.
And so, when the database got closed and reopened, reconcileStaged would
see a changed index, and run git write-tree again. Over and over.
I considered writing the index's new inodecache after write-tree to the
keysdb.cache, but that would be vulnerable to a race, if the index was
changed just after write-tree.
The fix was to stop using keysb.cache at all. When the database is closed
and later reopened by the same process, avoid re-doing reconcileStaged.
Now that .git/annex/keysdb.cache is no longer used. It could be removed,
but the time overhead of removing it would be more than the space overhead
of keeping it. Defferred removal to the v11 upgrade.
Sponsored-by: unqueued
diff --git a/Annex/Locations.hs b/Annex/Locations.hs index 6d1d8804cc..9ce7a70a94 100644 --- a/Annex/Locations.hs +++ b/Annex/Locations.hs @@ -47,7 +47,6 @@ module Annex.Locations ( gitAnnexUnusedLog, gitAnnexKeysDbDir, gitAnnexKeysDbLock, - gitAnnexKeysDbIndexCache, gitAnnexFsckState, gitAnnexFsckDbDir, gitAnnexFsckDbDirOld, @@ -411,11 +410,6 @@ gitAnnexKeysDbDir r c = gitAnnexKeysDbLock :: Git.Repo -> GitConfig -> OsPath gitAnnexKeysDbLock r c = gitAnnexKeysDbDir r c <> literalOsPath ".lck" -{- Contains the stat of the last index file that was - - reconciled with the keys database. -} -gitAnnexKeysDbIndexCache :: Git.Repo -> GitConfig -> OsPath -gitAnnexKeysDbIndexCache r c = gitAnnexKeysDbDir r c <> literalOsPath ".cache" - {- .git/annex/fsck/uuid/ is used to store information about incremental - fscks. -} gitAnnexFsckDir :: UUID -> Git.Repo -> Maybe GitConfig -> OsPath diff --git a/CHANGELOG b/CHANGELOG index 9549c28264..be8cdacf1a 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -10,6 +10,8 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * Avoid leaking file descriptors to child processes started by git-annex in some situations. Note that when not built with the OsPath build flag, these leaks can still happen. + * Improve performance when used with a local git remote that has a + large working tree. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/Database/Keys.hs b/Database/Keys.hs index 93e659f445..f17d07dbe7 100644 --- a/Database/Keys.hs +++ b/Database/Keys.hs @@ -47,7 +47,6 @@ import Git import Git.FilePath import Git.Command import Git.Types -import Git.Index import Git.Sha import Git.CatFile import Git.Branch (writeTreeQuiet, update') @@ -81,8 +80,8 @@ runReader t a = do else return tableschanged v <- a (SQL.ReadHandle qh) return (v, DbOpen (qh, tableschanged')) - go DbClosed = do - st <- openDb False DbClosed + go startst@(DbClosed _) = do + st <- openDb False startst v <- case st of (DbOpen (qh, _)) -> a (SQL.ReadHandle qh) _ -> return mempty @@ -124,7 +123,11 @@ runWriterIO t a = runWriter t (liftIO . a) openDb :: Bool -> DbState -> Annex DbState openDb _ st@(DbOpen _) = return st openDb False DbUnavailable = return DbUnavailable -openDb forwrite _ = do +openDb forwrite (DbClosed wasopen) = openDb' forwrite wasopen +openDb forwrite DbUnavailable = openDb' forwrite (DbWasOpen False) + +openDb' :: Bool -> DbWasOpen -> Annex DbState +openDb' forwrite wasopen = do lck <- calcRepo' gitAnnexKeysDbLock catchPermissionDenied permerr $ withExclusiveLock lck $ do dbdir <- calcRepo' gitAnnexKeysDbDir @@ -144,7 +147,7 @@ openDb forwrite _ = do open db dbisnew = do qh <- liftIO $ H.openDbQueue db SQL.containedTable - tc <- reconcileStaged dbisnew qh + tc <- reconcileStaged dbisnew qh wasopen return $ DbOpen (qh, tc) {- Closes the database if it was open. Any writes will be flushed to it. @@ -238,8 +241,8 @@ isInodeKnown i s = or <$> runReaderIO ContentTable - This is run with a lock held, so only one process can be running this at - a time. - - - To avoid unnecessary work, the index file is statted, and if it's not - - changed since last time this was run, nothing is done. + - If the database gets closed and then reopened by the same process, this + - will avoid doing any repeated work. - - A tree is generated from the index, and the diff between that tree - and the last processed tree is examined for changes. @@ -259,30 +262,19 @@ isInodeKnown i s = or <$> runReaderIO ContentTable - So when using getAssociatedFiles, have to make sure the file still - is an associated file. -} -reconcileStaged :: Bool -> H.DbQueue -> Annex DbTablesChanged -reconcileStaged dbisnew qh = ifM isBareRepo +reconcileStaged :: Bool -> H.DbQueue -> DbWasOpen -> Annex DbTablesChanged +reconcileStaged _ _ (DbWasOpen True) = + return (DbTablesChanged False False) +reconcileStaged dbisnew qh _ = ifM isBareRepo ( return mempty - , do - gitindex <- inRepo currentIndexFile - indexcache <- calcRepo' gitAnnexKeysDbIndexCache - withTSDelta (liftIO . genInodeCache gitindex) >>= \case - Just cur -> readindexcache indexcache >>= \case - Nothing -> go cur indexcache =<< getindextree - Just prev -> ifM (compareInodeCaches prev cur) - ( return mempty - , go cur indexcache =<< getindextree - ) - Nothing -> return mempty + , go =<< getindextree ) where lastindexref = Ref "refs/annex/last-index" - readindexcache indexcache = liftIO $ maybe Nothing readInodeCache - <$> catchMaybeIO (readFileString indexcache) - getoldtree = fromMaybe emptyTree <$> inRepo (Git.Ref.sha lastindexref) - go cur indexcache (Just newtree) = do + go (Just newtree) = do oldtree <- getoldtree when (oldtree /= newtree) $ do fastDebug "Database.Keys" "reconcileStaged start" @@ -292,7 +284,6 @@ reconcileStaged dbisnew qh = ifM isBareRepo (Just (fromRef oldtree)) (fromRef newtree) (procdiff mdfeeder) - liftIO $ writeFileString indexcache $ showInodeCache cur -- Storing the tree in a ref makes sure it does not -- get garbage collected, and is available to diff -- against next time. @@ -309,7 +300,7 @@ reconcileStaged dbisnew qh = ifM isBareRepo -- When there is a merge conflict, that will not see the new local -- version of the files that are conflicted. So a second diff -- is done, with --staged but no old tree. - go _ _ Nothing = do + go Nothing = do fastDebug "Database.Keys" "reconcileStaged start (in conflict)" oldtree <- getoldtree g <- Annex.gitRepo diff --git a/Database/Keys/Handle.hs b/Database/Keys/Handle.hs index 1e4a85427b..70e28ab441 100644 --- a/Database/Keys/Handle.hs +++ b/Database/Keys/Handle.hs @@ -9,6 +9,7 @@ module Database.Keys.Handle ( DbHandle, newDbHandle, DbState(..), + DbWasOpen(..), withDbState, flushDbQueue, closeDbHandle, @@ -30,10 +31,16 @@ newtype DbHandle = DbHandle (MVar DbState) -- The database can be closed or open, but it also may have been -- tried to open (for read) and didn't exist yet or is not readable. -data DbState = DbClosed | DbOpen (H.DbQueue, DbTablesChanged) | DbUnavailable +data DbState + = DbClosed DbWasOpen + | DbOpen (H.DbQueue, DbTablesChanged) + | DbUnavailable + +-- Was the database previously opened by this process? +data DbWasOpen = DbWasOpen Bool newDbHandle :: IO DbHandle -newDbHandle = DbHandle <$> newMVar DbClosed +newDbHandle = DbHandle <$> newMVar (DbClosed (DbWasOpen False)) -- Runs an action on the state of the handle, which can change its state. -- The MVar is empty while the action runs, which blocks other users @@ -65,5 +72,5 @@ closeDbHandle h = withDbState h go where go (DbOpen (qh, _)) = do H.closeDbQueue qh - return ((), DbClosed) + return ((), DbClosed (DbWasOpen True)) go st = return ((), st) diff --git a/Test.hs b/Test.hs index 43a15f8952..6de37709dc 100644 --- a/Test.hs +++ b/Test.hs @@ -896,8 +896,6 @@ test_lock_force = intmpclonerepo $ do Just k <- Annex.WorkTree.lookupKey (toOsPath annexedfile) Database.Keys.removeInodeCaches k Database.Keys.closeDb - liftIO . removeWhenExistsWith removeFile - =<< Annex.calcRepo' Annex.Locations.gitAnnexKeysDbIndexCache writecontent annexedfile "test_lock_force content" git_annex_shouldfail "lock" [annexedfile] "lock of modified file should not be allowed" git_annex "lock" ["--force", annexedfile] "lock --force of modified file" diff --git a/doc/bugs/get_from_local_git_remote_slow_because_reconcileStaged_runs_for_each_file.mdwn b/doc/bugs/get_from_local_git_remote_slow_because_reconcileStaged_runs_for_each_file.mdwn index efdd32bd62..1d49b66bdf 100644 (Diff truncated)
bug report
diff --git a/doc/bugs/get_from_local_git_remote_slow_because_reconcileStaged_runs_for_each_file.mdwn b/doc/bugs/get_from_local_git_remote_slow_because_reconcileStaged_runs_for_each_file.mdwn new file mode 100644 index 0000000000..efdd32bd62 --- /dev/null +++ b/doc/bugs/get_from_local_git_remote_slow_because_reconcileStaged_runs_for_each_file.mdwn @@ -0,0 +1,5 @@ +reconcileStaged runs `git write-tree` to see if anything has changed, which +can be slow in large repos. + +It should be possible for Remote.Git to cache the state so this doesn't +happen once per file. --[[Joey]]
initial report on problem with # in the path
diff --git a/doc/bugs/fails_to_discover_uuid_over_ssh_with___35___in_path_.mdwn b/doc/bugs/fails_to_discover_uuid_over_ssh_with___35___in_path_.mdwn new file mode 100644 index 0000000000..bd5130b64f --- /dev/null +++ b/doc/bugs/fails_to_discover_uuid_over_ssh_with___35___in_path_.mdwn @@ -0,0 +1,74 @@ +### Please describe the problem. + +if remote path has some folder name starting with `#` (may be also anywhere, didn't check) -- annex fails to discover UUID + +<details> +<summary>full reproducer which you can use to investigate more</summary> + +```shell +#!/bin/bash +set -eux + +cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)" + +pwd=$(pwd) +( +mkdir repo +cd repo +git init +git annex init + +echo text1 > text +git annex add text +git commit -m 'initial small' +) + +for d in 'clone1' '#clone2'; do + ( + mkdir "$d" + cd "$d" + git init + git annex init + ) + + r="${d//#/}" + ( + cd repo + git remote add "$r" localhost:"$pwd/$d" + git annex sync + echo -n "!!!! Remote $d. annex uuid: " + git config "remote.$r.annex-uuid" + ) +done + +``` +</details> + +which shows + +``` ++ git remote add clone2 localhost:/home/yoh/.tmp/dl-nD3k2cN/#clone2 ++ git annex sync + + Unable to parse git config from clone2 + + Remote clone2 does not have git-annex installed; setting annex-ignore + + This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote clone2 +... +!!!! Remote #clone2. annex uuid: + git config remote.clone2.annex-uuid +``` + +### What version of git-annex are you using? On what operating system? + +Originally was a bit older, now tried with bleeding edge + +``` +❯ git annex version +git-annex version: 10.20250828+git47-gab9bbeabd5-1~ndall+1 +``` + +FTR: I was trying to backup some old behavioral videos (octopus) from the laptop under `#video` folder which was reproduced on remote end as well. + +[[!meta author=yoh]] +[[!tag projects/dandi]]
Added a comment
diff --git a/doc/todo/allow_configuring_assistant_to_add_files_locked/comment_1_c51b5a17100ce1a1a9853cfc441cca7e._comment b/doc/todo/allow_configuring_assistant_to_add_files_locked/comment_1_c51b5a17100ce1a1a9853cfc441cca7e._comment new file mode 100644 index 0000000000..aff9f24639 --- /dev/null +++ b/doc/todo/allow_configuring_assistant_to_add_files_locked/comment_1_c51b5a17100ce1a1a9853cfc441cca7e._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 1" + date="2025-09-09T12:47:25Z" + content=""" +what do you think would be amount of effort needed for this feature? asking since coming into cases of mixed locked/unlocked files since some times use \"manual\" invocations for git annex and some times assistant , and that mixes two types of files annoyingly, and \"unlocked\" is not really needed/desired in most of my scenarios. +"""]]
diff --git a/doc/users/jkrebian.mdwn b/doc/users/jkrebian.mdwn new file mode 100644 index 0000000000..ebfd6fec4b --- /dev/null +++ b/doc/users/jkrebian.mdwn @@ -0,0 +1,3 @@ +Hi, I'am from Europe. +Starting to learn about web natures. +Please, kindly help me if I have...☺️☺️
Added a comment
diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_3_0d21d29d2f90bf8c0c105b9b1c737fbc._comment b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_3_0d21d29d2f90bf8c0c105b9b1c737fbc._comment new file mode 100644 index 0000000000..bfa627fb4e --- /dev/null +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_3_0d21d29d2f90bf8c0c105b9b1c737fbc._comment @@ -0,0 +1,81 @@ +[[!comment format=mdwn + username="babudarabu@232a9694ce5401143f6210561371f887dd15cd61" + nickname="babudarabu" + avatar="http://cdn.libravatar.org/avatar/b1563172cc335380f1582d960c44c7a4" + subject="comment 3" + date="2025-09-08T17:13:02Z" + content=""" +I'm experiencing this as well; mostly with filenames that have CJK characters in them, but also a couple using other non-ASCII symbols. I think nobodyinperson already confirmed this, but it seems like the contents of the file don't matter, just the filename. It also doesn't seem to matter whether or not it's in a subdirectory. + +```sh +$ touch '♭5 01-010 Drive.mp3' +$ git annex add '♭5 01-010 Drive.mp3' +add \"\342\231\2555 01-010 Drive.mp3\" + +git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists) +failed +add: 1 failed + +$ mkdir flat\ 5 +$ touch 'flat 5/♭5 01-010 Drive.mp3' +$ git annex add 'flat 5/♭5 01-010 Drive.mp3' +add \"flat 5/\342\231\2555 01-010 Drive.mp3\" + +git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists) +failed +add: 1 failed +``` + +There's something to do with filename length, too? Changing the extension but keeping the character count the same doesn't fix the issue, but using a shorter extension does: + +```sh +$ touch '♭5 01-010 Drive.mp4' +$ git annex add '♭5 01-010 Drive.mp4' +add \"\342\231\2555 01-010 Drive.mp4\" + +git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists) +failed +add: 1 failed + +$ touch '♭5 01-010 Drive.mp5' +$ git annex add '♭5 01-010 Drive.mp5' +add \"\342\231\2555 01-010 Drive.mp5\" + +git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists) +failed +add: 1 failed + +$ touch '♭5 01-010 Drive.mp' +$ git annex add '♭5 01-010 Drive.mp' +add \"\342\231\2555 01-010 Drive.mp\" +ok +(recording state in git...) + +$ touch '♭5 01-010 Drive.png' +$ git annex add '♭5 01-010 Drive.png' +add \"\342\231\2555 01-010 Drive.png\" + +git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists) +failed +add: 1 failed +``` + +Here are all the problematic filenames I've found so far. They all seem to *start* with a non-ASCII character, not sure if that's relevant. + +``` +⊿ 01-012 願い.mp3 +♭5 01-001 Stars.mp3 +♭5 01-003 Olive.mp3 +♭5 01-006 手紙.mp3 +♭5 01-009 追憶.mp3 +♭5 01-010 Drive.mp3 +何者 01-003 FREE.mp3 +何者 01-009 JET.mp3 +何者 01-014 POSE.mp3 +彩 01-001 Change.mp3 +彩 01-003 予言.mp3 +旅 01-003 微熱.mp3 +日常 01-007 Zzz.mp3 +落花 01-004 NITE.mp3 +``` +"""]]
rename forum/notes_and_enhancements_git_annex_on_android.mdwn to forum/notes_and_enhancements_for_git_annex_on_android.mdwn
diff --git a/doc/forum/notes_and_enhancements_git_annex_on_android.mdwn b/doc/forum/notes_and_enhancements_for_git_annex_on_android.mdwn similarity index 100% rename from doc/forum/notes_and_enhancements_git_annex_on_android.mdwn rename to doc/forum/notes_and_enhancements_for_git_annex_on_android.mdwn
diff --git a/doc/forum/notes_and_enhancements_git_annex_on_android.mdwn b/doc/forum/notes_and_enhancements_git_annex_on_android.mdwn new file mode 100644 index 0000000000..de6533fe7b --- /dev/null +++ b/doc/forum/notes_and_enhancements_git_annex_on_android.mdwn @@ -0,0 +1,31 @@ +I have the following setup: + +Git Annex on my laptop, Git Annex on my mobile phone via Termux and SSH. + +When I add my mobile phone using the web app ‘Adding a remote server using SSH’, it is not recognised. The reasons are: + +1: which is not installed. Git Annex apparently tries to check whether remote Git Annex is installed via ‘which git-annex’. I read this somewhere on the web, but I can't find the source right now. This problem can be easily solved: + + apt install which + +2: The Git Annex installation extends the PATH variable in the .profile file. The problem is, that this file is not evaluated when the following is executed: + + ssh -p 8022 user@host 'which git-annex' + +If we look at the openSSH documentation, it says: + +>When the user's identity has been accepted by the server, the server either executes the given command in a non-interactive session or, if no command has been specified, logs into the machine and gives the user a normal shell as an interactive session... + +Since no shell is started, .profile is not evaluated (my interpretation). I have now worked around this as follows: + + ln -s /data/data/com.termux/files/home/git-annex.linux/git-annex $PREFIX/bin/git-annex + ln -s /data/data/com.termux/files/home/git-annex.linux/git-annex-shell $PREFIX/bin/git-annex-shell + ln -s /data/data/com.termux/files/home/git-annex.linux/git-annex-webapp $PREFIX/bin/git-annex-webapp + +I don't know if this is a mistake, but I think it could help some people. Perhaps the git-annex-install script could be improved? + +Another suggestion from me: If the test carried out as follows: + + ssh -p 8022 hostname 'bash -l -c '\'which git-annex\''' + +Everything would works perfectly.
Added a comment
diff --git a/doc/todo/remove_webapp/comment_3_1ab23b587f60546a815070503528bb64._comment b/doc/todo/remove_webapp/comment_3_1ab23b587f60546a815070503528bb64._comment new file mode 100644 index 0000000000..aabb25f636 --- /dev/null +++ b/doc/todo/remove_webapp/comment_3_1ab23b587f60546a815070503528bb64._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="waldi5001" + avatar="http://cdn.libravatar.org/avatar/128c0f882560337aad72a15ab7ee3766" + subject="comment 3" + date="2025-09-08T08:47:56Z" + content=""" +Hello Joey, + +I agree completely with nobodyinperson and I can also understand why you want to retire the webapp. In my opinion there is one big advantage for the webapp: If you are a beginner, its so easy and a big help to create and configure your repositories. You do not know that much to start with git annex. No reading of the man pages, no understanding of git remotes and so on. + +I think some convenient scripts can help to create repositories, like + +git annex init client_repo +git annex init backup_repo +git annex init transfer_repo +"""]]
comments
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_12_39d94818a0e2f989da37953c1585161e._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_12_39d94818a0e2f989da37953c1585161e._comment new file mode 100644 index 0000000000..d2a9f896b9 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_12_39d94818a0e2f989da37953c1585161e._comment @@ -0,0 +1,35 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 12""" + date="2025-09-06T17:11:55Z" + content=""" +There are a few other races where a FD can leak to a child process still, +where `setFdOption` is used after dup()/pipe(). + +These don't involve files, so won't affect beegfs. They should still be +fixed. + +---- + +While dup3() and pipe2() allow setting the close-on-exec flag, +I don't think they're portable enough to rely on. + +It may be possible to rewrite these handful of things to avoid the problem: + +* processTranscript should be able to read chunks from stdout and stderr + concurrently and interleave, not needing a pipe +* Remote.Directory should be able to reuse the same handle passed to fileContentCopier + to get the FD for the postchecknoncow. +* gpg feedRead may be able to use stdin as the passphrase-fd. If gpg reads + one line for that, and then continues to read the rest of stdin for the + content to encrypt/decrypt. I have not checked if gpg behaves that way. +* Similar for StatelessOpenPGP + +Alternatively, it would be possible to solve all of these issues, as well +as the openTempFile race, by making the wrappers in Utility.Process prevent +starting a process when a global MVar is empty. And have a function that +runs an action with the MVar emptied. Then the call to dup/pipe could run +effectively atomically with the `setFdOption`. There would be a small +overhead in checking the MVar on each exec, but probably too small to be +noticable. +"""]] diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_13_e7346cc5c2946bf0e7bbea8001ebaf2f._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_13_e7346cc5c2946bf0e7bbea8001ebaf2f._comment new file mode 100644 index 0000000000..de6ff9ba1a --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_13_e7346cc5c2946bf0e7bbea8001ebaf2f._comment @@ -0,0 +1,21 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 13""" + date="2025-09-06T17:28:57Z" + content=""" +There is some potential for any of these FD leaks to compromise security, +in cases where a child process ends up running untrusted code in some kind +of sandbox, that is sufficiently leaky that the leaked FD is accessible +inside the sandbox. + +I don't think any existing special remote or other git-annex addons behave +that way, so don't think this is an exploitable security hole. Arguably, if +sandboxing untrusted code, it's on you to avoid exposing open Fds to it. + +However, since security is involved, it does need to be fixed comprehensively +in git-annex, including the remaining races. + +And, I have decided that this fix can't be tied to the OsPath flag being +set. It needs to be fixed when git-annex is built without that flag, or the +flag needs to go away. +"""]]
update
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment index c343275796..fb92f27cea 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment @@ -6,6 +6,7 @@ I've now converted *all* functions listed above to ones that set the close-on-exec flag. (When building with the OsPath flag.) -All that remains is the openTempFile race and checking the libraries in -comment #8. +And checked all the libraries in comment #8. + +All that remains is the openTempFile race. """]] diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment index 997300e64e..8e2bcc5d3a 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment @@ -20,9 +20,16 @@ with a file in the git-annex repo. Most dependencies of git-annex clearly don't open files there, and most open no files at all. Ones I need to check: -* persistent-sqlite -* feed (parseFeedFromFile uses openBinaryFile, updated git-annex to open +* persistent-sqlite (looks ok; no direct uses of problem haskell + functions. And in sqlite itself, `robust_open` sets the close-on-exec + flag) +* feed (update: parseFeedFromFile uses openBinaryFile, updated git-annex to open the file itself instead) -* concurrent-output (addOutputBuffer uses openTempFile; emitOutputBuffer uses T.readFile) +* concurrent-output (addOutputBuffer uses openTempFile; emitOutputBuffer + uses T.readFile. Probably neither actually triggers as git-annex uses the + library though. I have noted it in the todo for that library though.) * magic (update: checked it, it sets close-on-exec) + +At this point, I'm reasonably satisfied about libraries not causing +the problem. """]]
open feed file with close-on-exec bit set
parseFeedFromFile does not set the bit, so open and read the file
ourselves.
Versioned dependency on utf8-string should not cause any issues,
that version is available in all all versions of debian that package it.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
parseFeedFromFile does not set the bit, so open and read the file
ourselves.
Versioned dependency on utf8-string should not cause any issues,
that version is available in all all versions of debian that package it.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/Command/ImportFeed.hs b/Command/ImportFeed.hs index 7b66a2b507..e36e723702 100644 --- a/Command/ImportFeed.hs +++ b/Command/ImportFeed.hs @@ -1,6 +1,6 @@ {- git-annex command - - - Copyright 2013-2024 Joey Hess <id@joeyh.name> + - Copyright 2013-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -22,6 +22,7 @@ import Data.Time.Format import Data.Time.Calendar import Data.Time.LocalTime import Control.Concurrent.STM +import Codec.Binary.UTF8.String (decodeString) import qualified Data.Text as T import qualified Data.Text.Encoding as TE import qualified Data.ByteString as B @@ -48,6 +49,7 @@ import Logs.MetaData import Annex.MetaData import Annex.FileMatcher import Annex.UntrustedFilePath +import qualified Utility.FileIO as F import qualified Utility.RawFilePath as R import qualified Database.ImportFeed as Db @@ -158,19 +160,16 @@ getFeed o url st = | otherwise = get get = withTmpFile (literalOsPath "feed") $ \tmpf h -> do - let tmpf' = fromRawFilePath $ fromOsPath tmpf liftIO $ hClose h - ifM (downloadFeed url tmpf') - ( parse tmpf' + ifM (downloadFeed url tmpf) + ( parse tmpf , do recordfail next $ feedProblem url "downloading the feed failed" ) - -- Use parseFeedFromFile rather than reading the file - -- ourselves because it goes out of its way to handle encodings. - parse tmpf = liftIO (parseFeedFromFile tmpf) >>= \case + parse tmpf = liftIO (parseFeedFromFile' tmpf) >>= \case Nothing -> debugfeedcontent tmpf "parsing the feed failed" Just f -> do case decodeBS $ fromFeedText $ getFeedTitle f of @@ -183,7 +182,7 @@ getFeed o url st = next $ return True debugfeedcontent tmpf msg = do - feedcontent <- liftIO $ readFileString (toOsPath tmpf) + feedcontent <- liftIO $ readFileString tmpf fastDebug "Command.ImportFeed" $ unlines [ "start of feed content" , feedcontent @@ -265,11 +264,11 @@ findDownloads u f = catMaybes $ map mk (feedItems f) } {- Feeds change, so a feed download cannot be resumed. -} -downloadFeed :: URLString -> FilePath -> Annex Bool +downloadFeed :: URLString -> OsPath -> Annex Bool downloadFeed url f | Url.parseURIRelaxed url == Nothing = giveup "invalid feed url" | otherwise = Url.withUrlOptions Nothing $ - Url.download nullMeterUpdate Nothing url (toOsPath f) + Url.download nullMeterUpdate Nothing url f startDownload :: AddUnlockedMatcher -> ImportFeedOptions -> Cache -> TMVar Bool -> ToDownload -> CommandStart startDownload addunlockedmatcher opts cache cv todownload = case location todownload of @@ -645,3 +644,11 @@ feedState url = fromRepo $ gitAnnexFeedState $ fromUrl url Nothing False -} fromFeedText :: T.Text -> B.ByteString fromFeedText = TE.encodeUtf8 + +{- Like Test.Feed.parseFeedFromFile, but ensures the close-on-exec bit is + - set when opening the file. -} +parseFeedFromFile' :: OsPath -> IO (Maybe Feed) +parseFeedFromFile' fp = parseFeedString <$> utf8readfile fp + where + utf8readfile :: OsPath -> IO String + utf8readfile f = fmap decodeString (hGetContents =<< F.openBinaryFile f ReadMode) diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment index 8b8010e8c4..997300e64e 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment @@ -21,7 +21,8 @@ Most dependencies of git-annex clearly don't open files there, and most open no files at all. Ones I need to check: * persistent-sqlite -* feed (parseFeedFromFile) +* feed (parseFeedFromFile uses openBinaryFile, updated git-annex to open + the file itself instead) * concurrent-output (addOutputBuffer uses openTempFile; emitOutputBuffer uses T.readFile) -* magic +* magic (update: checked it, it sets close-on-exec) """]] diff --git a/git-annex.cabal b/git-annex.cabal index 8591e54d9f..af69ee14dc 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -206,7 +206,7 @@ custom-setup time (>= 1.9.1), directory (>= 1.2.7.0), async, - utf8-string, + utf8-string (>= 1.0.0), Cabal (< 4.0) Executable git-annex @@ -234,7 +234,7 @@ Executable git-annex IfElse, monad-logger (>= 0.3.10), free, - utf8-string, + utf8-string (>= 1.0.0), bytestring, text, sandi,
clarify
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment index f1e5d3895e..c343275796 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment @@ -4,7 +4,7 @@ date="2025-09-05T19:46:25Z" content=""" I've now converted *all* functions listed above to ones that set the -close-on-exec flag. +close-on-exec flag. (When building with the OsPath flag.) All that remains is the openTempFile race and checking the libraries in comment #8.
update
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment new file mode 100644 index 0000000000..f1e5d3895e --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_11_3fd6288ceff0ef378559a9a784510004._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2025-09-05T19:46:25Z" + content=""" +I've now converted *all* functions listed above to ones that set the +close-on-exec flag. + +All that remains is the openTempFile race and checking the libraries in +comment #8. +"""]]
update
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment index ab02c3b85b..f70670a24c 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment @@ -3,8 +3,8 @@ subject="""comment 9""" date="2025-09-05T15:15:03Z" content=""" -copyFile also leaks a FD. git-annex only uses it on Windows though, so -it does not need to be addressed for Beegfs. +copyFile also leaks a FD. git-annex only uses it on Windows though, so not +a problem. As far as general correctness goes, I've reported a bug upstream about it <https://github.com/haskell/directory/issues/203>, which I suspect they'll
update
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_10_74f2d24f21bdeb6a05dfbef5d558d950._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_10_74f2d24f21bdeb6a05dfbef5d558d950._comment new file mode 100644 index 0000000000..5144dce82a --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_10_74f2d24f21bdeb6a05dfbef5d558d950._comment @@ -0,0 +1,32 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 10""" + date="2025-09-05T17:42:02Z" + content=""" +I've made all OsPath operations in git-annex set the close-on-exec flag. + +This will largely fix the problem when git-annex is built with the OsPath build +flag. Which has now become default, but it's still possible for git-annex +to be built without that build flag. Check `git-annex version` for "OsPath" +to know if your git-annex was built with this. +I think it will probably pass the test suite on beegfs at this point. + +(However, the openTempFile implementation sets the flag only after opening +the file, so is still vulnerable to a race where the fd can leak to an +execed process. That could use more work, but at least is a quite +narrow race window.) + +Also the few ByteString readFile/writeFile calls were converted to the OsPath +operations. + +What remains to be done to fully fix this: + +* Checking haskell libraries as listed in comment #8 +* All the FilePath versions of readFile, writeFile, etc, still + don't set the close-on-exec flag. There are a few hundred such calls + still in of git-annex. Those will all have to be wrapped with versions + that do set the close-on-exec flag, or converted to the OsPath versions. + +So, this bug is now tied to [[OsPath conversion|todo/RawFilePath_conversion]] +in 2 ways. +"""]]
turn on OsPath build flag by default
It was already default in stack builds, now it is default in cabal
builds as well.
Add build warnings when git-annex is built without the OsPath build flag.
git-annex version: Report on whether it was built with the OsPath build flag.
Having the flag on by default was always the plan, and this is a good time to
make the change. A bit of added urgency comes from the close-on-exec leak
issue. Fixing that is going to need reimplentation of things like openFile.
Needing to reimplenment it twice is not very appealing, especially since the
FilePath version of it has an implementation that cannot be easily copied and
tweaked. If OsPath is on by default, I can start with only implementing
openFile for it, and fix the bug in that build. And perhaps avoid doing the
extra work that will later get thrown away when this transition finishes.
Note that at this point, Debian still needs to package file-io. Hopefully, they
will package it, rather than turning off the OsPath build flag.
It was already default in stack builds, now it is default in cabal
builds as well.
Add build warnings when git-annex is built without the OsPath build flag.
git-annex version: Report on whether it was built with the OsPath build flag.
Having the flag on by default was always the plan, and this is a good time to
make the change. A bit of added urgency comes from the close-on-exec leak
issue. Fixing that is going to need reimplentation of things like openFile.
Needing to reimplenment it twice is not very appealing, especially since the
FilePath version of it has an implementation that cannot be easily copied and
tweaked. If OsPath is on by default, I can start with only implementing
openFile for it, and fix the bug in that build. And perhaps avoid doing the
extra work that will later get thrown away when this transition finishes.
Note that at this point, Debian still needs to package file-io. Hopefully, they
will package it, rather than turning off the OsPath build flag.
diff --git a/BuildFlags.hs b/BuildFlags.hs index d273c216e4..8724b94e45 100644 --- a/BuildFlags.hs +++ b/BuildFlags.hs @@ -69,6 +69,11 @@ buildFlags = filter (not . null) , "Testsuite" , "S3" , "WebDAV" +#ifdef WITH_OSPATH + , "OsPath" +#else +#warning Building without the OsPath build flag set results in slower filename manipulation and is not recommended. +#endif ] -- Not a complete list, let alone a listing transitive deps, but only diff --git a/CHANGELOG b/CHANGELOG index 379d28a732..2297edecee 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -3,6 +3,10 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * drop: --fast support when dropping from a remote. * Fix crash operating on filenames that are exactly 21 bytes long and begin with a utf-8 character. + * git-annex.cabal: Turn on the OsPath build flag by default. + * Add build warnings when git-annex is built without the OsPath + build flag. + * version: Report on whether it was built with the OsPath build flag. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/doc/todo/RawFilePath_conversion.mdwn b/doc/todo/RawFilePath_conversion.mdwn index c10b62cf2e..b488353a60 100644 --- a/doc/todo/RawFilePath_conversion.mdwn +++ b/doc/todo/RawFilePath_conversion.mdwn @@ -12,10 +12,11 @@ But this conversion is not yet complete. This is a todo to keep track of the status. * The OsPath build flag makes git-annex build with OsPath. Otherwise, - it builds with RawFilePath. The plan is to make that build flag the - default where it is not already as time goes on. And then eventually - remove the build flag and simplify code in git-annex to not need to - support two different build methods. + it builds with RawFilePath. That build flag is now on by default, + and there is a build warning when it is not set. + +* The plan is to eventually remove the build flag and simplify code in + git-annex to not need to support two different build methods. * unix has modules that operate on RawFilePath but no OSPath versions yet. See https://github.com/haskell/unix/issues/240 diff --git a/git-annex.cabal b/git-annex.cabal index dda13f7071..484e94abfd 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -177,6 +177,7 @@ Flag Servant Flag OsPath Description: Use the os-string library and related libraries, for faster filename manipulation + Default: True Flag Benchmark Description: Enable benchmarking
update
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment index 61efde1467..ab02c3b85b 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment @@ -9,4 +9,7 @@ it does not need to be addressed for Beegfs. As far as general correctness goes, I've reported a bug upstream about it <https://github.com/haskell/directory/issues/203>, which I suspect they'll fix eventually since they have fixed similar problems before. + +Also filed <https://github.com/haskell/file-io/issues/44> which would avoid +git-annex needing to reimplement a lot of this stuff. """]]
copyFile
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment new file mode 100644 index 0000000000..61efde1467 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_9_2e36fbacbeb3347c941b108605b8bb59._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2025-09-05T15:15:03Z" + content=""" +copyFile also leaks a FD. git-annex only uses it on Windows though, so +it does not need to be addressed for Beegfs. + +As far as general correctness goes, I've reported a bug upstream about it +<https://github.com/haskell/directory/issues/203>, which I suspect they'll +fix eventually since they have fixed similar problems before. +"""]]
on libraries
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment index 1e95bdeab6..681ec6add1 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment @@ -19,8 +19,4 @@ inherit the close-on-exec flag. So it should be safe to just write new versions of all of those. Also there are a few uses of `openFd` that don't set CloseOnExec. - -There is also the problem that any haskell library that does anything -with a file might use any of the above internally without setting -close-on-exec. """]] diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment new file mode 100644 index 0000000000..8b8010e8c4 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_8_853bf59715ac755d046b54b282eaac7c._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2025-09-04T20:56:54Z" + content=""" +There is also the problem that any haskell library that does anything +with a file might use any of the above internally without setting +close-on-exec. + +For example, opening a https connection can result in readFile opening a +handle to a file in /etc/ssl/certs/, which will not be closed on exec. And +which can leak out via another thread doing an exec at just the right time. + +But inheriting a single FD like that is not going to cause problems for beegfs +or anything else. + +The ones I'd worry about is if a haskell library is doing something +with a file in the git-annex repo. + +Most dependencies of git-annex clearly don't open files there, and most open no +files at all. Ones I need to check: + +* persistent-sqlite +* feed (parseFeedFromFile) +* concurrent-output (addOutputBuffer uses openTempFile; emitOutputBuffer uses T.readFile) +* magic +"""]]
format
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment index 8ba69239de..1e95bdeab6 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment @@ -5,7 +5,7 @@ content=""" `openFile` is not the only one that would need to be dealt with. Also `withFile`, `openBinaryFile`, `withBinaryFile`, `appendFile`, -and `openTempFile`, `readFile, and `writeFile` (including `L.` versions). +and `openTempFile`, `readFile`, and `writeFile` (including `L.` versions). Since none of those provide a way to set CloseOnExec, they would have to be changed to be implemented using `openFd` with CloseOnExec, and
fix format
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment index c2353a4f68..5e64d78dc2 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment @@ -65,4 +65,4 @@ I don't care a great deal about supporting Beegfs; it would be nice to support it in some of its less crazy configurations if possible. But not leaking FDs while running child processes seems like something that ought to be fixed for other reasons. -"""] +"""]]
Revert "try to fix format issue on website"
This reverts commit 764b47d7d49ee13460565f59774c1aded665790f.
This reverts commit 764b47d7d49ee13460565f59774c1aded665790f.
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment index c76456bf20..c2353a4f68 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment @@ -14,7 +14,7 @@ In that report I hypothesized that Beegfs might not like an open file to be renamed. I'm not sure if we ever verified my fixes in that one fixed a problem with Beegfs, but it still seems like a good hypothesis. -`export.ex/<uuid>` is a log file that git-annex uses to keep track of files +"export.ex/<uuid>" is a log file that git-annex uses to keep track of files that were part of a tree exported to a special remote, but that were excluded from the export by its preferred content settings. To populate that file, git-annex opens a temp file, writes to it as the export runs, then closes it @@ -25,17 +25,37 @@ and renames it. close(14) = 0 rename(".git/annex/othertmp/cfd9e482-a5cc-42.0/cfd9e482-a5cc-42", ".git/annex/export.ex/cfd9e482-a5cc-4277-8ec1-954c5e95060f") = 0 -If the rename() fails, it falls back to trying mv, which is why -there are also mv errors in the transcript above. Anyway, I've verified +If the rename() fails, it falls back to trying "mv", which is why +there are also "mv" errors in the transcript above. Anyway, I've verified the FD is closed before that point. But, the "..." includes some fork and exec. And this FD is never set close-on-exec! And the processes started while it's open include -`git cat-file --batch`, which is a long-running process that will +"git cat-file --batch", which is a long-running process that will still be left running when the rename happens. This was pretty surprising to me, I did not realize git-annex was generally -leaking FDs to child processes in this way. +leaking FDs to child processes in this way. It's easy to demonstrate with +a simpler program: + + joey@darkstar:~>cat >foo.hs <<EOF + import System.Process + import System.IO + + main = do + h <- openFile "foo.x" WriteMode + hPutStrLn h "hello" + callProcess "sh" ["-c", "ls -l /proc/self/fd"] + hClose h + EOF + joey@darkstar:~>runghc foo.hs + total 0 + lrwx------ 1 joey joey 64 Sep 4 14:11 0 -> /dev/pts/8 + lrwx------ 1 joey joey 64 Sep 4 14:11 1 -> /dev/pts/8 + l-wx------ 1 joey joey 64 Sep 4 14:11 11 -> /dev/tty + l-wx------ 1 joey joey 64 Sep 4 14:11 12 -> /home/joey/foo.x + lrwx------ 1 joey joey 64 Sep 4 14:11 2 -> /dev/pts/8 + lr-x------ 1 joey joey 64 Sep 4 14:11 3 -> /proc/516659/fd So, really supporting this would mean auditing every file git-annex opens with openFile to see if the handle is ever passed to a child process,
try to fix format issue on website
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment index c2353a4f68..c76456bf20 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment @@ -14,7 +14,7 @@ In that report I hypothesized that Beegfs might not like an open file to be renamed. I'm not sure if we ever verified my fixes in that one fixed a problem with Beegfs, but it still seems like a good hypothesis. -"export.ex/<uuid>" is a log file that git-annex uses to keep track of files +`export.ex/<uuid>` is a log file that git-annex uses to keep track of files that were part of a tree exported to a special remote, but that were excluded from the export by its preferred content settings. To populate that file, git-annex opens a temp file, writes to it as the export runs, then closes it @@ -25,37 +25,17 @@ and renames it. close(14) = 0 rename(".git/annex/othertmp/cfd9e482-a5cc-42.0/cfd9e482-a5cc-42", ".git/annex/export.ex/cfd9e482-a5cc-4277-8ec1-954c5e95060f") = 0 -If the rename() fails, it falls back to trying "mv", which is why -there are also "mv" errors in the transcript above. Anyway, I've verified +If the rename() fails, it falls back to trying mv, which is why +there are also mv errors in the transcript above. Anyway, I've verified the FD is closed before that point. But, the "..." includes some fork and exec. And this FD is never set close-on-exec! And the processes started while it's open include -"git cat-file --batch", which is a long-running process that will +`git cat-file --batch`, which is a long-running process that will still be left running when the rename happens. This was pretty surprising to me, I did not realize git-annex was generally -leaking FDs to child processes in this way. It's easy to demonstrate with -a simpler program: - - joey@darkstar:~>cat >foo.hs <<EOF - import System.Process - import System.IO - - main = do - h <- openFile "foo.x" WriteMode - hPutStrLn h "hello" - callProcess "sh" ["-c", "ls -l /proc/self/fd"] - hClose h - EOF - joey@darkstar:~>runghc foo.hs - total 0 - lrwx------ 1 joey joey 64 Sep 4 14:11 0 -> /dev/pts/8 - lrwx------ 1 joey joey 64 Sep 4 14:11 1 -> /dev/pts/8 - l-wx------ 1 joey joey 64 Sep 4 14:11 11 -> /dev/tty - l-wx------ 1 joey joey 64 Sep 4 14:11 12 -> /home/joey/foo.x - lrwx------ 1 joey joey 64 Sep 4 14:11 2 -> /dev/pts/8 - lr-x------ 1 joey joey 64 Sep 4 14:11 3 -> /proc/516659/fd +leaking FDs to child processes in this way. So, really supporting this would mean auditing every file git-annex opens with openFile to see if the handle is ever passed to a child process,
more
diff --git a/Annex/Link.hs b/Annex/Link.hs index 6cef911a11..480a00ce25 100644 --- a/Annex/Link.hs +++ b/Annex/Link.hs @@ -36,6 +36,7 @@ import Utility.FileMode import Utility.InodeCache import Utility.Tmp.Dir import Utility.CopyFile +import Utility.OpenFd import qualified Database.Keys.Handle import qualified Utility.RawFilePath as R import qualified Utility.FileIO as F @@ -447,8 +448,9 @@ isPointerFile f = catchDefaultIO Nothing $ #else #if MIN_VERSION_unix(2,8,0) let open = do - fd <- openFd (fromOsPath f) ReadOnly - (defaultFileFlags { nofollow = True, cloexec = True }) + fd <- openFdWithMode (fromOsPath f) ReadOnly Nothing + (defaultFileFlags { nofollow = True }) + (CloseOnExecFlag True) fdToHandle fd in bracket open hClose readhandle #else diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment index 4d33caa11d..8ba69239de 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment @@ -3,11 +3,14 @@ subject="""comment 6""" date="2025-09-04T18:24:33Z" content=""" -Note that `openFile` is not the only one that would need to be dealt with. -Also `withFile`, `openBinaryFile`, and `withBinaryFile`. +`openFile` is not the only one that would need to be dealt with. +Also `withFile`, `openBinaryFile`, `withBinaryFile`, `appendFile`, +and `openTempFile`, `readFile, and `writeFile` (including `L.` versions). -And, since none of those provide a way to set CloseOnExec, they would have -to be changed to use `openFd` with CloseOnExec, and then mkHandleFromFD. +Since none of those provide a way to set CloseOnExec, they would have +to be changed to be implemented using `openFd` with CloseOnExec, and +then mkHandleFromFD. Or rewritten to use +<https://hackage.haskell.org/package/file-io-0.1.5/docs/src/System.File.OsPath.Internal.html#openFileWithCloseOnExec> I have checked and none of those are ever used to create a handle that is intentionally passed to a child process. The only uses of `handleToFd` @@ -17,5 +20,7 @@ versions of all of those. Also there are a few uses of `openFd` that don't set CloseOnExec. -And possibly also some libraries might open files, I don't know. +There is also the problem that any haskell library that does anything +with a file might use any of the above internally without setting +close-on-exec. """]]
added project
diff --git a/doc/bugs/35_failed_tests_on_beegfs.mdwn b/doc/bugs/35_failed_tests_on_beegfs.mdwn index 4a443e258a..28e1babc48 100644 --- a/doc/bugs/35_failed_tests_on_beegfs.mdwn +++ b/doc/bugs/35_failed_tests_on_beegfs.mdwn @@ -2,7 +2,7 @@ links: [prior report/fix of testing on beegfs 4 years ago; different site/version](https://git-annex.branchable.com/projects/dandi/bugs-done/beegfs__58___init_tests_FAIL_resource_busy/) -Currently I observed 35 tests failing +Currently I observed 35 tests failing (now at a potential repronim reprocenter location) ``` yarick@ducky:/data/mri_dicom/tmp/test-git-annex @@ -79,3 +79,5 @@ upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +[[!meta author=yoh]] +[[!tag projects/repronim]]
audit all openFd and dupping for close-on-exec
Made all uses of openFd and dup set the close-on-exec flag, with a few
exceptions when starting a git-annex daemon.
Made openFdWithMode be used everywhere, rather than openFd.
Adding a new parameter to it ensures I checked everything.
And will help to make sure this gets considered in the future when
opening fds.
In lockPidFile, the only thing that keeps the pid file locked, once
daemonize re-runs the command in a new session, is that the fd is
inherited.
In Utility.LogFile.redir, the new fd it dups to does not have the
close-on-exec flag set, because this is used to set up the stdout and
stderr fds, which need to be inherited by child processes.
Same in Assistant.startDaemon where the browser gets started with the
original stdout and stderr.
This does nothing about uses of openFile and similar!
Sponsored-By: mycroft
Made all uses of openFd and dup set the close-on-exec flag, with a few
exceptions when starting a git-annex daemon.
Made openFdWithMode be used everywhere, rather than openFd.
Adding a new parameter to it ensures I checked everything.
And will help to make sure this gets considered in the future when
opening fds.
In lockPidFile, the only thing that keeps the pid file locked, once
daemonize re-runs the command in a new session, is that the fd is
inherited.
In Utility.LogFile.redir, the new fd it dups to does not have the
close-on-exec flag set, because this is used to set up the stdout and
stderr fds, which need to be inherited by child processes.
Same in Assistant.startDaemon where the browser gets started with the
original stdout and stderr.
This does nothing about uses of openFile and similar!
Sponsored-By: mycroft
diff --git a/Annex/Link.hs b/Annex/Link.hs index 55cfc354e5..6cef911a11 100644 --- a/Annex/Link.hs +++ b/Annex/Link.hs @@ -448,7 +448,7 @@ isPointerFile f = catchDefaultIO Nothing $ #if MIN_VERSION_unix(2,8,0) let open = do fd <- openFd (fromOsPath f) ReadOnly - (defaultFileFlags { nofollow = True }) + (defaultFileFlags { nofollow = True, cloexec = True }) fdToHandle fd in bracket open hClose readhandle #else diff --git a/Command/RemoteDaemon.hs b/Command/RemoteDaemon.hs index 8c3226d05e..e41681ab7d 100644 --- a/Command/RemoteDaemon.hs +++ b/Command/RemoteDaemon.hs @@ -31,7 +31,9 @@ run o #ifndef mingw32_HOST_OS git_annex <- fromOsPath <$> liftIO programPath ps <- gitAnnexDaemonizeParams - let logfd = openFdWithMode (toRawFilePath "/dev/null") ReadOnly Nothing defaultFileFlags + let logfd = openFdWithMode (toRawFilePath "/dev/null") ReadOnly Nothing + defaultFileFlags + (CloseOnExecFlag True) liftIO $ daemonize git_annex ps logfd Nothing False runNonInteractive #else liftIO $ foreground Nothing runNonInteractive diff --git a/Git/LockFile.hs b/Git/LockFile.hs index 9d9042f453..946cf6d05c 100644 --- a/Git/LockFile.hs +++ b/Git/LockFile.hs @@ -53,12 +53,7 @@ openLock' lck = do #ifndef mingw32_HOST_OS -- On unix, git simply uses O_EXCL h <- openFdWithMode (fromOsPath lck) ReadWrite (Just 0O666) -#if MIN_VERSION_unix(2,8,0) - (defaultFileFlags { exclusive = True, cloexec = True }) -#else - (defaultFileFlags { exclusive = True }) - setFdOption h CloseOnExec True -#endif + (defaultFileFlags { exclusive = True }) (CloseOnExecFlag True) #else -- It's not entirely clear how git manages locking on Windows, -- since it's buried in the portability layer, and different diff --git a/Remote/Directory.hs b/Remote/Directory.hs index da97d06c03..75ec9b09cd 100644 --- a/Remote/Directory.hs +++ b/Remote/Directory.hs @@ -471,9 +471,11 @@ retrieveExportWithContentIdentifierM ii dir cow loc cids dest gk p = docopynoncow iv = do #ifndef mingw32_HOST_OS let open = do + fd <- openFdWithMode f' ReadOnly Nothing + defaultFileFlags (CloseOnExecFlag True) -- Need a duplicate fd for the post check. - fd <- openFdWithMode f' ReadOnly Nothing defaultFileFlags dupfd <- dup fd + setFdOption dupfd CloseOnExec True h <- fdToHandle fd return (h, dupfd) let close (h, dupfd) = do diff --git a/Utility/Daemon.hs b/Utility/Daemon.hs index 6d5ea6c0bf..a95503f0d2 100644 --- a/Utility/Daemon.hs +++ b/Utility/Daemon.hs @@ -52,7 +52,8 @@ daemonize cmd params openlogfd pidfile changedirectory a = do maybe noop lockPidFile pidfile a _ -> do - nullfd <- openFdWithMode (toRawFilePath "/dev/null") ReadOnly Nothing defaultFileFlags + nullfd <- openFdWithMode (toRawFilePath "/dev/null") ReadOnly Nothing defaultFileFlags + (CloseOnExecFlag True) redir nullfd stdInput redirLog =<< openlogfd environ <- getEnvironment @@ -91,7 +92,8 @@ foreground pidfile a = do #endif {- Locks the pid file, with an exclusive, non-blocking lock, - - and leaves it locked on return. + - and leaves it locked on return. The lock file is not closed on exec, so + - when daemonize runs the process again, it inherits it. - - Writes the pid to the file, fully atomically. - Fails if the pid file is already locked by another process. -} @@ -99,9 +101,11 @@ lockPidFile :: OsPath -> IO () lockPidFile pidfile = do #ifndef mingw32_HOST_OS fd <- openFdWithMode (fromOsPath pidfile) ReadWrite (Just stdFileMode) defaultFileFlags + (CloseOnExecFlag False) locked <- catchMaybeIO $ setLock fd (WriteLock, AbsoluteSeek, 0, 0) - fd' <- openFdWithMode (fromOsPath newfile) ReadWrite (Just stdFileMode) defaultFileFlags - { trunc = True } + fd' <- openFdWithMode (fromOsPath newfile) ReadWrite (Just stdFileMode) + (defaultFileFlags { trunc = True }) + (CloseOnExecFlag True) locked' <- catchMaybeIO $ setLock fd' (WriteLock, AbsoluteSeek, 0, 0) case (locked, locked') of (Nothing, _) -> alreadyRunning @@ -135,7 +139,10 @@ checkDaemon :: OsPath -> IO (Maybe PID) checkDaemon pidfile = bracket setup cleanup go where setup = catchMaybeIO $ - openFdWithMode (fromOsPath pidfile) ReadOnly (Just stdFileMode) defaultFileFlags + openFdWithMode (fromOsPath pidfile) ReadOnly + (Just stdFileMode) + defaultFileFlags + (CloseOnExecFlag True) cleanup (Just fd) = closeFd fd cleanup Nothing = return () go (Just fd) = catchDefaultIO Nothing $ do diff --git a/Utility/DirWatcher/Kqueue.hs b/Utility/DirWatcher/Kqueue.hs index b793eee58b..eb57b09334 100644 --- a/Utility/DirWatcher/Kqueue.hs +++ b/Utility/DirWatcher/Kqueue.hs @@ -111,7 +111,9 @@ scanRecursive topdir prune = M.fromList <$> walk [] [topdir] Nothing -> walk c rest Just info -> do mfd <- catchMaybeIO $ - openFdWithMode (toRawFilePath dir) Posix.ReadOnly Nothing Posix.defaultFileFlags + openFdWithMode (toRawFilePath dir) Posix.ReadOnly Nothing + Posix.defaultFileFlags + (CloseOnExecFlag True) case mfd of Nothing -> walk c rest Just fd -> do diff --git a/Utility/LockFile/PidLock.hs b/Utility/LockFile/PidLock.hs index 7a08f67c58..2c480a354d 100644 --- a/Utility/LockFile/PidLock.hs +++ b/Utility/LockFile/PidLock.hs @@ -210,7 +210,8 @@ linkToLock (Just _) src dest = do let setup = do fd <- openFdWithMode dest' WriteOnly (Just $ combineModes readModes) - (defaultFileFlags {exclusive = True}) + (defaultFileFlags { exclusive = True }) + (CloseOnExecFlag True) fdToHandle fd let cleanup = hClose let go h = readFile (fromOsPath src) >>= hPutStr h diff --git a/Utility/LockFile/Posix.hs b/Utility/LockFile/Posix.hs index 3ad3554a8b..5249acca9d 100644 --- a/Utility/LockFile/Posix.hs +++ b/Utility/LockFile/Posix.hs @@ -5,8 +5,6 @@ - License: BSD-2-clause -} -{-# LANGUAGE CPP #-} - module Utility.LockFile.Posix ( LockHandle, lockShared, @@ -76,16 +74,10 @@ tryLock lockreq mode lockfile = uninterruptibleMask_ $ do -- Close on exec flag is set so child processes do not inherit the lock. openLockFile :: LockRequest -> Maybe ModeSetter -> LockFile -> IO Fd -openLockFile lockreq filemode lockfile = do - l <- applyModeSetter filemode lockfile $ \filemode' -> - openFdWithMode (fromOsPath lockfile) openfor filemode' $ -#if MIN_VERSION_unix(2,8,0) - defaultFileFlags { cloexec = True } -#else - defaultFileFlags - setFdOption l CloseOnExec True -#endif - return l +openLockFile lockreq filemode lockfile = + applyModeSetter filemode lockfile $ \filemode' -> + openFdWithMode (fromOsPath lockfile) openfor filemode' + defaultFileFlags (CloseOnExecFlag True) where openfor = case lockreq of ReadLock -> ReadOnly diff --git a/Utility/OpenFd.hs b/Utility/OpenFd.hs index 17be54e016..95f18085a6 100644 --- a/Utility/OpenFd.hs +++ b/Utility/OpenFd.hs @@ -1,6 +1,6 @@ {- openFd wrapper to support old versions of unix package. - - - Copyright 2023 Joey Hess <id@joeyh.name> + - Copyright 2023-2025 Joey Hess <id@joeyh.name> - - License: BSD-2-clause -} @@ -17,12 +17,17 @@ import System.Posix.Types import Utility.RawFilePath -openFdWithMode :: RawFilePath -> OpenMode -> Maybe FileMode -> OpenFileFlags -> IO Fd +newtype CloseOnExecFlag = CloseOnExecFlag Bool + +openFdWithMode :: RawFilePath -> OpenMode -> Maybe FileMode -> OpenFileFlags -> CloseOnExecFlag -> IO Fd +openFdWithMode f openmode filemode flags (CloseOnExecFlag closeonexec) = do #if MIN_VERSION_unix(2,8,0) -openFdWithMode f openmode filemode flags = - openFd f openmode (flags { creat = filemode }) (Diff truncated)
analysis
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment index 47ad7d8ff9..c2353a4f68 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment @@ -59,9 +59,7 @@ a simpler program: So, really supporting this would mean auditing every file git-annex opens with openFile to see if the handle is ever passed to a child process, -and otherwise making it use CloseOnExec. Probably openFile is never actually used to -send a handle to a child process, so a version that just sets CloseOnExec could be -written and switched to. +and otherwise making it use CloseOnExec. I don't care a great deal about supporting Beegfs; it would be nice to support it in some of its less crazy configurations if possible. But not leaking FDs diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment new file mode 100644 index 0000000000..4d33caa11d --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_6_172a92bf49be25355dda3f88b377a6f4._comment @@ -0,0 +1,21 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2025-09-04T18:24:33Z" + content=""" +Note that `openFile` is not the only one that would need to be dealt with. +Also `withFile`, `openBinaryFile`, and `withBinaryFile`. + +And, since none of those provide a way to set CloseOnExec, they would have +to be changed to use `openFd` with CloseOnExec, and then mkHandleFromFD. + +I have checked and none of those are ever used to create a handle that is +intentionally passed to a child process. The only uses of `handleToFd` +result in a FD that gets dupped to another FD number, and dup() does not +inherit the close-on-exec flag. So it should be safe to just write new +versions of all of those. + +Also there are a few uses of `openFd` that don't set CloseOnExec. + +And possibly also some libraries might open files, I don't know. +"""]]
analysis
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment new file mode 100644 index 0000000000..47ad7d8ff9 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_5_ea2368e228753931099084e634c7bbca._comment @@ -0,0 +1,70 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-09-04T17:37:56Z" + content=""" +All the fails now look like this: + + mv: cannot move '.git/annex/othertmp/e3d80b70-6bfe-47.0/e3d80b70-6bfe-47' to '.git/annex/export.ex/e3d80b70-6bfe-47a1-830396-0-22d0f933': Device or resource busy + mv: cannot move '.git/annex/othertmp/e3d80b70-6bfe-47.0/e3d80b70-6bfe-47' to '.git/annex/export.ex/e3d80b70-6bfe-47a1-830396-1-22dea399': Device or resource busy + git-annex: renamePath:rename '.git/annex/othertmp/e3d80b70-6bfe-47.0/e3d80b70-6bfe-47' to '.git/annex/export.ex/e3d80b70-6bfe-47a1-8288-cf07f7e8bd7d': resource busy (Device or resource busy) + +This is the same kind of EBUSY problem as on the previous Beegfs bug report. +In that report I hypothesized that Beegfs might not like an open file to be +renamed. I'm not sure if we ever verified my fixes in that one fixed a +problem with Beegfs, but it still seems like a good hypothesis. + +"export.ex/<uuid>" is a log file that git-annex uses to keep track of files +that were part of a tree exported to a special remote, but that were excluded +from the export by its preferred content settings. To populate that file, +git-annex opens a temp file, writes to it as the export runs, then closes it +and renames it. + + openat(AT_FDCWD, ".git/annex/othertmp/cfd9e482-a5cc-42.0/cfd9e482-a5cc-42", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 14 + ... + close(14) = 0 + rename(".git/annex/othertmp/cfd9e482-a5cc-42.0/cfd9e482-a5cc-42", ".git/annex/export.ex/cfd9e482-a5cc-4277-8ec1-954c5e95060f") = 0 + +If the rename() fails, it falls back to trying "mv", which is why +there are also "mv" errors in the transcript above. Anyway, I've verified +the FD is closed before that point. + +But, the "..." includes some fork and exec. And this FD is never set +close-on-exec! And the processes started while it's open include +"git cat-file --batch", which is a long-running process that will +still be left running when the rename happens. + +This was pretty surprising to me, I did not realize git-annex was generally +leaking FDs to child processes in this way. It's easy to demonstrate with +a simpler program: + + joey@darkstar:~>cat >foo.hs <<EOF + import System.Process + import System.IO + + main = do + h <- openFile "foo.x" WriteMode + hPutStrLn h "hello" + callProcess "sh" ["-c", "ls -l /proc/self/fd"] + hClose h + EOF + joey@darkstar:~>runghc foo.hs + total 0 + lrwx------ 1 joey joey 64 Sep 4 14:11 0 -> /dev/pts/8 + lrwx------ 1 joey joey 64 Sep 4 14:11 1 -> /dev/pts/8 + l-wx------ 1 joey joey 64 Sep 4 14:11 11 -> /dev/tty + l-wx------ 1 joey joey 64 Sep 4 14:11 12 -> /home/joey/foo.x + lrwx------ 1 joey joey 64 Sep 4 14:11 2 -> /dev/pts/8 + lr-x------ 1 joey joey 64 Sep 4 14:11 3 -> /proc/516659/fd + +So, really supporting this would mean auditing every file git-annex +opens with openFile to see if the handle is ever passed to a child process, +and otherwise making it use CloseOnExec. Probably openFile is never actually used to +send a handle to a child process, so a version that just sets CloseOnExec could be +written and switched to. + +I don't care a great deal about supporting Beegfs; it would be nice to support +it in some of its less crazy configurations if possible. But not leaking FDs +while running child processes seems like something that ought to be fixed for +other reasons. +"""]
comment
diff --git a/doc/bugs/exporttree_exports_plain_git_files/comment_1_c6fdf8e60054367d12aad1d1fb131ce4._comment b/doc/bugs/exporttree_exports_plain_git_files/comment_1_c6fdf8e60054367d12aad1d1fb131ce4._comment new file mode 100644 index 0000000000..d68a25f229 --- /dev/null +++ b/doc/bugs/exporttree_exports_plain_git_files/comment_1_c6fdf8e60054367d12aad1d1fb131ce4._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-04T17:14:38Z" + content=""" +If I recall correctly, export initially only exported annexed files, but +that was very surprising behavior to some people who use a mixture of +files. + +And this is documented behavior: + +> Any files in the treeish that are stored on git will also be exported to +> the special remote. + +The docs also say that the preferred content expression allows exluding +annexed files, not that it does anything for non-annexed files. + +So I don't see how this is a bug. At most, it might be a feature request +to have a way to exclude non-annexed files from the export. + +(I was curious if a preferred content of "not inbackend=GIT" happened to work, +since internally these files are treated as a "GIT" backend type. It happens +not to work as currently implemented. It could be made to work pretty easily, +but "GIT" is currently an undocumented implementation detail and I don't know +if I want to expose it as part of the interface.) + +Also, consider that import works symmetrically to export, and so if such a +feature were implemented, it would make import skip files that annex.largefiles +does not match. +"""]]
comment
diff --git a/doc/install/rpm_standalone/comment_7_9f06e6cafec15957d2561f2b5305a0f2._comment b/doc/install/rpm_standalone/comment_7_9f06e6cafec15957d2561f2b5305a0f2._comment new file mode 100644 index 0000000000..9cea389762 --- /dev/null +++ b/doc/install/rpm_standalone/comment_7_9f06e6cafec15957d2561f2b5305a0f2._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2025-09-04T17:13:14Z" + content=""" +Not sure when that got fixed, probably the most recent release, but it is +populated now. +"""]]
avoid relatedTemplate ever returning ""
add: Fix crash adding filenames that are exactly 21 bytes long and begin
with a utf-8 character.
Also longer filenames that start with "....." would cause the same crash.
I also audited for other calls to truncateFilePath that could truncate it
to "". Most use pathmax so are not a problem. Backend.Utilities.genKeyName
could possibly truncate it like that, but appends the md5 so would not be a
problem either.
Sponsored-by: Kevin Mueller
add: Fix crash adding filenames that are exactly 21 bytes long and begin
with a utf-8 character.
Also longer filenames that start with "....." would cause the same crash.
I also audited for other calls to truncateFilePath that could truncate it
to "". Most use pathmax so are not a problem. Backend.Utilities.genKeyName
could possibly truncate it like that, but appends the md5 so would not be a
problem either.
Sponsored-by: Kevin Mueller
diff --git a/CHANGELOG b/CHANGELOG index dfd780321a..978e86ec55 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,6 +1,8 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * drop: --fast support when dropping from a remote. + * add: Fix crash adding filenames that are exactly 21 bytes long and + begin with a utf-8 character. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/Utility/FileSystemEncoding.hs b/Utility/FileSystemEncoding.hs index d66d8a008c..2fb726f1fc 100644 --- a/Utility/FileSystemEncoding.hs +++ b/Utility/FileSystemEncoding.hs @@ -119,11 +119,14 @@ fromRawFilePath = decodeBS toRawFilePath :: FilePath -> RawFilePath toRawFilePath = encodeBS -{- Truncates a FilePath to the given number of bytes (or less), +{- Truncates a path to the given number of bytes (or less), - as represented on disk. - - Avoids returning an invalid part of a unicode byte sequence, at the - cost of efficiency when running on a large FilePath. + - + - Note that this may return ""! That can happen if it is asked to truncate + - to eg 1 byte, but the input path starts with a unicode byte sequence. -} truncateFilePath :: Int -> RawFilePath -> RawFilePath #ifndef mingw32_HOST_OS diff --git a/Utility/Tmp.hs b/Utility/Tmp.hs index f373ca6c1c..c47cdfcb0b 100644 --- a/Utility/Tmp.hs +++ b/Utility/Tmp.hs @@ -120,8 +120,11 @@ relatedTemplate' f - ending in ".", and others like VFAT don't allow a - filename to end with trailing whitespace, so avoid - truncating a filename to end that way. -} - B.dropWhileEnd disallowed $ + let p = B.dropWhileEnd disallowed $ truncateFilePath (len - templateAddedLength) f + in if B.null p + then "t" + else p | otherwise = f where len = B.length f diff --git a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn index 0dce8ad00c..59823cfbbd 100644 --- a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn +++ b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn @@ -93,3 +93,5 @@ Yes, git-annex has been fantastic for managing large datasets across multiple ma --- This issue appears to affect **all Cyrillic filenames**, not just the initially identified patterns, making the current version of git-annex barely usable for repositories containing non-Latin filenames. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names/comment_1_af33dfae3ccbc24f84c84612337b98bc._comment b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names/comment_1_af33dfae3ccbc24f84c84612337b98bc._comment new file mode 100644 index 0000000000..cd36ba1158 --- /dev/null +++ b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names/comment_1_af33dfae3ccbc24f84c84612337b98bc._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-04T16:00:49Z" + content=""" +Reproduced. Thank you for an excellent bug report. + +And it is the temp filename generation causing the problem. + + mkdir(".git/annex/othertmp/.0", 0777) = 0 + unlink(".git/annex/othertmp/.0") = -1 EISDIR (Is a directory) + symlink(".git/annex/objects/k8/wf/SHA256E-s3--98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4.md/SHA256E-s3--98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4.md", ".git/annex/othertmp/.0") = -1 EEXIST (File exists) + +The cause is that relatedTemplate is returning "", which is not something the code +is prepared for. That results in the ".0" directory name, and `".0" </> "" == ".0"` +so it uses the same path for the temp file as for the subdirectory. + +Not all cyrllic names are affected though. Only ones that are exactly 21 +bytes long. Longer or shorter are both ok. + +The reason is that relatedTemplate wants to reserve 20 bytes for the random +part of the temp filename. With a 21 byte filename, that means it wants to +truncate it to 1 byte. But it that lands in the middle of the first unicode +character, which is not allowed, so it truncates it to 0 bytes instead. + +I've fixed this bug. +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_4_56455e4a138b68fac3b21cc791ab87e7._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_4_56455e4a138b68fac3b21cc791ab87e7._comment new file mode 100644 index 0000000000..61e9faf841 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_4_56455e4a138b68fac3b21cc791ab87e7._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 4" + date="2025-09-04T02:06:22Z" + content=""" +admins tamed the beast down (disabled some \"optimization\" options which were enabled) and the beast started to behave better -- now just + +``` +yarick@ducky:/data/mri_dicom/tmp/test-git-annex +*$> grep FAIL .duct/logs/2025.09.03T16.49.11-104855_* | nl + 1 .duct/logs/2025.09.03T16.49.11-104855_stdout: git-remote-annex exporttree: FAIL (23.37s) + 2 .duct/logs/2025.09.03T16.49.11-104855_stdout: git-remote-annex exporttree: FAIL (14.14s) + 3 .duct/logs/2025.09.03T16.49.11-104855_stdout: export and import of subdir: FAIL (39.09s) + 4 .duct/logs/2025.09.03T16.49.11-104855_stdout: export and import of subdir: FAIL (22.57s) + 5 .duct/logs/2025.09.03T16.49.11-104855_stdout: export and import: FAIL (11.14s) + 6 .duct/logs/2025.09.03T16.49.11-104855_stdout: export and import: FAIL (28.20s) + 7 .duct/logs/2025.09.03T16.49.11-104855_stdout: export and import: FAIL (17.36s) + 8 .duct/logs/2025.09.03T16.49.11-104855_stdout: export and import of subdir: FAIL (24.52s) +``` + +fails. [Full log link](http://www.oneukrainian.com/tmp/2025.09.03T16.49.11-104855_stdout). That is still with the same 10.20250721-g8867e7590a3a70afa8a93d2fefab94adc9a176d0 +"""]]
diff --git a/doc/forum/How_to_export_subfolders_and_their_contents.mdwn b/doc/forum/How_to_export_subfolders_and_their_contents.mdwn new file mode 100644 index 0000000000..e85cd3df5e --- /dev/null +++ b/doc/forum/How_to_export_subfolders_and_their_contents.mdwn @@ -0,0 +1,11 @@ +Following the process explained in [[special_remotes/adb/]], I tried exporting a subfolder to the sdcard of my phone. + + git annex initremote android type=adb androiddirectory=/sdcard/Music encryption=none exporttree=yes + +The annex repository I am exporting from contains several subfolders with music albums. Using + + git annex export master:album --to=android + +results in the contents of the album folder going to the root Music folder on the remote, instead of in the Music/album folder. + +How do I create a tree that contains the subfolder itself too?
Added a comment: odd odd filesystem
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_3_4b2a63e6bb5bb4e35a2858361ffa917d._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_3_4b2a63e6bb5bb4e35a2858361ffa917d._comment new file mode 100644 index 0000000000..30a8d21122 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_3_4b2a63e6bb5bb4e35a2858361ffa917d._comment @@ -0,0 +1,67 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="odd odd filesystem" + date="2025-09-02T15:06:43Z" + content=""" +eh, it is indeed quite a f...un filesystem: even `chmod` might endup unhappy + +``` +chmod +w -R test-repo +chmod: changing permissions of 'test-repo/.git/annex/objects/91/9x/SHA256E-s4--b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c/SHA256E-s4--b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c': No such file or directory +chmod: changing permissions of 'test-repo/.git/annex/objects/g7/9v/SHA256E-s4--7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730/SHA256 +E-s4--7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730': No such file or directory +``` + +and it might indeed take minute(s) for filesytem to become \"consistent\" with expected state. And it seems that even a minute might not be enough!!! + +``` +ok +(recording state in git...) +(test_script.sh:14): +sleep 1m +(test_script.sh:14): +cat bar +cat: bar: No such file or directory +``` + +<details> +<summary>with this full tune up script</summary> + +``` +#!/bin/bash + +if [ -e test-repo ]; then + chmod +w -R test-repo; + rm -rf test-repo; +fi +mkdir test-repo; cd test-repo; git init; git annex init; + +echo foo > foo +git-annex add foo; cat foo +echo bar > bar +# git-annex add bar; sleep 1m; cat bar +git-annex add bar +start_time=$(date +%s) +while ! cat bar 2>/dev/null; do + sleep 1 + echo -n . +done +end_time=$(date +%s) +wait_time=$((end_time - start_time)) +cat bar +echo \"Waited ${wait_time} seconds for bar to appear\" +``` + +</details> + +it `Waited 450 seconds for bar to appear`... and on this funny system bash would report that file is available so symlink would not be broken to those tests: + +``` +$> test -e bar || echo fail +$> cat bar +cat: bar: No such file or directory +``` + +I will report to sysadmins -- may be they have ideas/feedback ;-) +"""]]
diff --git a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn index 58ea00edff..0dce8ad00c 100644 --- a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn +++ b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn @@ -56,6 +56,13 @@ In git-annex version 10.20250721, certain non-Latin filenames, specifically thos * "ВУП Авто .pptx" (Cyrillic with spaces) — **fails** * "Ачох\_кейс.dat" (Cyrillic with underscore and special characters) — **fails** +* ** Even Simple Non-Latin names**: + * пожелания.md — **fails** + * обучение.xlsx — **fails** + * Протокол.xlsx — **fails** + * Согласие.docx — **fails** + * Грейдинг.pptx — **fails** + * **Working Examples**: * "ИА\_2222.07.xlsx" (2-char Cyrillic prefix)
diff --git a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn index 166a72937b..58ea00edff 100644 --- a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn +++ b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn @@ -1,124 +1,88 @@ -# What version of git-annex are you using? On what operating system? -``` - git-annex version: 10.20250721 (broken) - OS: Manjaro Linux, ext4 filesystem - git config: core.quotepath=false -``` -Note: Same files work perfectly in git-annex 10.20220121 (tested on WSL Ubuntu). - -[[!format sh """ -Complete test showing the pattern: - -$ git init && git annex init -init ok -(recording state in git...) -Create test files - working examples: - -$ echo "test" > "ИА_2222.07.xlsx" # 2-char Cyrillic prefix - WORKS -$ echo "test" > "ЦППП_202206.xlsx" # no dot in date - WORKS -$ echo "test" > "ААА_55.22.xlsx" # different date format - WORKS -$ echo "test" > "IOIO_2222.07.xlsx" # Latin letters - WORKS -Create test files - failing examples: - -$ echo "test" > "ЦППП_2022.06.xlsx" # 4-char prefix + YYYY.MM - FAILS -$ echo "test" > "ИАИА_2222.07.xlsx" # 4-char prefix + YYYY.MM - FAILS - -$ git annex add *.xlsx -add ААА_55.22.xlsx ok -add IOIO_2222.07.xlsx ok -add ИА_2222.07.xlsx ok -add ЦППП_202206.xlsx ok -add ЦППП_2022.06.xlsx -git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists) -failed -add ИАИА_2222.07.xlsx -git-annex: .git/annex/othertmp/.1: createSymbolicLink: already exists (File exists) -failed -add: 2 failed - -$ git annex status -A ./ААА_55.22.xlsx -A ./IOIO_2222.07.xlsx -A ./ИА_2222.07.xlsx -A ./ЦППП_202206.xlsx -? ./ИАИА_2222.07.xlsx -? ./ЦППП_2022.06.xlsx -Debug output shows escaped Cyrillic conversion: - -$ git annex --debug whereis "ЦППП_2022.06.xlsx" 2>&1 | grep ls-files -[...] git [...] ls-files [...] "\1062\1055\1055\1055_2022.06.xlsx" -For files that were added successfully, unlock also fails: - -$ git annex unlock "ЦППП_2022.06.xlsx" # if we force-add it first -mv: cannot overwrite non-directory './ЦП72447-0' with directory '../.git/annex/othertmp/.22' -git-annex: ../.git/annex/othertmp/.22/SHA256E-s...: removeDirectoryRecursive: permission denied (Permission denied) -failed -Workaround - add special character: - -$ mv "ЦППП_2022.06.xlsx" "ЦППП_2022.06—.xlsx" # em-dash -$ git annex add "ЦППП_2022.06—.xlsx" -add ЦППП_2022.06—.xlsx ok -End of transcript. - -"""]] - -Root cause: The temp filename generation algorithm appears to create conflicts when processing escaped Cyrillic sequences (\1062\1055\1055\1055) for filenames with 4+ character prefixes followed by YYYY.MM date patterns. It tries to create temp names like ЦП{PID}-{counter} which conflict with existing operations. - -# Workarounds found: - Shorten Cyrillic prefix to 2-3 characters - Remove dots from dates (ЦППП_202206.xlsx) - Add special characters (ЦППП_2022.06—.xlsx) - Use different date separators (ЦППП_2022-06.xlsx) - -# Have you had any luck using git-annex before? - -Absolutely! git-annex has been fantastic for managing large datasets across multiple machines. The same repository works perfectly with the older version (10.20220121) on Ubuntu WSL, and I've been using git-annex successfully for years. This appears to be a regression in the newer version, but the tool itself remains incredibly valuable for distributed file management. Thanks for all the great work on this project! - -# UPDATE: Problem scope is much wider than initially reported - -After comprehensive testing across a large repository, the issue affects ALL Cyrillic filenames, not just the specific 4-character prefix + YYYY.MM pattern initially reported. -Expanded problem scope - -ALL of these Cyrillic filename patterns fail: - -Simple Cyrillic names: -``` - пожелания.md - обучение.xlsx - Протокол.xlsx - Согласие.docx - Грейдинг.pptx -``` - -Names with numbers/dashes: -``` - ДПК_2021.06-2.xlsx - Скрипты_3.xlsx - РТ МВНП v1.docx - РТ МВНП v2.docx -``` -Names with spaces: -``` - ВУП Авто .pptx - Ваш юрист.pdf -``` - -Names with underscores/special chars: -``` - ВУП_видео.mp4 - Ачох_кейс.dat -``` - -Various file extensions affected: -``` - .docx, .pptx, .xlsx (originally reported) - .md, .pdf, .mp4, .dat (newly discovered) -``` - -Originally reported YYYY.MM pattern (confirmed): -``` - ЦППП_2022.01.xlsx, ЦППП_2022.02.xlsx, etc. -``` - -Working pattern: Latin-only filenames work fine. Some non-latin works some not. -This regression can affects ANY non latin filename, making git-annex 10.20250721 essentially barely usable for repositories containing non-latin filenames. +### Please describe the problem. + +In git-annex version 10.20250721, certain non-Latin filenames, specifically those with Cyrillic characters, fail to be added, unlocked, or adjusted in repositories. The issue affects a range of filename patterns, including simple Cyrillic names, names with numbers, dashes, spaces, or special characters, and files with various extensions. This problem appears to be a regression in this version, as the same repository works perfectly with git-annex version 10.20220121. + +### What steps will reproduce the problem? + +1. Create a new git repository and initialize git-annex: + + ```sh + git init + git annex init + ``` + +2. Create test files with different Cyrillic filename patterns (both working and failing examples): + + ```sh + echo "test" > "ИА_2222.07.xlsx" # 2-char Cyrillic prefix - WORKS + echo "test" > "ЦППП_202206.xlsx" # no dot in date - WORKS + echo "test" > "ААА_55.22.xlsx" # different date format - WORKS + echo "test" > "ЦППП_2022.06.xlsx" # 4-char prefix + YYYY.MM - FAILS + echo "test" > "ИАИА_2222.07.xlsx" # 4-char prefix + YYYY.MM - FAILS + ``` + +3. Add the files: + + ```sh + git annex add * + ``` + +4. You will see that some files are successfully added, while others fail with the error: + + ``` + git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists) failed + ``` + +5. Additionally, in existing repos, attempts to unlock or adjust in failed files will show errors like: + + ```sh + git-annex: ../.git/annex/othertmp/.22/SHA256E-s...: removeDirectoryRecursive: permission denied (Permission denied) failed + ``` + +### What version of git-annex are you using? On what operating system? + +* **git-annex version**: 10.20250721 (broken) +* **OS**: Manjaro Linux (ext4 filesystem) +* **git config**: `core.quotepath=false` +* **Note**: The issue does not occur in git-annex version 10.20220121 (tested on WSL Ubuntu). + +### Please provide any additional information below. + +* **Problematic Filename Examples**: + + * "ЦППП\_2022.06.xlsx" (4-char Cyrillic prefix with YYYY.MM date format) — **fails** + * "ИАИА\_2222.07.xlsx" (4-char Cyrillic prefix with YYYY.MM date format) — **fails** + * "ДПК\_2021.06-2.xlsx" (Cyrillic prefix with number and dash) — **fails** + * "ВУП Авто .pptx" (Cyrillic with spaces) — **fails** + * "Ачох\_кейс.dat" (Cyrillic with underscore and special characters) — **fails** + +* **Working Examples**: + + * "ИА\_2222.07.xlsx" (2-char Cyrillic prefix) + * "ЦППП\_202206.xlsx" (no dot in date) + * "ААА\_55.22.xlsx" (different date format) + * Latin-only filenames such as "IOIO\_2222.07.xlsx" also work fine. + +* **Debug Output** shows escaped Cyrillic sequences: + + ```sh + git annex --debug whereis "ЦППП_2022.06.xlsx" 2>&1 | grep ls-files + git [...] ls-files [...] "\1062\1055\1055\1055_2022.06.xlsx" + ``` (Diff truncated)
diff --git a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn index bc80ad004e..166a72937b 100644 --- a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn +++ b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn @@ -120,9 +120,5 @@ Originally reported YYYY.MM pattern (confirmed): ЦППП_2022.01.xlsx, ЦППП_2022.02.xlsx, etc. ``` -Revised pattern analysis - -Failing pattern: [ANY_CYRILLIC_CHARACTERS].[ANY_EXTENSION] - Working pattern: Latin-only filenames work fine. Some non-latin works some not. -This regression affects ANY non latin filename, making git-annex 10.20250721 essentially barely usable for repositories containing non-latin filenames. +This regression can affects ANY non latin filename, making git-annex 10.20250721 essentially barely usable for repositories containing non-latin filenames.
diff --git a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn index 9118362cb5..bc80ad004e 100644 --- a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn +++ b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn @@ -82,37 +82,47 @@ Expanded problem scope ALL of these Cyrillic filename patterns fail: Simple Cyrillic names: +``` пожелания.md обучение.xlsx Протокол.xlsx Согласие.docx Грейдинг.pptx +``` Names with numbers/dashes: +``` ДПК_2021.06-2.xlsx Скрипты_3.xlsx РТ МВНП v1.docx РТ МВНП v2.docx - +``` Names with spaces: +``` ВУП Авто .pptx Ваш юрист.pdf -Names with underscores/special chars: +``` +Names with underscores/special chars: +``` ВУП_видео.mp4 Ачох_кейс.dat +``` Various file extensions affected: - +``` .docx, .pptx, .xlsx (originally reported) .md, .pdf, .mp4, .dat (newly discovered) +``` Originally reported YYYY.MM pattern (confirmed): +``` ЦППП_2022.01.xlsx, ЦППП_2022.02.xlsx, etc. +``` Revised pattern analysis Failing pattern: [ANY_CYRILLIC_CHARACTERS].[ANY_EXTENSION] -Working pattern: Latin-only filenames work fine +Working pattern: Latin-only filenames work fine. Some non-latin works some not. This regression affects ANY non latin filename, making git-annex 10.20250721 essentially barely usable for repositories containing non-latin filenames.
diff --git a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn index c6f819c371..9118362cb5 100644 --- a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn +++ b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn @@ -73,3 +73,46 @@ Root cause: The temp filename generation algorithm appears to create conflicts w # Have you had any luck using git-annex before? Absolutely! git-annex has been fantastic for managing large datasets across multiple machines. The same repository works perfectly with the older version (10.20220121) on Ubuntu WSL, and I've been using git-annex successfully for years. This appears to be a regression in the newer version, but the tool itself remains incredibly valuable for distributed file management. Thanks for all the great work on this project! + +# UPDATE: Problem scope is much wider than initially reported + +After comprehensive testing across a large repository, the issue affects ALL Cyrillic filenames, not just the specific 4-character prefix + YYYY.MM pattern initially reported. +Expanded problem scope + +ALL of these Cyrillic filename patterns fail: + +Simple Cyrillic names: + пожелания.md + обучение.xlsx + Протокол.xlsx + Согласие.docx + Грейдинг.pptx + +Names with numbers/dashes: + ДПК_2021.06-2.xlsx + Скрипты_3.xlsx + РТ МВНП v1.docx + РТ МВНП v2.docx + +Names with spaces: + ВУП Авто .pptx + Ваш юрист.pdf +Names with underscores/special chars: + + ВУП_видео.mp4 + Ачох_кейс.dat + +Various file extensions affected: + + .docx, .pptx, .xlsx (originally reported) + .md, .pdf, .mp4, .dat (newly discovered) + +Originally reported YYYY.MM pattern (confirmed): + ЦППП_2022.01.xlsx, ЦППП_2022.02.xlsx, etc. + +Revised pattern analysis + +Failing pattern: [ANY_CYRILLIC_CHARACTERS].[ANY_EXTENSION] + +Working pattern: Latin-only filenames work fine +This regression affects ANY non latin filename, making git-annex 10.20250721 essentially barely usable for repositories containing non-latin filenames.
diff --git a/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn new file mode 100644 index 0000000000..c6f819c371 --- /dev/null +++ b/doc/bugs/git-annex_add__47__unlock_fails_for_some_names.mdwn @@ -0,0 +1,75 @@ +# What version of git-annex are you using? On what operating system? +``` + git-annex version: 10.20250721 (broken) + OS: Manjaro Linux, ext4 filesystem + git config: core.quotepath=false +``` +Note: Same files work perfectly in git-annex 10.20220121 (tested on WSL Ubuntu). + +[[!format sh """ +Complete test showing the pattern: + +$ git init && git annex init +init ok +(recording state in git...) +Create test files - working examples: + +$ echo "test" > "ИА_2222.07.xlsx" # 2-char Cyrillic prefix - WORKS +$ echo "test" > "ЦППП_202206.xlsx" # no dot in date - WORKS +$ echo "test" > "ААА_55.22.xlsx" # different date format - WORKS +$ echo "test" > "IOIO_2222.07.xlsx" # Latin letters - WORKS +Create test files - failing examples: + +$ echo "test" > "ЦППП_2022.06.xlsx" # 4-char prefix + YYYY.MM - FAILS +$ echo "test" > "ИАИА_2222.07.xlsx" # 4-char prefix + YYYY.MM - FAILS + +$ git annex add *.xlsx +add ААА_55.22.xlsx ok +add IOIO_2222.07.xlsx ok +add ИА_2222.07.xlsx ok +add ЦППП_202206.xlsx ok +add ЦППП_2022.06.xlsx +git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists) +failed +add ИАИА_2222.07.xlsx +git-annex: .git/annex/othertmp/.1: createSymbolicLink: already exists (File exists) +failed +add: 2 failed + +$ git annex status +A ./ААА_55.22.xlsx +A ./IOIO_2222.07.xlsx +A ./ИА_2222.07.xlsx +A ./ЦППП_202206.xlsx +? ./ИАИА_2222.07.xlsx +? ./ЦППП_2022.06.xlsx +Debug output shows escaped Cyrillic conversion: + +$ git annex --debug whereis "ЦППП_2022.06.xlsx" 2>&1 | grep ls-files +[...] git [...] ls-files [...] "\1062\1055\1055\1055_2022.06.xlsx" +For files that were added successfully, unlock also fails: + +$ git annex unlock "ЦППП_2022.06.xlsx" # if we force-add it first +mv: cannot overwrite non-directory './ЦП72447-0' with directory '../.git/annex/othertmp/.22' +git-annex: ../.git/annex/othertmp/.22/SHA256E-s...: removeDirectoryRecursive: permission denied (Permission denied) +failed +Workaround - add special character: + +$ mv "ЦППП_2022.06.xlsx" "ЦППП_2022.06—.xlsx" # em-dash +$ git annex add "ЦППП_2022.06—.xlsx" +add ЦППП_2022.06—.xlsx ok +End of transcript. + +"""]] + +Root cause: The temp filename generation algorithm appears to create conflicts when processing escaped Cyrillic sequences (\1062\1055\1055\1055) for filenames with 4+ character prefixes followed by YYYY.MM date patterns. It tries to create temp names like ЦП{PID}-{counter} which conflict with existing operations. + +# Workarounds found: + Shorten Cyrillic prefix to 2-3 characters + Remove dots from dates (ЦППП_202206.xlsx) + Add special characters (ЦППП_2022.06—.xlsx) + Use different date separators (ЦППП_2022-06.xlsx) + +# Have you had any luck using git-annex before? + +Absolutely! git-annex has been fantastic for managing large datasets across multiple machines. The same repository works perfectly with the older version (10.20220121) on Ubuntu WSL, and I've been using git-annex successfully for years. This appears to be a regression in the newer version, but the tool itself remains incredibly valuable for distributed file management. Thanks for all the great work on this project!
drop: --fast support when dropping from a remote
This is the same as --not --in $remote, but easier to type. And the
documentation of --fast helps also document that drop can do extra work
when used without --fast.
Sponsored-by: Nicholas Golder-Manning
This is the same as --not --in $remote, but easier to type. And the
documentation of --fast helps also document that drop can do extra work
when used without --fast.
Sponsored-by: Nicholas Golder-Manning
diff --git a/CHANGELOG b/CHANGELOG index 34d9e65c91..dfd780321a 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,9 @@ +git-annex (10.20250829) UNRELEASED; urgency=medium + + * drop: --fast support when dropping from a remote. + + -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 + git-annex (10.20250828) upstream; urgency=medium * p2p: Added --enable option, which can be used to enable P2P networks diff --git a/Command/Drop.hs b/Command/Drop.hs index 94720a6ae4..2f27da92dc 100644 --- a/Command/Drop.hs +++ b/Command/Drop.hs @@ -1,6 +1,6 @@ {- git-annex command - - - Copyright 2010-2021 Joey Hess <id@joeyh.name> + - Copyright 2010-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -109,8 +109,17 @@ startLocal lu pcc afile ai si numcopies mincopies key preverified ud = performLocal lu pcc key afile numcopies mincopies preverified ud startRemote :: LiveUpdate -> PreferredContentChecked -> AssociatedFile -> ActionItem -> SeekInput -> NumCopies -> MinCopies -> Key -> DroppingUnused -> Remote -> CommandStart -startRemote lu pcc afile ai si numcopies mincopies key ud remote = - starting "drop" (OnlyActionOn key ai) si $ do +startRemote lu pcc afile ai si numcopies mincopies key ud remote = do + fast <- Annex.getRead Annex.fast + if fast + then do + remotes <- Remote.keyPossibilities (Remote.IncludeIgnored True) key + if remote `elem` remotes + then go + else stop + else go + where + go = starting "drop" (OnlyActionOn key ai) si $ do showAction $ UnquotedString $ "from " ++ Remote.name remote performRemote lu pcc key afile numcopies mincopies remote ud diff --git a/doc/git-annex-drop.mdwn b/doc/git-annex-drop.mdwn index aa66696958..3d1307700a 100644 --- a/doc/git-annex-drop.mdwn +++ b/doc/git-annex-drop.mdwn @@ -42,6 +42,13 @@ Paths of files or directories to drop can be specified. this option can specify a remote from which the files' contents should be removed. +* `--fast` + + When dropping from a remote, avoid doing anything when the remote is not + believed to contain a file. Usually each file is attempted to be dropped + from the remote. This can be faster, but might leave some files on the + remote in some cases. + * `--auto` Rather than trying to drop all specified files, drop only those that diff --git a/doc/todo/drop_--from_unnecessary_locking_of_files_not_in_remote.mdwn b/doc/todo/drop_--from_unnecessary_locking_of_files_not_in_remote.mdwn index ac4b455a75..cedb23572c 100644 --- a/doc/todo/drop_--from_unnecessary_locking_of_files_not_in_remote.mdwn +++ b/doc/todo/drop_--from_unnecessary_locking_of_files_not_in_remote.mdwn @@ -8,11 +8,15 @@ Using `git-annex drop --from foo --in foo` avoided the problem. The reason drop behaves this way is that it's intended to remove content from a remote even when the local repository's location log -is out of sync with it. Still, it's somewhat surprising and annoying that -it can need to do so much extra work. +is out of sync with it. In order to avoid the surprising behavior of +`git-annex drop foo` saying it succeeded, in a case where it turns out the +remote just recently got the file. Still, it's somewhat surprising too, and +annoying, that it can need to do so much extra work. Note that checking if the remote actually has the content would be about as slow as locking files on the other remote(s) (assuming a small numcopies). `--fast` could be made to deal with this, making it check the location log. --[[Joey]] + +> [[done]]
add news item for git-annex 10.20250828
diff --git a/doc/news/version_10.20250416.mdwn b/doc/news/version_10.20250416.mdwn deleted file mode 100644 index 6768a0b469..0000000000 --- a/doc/news/version_10.20250416.mdwn +++ /dev/null @@ -1,19 +0,0 @@ -git-annex 10.20250416 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Added the mask special remote. - * updatecluster, updateproxy: When a remote that has no annex-uuid is - configured as annex-cluster-node, warn and avoid writing bad data to - the git-annex branch. - * Fix build without the assistant. - * fsck: Avoid complaining about required content of dead repositories. - * drop: Avoid redundant object directory thawing. - * httpalso: Windows url fix. - * Added remote.name.annex-web-options config, which is a per-remote - version of the annex.web-options config. - * migrate: Fix --remove-size to work when a file is not present. - Fixes reversion introduced in version 10.20231129. - * Support git remotes that use a IPV6 link-local address with a zone ID. - * Support git remotes that use an url with a user name that is URL - encoded, or in the case of an "scp-style" url, a user name that must be - encoded to be legal in an URL. - * Fix git-lfs special remote ssh endpoint discovery when the repository - path is URL encoded."""]] \ No newline at end of file diff --git a/doc/news/version_10.20250828.mdwn b/doc/news/version_10.20250828.mdwn new file mode 100644 index 0000000000..9dead20771 --- /dev/null +++ b/doc/news/version_10.20250828.mdwn @@ -0,0 +1,38 @@ +git-annex 10.20250828 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * p2p: Added --enable option, which can be used to enable P2P networks + provided by external commands git-annex-p2p-<netname> + * Added git-remote-p2p-annex, which allows git pull and push to + P2P networks provided by commands git-annex-p2p-<netname> + * S3: Default to signature=v4 when using an AWS endpoint, since some + AWS regions need v4 and all support it. When host= is used to specify + a different S3 host, the default remains signature=v2. + * webapp: Support setting up S3 buckets in regions that need v4 + signatures. + * S3: When initremote is given the name of a bucket that already exists, + automatically set datacenter to the right value, rather than needing it + to be explicitly set. + * info: Added --show option to pick which parts of the info to calculate + and display. + * Improve behavior when there are special remotes configured with + autoenable=yes with names that conflict with other remotes. + * adjust: When another branch has been manually merged into the adjusted + branch, re-adjusting errors out, rather than losing that merge commit. + * sync: When another branch has been manually merged into an adjusted + branch, error out rather than only displaying a warning. + * initremote: New onlyencryptcreds=yes which can be used along with + embedcreds=yes, to only encrypt the embedded creds, without encrypting + the content of the special remote. Useful for exporttree/importtree + remotes. + * Don't allow the type of encryption of an existing special remote to be + changed. Fixes reversion introduced in version 7.20191230. + * tahoe: Support tahoe-lafs command versions newer than 1.16. + * tahoe: Fix bug that made initremote require an encryption= parameter, + despite git-annex encryption not being used with this special remote. + Fixes reversion introduced in version 7.20191230. + * Improved error message when yt-dlp is not installed and is needed to + get a file from the web. + * The annex.youtube-dl-command git config is no longer used, git-annex + always runs the yt-dlp command, rather than the old youtube-dl command. + * Removed support for git versions older than 2.22. + * Bump aws build dependency to 0.24.1. + * stack.yaml: Update to lts-24.2."""]] \ No newline at end of file
comments
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_1_07b18e92f2abc25c5666f4ca83bbe1cd._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_1_07b18e92f2abc25c5666f4ca83bbe1cd._comment new file mode 100644 index 0000000000..8f27e557d0 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_1_07b18e92f2abc25c5666f4ca83bbe1cd._comment @@ -0,0 +1,41 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-08-29T15:16:00Z" + content=""" +The failures are mostly of two varieties. + +type A: + + add: FAIL (2.20s) + ./Test/Framework.hs:395: + checkcontent foo + expected: "annexed file content" + but got: "could not read file" + +type B: + + init: OK (1.98s) + add: FAIL (5.90s) + ./Test/Framework.hs:92: + unlock failed with unexpected exit code (transcript follows) + unlock sha1foo cp: cannot open '.git/annex/objects/3j/xV/SHA1-s25--ee80d2cec57a3810db83b80e1b320df3a3721ffa/SHA1-s25--ee80d2cec57a3810db83b80e1b320df3a3721ffa' for reading: No such file or directory + +In both cases, a `git-annex add` is succeeding, but the annex objects +directory is somehow not getting populated. Or at least, a subsequent read +of a file in it has the filesystem not knowing the file that the add put +there is there. + +It seems quite likely a lot of other tests would also fail, but they are being +skipped because the add tests fail. + +In one case, the add tests are succeeding (on an adjusted unlocked branch), +but then a subsequent test fails: + + git-remote-annex exporttree: FAIL (8.45s) + ./Test/Framework.hs:92: + export failed with unexpected exit code (transcript follows) + mv: cannot move '.git/annex/othertmp/89ddefa4-a04c-11.0/89ddefa4-a04c-11' to '.git/annex/export.ex/89ddefa4-a04c-11ef-818d8a-1-c6223d6': Device or resource busy + mv: cannot move '.git/annex/othertmp/89ddefa4-a04c-11.0/89ddefa4-a04c-11' to '.git/annex/export.ex/89ddefa4-a04c-11ef-818d8a-2-c718026': Device or resource busy + git-annex: renamePath:rename '.git/annex/othertmp/89ddefa4-a04c-11.0/89ddefa4-a04c-11' to '.git/annex/export.ex/89ddefa4-a04c-11ef-87b5-e880882a4f98': resource busy (Device or resource busy) +"""]] diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_2_0d3117fc63c4d360423fbf834a0e7d8d._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_2_0d3117fc63c4d360423fbf834a0e7d8d._comment new file mode 100644 index 0000000000..cb093848d8 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_2_0d3117fc63c4d360423fbf834a0e7d8d._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-08-29T15:31:42Z" + content=""" +I'm not familiar with beegfs, but its documentation such as this +<https://doc.beegfs.io/latest/architecture/overview.html> makes me wonder if it +manages to behave consistently as we would expect a filesystem to behave. + +In particular, we have a file being moved from one directory to another +directory. Beegfs's docs says it will pick a random metadata node for each +directory. So there can be two metadata nodes that have to be updated for a +move. If one node somehow lags seeing the update, could that result in the file +not appearing as present in the destination directory after the move? + +I'm only speculating about how beegfs might work, but it seems unlikely that +git-annex has a bug that causes it to lose an annexed file when all it's done is +move it to the objects directory, that only manifests on this one filesystem. + +A good next step might be to try manually adding an annexed file and see +if there is some lag between `git-annex add` and being able to read the content +of the symlink. Eg, compare: + + echo foo > foo + git-annex add foo; cat foo + echo bar > bar + git-annex add bar; sleep 1m; cat bar +"""]]
todo
diff --git a/doc/todo/drop_--from_unnecessary_locking_of_files_not_in_remote.mdwn b/doc/todo/drop_--from_unnecessary_locking_of_files_not_in_remote.mdwn new file mode 100644 index 0000000000..ac4b455a75 --- /dev/null +++ b/doc/todo/drop_--from_unnecessary_locking_of_files_not_in_remote.mdwn @@ -0,0 +1,18 @@ +Doing `git-annex drop --from foo`, I noticed it was first locking files on +another remote (bar) before proceeding to do nothing, as it turned out the files +were not on remote foo. Since the bar remote was accessed over a slow +ssh, that took a lot of time. The foo remote had only a few files, but it +would have needed to lock thousands of files. + +Using `git-annex drop --from foo --in foo` avoided the problem. + +The reason drop behaves this way is that it's intended to +remove content from a remote even when the local repository's location log +is out of sync with it. Still, it's somewhat surprising and annoying that +it can need to do so much extra work. + +Note that checking if the remote actually has the content would be about as +slow as locking files on the other remote(s) (assuming a small numcopies). + +`--fast` could be made to deal with this, making it check the location log. +--[[Joey]]
initial report from ducky
diff --git a/doc/bugs/35_failed_tests_on_beegfs.mdwn b/doc/bugs/35_failed_tests_on_beegfs.mdwn new file mode 100644 index 0000000000..4a443e258a --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs.mdwn @@ -0,0 +1,81 @@ +### Please describe the problem. + +links: [prior report/fix of testing on beegfs 4 years ago; different site/version](https://git-annex.branchable.com/projects/dandi/bugs-done/beegfs__58___init_tests_FAIL_resource_busy/) + +Currently I observed 35 tests failing + +``` +yarick@ducky:/data/mri_dicom/tmp/test-git-annex +*$> grep FAIL .duct/logs/2025.08.28T21.04.10-56898_stdout | nl | tail + 26 add: FAIL (2.30s) + 27 add: FAIL (2.34s) + 28 add: FAIL (2.80s) + 29 add: FAIL (2.21s) + 30 add: FAIL (1.98s) + 31 add: FAIL (3.16s) + 32 add: FAIL (4.27s) + 33 git-remote-annex exporttree: FAIL (8.45s) + 34 export and import: FAIL (10.40s) + 35 export and import of subdir: FAIL (15.99s) +``` + +full log: [http://www.oneukrainian.com/tmp/2025.08.28T21.04.10-56898_stdout](http://www.oneukrainian.com/tmp/2025.08.28T21.04.10-56898_stdout) + +some info: + +``` +$> modinfo beegfs +filename: /lib/modules/5.15.0-122-generic/updates/fs/beegfs_autobuild/beegfs.ko +version: 7.4.6 +alias: fs-beegfs +author: ThinkParQ GmbH +description: BeeGFS parallel file system client (https://www.beegfs.io) +license: GPL v2 +srcversion: 9F666198EABF0EB756ED3AC +depends: ib_core,rdma_cm +retpoline: Y +name: beegfs +vermagic: 5.15.0-122-generic SMP mod_unload modversions + +$> mount | grep data/mri_dicom +beegfs_nodev on /data/mri_dicom type beegfs (rw,nodev,relatime,cfgFile=/etc/beegfs/beegfs-client.conf) + + +``` + +### What steps will reproduce the problem? + +Run tests on beegfs? + +### What version of git-annex are you using? On what operating system? + +``` +*$> git annex version +git-annex version: 10.20250721-g8867e7590a3a70afa8a93d2fefab94adc9a176d0 +build flags: Assistant Webapp Pairing Inotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.1 http-client-0.7.19 persistent +-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E S +HA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2 +B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S +224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hoo +k external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +``` + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +
remove para about conflicts
That was only ever relevant for the v1 upgrade!
That was only ever relevant for the v1 upgrade!
diff --git a/doc/upgrades.mdwn b/doc/upgrades.mdwn index 7d48547d82..01e0054dfa 100644 --- a/doc/upgrades.mdwn +++ b/doc/upgrades.mdwn @@ -38,11 +38,6 @@ refuse to do anything. To upgrade, use the "git annex upgrade" command. To prevent automatic upgrades in a repository, run: `git config annex.autoupgraderepository false` -The upgrade process is guaranteed to be conflict-free. Unless you -already have git conflicts in your repository or between repositories. -Upgrading a repository with conflicts is not recommended; resolve the -conflicts first before upgrading git-annex. - The upgrade process needs to write to the repository. If the original repository cannot be written to (due to eg being on readonly media), the upgrade would need to be run in a copy of the repository.
fix test suite breakage
640bc43c38e37f0acbc5d83d072af82e4e8cc5fa broke a test. Change that test
to not use encryption=shared. Which required some refactoring.
Sponsored-by: Joshua Antonishen
640bc43c38e37f0acbc5d83d072af82e4e8cc5fa broke a test. Change that test
to not use encryption=shared. Which required some refactoring.
Sponsored-by: Joshua Antonishen
diff --git a/Test.hs b/Test.hs index bf09dd7d55..62cb88c10f 100644 --- a/Test.hs +++ b/Test.hs @@ -90,7 +90,6 @@ import qualified Utility.MoveFile import qualified Utility.StatelessOpenPGP import qualified Types.Remote #ifndef mingw32_HOST_OS -import qualified Utility.OsString as OS import qualified Remote.Helper.Encryptable import qualified Types.Crypto import qualified Utility.Gpg @@ -1917,64 +1916,44 @@ test_gpg_crypto = do testscheme "hybrid" testscheme "pubkey" where - gpgcmd = Utility.Gpg.mkGpgCmd Nothing - testscheme scheme = Utility.Tmp.Dir.withTmpDir (literalOsPath "gpgtmp") $ \gpgtmp -> do - -- Use the system temp directory as gpg temp directory because - -- it needs to be able to store the agent socket there, - -- which can be problematic when testing some filesystems. - absgpgtmp <- absPath gpgtmp - res <- testscheme' scheme absgpgtmp - -- gpg may still be running and would prevent - -- removeDirectoryRecursive from succeeding, so - -- force removal of the temp directory. - liftIO $ removeDirectoryForCleanup (fromOsPath gpgtmp) - return res - testscheme' scheme absgpgtmp = intmpclonerepo $ do - -- Since gpg uses a unix socket, which is limited to a - -- short path, use whichever is shorter of absolute - -- or relative path. - relgpgtmp <- relPathCwdToFile absgpgtmp - let gpgtmp = if OS.length relgpgtmp < OS.length absgpgtmp - then relgpgtmp - else absgpgtmp - void $ Utility.Gpg.testHarness (fromOsPath gpgtmp) gpgcmd $ \environ -> do - createDirectory (literalOsPath "dir") - let initps = - [ "foo" - , "type=directory" - , "encryption=" ++ scheme - , "directory=dir" - , "highRandomQuality=false" - ] ++ if scheme `elem` ["hybrid","pubkey"] - then ["keyid=" ++ Utility.Gpg.testKeyId] - else [] - git_annex' "initremote" initps (Just environ) "initremote" - git_annex_shouldfail' "initremote" initps (Just environ) "initremote should not work when run twice in a row" - git_annex' "enableremote" initps (Just environ) "enableremote" - git_annex' "enableremote" initps (Just environ) "enableremote when run twice in a row" - git_annex' "get" [annexedfile] (Just environ) "get of file" - annexed_present annexedfile - git_annex' "copy" [annexedfile, "--to", "foo"] (Just environ) "copy --to encrypted remote" - (c,k) <- annexeval $ do - uuid <- Remote.nameToUUID "foo" - rs <- Logs.Remote.readRemoteLog - Just k <- Annex.WorkTree.lookupKey (toOsPath annexedfile) - return (fromJust $ M.lookup uuid rs, k) - let key = if scheme `elem` ["hybrid","pubkey"] - then Just $ Utility.Gpg.KeyIds [Utility.Gpg.testKeyId] - else Nothing - testEncryptedRemote environ scheme key c [k] @? "invalid crypto setup" + testscheme scheme = intmpclonerepo $ test_with_gpg $ \gpgcmd environ -> do + createDirectory (literalOsPath "dir") + let initps = + [ "foo" + , "type=directory" + , "encryption=" ++ scheme + , "directory=dir" + , "highRandomQuality=false" + ] ++ if scheme `elem` ["hybrid","pubkey"] + then ["keyid=" ++ Utility.Gpg.testKeyId] + else [] + git_annex' "initremote" initps (Just environ) "initremote" + git_annex_shouldfail' "initremote" initps (Just environ) "initremote should not work when run twice in a row" + git_annex' "enableremote" initps (Just environ) "enableremote" + git_annex' "enableremote" initps (Just environ) "enableremote when run twice in a row" + git_annex' "get" [annexedfile] (Just environ) "get of file" + annexed_present annexedfile + git_annex' "copy" [annexedfile, "--to", "foo"] (Just environ) "copy --to encrypted remote" + (c,k) <- annexeval $ do + uuid <- Remote.nameToUUID "foo" + rs <- Logs.Remote.readRemoteLog + Just k <- Annex.WorkTree.lookupKey (toOsPath annexedfile) + return (fromJust $ M.lookup uuid rs, k) + let key = if scheme `elem` ["hybrid","pubkey"] + then Just $ Utility.Gpg.KeyIds [Utility.Gpg.testKeyId] + else Nothing + testEncryptedRemote gpgcmd environ scheme key c [k] @? "invalid crypto setup" - annexed_present annexedfile - git_annex' "drop" [annexedfile, "--numcopies=2"] (Just environ) "drop" - annexed_notpresent annexedfile - git_annex' "move" [annexedfile, "--from", "foo"] (Just environ) "move --from encrypted remote" - annexed_present annexedfile - git_annex_shouldfail' "drop" [annexedfile, "--numcopies=2"] (Just environ) "drop should not be allowed with numcopies=2" - annexed_present annexedfile + annexed_present annexedfile + git_annex' "drop" [annexedfile, "--numcopies=2"] (Just environ) "drop" + annexed_notpresent annexedfile + git_annex' "move" [annexedfile, "--from", "foo"] (Just environ) "move --from encrypted remote" + annexed_present annexedfile + git_annex_shouldfail' "drop" [annexedfile, "--numcopies=2"] (Just environ) "drop should not be allowed with numcopies=2" + annexed_present annexedfile {- Ensure the configuration complies with the encryption scheme, and - that all keys are encrypted properly for the given directory remote. -} - testEncryptedRemote environ scheme ks c keys = case Remote.Helper.Encryptable.extractCipher pc of + testEncryptedRemote gpgcmd environ scheme ks c keys = case Remote.Helper.Encryptable.extractCipher pc of Just cip@Crypto.SharedCipher{} | scheme == "shared" && isNothing ks -> checkKeys cip Nothing Just cip@(Crypto.EncryptedCipher encipher v ks') @@ -2210,9 +2189,26 @@ test_enableremote_encryption_changes = intmpclonerepo $ do "enableremote disabling encryption" git_annex_shouldfail "enableremote" ["bar", "onlyencryptcreds=yes", dirparam] "enableremote with onlyencryptcreds" - git_annex "initremote" ["baz", "type=directory", "encryption=shared", "onlyencryptcreds=yes", dirparam] - "initremote" - git_annex_shouldfail "enableremote" ["baz", "onlyencryptcreds=no", dirparam] - "enableremote disabling onlyencryptcreds" - git_annex "enableremote" ["baz", "onlyencryptcreds=yes", dirparam] - "enableremote enabling already enabled onlyencryptcreds" + git_annex_shouldfail "initremote" ["baz", "type=directory", "encryption=shared", "onlyencryptcreds=yes", dirparam] + "initremote with onlyencryptcreds not allowed with shared encryption" + git_annex_shouldfail "initremote" ["baz", "type=directory", "encryption=none", "onlyencryptcreds=yes", dirparam] + "initremote with onlyencryptcreds not allowed with no encryption" +#ifndef mingw32_HOST_OS + test_with_gpg $ \_gpgcmd environ -> do + git_annex' "initremote" + ["baz" + , "type=directory" + , "encryption=hybrid" + , "onlyencryptcreds=yes" + , "highRandomQuality=false" + , "keyid=" ++ Utility.Gpg.testKeyId + , dirparam] + (Just environ) + "initremote with onlyencryptcreds and hybrid encryption" + git_annex_shouldfail' "enableremote" ["baz", "onlyencryptcreds=no", dirparam] + (Just environ) + "enableremote disabling onlyencryptcreds" + git_annex' "enableremote" ["baz", "onlyencryptcreds=yes", dirparam] + (Just environ) + "enableremote enabling already enabled onlyencryptcreds" +#endif diff --git a/Test/Framework.hs b/Test/Framework.hs index 09ffe0a26d..3d0a96fa2f 100644 --- a/Test/Framework.hs +++ b/Test/Framework.hs @@ -1,11 +1,11 @@ {- git-annex test suite framework - - - Copyright 2010-2023 Joey Hess <id@joeyh.name> + - Copyright 2010-2024 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} -{-# LANGUAGE OverloadedStrings #-} +{-# LANGUAGE OverloadedStrings, CPP #-} module Test.Framework where @@ -67,6 +67,9 @@ import qualified Utility.Metered import qualified Utility.HumanTime import qualified Command.Uninit import qualified Utility.OsString as OS +#ifndef mingw32_HOST_OS +import qualified Utility.Gpg +#endif -- Run a process. The output and stderr is captured, and is only -- displayed if the process does not return the expected value. @@ -517,6 +520,33 @@ add_annex f faildesc = ifM (unlockedFiles <$> getTestMode) , git_annex "add" [f] faildesc ) +#ifndef mingw32_HOST_OS +test_with_gpg :: (Utility.Gpg.GpgCmd -> [(String, String)] -> Assertion) -> Assertion +test_with_gpg a = Utility.Tmp.Dir.withTmpDir (literalOsPath "gpgtmp") $ \gpgtmp -> do + -- Use the system temp directory as gpg temp directory because + -- it needs to be able to store the agent socket there, + -- which can be problematic when testing some filesystems. + absgpgtmp <- absPath gpgtmp + res <- go absgpgtmp + -- gpg may still be running and would prevent + -- removeDirectoryRecursive from succeeding, so + -- force removal of the temp directory. + liftIO $ removeDirectoryForCleanup (fromOsPath gpgtmp) + return res + where + gpgcmd = Utility.Gpg.mkGpgCmd Nothing + go absgpgtmp = do + -- Since gpg uses a unix socket, which is limited to a + -- short path, use whichever is shorter of absolute + -- or relative path. + relgpgtmp <- relPathCwdToFile absgpgtmp + let gpgtmp = if OS.length relgpgtmp < OS.length absgpgtmp + then relgpgtmp + else absgpgtmp (Diff truncated)
improve docs of annex.youtube-dl-options
The options are used whenever yt-dlp is run, not only when finding the
url to download.
The options are used whenever yt-dlp is run, not only when finding the
url to download.
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index db8a88fa3b..6b668f69b5 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -2110,8 +2110,8 @@ Remotes are configured using these settings in `.git/config`. * `annex.youtube-dl-options` - Options to pass to yt-dlp when using it to find the url to download - for a video. + Options to pass to yt-dlp. For example if youtube is requiring cookies + to download from it, use something like "--cookies-from-browser firefox" Some options may break git-annex's integration with yt-dlp. For example, the --output option could cause it to store files somewhere
remove youtube-dl support, always use yt-dlp
The annex.youtube-dl-command git config is no longer used, git-annex always
runs the yt-dlp command, rather than the old youtube-dl command.
Sponsored-by: Leon Schuermann
The annex.youtube-dl-command git config is no longer used, git-annex always
runs the yt-dlp command, rather than the old youtube-dl command.
Sponsored-by: Leon Schuermann
diff --git a/Annex/YoutubeDl.hs b/Annex/YoutubeDl.hs index 9015928078..dbfa315738 100644 --- a/Annex/YoutubeDl.hs +++ b/Annex/YoutubeDl.hs @@ -1,6 +1,6 @@ -{- yt-dlp (and deprecated youtube-dl) integration for git-annex +{- yt-dlp integration for git-annex - - - Copyright 2017-2024 Joey Hess <id@joeyh.name> + - Copyright 2017-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -41,7 +41,7 @@ import qualified Data.Aeson as Aeson import GHC.Generics import qualified Data.ByteString.Char8 as B8 --- youtube-dl can follow redirects to anywhere, including potentially +-- yt-dlp can follow redirects to anywhere, including potentially -- localhost or a private address. So, it's only allowed to download -- content if the user has allowed access to all addresses. youtubeDlAllowed :: Annex Bool @@ -52,25 +52,21 @@ youtubeDlNotAllowedMessage = unwords [ "This url is supported by yt-dlp, but" , "yt-dlp could potentially access any address, and the" , "configuration of annex.security.allowed-ip-addresses" - , "does not allow that. Not using yt-dlp (or youtube-dl)." + , "does not allow that. Not using yt-dlp." ] --- Runs youtube-dl in a work directory, to download a single media file +-- Runs yt-dlp in a work directory, to download a single media file -- from the url. Returns the path to the media file in the work directory. -- --- Displays a progress meter as youtube-dl downloads. +-- Displays a progress meter as yt-dlp downloads. -- -- If no file is downloaded, returns Right Nothing. -- --- youtube-dl can write to multiple files, either temporary files, or +-- yt-dlp can write to multiple files, either temporary files, or -- multiple videos found at the url, and git-annex needs only one file. -- So we need to find the destination file, and make sure there is not -- more than one. With yt-dlp use --print-to-file to make it record the --- file(s) it downloads. With youtube-dl, the best that can be done is --- to require that the work directory end up with only 1 file in it. --- (This can fail, but youtube-dl is deprecated, and they closed my --- issue requesting something like --print-to-file; --- <https://github.com/rg3/youtube-dl/issues/14864>) +-- file(s) it downloads. youtubeDl :: URLString -> OsPath -> MeterUpdate -> Annex (Either String (Maybe OsPath)) youtubeDl url workdir p = ifM ipAddressesUnlimited ( withUrlOptions Nothing $ youtubeDl' url workdir p @@ -79,52 +75,48 @@ youtubeDl url workdir p = ifM ipAddressesUnlimited youtubeDl' :: URLString -> OsPath -> MeterUpdate -> UrlOptions -> Annex (Either String (Maybe OsPath)) youtubeDl' url workdir p uo - | supportedScheme uo url = do - cmd <- youtubeDlCommand - ifM (liftIO $ inSearchPath cmd) - ( runcmd cmd >>= \case - Right True -> downloadedfiles cmd >>= \case + | supportedScheme uo url = + ifM (liftIO $ inSearchPath youtubeDlCommand) + ( runcmd >>= \case + Right True -> downloadedfiles >>= \case (f:[]) -> return $ Right (Just (toOsPath f)) - [] -> return (nofiles cmd) - fs -> return (toomanyfiles cmd fs) + [] -> return nofiles + fs -> return (toomanyfiles fs) Right False -> workdirfiles >>= \case [] -> return (Right Nothing) - _ -> return (Left $ cmd ++ " download is incomplete. Run the command again to resume.") + _ -> return (Left $ youtubeDlCommand ++ " download is incomplete. Run the command again to resume.") Left msg -> return (Left msg) - , return (Left $ cmd ++ " is not installed.") + , return (Left $ youtubeDlCommand ++ " is not installed.") ) | otherwise = return (Right Nothing) where - nofiles cmd = Left $ cmd ++ " did not put any media in its work directory, perhaps it's been configured to store files somewhere else?" - toomanyfiles cmd fs = Left $ cmd ++ " downloaded multiple media files; git-annex is only able to deal with one per url: " ++ show fs - downloadedfiles cmd - | isytdlp cmd = liftIO $ - (nub . lines <$> readFile (fromOsPath filelistfile)) - `catchIO` (pure . const []) - | otherwise = map fromOsPath <$> workdirfiles + nofiles = Left $ youtubeDlCommand ++ " did not put any media in its work directory, perhaps it's been configured to store files somewhere else?" + toomanyfiles fs = Left $ youtubeDlCommand ++ " downloaded multiple media files; git-annex is only able to deal with one per url: " ++ show fs + downloadedfiles = liftIO $ + (nub . lines <$> readFile (fromOsPath filelistfile)) + `catchIO` (pure . const []) workdirfiles = liftIO $ filter (/= filelistfile) <$> (filterM doesFileExist =<< dirContents workdir) filelistfile = workdir </> filelistfilebase filelistfilebase = literalOsPath "git-annex-file-list-file" - isytdlp cmd = cmd == "yt-dlp" - runcmd cmd = youtubeDlMaxSize workdir >>= \case + runcmd = youtubeDlMaxSize workdir >>= \case Left msg -> return (Left msg) Right maxsize -> do - opts <- youtubeDlOpts (dlopts cmd ++ maxsize) + opts <- youtubeDlOpts (dlopts ++ maxsize) oh <- mkOutputHandlerQuiet - -- The size is unknown to start. Once youtube-dl + -- The size is unknown to start. Once yt-dlp -- outputs some progress, the meter will be updated -- with the size, which is why it's important the -- meter is passed into commandMeter' let unknownsize = Nothing :: Maybe FileSize ok <- metered (Just p) unknownsize Nothing $ \meter meterupdate -> liftIO $ commandMeter' - (if isytdlp cmd then parseYtdlpProgress else parseYoutubeDlProgress) - oh (Just meter) meterupdate cmd opts + parseYtdlpProgress + oh (Just meter) meterupdate youtubeDlCommand opts (\pr -> pr { cwd = Just (fromOsPath workdir) }) return (Right ok) - dlopts cmd = + dlopts = [ Param url -- To make it only download one file when given a -- page with a video and a playlist, download only the video. @@ -134,22 +126,17 @@ youtubeDl' url workdir p uo -- somewhat stable, but this is the only way to prevent -- it from downloading the whole playlist.) , Param "--playlist-items", Param "0" - ] ++ - if isytdlp cmd - then - -- Avoid warnings, which go to - -- stderr and may mess up - -- git-annex's display. - [ Param "--no-warnings" - , Param "--progress-template" - , Param progressTemplate - , Param "--print-to-file" - , Param "after_move:filepath" - , Param (fromOsPath filelistfilebase) - ] - else [] + -- Avoid warnings, which go to stderr and may + -- mess up git-annex's display. + , Param "--no-warnings" + , Param "--progress-template" + , Param progressTemplate + , Param "--print-to-file" + , Param "after_move:filepath" + , Param (fromOsPath filelistfilebase) + ] --- To honor annex.diskreserve, ask youtube-dl to not download too +-- To honor annex.diskreserve, ask yt-dlp to not download too -- large a media file. Factors in other downloads that are in progress, -- and any files in the workdir that it may have partially downloaded -- before. @@ -188,22 +175,22 @@ youtubeDlTo key url dest p = do return Nothing return (fromMaybe False res) --- youtube-dl supports downloading urls that are not html pages, +-- yt-dlp supports downloading urls that are not html pages, -- but we don't want to use it for such urls, since they can be downloaded -- without it. So, this first downloads part of the content and checks --- if it's a html page; only then is youtube-dl used. +-- if it's a html page; only then is yt-dlp used. htmlOnly :: URLString -> a -> Annex a -> Annex a htmlOnly url fallback a = withUrlOptions Nothing $ \uo -> liftIO (downloadPartial url uo htmlPrefixLength) >>= \case Just bs | isHtmlBs bs -> a _ -> return fallback --- Check if youtube-dl supports downloading content from an url. +-- Check if yt-dlp supports downloading content from an url. youtubeDlSupported :: URLString -> Annex Bool youtubeDlSupported url = either (const False) id <$> withUrlOptions Nothing (youtubeDlCheck' url) --- Check if youtube-dl can find media in an url. +-- Check if yt-dlp can find media in an url. -- -- While this does not download anything, it checks youtubeDlAllowed -- for symmetry with youtubeDl; the check should not succeed if the @@ -218,11 +205,10 @@ youtubeDlCheck' :: URLString -> UrlOptions -> Annex (Either String Bool) youtubeDlCheck' url uo | supportedScheme uo url = catchMsgIO $ htmlOnly url False $ do opts <- youtubeDlOpts [ Param url, Param "--simulate" ] - cmd <- youtubeDlCommand - liftIO $ snd <$> processTranscript cmd (toCommand opts) Nothing + liftIO $ snd <$> processTranscript youtubeDlCommand (toCommand opts) Nothing | otherwise = return (Right False) --- Ask youtube-dl for the filename of media in an url. +-- Ask yt-dlp for the filename of media in an url. -- -- (This is not always identical to the filename it uses when downloading.) youtubeDlFileName :: URLString -> Annex (Either String OsPath) @@ -245,10 +231,11 @@ youtubeDlFileNameHtmlOnly' url uo (Diff truncated)
diff --git a/doc/forum/Current_git-annex_downloads_aren__39__t_available__63__.mdwn b/doc/forum/Current_git-annex_downloads_aren__39__t_available__63__.mdwn new file mode 100644 index 0000000000..5774f189d0 --- /dev/null +++ b/doc/forum/Current_git-annex_downloads_aren__39__t_available__63__.mdwn @@ -0,0 +1,8 @@ +Hi, + +Not sure why but current git annex downloads aren't available on [kitenet](https://downloads.kitenet.net/git-annex/linux/current/): + + +(only `.sig` and `.info` are present, `.tar.gz` are missing) + +Is it a temporary issue or something else?
issue resolved
diff --git a/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell.mdwn b/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell.mdwn index 612be4c47a..069956349b 100644 --- a/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell.mdwn +++ b/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell.mdwn @@ -175,3 +175,6 @@ git-annex version: 10.20250416 [[!meta author=yoh]] [[!tag projects/dandi]] +> [[fixed|done]] --[[yoh]] + +It was an issue with jumbo MTU on the server side!
report on failing test
diff --git a/doc/bugs/tests_fail__58___There_is_no_security_benefit_.mdwn b/doc/bugs/tests_fail__58___There_is_no_security_benefit_.mdwn new file mode 100644 index 0000000000..144a476b1c --- /dev/null +++ b/doc/bugs/tests_fail__58___There_is_no_security_benefit_.mdwn @@ -0,0 +1,52 @@ +### Please describe the problem. + +I have fixed up build env for https://github.com/datalad/git-annex/ -- now just a pure Debian stable (no neurodebian), [Dockerfile](https://github.com/datalad/git-annex/blob/master/.github/workflows/tools/containers/buildenv-git-annex/Dockerfile) + + +but then [testing of the build](https://github.com/datalad/git-annex/actions/runs/17227039565/job/48874015218) fails with + +``` +Tests + Repo Tests v10 locked + Init Tests + init: OK (0.21s) + add: OK (0.61s) + enableremote encryption changes: FAIL (0.63s) + ./Test/Framework.hs:92: + initremote failed with unexpected exit code (transcript follows) + initremote baz + git-annex: There is no security benefit to using onlyencryptcreds=yes with encryption=shared + failed + initremote: 1 failed + + Use -p '/enableremote encryption changes/' to rerun this test only. + borg remote: OK + uninit: OK (0.57s) + conflict resolution (mixed directory and file): OK (3.57s) + concurrent get of dup key regression: OK (0.69s) + migrate (via gitattributes): OK (1.87s) + fsck (basics): OK (0.82s) + copy: OK (1.39s) + drop (no remote): OK (0.59s) + shared clone: OK (0.18s) + add moved link: OK (0.56s) + +1 out of 13 tests failed (11.68s) +``` + + + +### What version of git-annex are you using? On what operating system? + + +``` +git-annex version: 10.20250721+git92-g481504db36-1~ndall+1 +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.34 DAV-1.3.4 feed-1.3.2.1 ghc-9.6.6 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +``` +
Added a comment: Wrong place to post
diff --git a/doc/forum/annex_import_doesn__39__t_delete_files_during_updates/comment_1_ca15eb2b3ed57236b611db6126482736._comment b/doc/forum/annex_import_doesn__39__t_delete_files_during_updates/comment_1_ca15eb2b3ed57236b611db6126482736._comment new file mode 100644 index 0000000000..c897ab43ff --- /dev/null +++ b/doc/forum/annex_import_doesn__39__t_delete_files_during_updates/comment_1_ca15eb2b3ed57236b611db6126482736._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="tbabej" + avatar="http://cdn.libravatar.org/avatar/06a742f97224b0fcb2df4a74e6615899" + subject="Wrong place to post" + date="2025-08-25T22:00:48Z" + content=""" +Just noticed there is also a subsection for the bugs, my apologies for reporting this at the wrong place! +"""]]
diff --git a/doc/forum/annex_import_doesn__39__t_delete_files_during_updates.mdwn b/doc/forum/annex_import_doesn__39__t_delete_files_during_updates.mdwn new file mode 100644 index 0000000000..508acc612e --- /dev/null +++ b/doc/forum/annex_import_doesn__39__t_delete_files_during_updates.mdwn @@ -0,0 +1,477 @@ +## git annex import does not delete files that have not been imported before, even if they were exported + +It looks like an import after each export is required in order to keep proper track of files, at least in the case of special remote `type=directory`. + +A full reproducer is below, but the abbreviated version is the following: + +1. Create a file, add it to the annex +2. Export the file to the directory special remote +3. Remove the file from the directory using `rm` +4. Import the changes from the directory +5. The deletion of the file is never detected and the file stays hanging locally indefinitely + +### Full reproducer + +First, we initialize an empty git-annex repo and a directory that will serve as the special remote: + +``` +$ mkdir git-annex +$ cd git-annex/ +$ mkdir repository directory +$ ls +directory repository +$ cd repository^C +$ cd repository/ +$ ls +$ git init +hint: Using 'master' as the name for the initial branch. This default branch name +hint: is subject to change. To configure the initial branch name to use in all +hint: of your new repositories, which will suppress this warning, call: +hint: +hint: git config --global init.defaultBranch <name> +hint: +hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and +hint: 'development'. The just-created branch can be renamed via this command: +hint: +hint: git branch -m <name> +hint: +hint: Disable this message with "git config set advice.defaultBranchName false" +Initialized empty Git repository in /tmp/git-annex/repository/.git/ +$ git annex init +init ok +(recording state in git...) +``` + +We add initial content. Note that removing a file that was added in the initial export is correctly detected on subsequent import: + +``` +$ echo one > one; sleep 1; echo two > two +$ git annex add * +add one +ok +add two +ok +(recording state in git...) +$ hh +git commit -m "Initial content" +$ git commit -m "Initial content" +[master (root-commit) e3cfdea] Initial content + 2 files changed, 2 insertions(+) + create mode 120000 one + create mode 120000 two +$ git annex initremote homeserver type=directory directory=/tmp/git-annex/directory exporttree=yes import +tree=yes encryption=none +initremote homeserver ok +(recording state in git...) +$ git annex export master --to homeserver +export homeserver one ok +export homeserver two ok +(recording state in git...) +$ rm ../directory/two +$ git annex -d import master --from homeserver -m "Deleted two, this works" +[2025-08-25 17:42:17.549242346] (Utility.Process) process [1346542] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +show-ref","git-annex"] +[2025-08-25 17:42:17.551512992] (Utility.Process) process [1346542] done ExitSuccess +[2025-08-25 17:42:17.551977656] (Utility.Process) process [1346543] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +show-ref","--hash","refs/heads/git-annex"] +[2025-08-25 17:42:17.554079617] (Utility.Process) process [1346543] done ExitSuccess +[2025-08-25 17:42:17.554859544] (Utility.Process) process [1346544] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +log","refs/heads/git-annex..673de1e45f8751c3ac0066b4c827e3a046051c4f","--pretty=%H","-n1"] +[2025-08-25 17:42:17.557950381] (Utility.Process) process [1346544] done ExitSuccess +[2025-08-25 17:42:17.559947735] (Utility.Process) process [1346545] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +cat-file","--batch"] +[2025-08-25 17:42:17.563916375] (Utility.Process) process [1346546] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +show-ref","--hash","refs/remotes/homeserver/master"] +[2025-08-25 17:42:17.566239833] (Utility.Process) process [1346546] done ExitSuccess +list homeserver ok +[2025-08-25 17:42:17.57127565] (Utility.Process) process [1346548] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","m +ktree","--missing","--batch","-z"] +[2025-08-25 17:42:17.574520441] (Utility.Process) process [1346548] done ExitSuccess +[2025-08-25 17:42:17.577872901] (Utility.Process) process [1346549] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +rev-parse","--verify","--quiet","refs/heads/git-annex:"] +[2025-08-25 17:42:17.580406091] (Utility.Process) process [1346549] done ExitSuccess +[2025-08-25 17:42:17.581051093] (Utility.Process) process [1346550] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +diff-tree","-z","--raw","--no-renames","-l0","-r","9ab60e3cc17c42b13c16a28ae4c59f3716502756","fcf471e16d0105550406a29e13f9d385447cbd31","--"] +[2025-08-25 17:42:17.584953927] (Utility.Process) process [1346550] done ExitSuccess +[2025-08-25 17:42:17.585114599] (Database.Handle) commitDb start +[2025-08-25 17:42:17.585975051] (Database.Handle) commitDb done +update refs/remotes/homeserver/master [2025-08-25 17:42:17.588322083] (Utility.Process) process [1346551] read: git ["--git-dir=.git","--work-tree=.","--litera +l-pathspecs","-c","annex.debug=true","rev-parse","--verify","--quiet","e3cfdead06d04f4df78a01a309a1831af4961858:"] +[2025-08-25 17:42:17.590642005] (Utility.Process) process [1346551] done ExitSuccess +[2025-08-25 17:42:17.59120988] (Utility.Process) process [1346552] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","m +ktree","--missing","--batch","-z"] +[2025-08-25 17:42:17.591563939] (Messages.explain) [ one does not match annex.addunlocked: nothing[FALSE] ] + +[2025-08-25 17:42:17.592509593] (Utility.Process) process [1346553] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +hash-object","-w","--no-filters","--stdin-paths"] +[2025-08-25 17:42:17.59575411] (Utility.Process) process [1346552] done ExitSuccess +[2025-08-25 17:42:17.596285972] (Utility.Process) process [1346554] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +log","e3cfdead06d04f4df78a01a309a1831af4961858","--full-history","--no-abbrev","--format=%T %H %P"] +[2025-08-25 17:42:17.598755905] (Utility.Process) process [1346554] done ExitSuccess +[2025-08-25 17:42:17.600719334] (Utility.Process) process [1346555] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +commit-tree","a2c2f21679a7c0864f7ca486c5d39998abb1f33f","--no-gpg-sign","-m","Deleted two, this works"] +[2025-08-25 17:42:17.603747862] (Utility.Process) process [1346555] done ExitSuccess +[2025-08-25 17:42:17.604331351] (Utility.Process) process [1346556] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +commit-tree","a2c2f21679a7c0864f7ca486c5d39998abb1f33f","--no-gpg-sign","-p","e3cfdead06d04f4df78a01a309a1831af4961858","-p","9344862d2a1be0162dfc63c2dcc5c1dd7 +c7406a3","-m","remote tracking branch"] +[2025-08-25 17:42:17.607310455] (Utility.Process) process [1346556] done ExitSuccess +[2025-08-25 17:42:17.609644622] (Utility.Process) process [1346557] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +diff-tree","-z","--raw","--no-renames","-l0","-r","6987476211346060b533f824472f34bd92602ccd","a2c2f21679a7c0864f7ca486c5d39998abb +1f33f","--"] +[2025-08-25 17:42:17.612718435] (Utility.Process) process [1346558] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"] +[2025-08-25 17:42:17.615267805] (Utility.Process) process [1346559] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +cat-file","--batch"] +[2025-08-25 17:42:17.617452533] (Utility.Process) process [1346557] done ExitSuccess +[2025-08-25 17:42:17.6175722] (Database.Handle) commitDb start +[2025-08-25 17:42:17.618291237] (Database.Handle) commitDb done +[2025-08-25 17:42:17.619671807] (Utility.Process) process [1346560] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +show-ref","--hash","refs/heads/git-annex"] +[2025-08-25 17:42:17.622156138] (Utility.Process) process [1346560] done ExitSuccess +[2025-08-25 17:42:17.622664547] (Utility.Process) process [1346561] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +rev-parse","--verify","--quiet","673de1e45f8751c3ac0066b4c827e3a046051c4f:"] +[2025-08-25 17:42:17.624898473] (Utility.Process) process [1346561] done ExitSuccess +[2025-08-25 17:42:17.625451325] (Utility.Process) process [1346562] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +mktree","--missing","--batch","-z"] +[2025-08-25 17:42:17.625953724] (Utility.Process) process [1346563] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +ls-tree","--full-tree","-z","-t","--","fcf471e16d0105550406a29e13f9d385447cbd31"] +[2025-08-25 17:42:17.628045705] (Utility.Process) process [1346563] done ExitSuccess +[2025-08-25 17:42:17.629385044] (Utility.Process) process [1346562] done ExitSuccess +[2025-08-25 17:42:17.629801349] (Utility.Process) process [1346564] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +commit-tree","becfcef63bb6e2521c4a66e41bdd1f867eda5d62","--no-gpg-sign","-p","673de1e45f8751c3ac0066b4c827e3a046051c4f","-m","graft"] +[2025-08-25 17:42:17.632644385] (Utility.Process) process [1346564] done ExitSuccess +[2025-08-25 17:42:17.633131506] (Utility.Process) process [1346565] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +commit-tree","fcf471e16d0105550406a29e13f9d385447cbd31","--no-gpg-sign","-p","d8a2fbef419e45b925d61363326a062bd31911ae","-m","graft cleanup"] +[2025-08-25 17:42:17.635853854] (Utility.Process) process [1346565] done ExitSuccess +[2025-08-25 17:42:17.63629057] (Utility.Process) process [1346566] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","u +pdate-ref","refs/heads/git-annex","7cf4ac7abbfd7e4e8be1a9af024ffafd230f7d71"] +[2025-08-25 17:42:17.63888163] (Utility.Process) process [1346566] done ExitSuccess +[2025-08-25 17:42:17.639467114] (Annex.Branch) read export.log +[2025-08-25 17:42:17.640240661] (Annex.Branch) set export.log +[2025-08-25 17:42:17.640651989] (Utility.Process) process [1346567] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +diff-tree","-z","--raw","--no-renames","-l0","-r","6987476211346060b533f824472f34bd92602ccd","a2c2f21679a7c0864f7ca486c5d39998abb1f33f","--"] +[2025-08-25 17:42:17.643916998] (Annex.Branch) read 5a2/b05/SHA256E-s4--27dd8ed44a83ff94d557f9fd0412ed5a8cbca69ea04922d88c01184a07300a5a.log +[2025-08-25 17:42:17.644707405] (Annex.Branch) set 5a2/b05/SHA256E-s4--27dd8ed44a83ff94d557f9fd0412ed5a8cbca69ea04922d88c01184a07300a5a.log +[2025-08-25 17:42:17.644810572] (Utility.Process) process [1346567] done ExitSuccess +[2025-08-25 17:42:17.645419226] (Utility.Process) process [1346568] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +update-ref","refs/remotes/homeserver/master","26a7015330e6151489ca0589b49196e39412c3b7"] +[2025-08-25 17:42:17.648061306] (Utility.Process) process [1346568] done ExitSuccess +ok +[2025-08-25 17:42:17.649157468] (Utility.Process) process [1346569] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +update-index","-z","--index-info"] +[2025-08-25 17:42:17.651518366] (Utility.Process) process [1346569] done ExitSuccess +[2025-08-25 17:42:17.652234078] (Utility.Process) process [1346570] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +show-ref","--hash","refs/heads/git-annex"] +[2025-08-25 17:42:17.654342885] (Utility.Process) process [1346570] done ExitSuccess +(recording state in git...) +[2025-08-25 17:42:17.655106125] (Utility.Process) process [1346571] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +write-tree"] +[2025-08-25 17:42:17.658098143] (Utility.Process) process [1346571] done ExitSuccess +[2025-08-25 17:42:17.658856276] (Utility.Process) process [1346572] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +commit-tree","4f8232e80d243df0cbd416864d9edc84ec29f5a8","--no-gpg-sign","-p","refs/heads/git-annex","-m","update"] +[2025-08-25 17:42:17.661406143] (Utility.Process) process [1346572] done ExitSuccess +[2025-08-25 17:42:17.662077275] (Utility.Process) process [1346573] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +update-ref","refs/heads/git-annex","61a5a7339eca80f146446f670aa12dc743cd3c0a"] +[2025-08-25 17:42:17.664587914] (Utility.Process) process [1346573] done ExitSuccess +[2025-08-25 17:42:17.66604483] (Utility.Process) process [1346559] done ExitSuccess +[2025-08-25 17:42:17.666703705] (Utility.Process) process [1346545] done ExitSuccess +[2025-08-25 17:42:17.667383399] (Utility.Process) process [1346558] done ExitSuccess +[2025-08-25 17:42:17.667864184] (Utility.Process) process [1346553] done ExitSuccess + +$ git annex merge homeserver/master +merge homeserver/master +Updating e3cfdea..26a7015 +Fast-forward + two | 1 - + 1 file changed, 1 deletion(-) + delete mode 120000 two +ok +``` + +However, adding a new file, exporting it, and immediatelly deleting it before ever importing from the remote leads to file hanging locally: + +``` +$ echo three > three (Diff truncated)
Added a comment
diff --git a/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell/comment_3_b112c2224fd19e9888e62a5223928a10._comment b/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell/comment_3_b112c2224fd19e9888e62a5223928a10._comment new file mode 100644 index 0000000000..14c8306c29 --- /dev/null +++ b/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell/comment_3_b112c2224fd19e9888e62a5223928a10._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 3" + date="2025-08-25T21:30:55Z" + content=""" +FWIW: might be not git-annex specific since I also observe now some stalls with `git push` as well, gets stuck on `Writing objects: 96% (1855/1932)`. Using newer git (2.45.1) from within a singularity env on client didn't help, so must be connection or target git (running `git receive-pack`) related... heh - the costs of upgrades! ;-) +"""]]
Added a comment
diff --git a/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell/comment_2_156f9bb49e4c93a6d8f7bed03054574c._comment b/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell/comment_2_156f9bb49e4c93a6d8f7bed03054574c._comment new file mode 100644 index 0000000000..095172a399 --- /dev/null +++ b/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell/comment_2_156f9bb49e4c93a6d8f7bed03054574c._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 2" + date="2025-08-25T16:56:00Z" + content=""" +similarly, if I try to `get` into server from the originally the client -- similar stall at + +``` +[2025-08-25 12:54:34.537964095] (Utility.Process) process [875925] done ExitSuccess +[2025-08-25 12:54:34.538354083] (Utility.Process) process [875929] read: git [\"--git-dir=/home/yoh/proj/dandi/zarr-manifests/.git\",\"--work-tree=/home/yoh/proj/dandi/zarr-manifests\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/annex/last-index\"] +[2025-08-25 12:54:34.541389023] (Utility.Process) process [875929] done ExitSuccess +[2025-08-25 12:54:34.542713902] (P2P.IO) [ThreadId 4] P2P > DATA 24576 +[2025-08-25 12:54:34.512358033] (P2P.IO) [git-annex-shell connection Just 551102] [ThreadId 4] P2P < DATA 24576 + +``` +"""]]
Added a comment
diff --git a/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell/comment_1_07e9419ca2b18ec2dfc4fa2cecaadf6b._comment b/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell/comment_1_07e9419ca2b18ec2dfc4fa2cecaadf6b._comment new file mode 100644 index 0000000000..6785292d4f --- /dev/null +++ b/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell/comment_1_07e9419ca2b18ec2dfc4fa2cecaadf6b._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 1" + date="2025-08-25T16:55:33Z" + content=""" +similarly, if I try to `get` into server from the originally the client -- similar stall at + +``` +[2025-08-25 12:54:34.537964095] (Utility.Process) process [875925] done ExitSuccess +[2025-08-25 12:54:34.538354083] (Utility.Process) process [875929] read: git [\"--git-dir=/home/yoh/proj/dandi/zarr-manifests/.git\",\"--work-tree=/home/yoh/proj/dandi/zarr-manifests\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/annex/last-index\"] +[2025-08-25 12:54:34.541389023] (Utility.Process) process [875929] done ExitSuccess +[2025-08-25 12:54:34.542713902] (P2P.IO) [ThreadId 4] P2P > DATA 24576 +[2025-08-25 12:54:34.512358033] (P2P.IO) [git-annex-shell connection Just 551102] [ThreadId 4] P2P < DATA 24576 + +``` +"""]]
initial report on copy to be stuck
diff --git a/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell.mdwn b/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell.mdwn new file mode 100644 index 0000000000..612be4c47a --- /dev/null +++ b/doc/bugs/copy_stalls_while_through___126____47__.ssh__47__git-annex-shell.mdwn @@ -0,0 +1,177 @@ +### Please describe the problem. + +I think it was working and likely on the server (receiver) end upgrade changed the situation. + +I have on receiver (server) + +``` +$> cat ~/.ssh/git-annex-shell +#!/bin/sh +set -e +exec git-annex-shell -c "$SSH_ORIGINAL_COMMAND" + +$> grep git-annex-shell ~/.ssh/authorized_keys| sed -e 's, AAAA.*, SENSORED,g' +command="~/.ssh/git-annex-shell",no-agent-forwarding,no-port-forwarding,no-X11-forwarding ssh-ed25519 SENSORED + +$> git-annex version +git-annex version: 10.20250416 +... +``` + + +And when I try to copy to it, it goes like + +```shell +yoh@typhon:~/proj/dandi/zarr-manifests$ git annex --debug copy --to falkor zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json +[2025-08-25 12:36:01.478893077] (Utility.Process) process [848930] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"] +[2025-08-25 12:36:01.482045477] (Utility.Process) process [848930] done ExitSuccess +[2025-08-25 12:36:01.482373437] (Utility.Process) process [848931] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"] +[2025-08-25 12:36:01.48528802] (Utility.Process) process [848931] done ExitSuccess +[2025-08-25 12:36:01.486194791] (Utility.Process) process [848932] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"] +[2025-08-25 12:36:01.499961366] (Utility.Process) process [848932] done ExitSuccess +[2025-08-25 12:36:01.502100538] (Utility.Process) process [848941] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json"] +[2025-08-25 12:36:01.502508594] (Utility.Process) process [848942] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2025-08-25 12:36:01.502927219] (Utility.Process) process [848943] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2025-08-25 12:36:01.503216764] (Utility.Process) process [848944] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] +copy zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json +[2025-08-25 12:36:01.51405112] (Utility.Process) process [848944] done ExitSuccess +[2025-08-25 12:36:01.514176808] (Utility.Process) process [848943] done ExitSuccess +[2025-08-25 12:36:01.514249119] (Utility.Process) process [848942] done ExitSuccess +[2025-08-25 12:36:01.514296537] (Utility.Process) process [848941] done ExitSuccess +[2025-08-25 12:36:01.51498527] (Utility.Process) process [848948] read: ssh ["-O","stop","-S","falkor","-o","ControlMaster=auto","-o","ControlPersist=yes","localhost"] in ".git/annex/ssh/" +[2025-08-25 12:36:01.520461531] (Utility.Process) process [848948] done ExitSuccess +[2025-08-25 12:36:01.5209307] (Utility.Process) process [848949] read: ssh ["-o","BatchMode=true","-S",".git/annex/ssh/falkor","-o","ControlMaster=auto","-o","ControlPersist=yes","-n","-T","falkor","true"] +[2025-08-25 12:36:01.821918754] (Utility.Process) process [848949] done ExitFailure 1 +[2025-08-25 12:36:01.82311226] (Utility.Process) process [848953] chat: ssh ["falkor","-S",".git/annex/ssh/falkor","-o","ControlMaster=auto","-o","ControlPersist=yes","-T","git-annex-shell 'p2pstdio' '/srv/datasets.datalad.org/www/dandi/zarr-manifests' '--debug' 'a5788908-4791-4455-834f-5eca0617a767' --uuid 1bdcbd2c-f8d0-496f-bbc1-22a8fea21c34"] +[2025-08-25 12:36:01.849110609] (P2P.IO) [ThreadId 4] P2P > AUTH-SUCCESS 1bdcbd2c-f8d0-496f-bbc1-22a8fea21c34 +[2025-08-25 12:36:01.880523039] (P2P.IO) [git-annex-shell connection Just 848953] [ThreadId 19] P2P < AUTH-SUCCESS 1bdcbd2c-f8d0-496f-bbc1-22a8fea21c34 +[2025-08-25 12:36:01.880770574] (P2P.IO) [git-annex-shell connection Just 848953] [ThreadId 19] P2P > VERSION 4 +[2025-08-25 12:36:01.850796001] (P2P.IO) [ThreadId 4] P2P < VERSION 4 +[2025-08-25 12:36:01.850927368] (P2P.IO) [ThreadId 4] P2P > VERSION 4 +[2025-08-25 12:36:01.881853888] (P2P.IO) [git-annex-shell connection Just 848953] [ThreadId 19] P2P < VERSION 4 +[2025-08-25 12:36:01.882131352] (P2P.IO) [git-annex-shell connection Just 848953] [ThreadId 19] P2P > CHECKPRESENT SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +[2025-08-25 12:36:01.852139336] (P2P.IO) [ThreadId 4] P2P < CHECKPRESENT SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +[2025-08-25 12:36:01.85256798] (P2P.IO) [ThreadId 4] P2P > FAILURE +[2025-08-25 12:36:01.883482037] (P2P.IO) [git-annex-shell connection Just 848953] [ThreadId 19] P2P < FAILURE +[2025-08-25 12:36:01.88500046] (P2P.IO) [git-annex-shell connection Just 848953] [ThreadId 19] P2P > PUT zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +[2025-08-25 12:36:01.855064436] (P2P.IO) [ThreadId 4] P2P < PUT zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +[2025-08-25 12:36:01.855407835] (P2P.IO) [ThreadId 4] P2P > PUT-FROM 0 (to falkor...) +[2025-08-25 12:36:01.886302441] (P2P.IO) [git-annex-shell connection Just 848953] [ThreadId 19] P2P < PUT-FROM 0 +[2025-08-25 12:36:01.88807408] (Utility.Process) process [848956] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","write-tree"] +[2025-08-25 12:36:01.90475146] (Utility.Process) process [848956] done ExitSuccess +[2025-08-25 12:36:01.905302296] (Utility.Process) process [848957] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/annex/last-index"] +[2025-08-25 12:36:01.909254395] (Utility.Process) process [848957] done ExitSuccess +[2025-08-25 12:36:01.90983123] (P2P.IO) [git-annex-shell connection Just 848953] [ThreadId 19] P2P > DATA 8174019 +0% 31.98 KiB 1 MiB/s 6s +[2025-08-25 12:36:01.879890247] (P2P.IO) [ThreadId 4] P2P < DATA 8174019 +[2025-08-25 12:36:01.88445232] (Annex.Perms) thawing content /srv/datasets.datalad.org/www/dandi/zarr-manifests/.git/annex/tmp/SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +``` + +and stalls! eventually would die with smth like + +``` +Lost connection (fd:54: hPutBuf: resource vanished (Broken pipe)) +Transfer failed +``` + +removing `command` from the `~/.ssh/authorized_keys` reveals: + +``` +[2025-08-25 12:41:20.349423229] (P2P.IO) [git-annex-shell connection Just 856869] [ThreadId 19] P2P > DATA 8174019 +0% 31.98 KiB 2 MiB/s 4s +[2025-08-25 12:41:20.319241906] (P2P.IO) [ThreadId 4] P2P < DATA 8174019 + transfer already in progress, or unable to take transfer lock +git-annex: transfer already in progress, or unable to take transfer lock +p2pstdio: 1 failed + Lost connection (fd:32: hPutBuf: resource vanished (Broken pipe))done ExitFailure 1 + + Transfer failed +[2025-08-25 12:41:20.367078689] (Utility.Process) process [856874] chat: ssh ["falkor","-S",".git/annex/ssh/falkor","-o","ControlMaster=auto","-o","ControlPersist=yes","-T","git-annex-shell 'p2pstdio' '/srv/datasets.datalad.org/www/dandi/zarr-manifests' '--debug' 'a5788908-4791-4455-834f-5eca0617a767' --uuid 1bdcbd2c-f8d0-496f-bbc1-22a8fea21c34"] +``` + +killing all the already running git-annex-shell on server, and restarting leads to the similar stall + +``` +yoh@typhon:~/proj/dandi/zarr-manifests$ git annex --debug copy --to falkor zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json +[2025-08-25 12:46:05.59881201] (Utility.Process) process [863109] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"] +[2025-08-25 12:46:05.601854318] (Utility.Process) process [863109] done ExitSuccess +[2025-08-25 12:46:05.602156266] (Utility.Process) process [863110] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"] +[2025-08-25 12:46:05.604225437] (Utility.Process) process [863110] done ExitSuccess +[2025-08-25 12:46:05.605009464] (Utility.Process) process [863111] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"] +[2025-08-25 12:46:05.618808813] (Utility.Process) process [863111] done ExitSuccess +[2025-08-25 12:46:05.620762155] (Utility.Process) process [863120] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json"] +[2025-08-25 12:46:05.621103669] (Utility.Process) process [863121] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2025-08-25 12:46:05.621369628] (Utility.Process) process [863122] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2025-08-25 12:46:05.621925833] (Utility.Process) process [863123] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2025-08-25 12:46:05.631302214] (Utility.Process) process [863123] done ExitSuccess +[2025-08-25 12:46:05.631359154] (Utility.Process) process [863122] done ExitSuccess +[2025-08-25 12:46:05.631398574] (Utility.Process) process [863121] done ExitSuccess +[2025-08-25 12:46:05.6314255] (Utility.Process) process [863120] done ExitSuccess +copy zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json +[2025-08-25 12:46:05.632555017] (Utility.Process) process [863127] read: ssh ["-O","stop","-S","falkor","-o","ControlMaster=auto","-o","ControlPersist=yes","localhost"] in ".git/annex/ssh/" +[2025-08-25 12:46:05.640975166] (Utility.Process) process [863127] done ExitSuccess +[2025-08-25 12:46:05.641706877] (Utility.Process) process [863128] read: ssh ["-o","BatchMode=true","-S",".git/annex/ssh/falkor","-o","ControlMaster=auto","-o","ControlPersist=yes","-n","-T","falkor","true"] +[2025-08-25 12:46:05.9354612] (Utility.Process) process [863128] done ExitSuccess +[2025-08-25 12:46:05.936449504] (Utility.Process) process [863132] chat: ssh ["falkor","-S",".git/annex/ssh/falkor","-o","ControlMaster=auto","-o","ControlPersist=yes","-T","git-annex-shell 'p2pstdio' '/srv/datasets.datalad.org/www/dandi/zarr-manifests' '--debug' 'a5788908-4791-4455-834f-5eca0617a767' --uuid 1bdcbd2c-f8d0-496f-bbc1-22a8fea21c34"] +[2025-08-25 12:46:05.92909402] (P2P.IO) [ThreadId 4] P2P > AUTH-SUCCESS 1bdcbd2c-f8d0-496f-bbc1-22a8fea21c34 +[2025-08-25 12:46:05.960402757] (P2P.IO) [git-annex-shell connection Just 863132] [ThreadId 19] P2P < AUTH-SUCCESS 1bdcbd2c-f8d0-496f-bbc1-22a8fea21c34 +[2025-08-25 12:46:05.960640816] (P2P.IO) [git-annex-shell connection Just 863132] [ThreadId 19] P2P > VERSION 4 +[2025-08-25 12:46:05.930712551] (P2P.IO) [ThreadId 4] P2P < VERSION 4 +[2025-08-25 12:46:05.93086669] (P2P.IO) [ThreadId 4] P2P > VERSION 4 +[2025-08-25 12:46:05.961883105] (P2P.IO) [git-annex-shell connection Just 863132] [ThreadId 19] P2P < VERSION 4 +[2025-08-25 12:46:05.962098313] (P2P.IO) [git-annex-shell connection Just 863132] [ThreadId 19] P2P > CHECKPRESENT SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +[2025-08-25 12:46:05.932143998] (P2P.IO) [ThreadId 4] P2P < CHECKPRESENT SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +[2025-08-25 12:46:05.932621567] (P2P.IO) [ThreadId 4] P2P > FAILURE +[2025-08-25 12:46:05.963717617] (P2P.IO) [git-annex-shell connection Just 863132] [ThreadId 19] P2P < FAILURE (to falkor...) +[2025-08-25 12:46:05.965729174] (P2P.IO) [git-annex-shell connection Just 863132] [ThreadId 19] P2P > PUT zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +[2025-08-25 12:46:05.935798091] (P2P.IO) [ThreadId 4] P2P < PUT zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +[2025-08-25 12:46:05.936090726] (P2P.IO) [ThreadId 4] P2P > PUT-FROM 0 +[2025-08-25 12:46:05.96690746] (P2P.IO) [git-annex-shell connection Just 863132] [ThreadId 19] P2P < PUT-FROM 0 +[2025-08-25 12:46:05.968560686] (Utility.Process) process [863135] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","write-tree"] +[2025-08-25 12:46:05.98374097] (Utility.Process) process [863135] done ExitSuccess +[2025-08-25 12:46:05.984179335] (Utility.Process) process [863136] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/annex/last-index"] +[2025-08-25 12:46:05.988648755] (Utility.Process) process [863136] done ExitSuccess +[2025-08-25 12:46:05.989018015] (P2P.IO) [git-annex-shell connection Just 863132] [ThreadId 19] P2P > DATA 8174019 +0% 31.98 KiB 1 MiB/s 5s +[2025-08-25 12:46:05.958862911] (P2P.IO) [ThreadId 4] P2P < DATA 8174019 +[2025-08-25 12:46:05.960530955] (Annex.Perms) thawing content /srv/datasets.datalad.org/www/dandi/zarr-manifests/.git/annex/tmp/SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json + +``` + +and eventually clearing screen and showing + +``` +copy zarr-manifests-v2-sorted/d09/7af/d097af6b-8fd8-4d83-b649-fc6518e95d25/f56bc8e1854433d03cc766d2955823f6-65602--139477698137.json (to falkor...) +0% 31.98 KiB 1 MiB/s 5s +``` + +with the process on the server + +``` +$> ps auxw | grep git-annex +yoh 549790 0.3 0.0 1074142528 49344 ? Ssl 12:46 0:00 git-annex-shell p2pstdio /srv/datasets.datalad.org/www/dandi/zarr-manifests --debug a5788908-4791-4455-834f-5eca0617a767 --uuid 1bdcbd2c-f8d0-496f-bbc1-22a8fea21c34 + + +$> ls -l /proc/549790/fd/ | grep annex/ +lrwx------ 1 yoh yoh 64 Aug 25 12:47 11 -> /srv/datasets.datalad.org/www/dandi/zarr-manifests/.git/annex/transfer/download/lck/lck.SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +lrwx------ 1 yoh yoh 64 Aug 25 12:47 12 -> /srv/datasets.datalad.org/www/dandi/zarr-manifests/.git/annex/transfer/download/a5788908-4791-4455-834f-5eca0617a767/lck.SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +lrwx------ 1 yoh yoh 64 Aug 25 12:47 13 -> /srv/datasets.datalad.org/www/dandi/zarr-manifests/.git/annex/tmp/SHA256E-s8174019--5e92a791cf536c40c7b783d1cabd4ce484f909400d917439bf3a8b67e8c2f97f.json +1 9886 [1].....................................:Mon 25 Aug 2025 12:47:39 PM EDT:. +falkor:~ +$> ls -Ll /proc/549790/fd/{11,12,13} +-rw-rw-r-- 1 yoh datalad 0 Aug 25 12:36 /proc/549790/fd/11 +-rw-rw-r-- 1 yoh datalad 0 Aug 25 12:36 /proc/549790/fd/12 +-rw-rw-r-- 1 yoh datalad 0 Aug 22 11:31 /proc/549790/fd/13 + +``` + +### What version of git-annex are you using? On what operating system? + +``` +$ git annex version +git-annex version: 10.20250416 +``` + +[[!meta author=yoh]] +[[!tag projects/dandi]] +
Added a comment: Feedback on encryptonlycreds=yes
diff --git a/doc/todo/encrypt_only_the_credentials/comment_7_8a0265614583aa8a9f040b5fed0ff480._comment b/doc/todo/encrypt_only_the_credentials/comment_7_8a0265614583aa8a9f040b5fed0ff480._comment new file mode 100644 index 0000000000..9bf6c78ee5 --- /dev/null +++ b/doc/todo/encrypt_only_the_credentials/comment_7_8a0265614583aa8a9f040b5fed0ff480._comment @@ -0,0 +1,35 @@ +[[!comment format=mdwn + username="stv0g" + avatar="http://cdn.libravatar.org/avatar/6faa6cc783a165b25fc1c8f3154ba449" + subject="Feedback on encryptonlycreds=yes" + date="2025-08-24T11:20:23Z" + content=""" +Hi Joey, + +thank you very mich for the quick implementation of this feature! I have tested it already successfully :) + +There seems to be an minor issue when I am trying to set `encryptonlycreds=yes` via `SETCONFIG`, followed by a subsequent `SETCREDS`: + +``` +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> VERSION 2 +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] <-- EXTENSIONS INFO GETGITREMOTENAME UNAVAILABLERESPONSE ASYNC +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> EXTENSIONS INFO +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] <-- LISTCONFIGS +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> CONFIGEND +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] <-- INITREMOTE +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> GETCONFIG encryption +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] <-- VALUE hybrid +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> GETCONFIG onlyencryptcreds +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] <-- VALUE +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> SETCONFIG onlyencryptcreds yes +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> SETCREDS key aes256 70482ccb1e0a0b8f8bf6f2603174d2d6f40d940033285d3ae94e7258595dca26 +git-annex: getRemoteConfigValue onlyencryptcreds found value of unexpected type PassedThrough. This is a bug in git-annex! +CallStack (from HasCallStack): + error, called at ./Annex/SpecialRemote/Config.hs:209:28 in main:Annex.SpecialRemote.Config + getRemoteConfigValue, called at ./Remote/Helper/Encryptable.hs:297:27 in main:Remote.Helper.Encryptable +failed +initremote: 1 failed +``` + +I am not sure if this is even supposed to be supported? Let me know if I am using it in the wrong way :) +"""]]
Added a comment: Download with the git-annex-install script fail
diff --git a/doc/Android/comment_8_7ba624f545e991bf11d5fcf8378d3ada._comment b/doc/Android/comment_8_7ba624f545e991bf11d5fcf8378d3ada._comment new file mode 100644 index 0000000000..17cd028ebd --- /dev/null +++ b/doc/Android/comment_8_7ba624f545e991bf11d5fcf8378d3ada._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="waldi1985@90ec1a42d21b8182257f7a16ba8b8b08c51a0b6e" + nickname="waldi1985" + avatar="http://cdn.libravatar.org/avatar/128c0f882560337aad72a15ab7ee3766" + subject="Download with the git-annex-install script fail" + date="2025-08-24T11:10:50Z" + content=""" + Downloading git-annex... + --2025-08-24 12:49:08-- https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-arm64-ancient.tar.gz + Resolving downloads.kitenet.net (downloads.kitenet.net)... 66.228.36.95, 2600:3c03::f03c:91ff:fe73:b0d2 + Connecting to downloads.kitenet.net (downloads.kitenet.net)|66.228.36.95|:443... connected. + HTTP request sent, awaiting response... 403 Forbidden + 2025-08-24 12:49:09 ERROR 403: Forbidden. + +The Download with the git-annex-install script fail, I guess because the build is failed. see: https://downloads.kitenet.net/git-annex/autobuild/arm64-ancient/ + + enableremote encryption changes: FAIL (0.73s) + ./Test/Framework.hs:92: + initremote failed with unexpected exit code (transcript follows) + initremote baz + git-annex: There is no security benefit to using onlyencryptcreds=yes with encryption=shared + failed + initremote: 1 failed + +It would be grate if you could fix that. Thanks in advance. +"""]]
Added a comment
diff --git a/doc/bugs/Re-Adjust_Loses_Commits/comment_4_6366ab14803dd8303752807741baec2c._comment b/doc/bugs/Re-Adjust_Loses_Commits/comment_4_6366ab14803dd8303752807741baec2c._comment new file mode 100644 index 0000000000..623afe7056 --- /dev/null +++ b/doc/bugs/Re-Adjust_Loses_Commits/comment_4_6366ab14803dd8303752807741baec2c._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="Lukey" + avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b" + subject="comment 4" + date="2025-08-23T06:51:15Z" + content=""" +You can use `git reflog` to see previous states of a branch. +"""]]
fix specialRemote confusion with tahoe
tahoe: Fix bug that made initremote require an encryption= parameter,
despite git-annex encryption not being used with this special remote,
since tahoe handles encryption itself.
The chunking parameters were also accepted and won't be any longer either.
They were also not actually used.
c4ea3ca40ae6ba973287ca94e892e93973a8376e was the commit. At that point
specialRemote was being added to most remotes and I forgot tahoe doesn't
need these parameters.
Turns out that, when embedcreds=yes was used, it did *not* cause the
introducer-furl and shared-convergence-secret to be encrypted, even
though encryption= was specified. Which is only not a security hole
because encryption= was not documented to work with the tahoe special
remote at all!
It might be nice to support onlyencryptcreds=yes with tahoe, and it
would make sense to accept the encryption= parameter then, and only use
it for encrypting the creds. That would take some work, since the
encryption= parameter would need to be optional, and the usual encrypted
special remote code couldn't be used.
Sponsored-by: unqueued
tahoe: Fix bug that made initremote require an encryption= parameter,
despite git-annex encryption not being used with this special remote,
since tahoe handles encryption itself.
The chunking parameters were also accepted and won't be any longer either.
They were also not actually used.
c4ea3ca40ae6ba973287ca94e892e93973a8376e was the commit. At that point
specialRemote was being added to most remotes and I forgot tahoe doesn't
need these parameters.
Turns out that, when embedcreds=yes was used, it did *not* cause the
introducer-furl and shared-convergence-secret to be encrypted, even
though encryption= was specified. Which is only not a security hole
because encryption= was not documented to work with the tahoe special
remote at all!
It might be nice to support onlyencryptcreds=yes with tahoe, and it
would make sense to accept the encryption= parameter then, and only use
it for encrypting the creds. That would take some work, since the
encryption= parameter would need to be optional, and the usual encrypted
special remote code couldn't be used.
Sponsored-by: unqueued
diff --git a/CHANGELOG b/CHANGELOG index 9a478550fe..1e43b9638f 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -27,6 +27,9 @@ git-annex (10.20250722) UNRELEASED; urgency=medium * Don't allow the type of encryption of an existing special remote to be changed. Fixes reversion introduced in version 7.20191230. * tahoe: Support tahoe-lafs command versions newer than 1.16. + * tahoe: Fix bug that made initremote require an encryption= parameter, + despite git-annex encryption not being used with this special remote. + Fixes reversion introduced in version 7.20191230. * Removed support for git versions older than 2.22. * Bump aws build dependency to 0.24.1. * stack.yaml: Update to lts-24.2. diff --git a/Remote/Tahoe.hs b/Remote/Tahoe.hs index e51ccf9531..2f0cebf188 100644 --- a/Remote/Tahoe.hs +++ b/Remote/Tahoe.hs @@ -58,7 +58,7 @@ type IntroducerFurl = String type Capability = String remote :: RemoteType -remote = specialRemoteType $ RemoteType +remote = RemoteType { typename = "tahoe" , enumerate = const (findSpecialRemotes "tahoe") , generate = gen diff --git a/doc/special_remotes/tahoe.mdwn b/doc/special_remotes/tahoe.mdwn index 80f176d73c..281151182d 100644 --- a/doc/special_remotes/tahoe.mdwn +++ b/doc/special_remotes/tahoe.mdwn @@ -35,10 +35,6 @@ the tahoe remote. whether you want to give them access to your tahoe system before using embedcreds! -* `onlyencryptcreds` - Optional. Set to "yes" to make the `encryption` - only be used for the embedded tahoe credentials, but not used to encrypt - the content stored on the special remote. - Setup example: # TAHOE_FURL=... git annex initremote tahoe type=tahoe embedcreds=yes