Recent changes to this wiki:
comment
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_5_aea052fe21134d421c184272372e0cd8._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_5_aea052fe21134d421c184272372e0cd8._comment new file mode 100644 index 0000000000..12ef212684 --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_5_aea052fe21134d421c184272372e0cd8._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-23T20:52:33Z" + content=""" +Started implementation in the `repair` branch. +"""]]
update
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment index 9cd701d3fb..235402bf24 100644 --- a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment @@ -6,9 +6,14 @@ If [[todo/drop_from_export_remote]] were implemented that would take care of #1. -Since `git-annex fsck` already tells the user what to do when it finds a -corrupted file on an export remote, and that works for ones not using -versioning, I think #1 can be left to that todo to solve, -and #2 be dealt with here. That will be enough to recover the problem -dataset. +The user can export a tree that removes the file themselves. fsck even +suggests doing that when it finds a corrupted file on an exporttree remote, +since it's unable to drop it in that case. + +But notice that the fsck run above does not suggest doing that. Granted, +with a S3 bucket with versioning, exporting a tree won't remove the +corrupted version of the file from the remote anyway. + +It seems that dealing with #2 here is enough to recover the problem +dataset, and #1 can be left to that other todo. """]]
comments
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment index b740f91970..ce7ceff9a9 100644 --- a/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment @@ -3,7 +3,24 @@ subject="""comment 2""" date="2025-12-17T18:30:06Z" content=""" -In a non-export S3 bucket with versioning, fsck also cannot recover from a -corrupted object, due to the same problem with the versionId. The same -method should work to handle this case. +The OpenNeuro dataset ds005256 is a S3 bucket with versioning=yes, and a +publicurl set, and exporttree=yes. With that combination, when S3 +credentials are not set, the versionId is used, in the public url for downloading. + + git clone https://github.com/OpenNeuroDatasets/ds005256.git + git-annex get stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 + +Note that this first does a download that fails incomplete with +"Verification of content failed". Then it complains "Unable to access these +remotes: s3-PUBLIC". It's trying two different download methods; the second +one can only work with S3 credentials set. + + git-annex fsck stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 + fsck stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 (fixing location log) + ** Based on the location log, stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 + ** was expected to be present, but its content is missing. + failed + +Note that this doesn't download, but fails at the checkPresent stage. At that +point, the HTTP HEAD reports the size of the object, and it's too short. """]] diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment index 1a38db539e..9cd701d3fb 100644 --- a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment @@ -8,5 +8,7 @@ of #1. Since `git-annex fsck` already tells the user what to do when it finds a corrupted file on an export remote, and that works for ones not using -versioning, I think #1 can be postponed and #2 be dealt with first. +versioning, I think #1 can be left to that todo to solve, +and #2 be dealt with here. That will be enough to recover the problem +dataset. """]] diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_4_9f2be5ff2d4225c880eb39831455b2a5._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_4_9f2be5ff2d4225c880eb39831455b2a5._comment new file mode 100644 index 0000000000..1218d28c06 --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_4_9f2be5ff2d4225c880eb39831455b2a5._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-23T17:21:51Z" + content=""" +After a *lot* of thought and struggling with layering issues between fsck and +the S3 remote, here is a design to solve #2: + +Add a new method `repairCorruptedKey :: Key -> Annex Bool` + +fsck calls this when it finds a remote does not have a key it expected it +to have, or when it downloads corrupted content. + +If `repairCorruptedKey` returns True, it was able to repair a problem, and +the Key should be able to be downloaded from the remote still. If it +returns False, it was not able to repair the problem. + +Most special remotes will make this `pure False`. For S3 with versioning=yes, +it will download the object from the bucket, using each recorded versionId. +Any versionId that does not work will be removed. And return True if any +download did succeed. + +In a case where the object size is right, but it's corrupt, +fsck will download the object, and then repairCorruptedKey will download it +a second time. If there were 2 files with the same content, it would end up +being downloaded 3 times! So this can be pretty expensive, +but it's simple and will work. +"""]]
comment
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment new file mode 100644 index 0000000000..1a38db539e --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-23T16:42:49Z" + content=""" +If [[todo/drop_from_export_remote]] were implemented that would take care +of #1. + +Since `git-annex fsck` already tells the user what to do when it finds a +corrupted file on an export remote, and that works for ones not using +versioning, I think #1 can be postponed and #2 be dealt with first. +"""]]
comment
diff --git a/doc/todo/drop_from_export_remote/comment_1_dac4f33da46b3695383df63e88fc4e67._comment b/doc/todo/drop_from_export_remote/comment_1_dac4f33da46b3695383df63e88fc4e67._comment new file mode 100644 index 0000000000..860bb39c4f --- /dev/null +++ b/doc/todo/drop_from_export_remote/comment_1_dac4f33da46b3695383df63e88fc4e67._comment @@ -0,0 +1,21 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-23T16:46:05Z" + content=""" +Rather than altering the exported git tree, it could removeExport and then +update the export log to say that the export is incomplete. + +That would result in a re-export putting the file back on the remote. + +It's not uncommon to eg want to `git-annex move foo --from remote`, +due to it being low on space, or to temporarily make it unavailable, +and later send the file back to the remote. Supporting drop from export +remotes in this way would allow for such a workflow, although with the +difference that `git-annex export` would be needed to put the file back. + +It might also be possible to make sending a particular file to an export +remote succeed when the export to the remote is incomplete and the file is +in the exported tree. Then `git-annex move foo --to remote` would work to +put the file back. +"""]]
Added a comment
diff --git a/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_4_16b6ef07800c1bff2e01f258b031d9b9._comment b/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_4_16b6ef07800c1bff2e01f258b031d9b9._comment new file mode 100644 index 0000000000..befcbe4105 --- /dev/null +++ b/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_4_16b6ef07800c1bff2e01f258b031d9b9._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 4" + date="2026-01-23T14:39:24Z" + content=""" +> A balance might be that if it fails to connect to the remote.name.annexUrl, it could re-check it then. + +Would this include re-checking when remote.name.annexUrl is unset? That would be necessary in the situations where either the client didn't understand p2phttp when the repository was closed or when the server-side didn't provide p2phttp yet. + +Given that the clone happened in the knowledge that \"dumb http\" was the only supported http protocol and read only, I am now questioning if such a automatic upgrade to p2phttp would really be needed, or even desirable. Dumb http continues to work anyway. + +Only re-checking if remote.name.annexUrl is set already would solve the issue of relocating the p2phttp endpoint. +"""]]
diff --git a/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn b/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn index 7ebff66146..8276b76ba6 100644 --- a/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn +++ b/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn @@ -140,3 +140,5 @@ I know that this is sort of abusing the URL handling in git-annex, but it was su ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) Yes! It is absolutely great, thank you for it. + +[[!tag projects/ICE4]]
Added a comment: Poor Bunny
diff --git a/doc/forum/OSX__39__s_default_sshd_behaviour_has_limited_paths_set/comment_4_fc4bc5c0f4e3f75b862adc517739c334._comment b/doc/forum/OSX__39__s_default_sshd_behaviour_has_limited_paths_set/comment_4_fc4bc5c0f4e3f75b862adc517739c334._comment new file mode 100644 index 0000000000..beeca24e25 --- /dev/null +++ b/doc/forum/OSX__39__s_default_sshd_behaviour_has_limited_paths_set/comment_4_fc4bc5c0f4e3f75b862adc517739c334._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="cxararea" + avatar="http://cdn.libravatar.org/avatar/2718f71ca02c851974140f2a0c457b1b" + subject="Poor Bunny" + date="2026-01-21T07:29:04Z" + content=""" +Another standout feature is replayability. Each run feels different due to <a href=\"https://poorbunnygame.com\">Poor Bunny</a> random trap patterns, and the desire to beat your previous high score creates a strong “one more try” loop. +"""]]
Added a comment: Melon playground - Gaming is good
diff --git a/doc/forum/How_to_register_arguments_for_an_external_special_remote__63__/comment_7_a5a401145f88a0bee78fcf2d1c7befbc._comment b/doc/forum/How_to_register_arguments_for_an_external_special_remote__63__/comment_7_a5a401145f88a0bee78fcf2d1c7befbc._comment new file mode 100644 index 0000000000..ade7a9ada4 --- /dev/null +++ b/doc/forum/How_to_register_arguments_for_an_external_special_remote__63__/comment_7_a5a401145f88a0bee78fcf2d1c7befbc._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="cxararea" + avatar="http://cdn.libravatar.org/avatar/2718f71ca02c851974140f2a0c457b1b" + subject="Melon playground - Gaming is good" + date="2026-01-21T07:26:00Z" + content=""" +One of the most impressive aspects of <a href=\"https://melon-playground.io/online/\">Melon Playground</a> is its physics system. Every action feels meaningful because small changes can lead to very different outcomes. Whether you’re connecting objects, applying pressure, or testing explosions, the results often feel unpredictable and entertaining. This makes experimentation highly addictive, as players are constantly curious to see “what happens if” they try something new. + +The ragdoll physics of the characters add another layer of fun. Watching how they react to impacts, tools, and environmental hazards can be both humorous and fascinating, especially when combined with creative setups. +"""]]
comment
diff --git a/doc/todo/misleading_message_when_ssh_remote_does_not_exist.mdwn b/doc/todo/misleading_message_when_ssh_remote_does_not_exist.mdwn new file mode 100644 index 0000000000..4bd6ba54c3 --- /dev/null +++ b/doc/todo/misleading_message_when_ssh_remote_does_not_exist.mdwn @@ -0,0 +1,13 @@ + joey@darkstar:~/tmp/ben/mom4>git remote add foo localhost:/tmp/foo + joey@darkstar:~/tmp/ben/mom4>git-annex init + init + Unable to parse git config from foo + + Remote foo does not have git-annex installed; setting annex-ignore + + This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote foo + ok + +This message is wrong, git-annex-shell is installed. But since /tmp/foo does not exist, it errors out. + +Maybe `git-annex-shell configlist` should output nothing instead of erroring out in this situation? --[[Joey]]
comment
diff --git a/doc/todo/support_push_to_create/comment_6_4bb1c61505124c34618859c71821a963._comment b/doc/todo/support_push_to_create/comment_6_4bb1c61505124c34618859c71821a963._comment new file mode 100644 index 0000000000..daee1c05f7 --- /dev/null +++ b/doc/todo/support_push_to_create/comment_6_4bb1c61505124c34618859c71821a963._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2026-01-20T19:58:18Z" + content=""" +See <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/103> +"""]]
close
diff --git a/doc/todo/support_push_to_create.mdwn b/doc/todo/support_push_to_create.mdwn index 50d9cd0a65..a2735a934e 100644 --- a/doc/todo/support_push_to_create.mdwn +++ b/doc/todo/support_push_to_create.mdwn @@ -31,3 +31,5 @@ since it would ignore annex-ignore being set, and re-probe the git config to see if a UUID has appeared. That seems a small enough price to pay. The assistant would also need to be made to handle this. --[[Joey]] + +> [[done]] --[[Joey]] diff --git a/doc/todo/support_push_to_create/comment_5_6aeb1f35417bf2035ee3072061df97bf._comment b/doc/todo/support_push_to_create/comment_5_6aeb1f35417bf2035ee3072061df97bf._comment new file mode 100644 index 0000000000..330603153e --- /dev/null +++ b/doc/todo/support_push_to_create/comment_5_6aeb1f35417bf2035ee3072061df97bf._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-20T19:46:27Z" + content=""" +Implemented both. +"""]]
sync, push: push-to-create support
When used with git forges that allow Push to Create, the remote's
annex-uuid is re-probed after the initial push.
This works, but requires the user run git-annex sync or push. If they
opt to manually git push to create the repo, and then use other
git-annex commands, annex-ignore will remain set.
The implementation here is not ideal, the annex-ignore git config gets
unset and may then get re-set if the remote host does not support
git-annex-shell. And the use of remoteList' to regenerate the remote
does extra work. But implementing it this way avoids needing any changes
to Remote.Git, and avoids tying it to that type of remote too.
When used with git forges that allow Push to Create, the remote's
annex-uuid is re-probed after the initial push.
This works, but requires the user run git-annex sync or push. If they
opt to manually git push to create the repo, and then use other
git-annex commands, annex-ignore will remain set.
The implementation here is not ideal, the annex-ignore git config gets
unset and may then get re-set if the remote host does not support
git-annex-shell. And the use of remoteList' to regenerate the remote
does extra work. But implementing it this way avoids needing any changes
to Remote.Git, and avoids tying it to that type of remote too.
diff --git a/CHANGELOG b/CHANGELOG
index 112c34db33..6b44912669 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -3,6 +3,8 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* Fix behavior of local git remotes that have annex-ignore
set to be the same as ssh git remotes.
* p2phttp: Commit git-annex branch changes promptly.
+ * When used with git forges that allow Push to Create, the remote's
+ annex-uuid is re-probed after the initial push.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Command/Sync.hs b/Command/Sync.hs
index e859746f21..e1e9c146f0 100644
--- a/Command/Sync.hs
+++ b/Command/Sync.hs
@@ -1,7 +1,7 @@
{- git-annex command
-
- Copyright 2011 Joachim Breitner <mail@joachim-breitner.de>
- - Copyright 2011-2024 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -264,7 +264,15 @@ seek' :: SyncOptions -> CommandSeek
seek' o = startConcurrency transferStages $ do
let withbranch a = a =<< getCurrentBranch
- remotes <- syncRemotes (syncWith o)
+ mc <- mergeConfig (allowUnrelatedHistories o)
+
+ unless (cleanupOption o) $
+ includeactions
+ [ [ commit o ]
+ , [ withbranch (mergeLocal mc o) ]
+ ]
+
+ remotes <- mapM (pushToCreate o) =<< syncRemotes (syncWith o)
warnSyncContentTransition o remotes
-- Remotes that git can push to and pull from.
let gitremotes = filter Remote.gitSyncableRemote remotes
@@ -277,16 +285,8 @@ seek' o = startConcurrency transferStages $ do
commandAction (withbranch cleanupLocal)
mapM_ (commandAction . withbranch . cleanupRemote) gitremotes
else do
- mc <- mergeConfig (allowUnrelatedHistories o)
-
- -- Syncing involves many actions, any of which
- -- can independently fail, without preventing
- -- the others from running.
- -- These actions cannot be run concurrently.
- mapM_ includeCommandAction $ concat
- [ [ commit o ]
- , [ withbranch (mergeLocal mc o) ]
- , map (withbranch . pullRemote o mc) gitremotes
+ includeactions
+ [ map (withbranch . pullRemote o mc) gitremotes
, [ mergeAnnex ]
]
@@ -325,8 +325,8 @@ seek' o = startConcurrency transferStages $ do
-- git-annex branch on the remotes in the
-- meantime, so pull and merge again to
-- avoid our push overwriting those changes.
- when (syncedcontent || exportedcontent) $ do
- mapM_ includeCommandAction $ concat
+ when (syncedcontent || exportedcontent) $
+ includeactions
[ map (withbranch . pullRemote o mc) gitremotes
, [ commitAnnex, mergeAnnex ]
]
@@ -334,6 +334,12 @@ seek' o = startConcurrency transferStages $ do
void $ includeCommandAction $ withbranch $ pushLocal o
-- Pushes to remotes can run concurrently.
mapM_ (commandAction . withbranch . pushRemote o) gitremotes
+ where
+ -- Syncing involves many actions, any of which
+ -- can independently fail, without preventing
+ -- the others from running.
+ -- These actions cannot be run concurrently.
+ includeactions = mapM_ includeCommandAction . concat
{- Merging may delete the current directory, so go to the top
- of the repo. This also means that sync always acts on all files in the
@@ -1188,3 +1194,43 @@ exportHasAnnexObjects = annexObjects . Remote.config
isThirdPartyPopulated :: Remote -> Bool
isThirdPartyPopulated = Remote.thirdPartyPopulated . Remote.remotetype
+
+{- Support for push-to-create of git repositories.
+ -
+ - When the remote does not exist yet, annex-ignore and
+ - annex-ignore-auto will be set. In that case, try to push.
+ -
+ - After a successful push, clear annex-ignore and regenerate the remote.
+ - That may re-set annex-ignore. Then annex-ignore-auto is cleared, so
+ - this will not run again, even when annex-ignore remains set.
+ -}
+pushToCreate :: SyncOptions -> Remote -> Annex Remote
+pushToCreate o r
+ | not (pushOption o) = return r
+ | Remote.gitSyncableRemote r && remoteAnnexIgnoreAuto (Remote.gitconfig r) =
+ ifM (liftIO $ getDynamicConfig $ remoteAnnexIgnore $ Remote.gitconfig r)
+ ( getCurrentBranch >>= \case
+ currbranch@(Just _, _) -> do
+ pushed <- includeCommandAction $
+ pushRemote o r currbranch
+ if pushed
+ then do
+ repo <- Remote.getRepo r
+ unsetRemoteIgnore repo
+ reloadConfig
+ r' <- regenremote
+ unsetRemoteIgnoreAuto repo
+ return r'
+ else return r
+ _ -> return r
+ , return r
+ )
+ | otherwise = return r
+ where
+ regenremote = do
+ -- Regenerating the remote list involves some extra work,
+ -- but push-to-create only happens once per remote.
+ rs <- Remote.remoteList' False
+ case filter (\r' -> Remote.name r' == Remote.name r) rs of
+ (r':_) -> return r'
+ _ -> return r
diff --git a/Config.hs b/Config.hs
index 892c49d4a5..11e2744648 100644
--- a/Config.hs
+++ b/Config.hs
@@ -1,6 +1,6 @@
{- Git configuration
-
- - Copyright 2011-2023 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -78,6 +78,15 @@ setRemoteAvailability r c = setConfig (remoteAnnexConfig r "availability") (show
setRemoteIgnore :: Git.Repo -> Bool -> Annex ()
setRemoteIgnore r b = setConfig (remoteAnnexConfig r "ignore") (Git.Config.boolConfig b)
+unsetRemoteIgnore :: Git.Repo -> Annex ()
+unsetRemoteIgnore r = unsetConfig (remoteAnnexConfig r "ignore")
+
+setRemoteIgnoreAuto :: Git.Repo -> Bool -> Annex ()
+setRemoteIgnoreAuto r b = setConfig (remoteAnnexConfig r "ignore-auto") (Git.Config.boolConfig b)
+
+unsetRemoteIgnoreAuto :: Git.Repo -> Annex ()
+unsetRemoteIgnoreAuto r = unsetConfig (remoteAnnexConfig r "ignore-auto")
+
setRemoteBare :: Git.Repo -> Bool -> Annex ()
setRemoteBare r b = setConfig (remoteAnnexConfig r "bare") (Git.Config.boolConfig b)
diff --git a/Remote/Git.hs b/Remote/Git.hs
index 36ebf53c65..f2c5206648 100644
--- a/Remote/Git.hs
+++ b/Remote/Git.hs
@@ -368,6 +368,7 @@ tryGitConfigRead gc autoinit r hasuuid
when longmessage $
warning $ UnquotedString $ "This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote " ++ n
setremote setRemoteIgnore True
+ setremote setRemoteIgnoreAuto True
setremote setter v = case Git.remoteName r of
Nothing -> noop
diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs
index a33d8a9dca..c31dec617f 100644
--- a/Types/GitConfig.hs
+++ b/Types/GitConfig.hs
@@ -396,6 +396,7 @@ globalConfigs =
data RemoteGitConfig = RemoteGitConfig
{ remoteAnnexCost :: DynamicConfig (Maybe Cost)
, remoteAnnexIgnore :: DynamicConfig Bool
+ , remoteAnnexIgnoreAuto :: Bool
, remoteAnnexSync :: DynamicConfig Bool
, remoteAnnexPull :: Bool
, remoteAnnexPush :: Bool
@@ -477,6 +478,7 @@ extractRemoteGitConfig r remotename = do
return $ RemoteGitConfig
{ remoteAnnexCost = annexcost
, remoteAnnexIgnore = annexignore
+ , remoteAnnexIgnoreAuto = getbool IgnoreAutoField False
, remoteAnnexSync = annexsync
, remoteAnnexPull = getbool PullField True
, remoteAnnexPush = getbool PushField True
@@ -586,6 +588,7 @@ data RemoteGitConfigField
= CostField
| CostCommandField
| IgnoreField
+ | IgnoreAutoField
| IgnoreCommandField
| SyncField
| SyncCommandField
@@ -659,6 +662,7 @@ remoteGitConfigField = \case
CostField -> inherited True "cost"
(Diff truncated)
comment
diff --git a/doc/todo/support_push_to_create/comment_4_f0b998bee7dafad3485348435c8362af._comment b/doc/todo/support_push_to_create/comment_4_f0b998bee7dafad3485348435c8362af._comment new file mode 100644 index 0000000000..27cd636616 --- /dev/null +++ b/doc/todo/support_push_to_create/comment_4_f0b998bee7dafad3485348435c8362af._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-20T18:48:32Z" + content=""" +The user might manually `git push`, knowing push-to-create is a thing, +but do it after `git-annex init`, and so annex-ignore is already set +and will stay set until they `git-annex push`. Which they may never do. + +To deal with this, when annex-ignore-auto is set, Remote.Git could check if +the remote tracking branch exists. If so, unset annex-ignore-auto and +annex-ignore and re-run the uuid probing. +"""]]
comment
diff --git a/doc/todo/support_push_to_create/comment_3_b07c7c7453e913b70e1608c38b08f885._comment b/doc/todo/support_push_to_create/comment_3_b07c7c7453e913b70e1608c38b08f885._comment new file mode 100644 index 0000000000..c503271998 --- /dev/null +++ b/doc/todo/support_push_to_create/comment_3_b07c7c7453e913b70e1608c38b08f885._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-20T15:49:26Z" + content=""" +Here's a better plan: annex-ignore remains the config, but +annex-ignore-auto is set when git-annex sets annex-ignore. +If the user manually sets annex-ignore, they don't set +annex-ignore-auto. + +Then, `git-annex push` can check if push-to-create happend +and unset annex-ignore iff annex-ignore-auto is set. +"""]]
comment
diff --git a/doc/todo/support_push_to_create/comment_2_09a3a43c73e6d581727d804aa27b8e42._comment b/doc/todo/support_push_to_create/comment_2_09a3a43c73e6d581727d804aa27b8e42._comment new file mode 100644 index 0000000000..8f428c2993 --- /dev/null +++ b/doc/todo/support_push_to_create/comment_2_09a3a43c73e6d581727d804aa27b8e42._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-01-20T15:16:46Z" + content=""" +Problem with a new config (call it annex-ignore-auto) is that users may +have learned to unset annex-ignore when there was a problem that got +corrected, and would need to learn to unset annex-ignore-auto instead. +While `git-annex push` would do it for them, they might not use that. + +Is this disruptive change worth it to support push-to-create? Probably. +But it does make the option of checking before push and after push and +unsetting annex-ignore seem more appealing. + +The situation where 2 users are doing push to create of the same remote +repo at the same time is very unlikely to happen. And currently what +happens is that both have to unset annex-ignore. A change that makes only +one of them but not the other need to unset it is not making things worse. +"""]]
comment
diff --git a/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_4_658fdda65fd7560de29803d46a3af22e._comment b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_4_658fdda65fd7560de29803d46a3af22e._comment new file mode 100644 index 0000000000..5af3437cc4 --- /dev/null +++ b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_4_658fdda65fd7560de29803d46a3af22e._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-19T18:57:00Z" + content=""" +p2phttp is fixed in master to commit the git-annex branch promptly. +"""]]
p2phttp: Commit git-annex branch changes promptly
Changes were piling up in the journal until p2phttp exited or another
git-annex command committed them. That could lead to situations where one
client made a change to the server, but didn't push the git-annex branch to
it, and so another client would be unaware of the change.
Rather than make a commit after every change, wait until the server has
been idle for 1 second, and then commit. This way, when a client is making
several changes, eg sending multiple files, it will wait until the end to
commit.
1 second was chosen as a time that is:
A) Short enough that no user is likely to notice that the server
waits this long before committing.
Long enough that a git-annex command that makes multiple changes to
the server is unlikely to wait this long after one change finishes
before sending the next change.
An example situation where B does not hold is `git-annex copy --to origin`
in a large repository, where the first and last file are not in the server,
and the rest are. So it takes more than 1 second after sending the first
file to get to sending the last file. An extra git-annex branch commit
happens then.
An example situation where A does not hold would have to be something
where the same user (or an automated process) makes a change to the server
in one clone, and then immediately pulls the git-annex branch in another
clone and expects it to reflect the change. That's possible, but in any
situation where there are two different users, 1 second is plenty of time.
And of course, when the same user is doing both, they only need to push the
git-annex branch to the server before pulling it to avoid any timing
issues.
It is possible that a server has so much change activity that it is never
left idle, and so never commits. A low bandwidth series of uploads, for
example. It would be possible to commit after N minutes even when not idle,
but I don't know what would be a good value for N. And any value in minutes
would be too long to satisfy A in any case.
Changes were piling up in the journal until p2phttp exited or another
git-annex command committed them. That could lead to situations where one
client made a change to the server, but didn't push the git-annex branch to
it, and so another client would be unaware of the change.
Rather than make a commit after every change, wait until the server has
been idle for 1 second, and then commit. This way, when a client is making
several changes, eg sending multiple files, it will wait until the end to
commit.
1 second was chosen as a time that is:
A) Short enough that no user is likely to notice that the server
waits this long before committing.
Long enough that a git-annex command that makes multiple changes tothe server is unlikely to wait this long after one change finishes
before sending the next change.
An example situation where B does not hold is `git-annex copy --to origin`
in a large repository, where the first and last file are not in the server,
and the rest are. So it takes more than 1 second after sending the first
file to get to sending the last file. An extra git-annex branch commit
happens then.
An example situation where A does not hold would have to be something
where the same user (or an automated process) makes a change to the server
in one clone, and then immediately pulls the git-annex branch in another
clone and expects it to reflect the change. That's possible, but in any
situation where there are two different users, 1 second is plenty of time.
And of course, when the same user is doing both, they only need to push the
git-annex branch to the server before pulling it to avoid any timing
issues.
It is possible that a server has so much change activity that it is never
left idle, and so never commits. A low bandwidth series of uploads, for
example. It would be possible to commit after N minutes even when not idle,
but I don't know what would be a good value for N. And any value in minutes
would be too long to satisfy A in any case.
diff --git a/CHANGELOG b/CHANGELOG
index 1e1cd87ee7..112c34db33 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -2,6 +2,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* Fix behavior of local git remotes that have annex-ignore
set to be the same as ssh git remotes.
+ * p2phttp: Commit git-annex branch changes promptly.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs
index 522ad944e1..b7f773301a 100644
--- a/P2P/Http/Server.hs
+++ b/P2P/Http/Server.hs
@@ -251,7 +251,7 @@ serveRemove
-> IsSecure
-> Maybe Auth
-> Handler t
-serveRemove st resultmangle su apiver (B64Key k) cu bypass sec auth = do
+serveRemove st resultmangle su apiver (B64Key k) cu bypass sec auth = changesBranch st su $ do
res <- withP2PConnection apiver WorkerPoolRunner st cu su bypass sec auth RemoveAction id
$ \(conn, _) ->
liftIO $ proxyClientNetProto conn $ remove Nothing k
@@ -273,7 +273,7 @@ serveRemoveBefore
-> IsSecure
-> Maybe Auth
-> Handler RemoveResultPlus
-serveRemoveBefore st su apiver (B64Key k) cu bypass (Timestamp ts) sec auth = do
+serveRemoveBefore st su apiver (B64Key k) cu bypass (Timestamp ts) sec auth = changesBranch st su $ do
res <- withP2PConnection apiver WorkerPoolRunner st cu su bypass sec auth RemoveAction id
$ \(conn, _) ->
liftIO $ proxyClientNetProto conn $
@@ -320,7 +320,7 @@ servePut
-> IsSecure
-> Maybe Auth
-> Handler t
-servePut mst resultmangle su apiver (Just True) _ k cu bypass baf _ _ sec auth = do
+servePut mst resultmangle su apiver (Just True) _ k cu bypass baf _ _ sec auth = changesBranch mst su $ do
res <- withP2PConnection' apiver WorkerPoolRunner mst cu su bypass sec auth WriteAction
(\cst -> cst { connectionWaitVar = False }) (liftIO . protoaction)
servePutResult resultmangle res
@@ -328,7 +328,7 @@ servePut mst resultmangle su apiver (Just True) _ k cu bypass baf _ _ sec auth =
protoaction conn = servePutAction conn k baf $ \_offset -> do
net $ sendMessage DATA_PRESENT
checkSuccessPlus
-servePut mst resultmangle su apiver _datapresent (DataLength len) k cu bypass baf moffset stream sec auth = do
+servePut mst resultmangle su apiver _datapresent (DataLength len) k cu bypass baf moffset stream sec auth = changesBranch mst su $ do
validityv <- liftIO newEmptyTMVarIO
let validitycheck = local $ runValidityCheck $
liftIO $ atomically $ readTMVar validityv
diff --git a/P2P/Http/State.hs b/P2P/Http/State.hs
index 29355c4851..d817a5e270 100644
--- a/P2P/Http/State.hs
+++ b/P2P/Http/State.hs
@@ -2,7 +2,7 @@
-
- https://git-annex.branchable.com/design/p2p_protocol_over_http/
-
- - Copyright 2024-2025 Joey Hess <id@joeyh.name>
+ - Copyright 2024-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -36,6 +36,7 @@ import Utility.HumanTime
import Logs.Proxy
import Annex.Proxy
import Annex.Cluster
+import qualified Annex.Branch
import qualified P2P.Proxy as Proxy
import qualified Types.Remote as Remote
import Remote.List
@@ -82,6 +83,7 @@ data PerRepoServerState = PerRepoServerState
, getServerMode :: GetServerMode
, openLocks :: TMVar (M.Map LockID Locker)
, lockedFilesQSem :: LockedFilesQSem
+ , branchChangesInProgress :: TMVar Bool
}
type AnnexWorkerPool = TMVar (WorkerPool (Annex.AnnexState, Annex.AnnexRead))
@@ -90,7 +92,7 @@ type GetServerMode = IsSecure -> Maybe Auth -> ServerMode
data ServerMode
= ServerMode
- { serverMode :: P2P.ServerMode
+ { serverMode :: P2P.ServerMode
, unauthenticatedLockingAllowed :: Bool
, authenticationAllowed :: Bool
}
@@ -105,6 +107,7 @@ mkPerRepoServerState acquireconn annexworkerpool annexstate annexread getserverm
<*> pure getservermode
<*> newTMVarIO mempty
<*> pure lockedfilesqsem
+ <*> newEmptyTMVarIO
data ActionClass = ReadAction | WriteAction | RemoveAction | LockAction
deriving (Eq)
@@ -318,15 +321,18 @@ mkP2PHttpServerState getservermode updaterepos proxyconnectionpoolsize clusterco
proxypool <- liftIO $ newTMVarIO (0, mempty)
asyncservicer <- liftIO $ async $
servicer myuuid myproxies proxypool reqv relv endv
- let endit = do
- liftIO $ atomically $ putTMVar endv ()
- liftIO $ wait asyncservicer
let servinguuids = myuuid : map proxyRemoteUUID (maybe [] S.toList myproxies)
annexstate <- liftIO . newTMVarIO =<< dupState
annexread <- Annex.getRead id
st <- liftIO $ mkPerRepoServerState
(acquireconn reqv annexstate annexread)
workerpool annexstate annexread getservermode lockedfilesqsem
+ asynccommitter <- liftIO $ async $
+ branchCommitter st endv
+ let endit = do
+ liftIO $ atomically $ putTMVar endv ()
+ liftIO $ wait asyncservicer
+ liftIO $ wait asynccommitter
return $ P2PHttpServerState
{ servedRepos = M.fromList $ zip servinguuids (repeat st)
, serverShutdownCleanup = endit
@@ -347,7 +353,7 @@ mkP2PHttpServerState getservermode updaterepos proxyconnectionpoolsize clusterco
`orElse`
(Left . Right <$> takeTMVar relv)
`orElse`
- (Left . Left <$> takeTMVar endv)
+ (Left . Left <$> readTMVar endv)
case reqrel of
Right (runnertype, annexstate, annexread, connparams, ready, respvar) -> do
servicereq runnertype annexstate annexread myuuid myproxies proxypool relv connparams ready
@@ -818,3 +824,56 @@ proxyConnectionPoolKey connparams =
, connectionBypass connparams
, connectionProtocolVersion connparams
)
+
+-- Use when running an action which may journal git-annex branch changes.
+-- This arranges for the journalled changes to be committed to the branch
+-- in a timely fashion, so that eg, soon after one client has sent a file,
+-- another client can pull the branch and see that the file is present in
+-- the server.
+changesBranch :: TMVar P2PHttpServerState -> B64UUID ServerSide -> Handler t -> Handler t
+changesBranch mstv su a = liftIO (getPerRepoServerState mstv su) >>= \case
+ Just st -> bracket_ (send st True) (send st False) a
+ Nothing -> a
+ where
+ send st b = liftIO $ atomically $
+ putTMVar (branchChangesInProgress st) b
+
+branchCommitter :: PerRepoServerState -> TMVar () -> IO ()
+branchCommitter st endv = do
+ idlev <- newEmptyTMVarIO
+ void $ async $ committer idlev
+ go idlev (0 :: Integer)
+ where
+ waitchangeorend = (Right <$> takeTMVar (branchChangesInProgress st))
+ `orElse` (Left <$> readTMVar endv)
+ go idlev n = atomically waitchangeorend >>= \case
+ Right True -> do
+ let !n' = succ n
+ -- Not idle.
+ void $ atomically $ tryTakeTMVar idlev
+ go idlev n'
+ Right False -> do
+ let n' = pred n
+ when (n' == 0) $
+ -- Idle.
+ atomically $ writeTMVar idlev ()
+ go idlev n'
+ Left () -> return ()
+ waitidleorend idlev =
+ (Right <$> readTMVar idlev)
+ `orElse` (Left <$> readTMVar endv)
+ committer idlev =
+ -- Wait until a change has completed and it's idle.
+ atomically (waitidleorend idlev) >>= \case
+ Right () -> do
+ threadDelaySeconds (Seconds 1)
+ -- Once it's been idle for a second,
+ -- commit the journalled changes.
+ atomically (tryTakeTMVar idlev) >>= \case
+ Just () ->
+ void $ handleRequestAnnex st $
+ Annex.Branch.commit =<< Annex.Branch.commitMessage
+ Nothing -> noop
+ committer idlev
+ Left () -> return ()
+
diff --git a/doc/bugs/p2phttp_timely_journal_commit.mdwn b/doc/bugs/p2phttp_timely_journal_commit.mdwn
index c0ae1c3217..65c7dfec27 100644
--- a/doc/bugs/p2phttp_timely_journal_commit.mdwn
+++ b/doc/bugs/p2phttp_timely_journal_commit.mdwn
@@ -11,3 +11,5 @@ but does not ever push its git-annex branch, other clients will never learn
that the repository has a copy of the file. --[[Joey]]
[[!tag projects/INM7]]
+
+> [[fixed|done]] --[[Joey]]
comment
diff --git a/doc/bugs/p2phttp_timely_journal_commit/comment_1_59b8cebeae88d50dbcf410ee8bec4a75._comment b/doc/bugs/p2phttp_timely_journal_commit/comment_1_59b8cebeae88d50dbcf410ee8bec4a75._comment new file mode 100644 index 0000000000..8d99d8f218 --- /dev/null +++ b/doc/bugs/p2phttp_timely_journal_commit/comment_1_59b8cebeae88d50dbcf410ee8bec4a75._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-19T15:37:59Z" + content=""" +It could simply commit after each change. +But that would bloat the git-annex branch with a lot of small commits when +a lot of files are being sent to the server in one batch. + +I think what probably makes sense is to detect when the p2phttp +server has been idle for some amount of time, and commit then. +A few seconds idle should be enough to coalesce everything done by +a typical `git annex push` into a single git-annex branch commit. +"""]]
respond, open bug
diff --git a/doc/bugs/p2phttp_timely_journal_commit.mdwn b/doc/bugs/p2phttp_timely_journal_commit.mdwn new file mode 100644 index 0000000000..c0ae1c3217 --- /dev/null +++ b/doc/bugs/p2phttp_timely_journal_commit.mdwn @@ -0,0 +1,13 @@ +`git-annex p2phttp`, when eg receiving files into the repository, leaves +git-annex location log changes in the journal and does not commit them to +the git-annex branch in a timely fashion. + +Usually git-annex branch commits happen when a git-annex command finishes, +but p2phttp runs for a long time. So a commit won't happen until it's +restarted or some other git-annex command is run in the repo. + +This causes problems. Ie, if one client copies a file to the repository, +but does not ever push its git-annex branch, other clients will never learn +that the repository has a copy of the file. --[[Joey]] + +[[!tag projects/INM7]] diff --git a/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_3_06a8830d23ffd33d425147bf859005bc._comment b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_3_06a8830d23ffd33d425147bf859005bc._comment new file mode 100644 index 0000000000..841744c287 --- /dev/null +++ b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_3_06a8830d23ffd33d425147bf859005bc._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-19T15:10:44Z" + content=""" +This seems like a bug in the p2phttp server, it should not be leaving the +git-annex branch uncommitted for long periods of time. It's easy enough to +show that it leaves changes in the journal for a long time. + +Probably we don't usually notice the bug because usually, if the p2phttp server +doesn't commit the journal, the client will record the same information +in the git-annex branch on its side, and push it out in the normal course +of events, eg during a sync. I assume your JS client doesn't do that. + +I've filed a bug: [[bugs/p2phttp_timely_journal_commit]] + +(As to the p2phttp clientuuid parameter, it is actually only used in transfer +logs, which don't get into the git-annex branch. Using a made-up non-UUID there, +or for that matter, using a UUID that "belongs" to someone else won't cause +any real problem. (`git-annex info` will use the non-UUID in the "transfers +in progress" display). This does not seem related to your problem.) +"""]]
Revert "remove incorrect sentance"
This reverts commit fcb2b19910dab3c9a4a1149ca966940bed130b17.
Actually, the docs are correct. It works for a ssh remote. There is a
bug preventing it from working as documented with a local git remote
though.
This reverts commit fcb2b19910dab3c9a4a1149ca966940bed130b17.
Actually, the docs are correct. It works for a ssh remote. There is a
bug preventing it from working as documented with a local git remote
though.
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 297a7ed7b3..7ec1efb314 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1580,6 +1580,7 @@ Remotes are configured using these settings in `.git/config`. If set to `true`, prevents git-annex from storing or retrieving annexed file contents on this remote by default. + (You can still request it be used with the `--from` and `--to` options.) This is, for example, useful if the remote is located somewhere without git-annex-shell. (For example, if it's on GitHub).
Added a comment: Mismatch with observations
diff --git a/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_2_c85202a4ed5022a08e4b390cb9eb5f29._comment b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_2_c85202a4ed5022a08e4b390cb9eb5f29._comment
new file mode 100644
index 0000000000..2c1a78eeae
--- /dev/null
+++ b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_2_c85202a4ed5022a08e4b390cb9eb5f29._comment
@@ -0,0 +1,71 @@
+[[!comment format=mdwn
+ username="mih"
+ avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd"
+ subject="Mismatch with observations"
+ date="2026-01-16T15:30:17Z"
+ content="""
+Thanks for detailing the behavior. I am observing something different, though. The context is a git-annex repo at a forgejo-aneksajo site.
+
+I used a JS client to upload annex keys to a an annex with uuid `f1a8ef1c-...`. This worked. I see them in `annex/objects` at the remote
+
+```
+git@loki:~/git/repositories/internal/pool-files.git$ tree annex/objects/
+annex/objects/
+|-- d73
+| `-- 370
+| `-- SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.1
+| `-- SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.1
+|-- db2
+| `-- f4b
+| `-- SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.svg
+| `-- SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.svg
+|-- dc7
+| `-- 005
+| `-- SHA256E-s12027--6f4d23344053ca2f22c4a40eec33e178b70e67d63edd1c5e4f05d96053548b69.png
+| `-- SHA256E-s12027--6f4d23344053ca2f22c4a40eec33e178b70e67d63edd1c5e4f05d96053548b69.png
+`-- fa0
+ `-- d63
+ `-- SHA256E-s324254--d6581b4f13219fe93aa8b020df8ec8875881c5d97f28493c6b7b7ac9e80c2532.png
+ `-- SHA256E-s324254--d6581b4f13219fe93aa8b020df8ec8875881c5d97f28493c6b7b7ac9e80c2532.png
+```
+
+I also see this:
+
+```
+git@...:~/git/repositories/internal/pool-files.git$ find . -name '*f1a8ef1c-...*'
+./annex/transfer/upload/f1a8ef1c-...
+git@...:~/git/repositories/internal/pool-files.git$ grep -R 'f1a8ef1c-6d8a-40e3-970f-4634390d961f' .
+./annex/journal/db2_f4b_SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.svg.log:1766077308s 1 f1a8ef1c-...
+./annex/journal/dc7_005_SHA256E-s12027--6f4d23344053ca2f22c4a40eec33e178b70e67d63edd1c5e4f05d96053548b69.png.log:1766058908s 1 f1a8ef1c-...
+./annex/journal/fa0_d63_SHA256E-s324254--d6581b4f13219fe93aa8b020df8ec8875881c5d97f28493c6b7b7ac9e80c2532.png.log:1766060307s 1 f1a8ef1c-...
+./annex/journal/d73_370_SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.1.log:1766077792s 1 f1a8ef1c-...
+./config: uuid = f1a8ef1c-...
+```
+
+This made me (incorrectly) think whether this could mean that the repo thinks the upload came FROM f1a8ef1c-... ?
+
+The p2phttp request is made to an endpoint that is composed like this:
+
+```
+endpoint = `${baseUrl}/${targetUuid}/v4/put?key=${encodeURIComponent(fileData.value.annexKey)}&clientuuid=${encodeURIComponent(clientUuid)}`
+```
+
+where
+
+```
+ baseUrl: https://<site>/git-annex-p2phttp/git-annex
+ targetUuid: f1a8ef1c-...
+ clientUuid: not-a-uuid
+```
+
+Notice that `clientUuid` is not a UUID (redacted original value that also was not a valid UUID).
+
+I have adjusted that to be an actual UUID, and did another upload. This achieved two things:
+
+1. A new file uploaded successfully (as before)
+2. The pending logs were applied and the git-annex branch was updated -- exactly like you described.
+
+However, the new upload is now sitting in the journal, and has not been taken into account, and additional uploads do not trigger a git-annex branch update immediately.
+
+This issue may be in the realm of forgejo-aneksajo, and how it runs the p2phttp server. The previous uploads were made mid-December (as seen from the timestamps in the journal). Nothing has triggered a journal commit, also not the fetch of the git-annex branch.
+"""]]
add news item for git-annex 10.20260115
diff --git a/doc/news/version_10.20250925.mdwn b/doc/news/version_10.20250925.mdwn deleted file mode 100644 index 3cba8b8b77..0000000000 --- a/doc/news/version_10.20250925.mdwn +++ /dev/null @@ -1,28 +0,0 @@ -git-annex 10.20250925 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Fix bug that made changes to a special remote sometimes be missed when - importing a tree from it. After upgrading, any such missed changes - will be included in the next tree imported from a special remote. - Fixes reversion introduced in version 10.20230626. - * Fix crash operating on filenames that are exactly 21 bytes long - and begin with a utf-8 character. - * Fix hang that could occur when using git-annex adjust on a branch with - a number of files greater than annex.queuesize. - * Fix bug that could cause an invalid utf-8 sequence to be used in a - temporary filename when the input filename was valid utf-8. - * Improve performance when used with a local git remote that has a - large working tree. - * drop: --fast support when dropping from a remote. - * Added annex.assistant.allowunlocked config. - * Add git-remote-p2p-annex and git-remote-tor-annex to standalone builds. - * enableremote: Disallow using type= to attempt to change the type of an - existing remote. - * Add build warnings when git-annex is built without the OsPath - build flag. - * version: Report on whether it was built with the OsPath build flag. - * Avoid leaking file descriptors to child processes started by git-annex - in some situations. Note that when not built with the OsPath build - flag, these leaks can still happen. - * git-annex.cabal: Turn on the OsPath build flag by default. - * p2phttp: Fix a hang that could occur when used with --directory, - and a repository in the directory got removed. - * Removed support for building with unmaintained cryptonite, use crypton."""]] \ No newline at end of file diff --git a/doc/news/version_10.20260115.mdwn b/doc/news/version_10.20260115.mdwn new file mode 100644 index 0000000000..9835c5150a --- /dev/null +++ b/doc/news/version_10.20260115.mdwn @@ -0,0 +1,20 @@ +git-annex 10.20260115 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * New git configs annex.initwanted, annex.initrequired, and + annex.initgroups. + * Fix bug that could result in a tree imported from a remote containing + missing git blobs. + * fix: Populate unlocked pointer files in situations where a git command, + like git reset or git stash, leaves them unpopulated. + * Pass www-authenticate headers in to to git credential, to support + eg, git-credential-oauth. + * import: Fix display of some import errors. + * external: Respond to GETGITREMOTENAME during INITREMOTE with the remote + name. + * When displaying sqlite error messages, include the path to the database. + * webapp: Remove support for local pairing; use wormhole pairing instead. + * git-annex.cabal: Removed pairing build flag, and no longer depends + on network-multicast or network-info. + * Remove support for building with old versions of persistent and + persistent-sqlite. + * Removed support for building with ghc older than 9.6.6. + * stack.yaml: Update to lts-24.26."""]] \ No newline at end of file
Added a comment: Appending `or present` is a funny idea
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_15_96e536dfab68ddce34c92ccba0186c30._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_15_96e536dfab68ddce34c92ccba0186c30._comment new file mode 100644 index 0000000000..ffa0e586a0 --- /dev/null +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_15_96e536dfab68ddce34c92ccba0186c30._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Appending `or present` is a funny idea" + date="2026-01-15T23:02:54Z" + content=""" +> Hmm, if the default always had \"or present\" added to it, at least the surprise drop would not be a concern. + +That is a very funny idea, I like it! +"""]]
rename defaultwanted to initwanted (etc)
This is to leave open the possibility of a git-annex config
default that is used when there is no preferred content set.
Currently, copying them over at init time feels safe, and a git-annex
config default has known safety problems that would need to be
addressed. But maybe they can be eventually.
This is to leave open the possibility of a git-annex config
default that is used when there is no preferred content set.
Currently, copying them over at init time feels safe, and a git-annex
config default has known safety problems that would need to be
addressed. But maybe they can be eventually.
diff --git a/Annex/Init.hs b/Annex/Init.hs
index 4d6c3a9b73..e610fbce00 100644
--- a/Annex/Init.hs
+++ b/Annex/Init.hs
@@ -20,7 +20,7 @@ module Annex.Init (
probeCrippledFileSystem,
probeCrippledFileSystem',
isCrippledFileSystem,
- propigateDefaultGitConfigs,
+ propigateInitGitConfigs,
) where
import Annex.Common
@@ -177,7 +177,7 @@ initialize' startupannex mversion _initallowed = do
)
propigateSecureHashesOnly
when (isNothing initialversion) $
- propigateDefaultGitConfigs =<< getUUID
+ propigateInitGitConfigs =<< getUUID
createInodeSentinalFile False
fixupUnusualReposAfterInit
@@ -504,12 +504,12 @@ propigateSecureHashesOnly =
=<< getGlobalConfig "annex.securehashesonly"
{- Propigate git configs that set defaults. -}
-propigateDefaultGitConfigs :: UUID -> Annex ()
-propigateDefaultGitConfigs u = do
+propigateInitGitConfigs :: UUID -> Annex ()
+propigateInitGitConfigs u = do
gc <- Annex.getGitConfig
- set (annexDefaultWanted gc) preferredContentSet
- set (annexDefaultRequired gc) requiredContentSet
- case annexDefaultGroups gc of
+ set (annexInitWanted gc) preferredContentSet
+ set (annexInitRequired gc) requiredContentSet
+ case annexInitGroups gc of
[] -> noop
groups -> groupChange u (S.union (S.fromList groups))
where
diff --git a/CHANGELOG b/CHANGELOG
index 9c4a8ae418..6c2f303b07 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,7 +1,7 @@
git-annex (10.20260115) upstream; urgency=medium
- * New git configs annex.defaultwanted, annex.defaultrequired, and
- annex.defaultgroups.
+ * New git configs annex.initwanted, annex.initrequired, and
+ annex.initgroups.
* Fix bug that could result in a tree imported from a remote containing
missing git blobs.
* fix: Populate unlocked pointer files in situations where a git command,
diff --git a/Command/InitRemote.hs b/Command/InitRemote.hs
index 6fd6a0d75c..d4e5f1086d 100644
--- a/Command/InitRemote.hs
+++ b/Command/InitRemote.hs
@@ -128,7 +128,7 @@ cleanup t u name c o = do
case sameas o of
Nothing -> do
describeUUID u (toUUIDDesc name)
- propigateDefaultGitConfigs u
+ propigateInitGitConfigs u
Logs.Remote.configSet u c
Just _ -> do
cu <- liftIO genUUID
diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs
index 9057989495..a33d8a9dca 100644
--- a/Types/GitConfig.hs
+++ b/Types/GitConfig.hs
@@ -172,9 +172,9 @@ data GitConfig = GitConfig
, annexViewUnsetDirectory :: ViewUnset
, annexClusters :: M.Map RemoteName ClusterUUID
, annexFullyBalancedThreshhold :: Double
- , annexDefaultWanted :: Maybe String
- , annexDefaultRequired :: Maybe String
- , annexDefaultGroups :: [Group]
+ , annexInitWanted :: Maybe String
+ , annexInitRequired :: Maybe String
+ , annexInitGroups :: [Group]
}
extractGitConfig :: ConfigSource -> Git.Repo -> GitConfig
@@ -319,10 +319,10 @@ extractGitConfig configsource r = GitConfig
, annexFullyBalancedThreshhold =
fromMaybe 0.9 $ (/ 100) <$> getmayberead
(annexConfig "fullybalancedthreshhold")
- , annexDefaultWanted = getmaybe (annexConfig "defaultwanted")
- , annexDefaultRequired = getmaybe (annexConfig "defaultrequired")
- , annexDefaultGroups = map (Group . encodeBS) $
- getwords (annexConfig "defaultgroups")
+ , annexInitWanted = getmaybe (annexConfig "initwanted")
+ , annexInitRequired = getmaybe (annexConfig "initrequired")
+ , annexInitGroups = map (Group . encodeBS) $
+ getwords (annexConfig "initgroups")
}
where
getbool k d = fromMaybe d $ getmaybebool k
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index 52c601b6b5..297a7ed7b3 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -1061,25 +1061,25 @@ repository, using [[git-annex-config]]. See its man page for a list.)
If this is set to `true` then it will instead use the `annex.addunlocked`
configuration to decide which files to add unlocked.
-* `annex.defaultwanted`
+* `annex.initwanted`
When this is set to a preferred content expression, all
new repositories (and special remotes) will have it copied into their
configuration when initialized, the same as if you had run
[[git-annex-wanted]](1).
-* `annex.defaultrequired`
+* `annex.initrequired`
When this is set to a preferred content expression, all
new repositories (and special remotes) will have it copied into their
configuration when initialized, the same as if you had run
[[git-annex-required]](1).
-* `annex.defaultgroups`
+* `annex.initgroups`
When this is set to a list of groups (separated by whitespace), all
- new repositories (and special remotes) will start out in those groups,
- the same as if you had run [[git-annex-group]](1).
+ new repositories (and special remotes) start out in those groups
+ when initialized, the same as if you had run [[git-annex-group]](1).
* `annex.numcopies`
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_14_60dddf4e5a89e344ad7bb03fd4539635._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_14_60dddf4e5a89e344ad7bb03fd4539635._comment
new file mode 100644
index 0000000000..3f2fe8bbf1
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_14_60dddf4e5a89e344ad7bb03fd4539635._comment
@@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 14"""
+ date="2026-01-15T16:34:09Z"
+ content="""
+Hmm, if the default always had "or present" added to it, at least the
+surprise drop would not be a concern.
+
+I am going to change the names to "initwanted" etc as you suggested,
+to avoid closing off the possiblity of adding a global default later.
+"""]]
Added a comment
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_13_d16a7c55ad3b86370aa06fc3964c1ebf._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_13_d16a7c55ad3b86370aa06fc3964c1ebf._comment new file mode 100644 index 0000000000..3edf7edd5e --- /dev/null +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_13_d16a7c55ad3b86370aa06fc3964c1ebf._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 13" + date="2026-01-15T08:51:46Z" + content=""" +> It's probably somewhat common to want to get files from origin, but not let origin make config changes that drop all the files they have previously shared. + +Fair enough. + +So I guess one can encourage users to include `git config --global annex.jobs 4` and `git config annex.defaultwanted present` in their setup. Thanks for implementing that. +"""]]
response
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_12_76cc0409b834e12aee0eaedb5d5c1e2c._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_12_76cc0409b834e12aee0eaedb5d5c1e2c._comment new file mode 100644 index 0000000000..71d36cb2f1 --- /dev/null +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_12_76cc0409b834e12aee0eaedb5d5c1e2c._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 12""" + date="2026-01-14T17:42:46Z" + content=""" +> Anyone with write access to a repo can already freely change any group, groupwanted or wanted for any involved clone - if it's present in the git-annex branch + +A good point certianly. + +> So your concerns only apply to private repos that don't record their activity in the git-annex branch by using `annex.private=true`. + +Well also repos that lack permission to push or are simply not pushed to +origin. + +It's probably somewhat common to want to get files from origin, but not let +origin make config changes that drop all the files they have previously +shared. +"""]]
fix comment (TAB to indent markdown lists is a bad idea in the webinterface 😅)
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment index fe56f47ce2..c3ecbed1e4 100644 --- a/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment @@ -12,13 +12,18 @@ Yes, but the same is already possible for anyone with write access to a repo. I Other situations I can imagine consider groups of people (or just single users) who trust each other when using a git-annex repo. git-annex is not designed to solve such permission problems - neither is git itself. -git-annex usages: +In your publicly readable (not writable) git-annex-builds repo on the other hand, if *you* were to set `git annex config --set annex.defaultwanted nothing`, then people who just run `git annex sync|assist|assistant` in their clones would have their downloaded builds dropped, okay. -- publicly writable git-annex repo -(bad idea anyway for several reasons) -- publicly readable git-annex repo (e.g. your git-annex-builds repo) - -> people you were able to social engineer to doing that +### git-annex usage scenarios +- publicly writable git-annex repo + - (bad idea anyway for several reasons without any form of permission control on the remote side) + - malicious people could set `git annex config --set annex.defaultwanted nothing` at some point and other's clones would have files dropped on sync. +- publicly readable git-annex repo to provide assets (e.g. your git-annex-builds repo) + - only the owner could do such shenanigans. Users can avoid it by using `git annex pull` and `git annex get` instead of `sync|assist|assistant` (which arguably makes more sense in this case anyway) or explicitly stating their `git annex wanted here ...`. +- groups or individuals working on a repo in several clones - everyone has write access, in a team for example + - anyone can already happily destroy repo contents and control other's wanted expressions + - `git annex config annex.defaultwanted` can be set as an established "repo policy" for everyone's convenience, that anyone can overwrite locally with `git annex wanted here ...`. + - if you run `git annex assist|sync|assistant|satisfy`, you *accept the repo's policy*, as with your `securehashesonly` example. If you're paranoid, don't use these sync commands, but do only exactly what you want such as `git annex pull -g`, `git annex get <thatfile>`, `git annex wanted ...`, etc. """]]
Added a comment
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment
new file mode 100644
index 0000000000..fe56f47ce2
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment
@@ -0,0 +1,24 @@
+[[!comment format=mdwn
+ username="nobodyinperson"
+ avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
+ subject="comment 11"
+ date="2026-01-14T14:13:00Z"
+ content="""
+> you can set annex.defaultwanted to \"standard\", and annex.defaultgroups to some group, and then changing git-annex groupwanted will affect all repositories that copied that defaultwanted into their config
+
+> If annex.defaultwanted were able to be changed for all repositories with git-annex config, then here's a really ugly security problem [...]
+
+Yes, but the same is already possible for anyone with write access to a repo. I can `git annex wanted JOEYS-UUID nothing`, wait for your assistant or manual sync to auto-drop all files (would also need to set `{num,min}copies` to 1 for that, and even then it might not auto-drop it depending on the remotes). Anyone with write access to a repo can already freely change any group, groupwanted or wanted for any involved clone - if it's present in the git-annex branch (i.e. not made with `git config annex.private=true`). So your concerns only apply to private repos that don't record their activity in the git-annex branch by using `annex.private=true`. Making a git-annex repo private is a conscious, active choice. One does not need to do it if one only consumes files and does not have push access anyway. So that'll be people who actively change repo content, probably consume it, but don't want their repo to show up in `git annex info`. Maybe for a publicly-pushable git-annex repo where everyone can add new files (who would host that anyway...). In this case, yes, users of that repo can't trust each other and there setting something like `git annex config --set annex.defaultwanted nothing` at some point can lead to people's `git annex sync|assist|assistant` to suddenly drop their files - and probably also on the central remote. But I'd argue that this kind of publicly writable setup has so many other obvious problems that `annex.defaultwanted` is one of the minor ones.
+
+Other situations I can imagine consider groups of people (or just single users) who trust each other when using a git-annex repo. git-annex is not designed to solve such permission problems - neither is git itself.
+
+git-annex usages:
+
+- publicly writable git-annex repo
+(bad idea anyway for several reasons)
+- publicly readable git-annex repo (e.g. your git-annex-builds repo)
+
+> people you were able to social engineer to doing that
+
+
+"""]]
Added a comment
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_9_bf66e70d86d7c334781d6bd1827a1fe9._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_9_bf66e70d86d7c334781d6bd1827a1fe9._comment new file mode 100644 index 0000000000..99cbf2bfb9 --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_9_bf66e70d86d7c334781d6bd1827a1fe9._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 9" + date="2026-01-14T07:26:26Z" + content=""" +Thank you! +"""]]
comments
diff --git a/Remote/List.hs b/Remote/List.hs
index 80a9781f10..7b2ba4f048 100644
--- a/Remote/List.hs
+++ b/Remote/List.hs
@@ -110,8 +110,8 @@ remoteGen' adjustconfig m t g = do
Just r -> Just <$> adjustExportImport (adjustReadOnly (addHooks r)) rs
{- Updates a local git Remote, re-reading its git config. -}
-updateRemote :: Remote -> Annex (Maybe Remote)
-updateRemote remote = do
+updateRemote :: Remote -> Bool -> Annex (Maybe Remote)
+updateRemote remote honorignore = do
m <- remoteConfigMap
remote' <- updaterepo =<< getRepo remote
remoteGen m (remotetype remote) remote'
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_10_7aeec1887d1851624015c6dbf2feecf5._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_10_7aeec1887d1851624015c6dbf2feecf5._comment
new file mode 100644
index 0000000000..e68007d5cc
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_10_7aeec1887d1851624015c6dbf2feecf5._comment
@@ -0,0 +1,26 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 10"""
+ date="2026-01-13T17:42:20Z"
+ content="""
+If annex.defaultwanted were able to be changed for all repositories with
+`git-annex config`, then here's a really ugly security problem:
+
+* First, I make sure to get a copy of every annexed file.
+* Then I run `git-annex config annex.defaultwanted nothing`
+* Then I wait for git-annex to drop every file from your repository.
+* Finally, I demand $ to get your files back.
+
+Now, the same can be done by convincing people to add their repository to
+some group and set preferred content to "standard", and later
+changing the groupwanted. But that only works on people you were able to
+social engineer to doing that, not everyone who cloned a repository
+with the default settings.
+
+And beyond the ransom problem, there's the problem that once this is set,
+any change to it is going to affect most every other user of the
+repository. With groupwanted there's a communicated intent in the name of
+the group, and there can be different groups with different versions of the
+preferred content expression. This lacks that, it encourages flag day
+events.
+"""]]
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_8_30f883e3d14304b1ed1eaf3f1a6421f2._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_8_30f883e3d14304b1ed1eaf3f1a6421f2._comment
new file mode 100644
index 0000000000..ef415b80db
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_8_30f883e3d14304b1ed1eaf3f1a6421f2._comment
@@ -0,0 +1,17 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 8"""
+ date="2026-01-13T17:21:09Z"
+ content="""
+I'm on the fence about whether the kind of security impact I discussed
+earlier is really something that should prevent a global setting, or not.
+
+`git-annex config` of `annex.securehashesonly` is another example of
+something where my hypothetical "auditing repos" would be vulnerable to a
+behavior change that might be security significant. Since that gets copied
+from the git-annex config to git config at init time, behavior in a
+new clone might be different than behavior in an existing clone.
+
+Does that mean it's ok for there to be more cases where there can be such a
+potential security impact? I don't know.
+"""]]
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_9_115e5b2d9eef39ca086ca6d2b1e67627._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_9_115e5b2d9eef39ca086ca6d2b1e67627._comment
new file mode 100644
index 0000000000..0ade7fc4ff
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_9_115e5b2d9eef39ca086ca6d2b1e67627._comment
@@ -0,0 +1,13 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 9"""
+ date="2026-01-13T17:29:35Z"
+ content="""
+Note that you can set annex.defaultwanted to "standard", and
+annex.defaultgroups to some group, and then changing
+`git-annex groupwanted` will affect all repositories that copied that
+defaultwanted into their config.
+
+So that's a way to be able to make changes that will affect other people's
+clones. But only ones that they have opted into.
+"""]]
comment
diff --git a/doc/todo/support_push_to_create/comment_1_1ad156f002e61c1a3db246c103c37b63._comment b/doc/todo/support_push_to_create/comment_1_1ad156f002e61c1a3db246c103c37b63._comment new file mode 100644 index 0000000000..8c288fe8c2 --- /dev/null +++ b/doc/todo/support_push_to_create/comment_1_1ad156f002e61c1a3db246c103c37b63._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-13T17:02:50Z" + content=""" +The annex-ignore config can be manually set by the user to prevent using an +otherwise usable remote. The man page gives the example of a network +connection that is too slow to use normally. + +It may be that no users are actually using annex-ignore like this. +Using annex-sync seems more likely. But, it's hard to rule out. + +That presents a problem, since this would need to unset annex-ignore once +the repository was created. + +Checking before push if the repository exists, and only unsetting +annex-ignore if it did not exist before sync, but does afterwards, would be +one way around this problem. It does mean that, if 2 people are making +a repository at the same location at the same time, the loser may be left +with annex-ignore set due to the other person having created the +repository. + +Or, a new config could be added, that is like annex-ignore, but is only +set by git-annex, and not by the user. Keeping annex-ignore's behavior, +but making git-annex set and unset the new config as needed. +"""]]
remove incorrect sentance
Testing with annex-ignore set on a remote, git-annex get --from that
remote fails with "cannot access remote"
Testing with annex-ignore set on a remote, git-annex get --from that
remote fails with "cannot access remote"
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index bf69cc4438..52c601b6b5 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1580,7 +1580,6 @@ Remotes are configured using these settings in `.git/config`. If set to `true`, prevents git-annex from storing or retrieving annexed file contents on this remote by default. - (You can still request it be used with the `--from` and `--to` options.) This is, for example, useful if the remote is located somewhere without git-annex-shell. (For example, if it's on GitHub).
Added a comment: Thanks! Maybe still consider a repo-wide setting for default wanted content?
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_7_0fcdc57b6e094bfdd450286be26c9b56._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_7_0fcdc57b6e094bfdd450286be26c9b56._comment
new file mode 100644
index 0000000000..24869b1658
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_7_0fcdc57b6e094bfdd450286be26c9b56._comment
@@ -0,0 +1,27 @@
+[[!comment format=mdwn
+ username="nobodyinperson"
+ avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
+ subject="Thanks! Maybe still consider a repo-wide setting for default wanted content? "
+ date="2026-01-13T15:50:32Z"
+ content="""
+Hi joey, thank you for picking this up. IIUC, what you implemented (`git config annex.default{wanted,required,group}`) allows you to set these configs *locally* and then spare *yourself* the initial `git annex wanted . present` (etc.) setup calls. This is cool, thanks!
+
+The problem I was trying to express here is however that `git annex assist` (the very convenient do-it-all command you can tell non-techy people to use to 'do the syncing stuff') will by default pull in *all* files, resulting in a terrible user experience: it's slow (of course nobody sets `annex.jobs=cpus` or uses `-j4`), it takes up a ridiculous amount of space, people will say 'I don't need that 3GB file, why does it download it?' (of course nobody remembers or understands to set `git annex wanted . present` or anything complex), etc. Sure, this is a question of user education, but good defaults can make for a much easier onboarding experience. (I know you are not so fond of such a do-it-all command, but this `git annex assist` single-stepping command really has been a good git annex selling point in the discussions and talks I had.)
+
+So if there was a global setting like `git annex config --set annex.defaultwanted 'present or include=*.pdf'` that would set the default wanted expression for any clone, one could define what the most important files are and tell everyone to `git annex get` the others if necessary. `git annex assist` will be fast, only pull in the most important files (or none!), people can modify or add new stuff, and run `git annex assist` quickly again.
+
+I would say `git annex config --set annex.defaultwanted <whatever>` should **not** execute `git annex wanted . <whatever>` and as such hard-code it in the git-annex branch for every repo (because then again, when would that even be executed? Would it be re-set after another `git annex config --set annex.defaultwanted <whatever2>`? When?). Instead, `git annex --set annex.defaultwanted <whatever>` should cause the *default* (i.e. fallback) value of `git annex wanted .` to be `<whatever>`, which is currently just `\"\"`, which I guess means something like `include=*` IIRC.
+
+## Re: your security concerns
+
+I understand your hesitation to add more `git annex config ...` global repo configs. But here I would argue:
+
+- git annex does not have a permissions model anyway. Anyone with push access to a repo can change any policy, any wanted expression for any repo, etc. If that is a problem, then git annex might not be the right tool. I guess one can implement some level of permission control with post-receive hooks on the remote side, but that is outside git annex's scope. git annex assumes everyone writing to the repo is nice.
+- I don't really understand your 'auditing' repo situation. Does it mean you regularly clone some repos, run `git annex pull|assist` in them to check if it still works? In that case the only negative thing `git annex config --set annex.defaultwanted` could do is indeed leaving you with *less* downloaded files. If one needs all files, `git annex get --all` has always been the way to go, hasn't it? 🤔 Or what kind of external repos from bad actors maliciously setting a default wanted expression do you 'audit'? And how is not having all files after `git annex assist` bad in this case?
+
+*Should* you consider implementing `git annex config --set annex.defaultwanted`, it would conflict with the freshly introduced `git config annex.defaultwanted` local settings. We could rename those to `git config annex.initdefaultwanted` (or just `annex.initwanted), to emphasize that those only happen on `git annex init`. Then `git annex config --set annex.defaultwanted` would also sound very sensible to me in contrast, as it really configures the default, and does not modify individual repos.
+
+Cheers,
+Yann
+
+"""]]
comment
diff --git a/doc/projects/datalad/bugs-done/add_config_var_preventing_adjusted_branch_mode/comment_7_7f0b7073893ee4ab73f273f6204e2a27._comment b/doc/projects/datalad/bugs-done/add_config_var_preventing_adjusted_branch_mode/comment_7_7f0b7073893ee4ab73f273f6204e2a27._comment new file mode 100644 index 0000000000..25c1c70b81 --- /dev/null +++ b/doc/projects/datalad/bugs-done/add_config_var_preventing_adjusted_branch_mode/comment_7_7f0b7073893ee4ab73f273f6204e2a27._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2026-01-13T15:33:19Z" + content=""" +The automatic init that git-annex does in a clone does enter adjusted +branch. I think I was not considering that because you were talking about +having an existing repository and git-annex entering the adjusted branch +later. + +We can reopen this if you want, unsure. +"""]]
response
diff --git a/doc/tips/cloning_a_repository_privately/comment_4_989bafbd0b79b7f75c199cdd7817a82f._comment b/doc/tips/cloning_a_repository_privately/comment_4_989bafbd0b79b7f75c199cdd7817a82f._comment new file mode 100644 index 0000000000..5c742c5863 --- /dev/null +++ b/doc/tips/cloning_a_repository_privately/comment_4_989bafbd0b79b7f75c199cdd7817a82f._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: buyer's remorse""" + date="2026-01-13T14:55:07Z" + content=""" +Oh good question! + +This gets a tiny bit into internals, but `.git/annex/journal-private/` is +where the private information is stored. If you move the files from there +into `.git/annex/journal/`, they will be committed on the next run of +git-annex. + +You would need to take care to avoid overwriting any existing files in the +journal, usually there won't be any though. + +Also unset annex.private of course. +"""]]
response
diff --git a/doc/bugs/available_space_miscomputed_on_large_macOS_volume/comment_2_321c61379aee8c7ad16742eb7e458562._comment b/doc/bugs/available_space_miscomputed_on_large_macOS_volume/comment_2_321c61379aee8c7ad16742eb7e458562._comment
new file mode 100644
index 0000000000..5233219bf4
--- /dev/null
+++ b/doc/bugs/available_space_miscomputed_on_large_macOS_volume/comment_2_321c61379aee8c7ad16742eb7e458562._comment
@@ -0,0 +1,22 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2026-01-13T14:32:53Z"
+ content="""
+I'm inclined to agree with you, it's probably a problem with
+<https://hackage.haskell.org/package/disk-free-space>
+
+I am not going to be able to reproduce this!
+
+Could you take a look at disk-free-space in ghci and see if it reproduces
+there?
+
+ ghci> import System.DiskSpace
+ ghci> getAvailSpace "/"
+ 283744563200
+ ghci> getDiskUsage "/"
+ DiskUsage {diskTotal = 501386043392, diskFree = 283761369088, diskAvail = 283744591872, blockSize = 4096}
+
+Looking at the code, it assumes bsize and frsize are CULong. I guess it's
+that or FsBlkCnt is somehow wrong.
+"""]]
response
diff --git a/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__/comment_1_92ec96134e1045804382cbe28e22154b._comment b/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__/comment_1_92ec96134e1045804382cbe28e22154b._comment new file mode 100644 index 0000000000..676902768d --- /dev/null +++ b/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__/comment_1_92ec96134e1045804382cbe28e22154b._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-13T14:27:09Z" + content=""" +The assistant only sends files to repositories that want them. This is not +guaranteed to make as many copies of the files as whatever you have +numcopies configured to. (Numcopies will prevent the assistant from +dropping a file from a repository if there are not enough copies.) + +All of your archive repositories only want 1 copy of a file across all of +them, so you would need 2 backup repositories (which want all files) in +order to get to 3 copies. +"""]]
response
diff --git a/doc/forum/Old_files_being_pushed_to_transfer_repository/comment_1_f96d7b83fa1d1207cbbe8d35d92a2c1c._comment b/doc/forum/Old_files_being_pushed_to_transfer_repository/comment_1_f96d7b83fa1d1207cbbe8d35d92a2c1c._comment new file mode 100644 index 0000000000..e26f998803 --- /dev/null +++ b/doc/forum/Old_files_being_pushed_to_transfer_repository/comment_1_f96d7b83fa1d1207cbbe8d35d92a2c1c._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-13T14:21:29Z" + content=""" +There are two possibilities: + +1. Transfer repositories want files that have not yet reached all clients, so + maybe you had a second client repository that doesn't have the file yet. + +2. When there is only a single client repository, transfer repositories + want to contain all content, even once it's reached that client. The + assumption is that, since the purpose of a transfer repo is to transfer + between clients, there will be a second client repository added at some + point, and then the trasfer repository will have the content to send it it. + +This is documented in [[preferred_content/standard_groups]]. +"""]]
response
diff --git a/doc/special_remotes/compute/comment_8_f13c86659ddbb1a027455b9b3e67296a._comment b/doc/special_remotes/compute/comment_8_f13c86659ddbb1a027455b9b3e67296a._comment new file mode 100644 index 0000000000..50a20cb97c --- /dev/null +++ b/doc/special_remotes/compute/comment_8_f13c86659ddbb1a027455b9b3e67296a._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: comment 6""" + date="2026-01-13T14:14:27Z" + content=""" +`git-annex findcomputed --inputs` is documented to output one line per +input file. If it doesn't behave that way, file a bug. + +It would be possible to run git-annex commands in the compute script if +you were able to determine where the git repository was. I don't think +git-annex sets anything in the environment that will help with that +currently. + +If the compute program set metadata though, it would re-set the same +metadata when it's used to recompute the files. That might be undesirable +behavior if the user has edited the metadata in the meantime. +"""]]
response
diff --git a/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_2_baa9ac446450aa0522c6298cae80a2c8._comment b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_2_baa9ac446450aa0522c6298cae80a2c8._comment new file mode 100644 index 0000000000..9c84a5f9d1 --- /dev/null +++ b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_2_baa9ac446450aa0522c6298cae80a2c8._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-01-13T14:08:21Z" + content=""" +I tend to agree, this adds a lot of potential for foot shooting. + +It might make sense an an option that enables acting on non-annexed files? +"""]]
response
diff --git a/doc/forum/Confirming_my_preferred_content_understanding/comment_1_52c1b6396a56a2096f9ab5cf3e1c9d43._comment b/doc/forum/Confirming_my_preferred_content_understanding/comment_1_52c1b6396a56a2096f9ab5cf3e1c9d43._comment new file mode 100644 index 0000000000..a9cd3d09b0 --- /dev/null +++ b/doc/forum/Confirming_my_preferred_content_understanding/comment_1_52c1b6396a56a2096f9ab5cf3e1c9d43._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-13T14:01:50Z" + content=""" +I think that will work! + +Since moving content between the archive drives is probably reasonably +fast, it might make sense to use fullybalanced or fullysizebalanced. + +In any case, when using "balanced" things, you will need to use +[[git-annex-maxsize]] to tell it how large each repository is. +"""]]
update
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment index b9f723232c..777af18df7 100644 --- a/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment @@ -45,17 +45,17 @@ Well, there is a small one. If I have made a clone of a repository, I may be hiding the existence of that repository from others. So nobody knows its uuid, and so they cannot change its preferred content setting. But with `git-annex config` allowing overriding the default, -I'd risk a pull from origin changing it. +a clone I made yesterday may behave differently than a clone I make today. -Which, since the default is to want all files, must change my repo +Which, since the default is to want all files, must make clone to want fewer files. -So for this to be an actual security problem, I would need to be relying -on my repository getting all files for some security reason. Which could be -auditing the content of annexed files. As the auditing repository, I want -it to get every file that passes through origin. And by foolishly relying -on the current default preferred content (which after all joey seems like -he's never gonna get around to changing!), I open myself up to an attacker +So for this to be an actual security problem, I would need to be relying on +my clones getting all files for some security reason. Which could be +auditing the content of annexed files. I want the auditing clones to get +every file that passes through origin. And by foolishly relying on the +current default preferred content (which after all joey seems like he's +never gonna get around to changing!), I open myself up to an attacker breaking my auditing process. That's a bit tortured, but it does seem to argue against making this
git configs annex.defaultwanted, annex.defaultrequired, and annex.defaultgroups
These are propigated into the git-annex branch when a repository is
initialized for the 1st time. That includes by git-annex init, by
autointialization, and by git-annex initremote. Note that git-annex
reinit, git-annex init run a second time and git-annex enableremote
do not propigate them, to avoid overwriting the the git-annex branch.
git-remote-annex also propigates them for the local repository when
initializing it. It does not propigate them to the temporary special
remote that it uses for cloning. That special remote was already
initialized elsewhere, so the git-annex branch, once fetched from it, will
have the desired settings. And since git-remote-annex only downloads from
it, these configs don't matter as far as what it does.
Sponsored-by: Graham Spencer on Patreon
These are propigated into the git-annex branch when a repository is
initialized for the 1st time. That includes by git-annex init, by
autointialization, and by git-annex initremote. Note that git-annex
reinit, git-annex init run a second time and git-annex enableremote
do not propigate them, to avoid overwriting the the git-annex branch.
git-remote-annex also propigates them for the local repository when
initializing it. It does not propigate them to the temporary special
remote that it uses for cloning. That special remote was already
initialized elsewhere, so the git-annex branch, once fetched from it, will
have the desired settings. And since git-remote-annex only downloads from
it, these configs don't matter as far as what it does.
Sponsored-by: Graham Spencer on Patreon
diff --git a/Annex/Init.hs b/Annex/Init.hs
index 64c924fd04..4d6c3a9b73 100644
--- a/Annex/Init.hs
+++ b/Annex/Init.hs
@@ -1,6 +1,6 @@
{- git-annex repository initialization
-
- - Copyright 2011-2025 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -20,6 +20,7 @@ module Annex.Init (
probeCrippledFileSystem,
probeCrippledFileSystem',
isCrippledFileSystem,
+ propigateDefaultGitConfigs,
) where
import Annex.Common
@@ -34,6 +35,8 @@ import qualified Database.Fsck
import Logs.UUID
import Logs.Trust.Basic
import Logs.Config
+import Logs.PreferredContent.Raw
+import Logs.Group
import Types.TrustLevel
import Types.RepoVersion
import Annex.Version
@@ -64,6 +67,7 @@ import qualified Utility.LockFile.Posix as Posix
#endif
import qualified Data.Map as M
+import qualified Data.Set as S
import Control.Monad.IO.Class (MonadIO)
#ifndef mingw32_HOST_OS
import System.PosixCompat.Files (ownerReadMode, isNamedPipe)
@@ -150,7 +154,8 @@ initialize' startupannex mversion _initallowed = do
hookWrite preCommitHook
hookWrite postReceiveHook
setDifferences
- unlessM (isJust <$> getVersion) $
+ initialversion <- getVersion
+ unless (isJust initialversion) $
setVersion (fromMaybe defaultVersion mversion)
supportunlocked <- annexSupportUnlocked <$> Annex.getGitConfig
if supportunlocked
@@ -171,6 +176,8 @@ initialize' startupannex mversion _initallowed = do
Direct.switchHEADBack
)
propigateSecureHashesOnly
+ when (isNothing initialversion) $
+ propigateDefaultGitConfigs =<< getUUID
createInodeSentinalFile False
fixupUnusualReposAfterInit
@@ -487,7 +494,7 @@ initSharedClone True = do
trustSet u UnTrusted
setConfig (annexConfig "hardlink") (Git.Config.boolConfig True)
-{- Propagate annex.securehashesonly from then global config to local
+{- Propigate annex.securehashesonly from the global config to local
- config. This makes a clone inherit a parent's setting, but once
- a repository has a local setting, changes to the global config won't
- affect it. -}
@@ -496,6 +503,19 @@ propigateSecureHashesOnly =
maybe noop (setConfig "annex.securehashesonly" . fromConfigValue)
=<< getGlobalConfig "annex.securehashesonly"
+{- Propigate git configs that set defaults. -}
+propigateDefaultGitConfigs :: UUID -> Annex ()
+propigateDefaultGitConfigs u = do
+ gc <- Annex.getGitConfig
+ set (annexDefaultWanted gc) preferredContentSet
+ set (annexDefaultRequired gc) requiredContentSet
+ case annexDefaultGroups gc of
+ [] -> noop
+ groups -> groupChange u (S.union (S.fromList groups))
+ where
+ set (Just expr) setter = setter u expr
+ set Nothing _ = noop
+
fixupUnusualReposAfterInit :: Annex ()
fixupUnusualReposAfterInit = do
gc <- Annex.getGitConfig
diff --git a/CHANGELOG b/CHANGELOG
index 92a10ca152..fb965ce70c 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,5 +1,7 @@
git-annex (10.20251216) UNRELEASED; urgency=medium
+ * New git configs annex.defaultwanted, annex.defaultrequired, and
+ annex.defaultgroups.
* Fix bug that could result in a tree imported from a remote containing
missing git blobs.
* fix: Populate unlocked pointer files in situations where a git command,
diff --git a/Command/InitRemote.hs b/Command/InitRemote.hs
index eda978cfea..6fd6a0d75c 100644
--- a/Command/InitRemote.hs
+++ b/Command/InitRemote.hs
@@ -1,6 +1,6 @@
{- git-annex command
-
- - Copyright 2011-2024 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -22,6 +22,7 @@ import Types.ProposedAccepted
import Config
import Git.Config
import Git.Types
+import Annex.Init
import qualified Data.Map as M
import qualified Data.Text as T
@@ -127,6 +128,7 @@ cleanup t u name c o = do
case sameas o of
Nothing -> do
describeUUID u (toUUIDDesc name)
+ propigateDefaultGitConfigs u
Logs.Remote.configSet u c
Just _ -> do
cu <- liftIO genUUID
diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs
index 4303c09961..9057989495 100644
--- a/Types/GitConfig.hs
+++ b/Types/GitConfig.hs
@@ -1,6 +1,6 @@
{- git-annex configuration
-
- - Copyright 2012-2025 Joey Hess <id@joeyh.name>
+ - Copyright 2012-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -51,6 +51,7 @@ import Types.RepoVersion
import Types.StallDetection
import Types.View
import Types.Cluster
+import Types.Group
import Config.DynamicConfig
import Utility.HumanTime
import Utility.Gpg (GpgCmd, mkGpgCmd)
@@ -171,6 +172,9 @@ data GitConfig = GitConfig
, annexViewUnsetDirectory :: ViewUnset
, annexClusters :: M.Map RemoteName ClusterUUID
, annexFullyBalancedThreshhold :: Double
+ , annexDefaultWanted :: Maybe String
+ , annexDefaultRequired :: Maybe String
+ , annexDefaultGroups :: [Group]
}
extractGitConfig :: ConfigSource -> Git.Repo -> GitConfig
@@ -284,7 +288,7 @@ extractGitConfig configsource r = GitConfig
(getmayberead (annexConfig "adjustedbranchrefresh"))
, annexSupportUnlocked = getbool (annexConfig "supportunlocked") True
, annexAssistantAllowUnlocked = getbool (annexConfig "assistant.allowunlocked") False
- , annexTrashbin = getmaybe "annex.trashbin"
+ , annexTrashbin = getmaybe (annexConfig "trashbin")
, coreSymlinks = getbool "core.symlinks" True
, coreSharedRepository = getSharedRepository r
, coreQuotePath = QuotePath (getbool "core.quotepath" True)
@@ -315,6 +319,10 @@ extractGitConfig configsource r = GitConfig
, annexFullyBalancedThreshhold =
fromMaybe 0.9 $ (/ 100) <$> getmayberead
(annexConfig "fullybalancedthreshhold")
+ , annexDefaultWanted = getmaybe (annexConfig "defaultwanted")
+ , annexDefaultRequired = getmaybe (annexConfig "defaultrequired")
+ , annexDefaultGroups = map (Group . encodeBS) $
+ getwords (annexConfig "defaultgroups")
}
where
getbool k d = fromMaybe d $ getmaybebool k
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index c325adc9c1..bf69cc4438 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -1061,6 +1061,26 @@ repository, using [[git-annex-config]]. See its man page for a list.)
If this is set to `true` then it will instead use the `annex.addunlocked`
configuration to decide which files to add unlocked.
+* `annex.defaultwanted`
+
+ When this is set to a preferred content expression, all
+ new repositories (and special remotes) will have it copied into their
+ configuration when initialized, the same as if you had run
+ [[git-annex-wanted]](1).
+
+* `annex.defaultrequired`
+
+ When this is set to a preferred content expression, all
+ new repositories (and special remotes) will have it copied into their
+ configuration when initialized, the same as if you had run
+ [[git-annex-required]](1).
+
+* `annex.defaultgroups`
+
+ When this is set to a list of groups (separated by whitespace), all
(Diff truncated)
comment
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment new file mode 100644 index 0000000000..b9f723232c --- /dev/null +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment @@ -0,0 +1,72 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-12T15:54:26Z" + content=""" +Seems I really dropped the ball on following up to this one. On the other +hand, it seems a lot of things need to be thought through still.. + +--- + +I suppose there are two ways a default preferred content config could work: + +1. Something that gets set in the repository's config at `git-annex init` + (or autoinit) time, when the repository does not already have a + preferred content setting. Also at `git-annex initremote` time for + special remotes. +2. Something that is used rather than the current default of "" + when a repository does not have a preferred content setting. + +With option #1, it gets baked into the repo, while with option #2 you can +change a single git config later and it affects whatever repos. + +Pretty sure people have been wanting option #1. + +And option #2 seems to have a problem, that git-annex could see different +preferred content settings for the same repository when run in different +places. Which could result in a churn of content being added to a +repository, and later dropped from it. + +So option #1 seems like the right one. + +--- + +Looking back at the original request, there was the idea that +`git annex config` could set the default. + +Every `git annex config` setting needs to be considered for +security and unwanted behavior. + +As far as security goes, if someone can set `git-annex config`, +they can just go in and change the preferred content settings of any +repository. So no difference? + +Well, there is a small one. If I have made a clone of a repository, +I may be hiding the existence of that repository from others. +So nobody knows its uuid, and so they cannot change its preferred content +setting. But with `git-annex config` allowing overriding the default, +I'd risk a pull from origin changing it. + +Which, since the default is to want all files, must change my repo +to want fewer files. + +So for this to be an actual security problem, I would need to be relying +on my repository getting all files for some security reason. Which could be +auditing the content of annexed files. As the auditing repository, I want +it to get every file that passes through origin. And by foolishly relying +on the current default preferred content (which after all joey seems like +he's never gonna get around to changing!), I open myself up to an attacker +breaking my auditing process. + +That's a bit tortured, but it does seem to argue against making this +a `git-annex config` setting. + +---- + +The original request also included annex.defaultgroupwanted ... +I don't see how that would work. groupwanted varies by group, it does +not make sense to have a default that works across groups. + +It does seem to make sense to allow annex.defaultgroup to set the default +group(s) of a new repository. +"""]]
comment
diff --git a/doc/bugs/git_annex_get_is_silently_stuck_on__P2P___62___GET_0/comment_4_99ccacee9746b3b30980c5f94f5ebf49._comment b/doc/bugs/git_annex_get_is_silently_stuck_on__P2P___62___GET_0/comment_4_99ccacee9746b3b30980c5f94f5ebf49._comment new file mode 100644 index 0000000000..ea7704974a --- /dev/null +++ b/doc/bugs/git_annex_get_is_silently_stuck_on__P2P___62___GET_0/comment_4_99ccacee9746b3b30980c5f94f5ebf49._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-12T15:31:48Z" + content=""" +Makes sense it could be locking. As part of recording the currently running +transfer, a lock is held. + +Pid locking still involves a regular unix lock, the side lock, which is in +/dev/shm or /tmp. So I guess it could be that /tmp is on nfs and lockd +misbehaving caused the problem? +"""]]
Added a comment
diff --git a/doc/design/external_special_remote_protocol/comment_61_f214c6f610a2be1beec00a973e3ed994._comment b/doc/design/external_special_remote_protocol/comment_61_f214c6f610a2be1beec00a973e3ed994._comment new file mode 100644 index 0000000000..da81bf72a6 --- /dev/null +++ b/doc/design/external_special_remote_protocol/comment_61_f214c6f610a2be1beec00a973e3ed994._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="Katie" + avatar="http://cdn.libravatar.org/avatar/38e04123b913160b66d8117cada14532" + subject="comment 61" + date="2026-01-11T06:18:07Z" + content=""" +Thanks a lot for the quick fix, Joey! +"""]]
external: Respond to GETGITREMOTENAME during INITREMOTE with the remote name
diff --git a/CHANGELOG b/CHANGELOG
index d7f8ec5d7b..92a10ca152 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -7,6 +7,8 @@ git-annex (10.20251216) UNRELEASED; urgency=medium
* Pass www-authenticate headers in to to git credential, to support
eg, git-credential-oauth.
* import: Fix display of some import errors.
+ * external: Respond to GETGITREMOTENAME during INITREMOTE with the remote
+ name.
* When displaying sqlite error messages, include the path to the database.
* webapp: Remove support for local pairing; use wormhole pairing instead.
* git-annex.cabal: Removed pairing build flag, and no longer depends
diff --git a/Remote/External.hs b/Remote/External.hs
index d9871eaf41..87a23a2b9e 100644
--- a/Remote/External.hs
+++ b/Remote/External.hs
@@ -1,6 +1,6 @@
{- External special remote interface.
-
- - Copyright 2013-2025 Joey Hess <id@joeyh.name>
+ - Copyright 2013-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -193,7 +193,7 @@ externalSetup externalprogram setgitconfig ss mu remotename _ c gc = do
else do
pc' <- either giveup return $ parseRemoteConfig c' (lenientRemoteConfigParser externalprogram)
let p = fromMaybe (ExternalType externaltype) externalprogram
- external <- newExternal p (Just u) pc' (Just gc) Nothing Nothing
+ external <- newExternal p (Just u) pc' (Just gc) (Just remotename) Nothing
-- Now that we have an external, ask it to LISTCONFIGS,
-- and re-parse the RemoteConfig strictly, so we can
-- error out if the user provided an unexpected config.
@@ -953,3 +953,4 @@ remoteConfigParser externalprogram c
where
isproposed (Accepted _) = False
isproposed (Proposed _) = True
+
diff --git a/doc/design/external_special_remote_protocol.mdwn b/doc/design/external_special_remote_protocol.mdwn
index 5a1f9fa969..f79b8230ae 100644
--- a/doc/design/external_special_remote_protocol.mdwn
+++ b/doc/design/external_special_remote_protocol.mdwn
@@ -379,6 +379,9 @@ handling a request.
passed to `git-annex initremote` and `enableremote`, but it is possible
for git remotes to be renamed, and this will provide the remote's current
name.
+ If this is used during INITREMOTE, the git remote may not be
+ configured yet. (Older versions of git-annex responded with an ERROR
+ when this is used during INITREMOTE.)
(git-annex replies with VALUE followed by the name.)
This message is a protocol extension; it's only safe to send it to
git-annex after it sent an `EXTENSIONS` that included `GETGITREMOTENAME`.
diff --git a/doc/design/external_special_remote_protocol/comment_60_92ddb8c0da5260c619467ba8a5bf753c._comment b/doc/design/external_special_remote_protocol/comment_60_92ddb8c0da5260c619467ba8a5bf753c._comment
new file mode 100644
index 0000000000..12aa7212e7
--- /dev/null
+++ b/doc/design/external_special_remote_protocol/comment_60_92ddb8c0da5260c619467ba8a5bf753c._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""Re: How do I get GETGITREMOTENAME to work in INITREMOTE?"""
+ date="2026-01-09T17:26:59Z"
+ content="""
+@Katie, thanks for pointing out that doesn't work. I was able to fix that,
+so check out a daily build.
+"""]]
comment
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_4_f6d3abcc128796acc7ccfa50a3d0f907._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_4_f6d3abcc128796acc7ccfa50a3d0f907._comment new file mode 100644 index 0000000000..0d8cdfb882 --- /dev/null +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_4_f6d3abcc128796acc7ccfa50a3d0f907._comment @@ -0,0 +1,35 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-08T19:46:02Z" + content=""" +Unfortunately, that design doesn't optimize the preferred content +expression that you were wanting to use: + +`include=docs/* or (include=*.md and exclude=*/*)` + +In this case, the exclude limits the include to md files in the top directory, +not subdirectories, but with the current design it will recurse and find +all files to handle the `include=*.md`. + +To optimise that, it needs to look at when includes are ANDed with +excludes. With `"exclude=*/*"`, only files in the root directory can match, +and those are always listed. So, that include can be filtered out before +step #3 above. + +The other cases of excludes that can be ANDed with an include are: + +* `exclude=bar/*` -- This needs to do a full listing, same reasons I + discussed in comment 2. +* `exclude=*/foo.*` -- Also needs a full listing. +* `exclude=foo` -- Also needs a full listing. +* `exclude=foo.*` -- Also needs a full listing. +* `exclude=*[/]*` -- Same as "exclude=*/*" +* `exclude=*[//]*` -- Same (and so on for other numbers of slashes). +* `exclude=*/**` -- Same (and so on for more asterisks in the front or back) +* `exclude=*[/]**` -- Same (and so on for more slashes and asterisks in the + front or back) +* `exclude=*` -- Pointless to AND with an include since the combination + can never match. May as well optimise it anyway by avoiding a full listing. +* `exclude=**` -- Same as above (and so on) +"""]]
correction
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment index a7cca3fb31..fbdcd966f4 100644 --- a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment @@ -3,11 +3,14 @@ subject="""comment 1""" date="2026-01-08T13:49:52Z" content=""" -Paths in preferred content expressions match relative to the top, so -this preferred content expression will match only md files in the top, +This preferred content expression will match only md files in the top, and files in the docs subdirectory: -`include=docs/* or include=*.md` +`include=docs/* or (include=*.md and exclude=*/*)` + +I got this wrong at first; this version will work! The `"include=*.md"` +matches files with that extension anywhere in the tree, so the `"exclude=*/*` +is needed to limit to ones not in a subdirectory. Only preferred content is downloaded, but S3 is still queried for the entire list of files in the bucket.
markdown
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment index b1a0c2585c..4d32e682d6 100644 --- a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment @@ -12,11 +12,11 @@ subdirectories. Eg, if the bucket contains "foo", "bar/...", and "baz/...", the response will list only the file "foo", and CommonPrefixes contains "bar" and "baz". -So, git-annex could make that request, and then if "include=bar/*" is not -in preferred content, but "include=foo/*" is, it could make a request to +So, git-annex could make that request, and then if `"include=bar/*"` is not +in preferred content, but `"include=foo/*"` is, it could make a request to list files prefixed by "foo/". And so avoid listing all the files in "bar". -If preferred content contained "include=foo/x/*" and "include=foo/y/*", +If preferred content contained `"include=foo/x/*"` and `"include=foo/y/*"`, when CommonPrefixes includes "foo", git-annex could follow up with 2 requests to list those subdirectories. @@ -24,7 +24,7 @@ So this ends up making at most 1 additional request per subdirectory included in preferred content. When preferred content excludes a subdirectory though, more requests would -be needed. For "exclude=bar/*", if the response lists 100 other +be needed. For `"exclude=bar/*"`, if the response lists 100 other subdirectories in CommonPrefixes, it would need to make 100 separate requests to list those while avoiding listing bar. That could easily be more expensive than the current behavior. So it does not seem to make sense
markdown
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment
index e18502378a..361471327b 100644
--- a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment
+++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment
@@ -5,12 +5,12 @@
content="""
There are some complications in possible preferred content expressions:
-"include=foo*/*" -- we want "foo/*" but also "foooooom/*"... but what if
+`"include=foo*/*"` -- we want `"foo/*"` but also `"foooooom/*"`... but what if
there are 100 such subdirectories? It would be an unexpected cost to need
to make so many requests. Like exclude=, the optimisation should not be
used in this case.
-"include=foo/bar" -- we want only this file.. so would prefer to avoid
+`"include=foo/bar"` -- we want only this file.. so would prefer to avoid
recursing through the rest of foo. If there are multiple ones like this
that are all in the same subdirectory, it might be nice to make
one single request to find them all. But this seems like an edge case,
@@ -22,16 +22,16 @@ Here's a design:
2. Filter for "include=" that contain a "/" in the value. If none are
found, do the usual full listing of the bucket.
3. If any of those includes contain a glob before a "/", do the usual full
- listing of the bucket. (This handles the "include=foo*/* case)
+ listing of the bucket. (This handles the `"include=foo*/*"` case)
4. Otherwise, list the top level of the bucket with delimiter set to "/".
5. Include all the top-level files in the list.
6. Filter the includes to ones that start with a subdirectory in the
CommonPrefixes.
7. For each remaining include, make a request to list the bucket, with
the prefix set to the non-glob directory from the include. For example,
- for "include=foo/bar/*", set prefix to "foo/bar/", but for
- "include=foo/*bar", set prefix to "foo/". And for "include=foo/bar",
- set prefix to "foo/".
+ for `"include=foo/bar/*"`, set prefix to `"foo/bar/"`, but for
+ `"include=foo/*bar"`, set prefix to `"foo/"`. And for
+ `"include=foo/bar"`, set prefix to `"foo/"`.
8. Add back the prefixes to each file in the responses.
Note that, step #1 hides some complexity, because currently preferred
design
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment new file mode 100644 index 0000000000..a7cca3fb31 --- /dev/null +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-08T13:49:52Z" + content=""" +Paths in preferred content expressions match relative to the top, so +this preferred content expression will match only md files in the top, +and files in the docs subdirectory: + +`include=docs/* or include=*.md` + +Only preferred content is downloaded, but S3 is still queried for the +entire list of files in the bucket. +"""]] diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment new file mode 100644 index 0000000000..b1a0c2585c --- /dev/null +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment @@ -0,0 +1,32 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-01-08T14:16:26Z" + content=""" +I do think it would be possible to avoid the overhead of listing the +contents of subdirectories that are not preferred content. At +least sometimes. + +When a bucket is listed with a "/" delimiter, S3 does not recurse into +subdirectories. Eg, if the bucket contains "foo", "bar/...", and "baz/...", +the response will list only the file "foo", and CommonPrefixes contains +"bar" and "baz". + +So, git-annex could make that request, and then if "include=bar/*" is not +in preferred content, but "include=foo/*" is, it could make a request to +list files prefixed by "foo/". And so avoid listing all the files in "bar". + +If preferred content contained "include=foo/x/*" and "include=foo/y/*", +when CommonPrefixes includes "foo", git-annex could follow up with 2 requests +to list those subdirectories. + +So this ends up making at most 1 additional request per subdirectory included +in preferred content. + +When preferred content excludes a subdirectory though, more requests would +be needed. For "exclude=bar/*", if the response lists 100 other +subdirectories in CommonPrefixes, it would need to make 100 separate +requests to list those while avoiding listing bar. That could easily be +more expensive than the current behavior. So it does not seem to make sense +to try to optimise handling of excludes. +"""]] diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment new file mode 100644 index 0000000000..e18502378a --- /dev/null +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment @@ -0,0 +1,42 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-08T14:46:44Z" + content=""" +There are some complications in possible preferred content expressions: + +"include=foo*/*" -- we want "foo/*" but also "foooooom/*"... but what if +there are 100 such subdirectories? It would be an unexpected cost to need +to make so many requests. Like exclude=, the optimisation should not be +used in this case. + +"include=foo/bar" -- we want only this file.. so would prefer to avoid +recursing through the rest of foo. If there are multiple ones like this +that are all in the same subdirectory, it might be nice to make +one single request to find them all. But this seems like an edge case, +and one request per include is probably acceptable. + +Here's a design: + +1. Get preferred content expression of the remote. +2. Filter for "include=" that contain a "/" in the value. If none are + found, do the usual full listing of the bucket. +3. If any of those includes contain a glob before a "/", do the usual full + listing of the bucket. (This handles the "include=foo*/* case) +4. Otherwise, list the top level of the bucket with delimiter set to "/". +5. Include all the top-level files in the list. +6. Filter the includes to ones that start with a subdirectory in the + CommonPrefixes. +7. For each remaining include, make a request to list the bucket, with + the prefix set to the non-glob directory from the include. For example, + for "include=foo/bar/*", set prefix to "foo/bar/", but for + "include=foo/*bar", set prefix to "foo/". And for "include=foo/bar", + set prefix to "foo/". +8. Add back the prefixes to each file in the responses. + +Note that, step #1 hides some complexity, because currently preferred +content is loaded and parsed to a MatchFiles, which does not allow +introspecting to get the expression. Since we only care about include +expressions, it would suffice to add to MatchFiles a +`matchInclude :: Maybe String` which gets set for includes. +"""]]
Added a comment: How do I get GETGITREMOTENAME to work in INITREMOTE?
diff --git a/doc/design/external_special_remote_protocol/comment_59_ca91c66cf172e0e859dfe6c6e8d62dd3._comment b/doc/design/external_special_remote_protocol/comment_59_ca91c66cf172e0e859dfe6c6e8d62dd3._comment new file mode 100644 index 0000000000..657eca57b2 --- /dev/null +++ b/doc/design/external_special_remote_protocol/comment_59_ca91c66cf172e0e859dfe6c6e8d62dd3._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="Katie" + avatar="http://cdn.libravatar.org/avatar/38e04123b913160b66d8117cada14532" + subject="How do I get GETGITREMOTENAME to work in INITREMOTE?" + date="2026-01-07T23:37:01Z" + content=""" +I am writing a external special remote using this protocol. This is little similar to the directory remote and there's a path on the local system where content is stored. + +I don't want this location to be saved in the git-annex branch and I thought I'll be able to use GETGITREMOTENAME to persist it myself. However, I'm running into an issue where GETGITREMOTENAME fails during INITREMOTE (presumably since the remote has not yet been created). It does work during Prepare, but that feels a bit late to ask for a required piece of configuration. + +What are my options? My ideal behavior would be if it behaves very similar to `directory=` field in directory remote, but I can hand-manage it too if that's the recommendation as long as I get some identifier for this remote (there can be multiple of these in the same repo) +"""]]
desire for a limited import/export.
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree.mdwn b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree.mdwn new file mode 100644 index 0000000000..3eacf23742 --- /dev/null +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree.mdwn @@ -0,0 +1,6 @@ +I wanted to implement management and synchronization of descriptive files (README.md etc) on the top of the large S3 bucket via git-annex so I could keep files in a git repo and rely on importree/exporttree functionality to keep bucket and repo in sync. + +Looking at [special_remotes/S3/](https://git-annex.branchable.com/special_remotes/S3/) I didn't spot any option to achieve that. + +I am not sure what would be the best option for this, given that greedy me might want to also eventually `sync` some `docs/` prefix there: may be could be a white list of some keys/paths to include and/or exclude? May be some [preferred content](https://git-annex.branchable.com/preferred_content/) `include` expression could be specific enough to not demand full bucket traversal (unrealistic in feasible time) but rather limit to top level, e.g. `include=^docs/ and include=^*.md` or smth smarter? +
Pass www-authenticate headers in to to git credential
To support eg, git-credential-oauth.
To support eg, git-credential-oauth.
diff --git a/Annex/Url.hs b/Annex/Url.hs
index 1cc742f522..6d0cb43767 100644
--- a/Annex/Url.hs
+++ b/Annex/Url.hs
@@ -157,7 +157,7 @@ withUrlOptions :: Maybe RemoteGitConfig -> (U.UrlOptions -> Annex a) -> Annex a
withUrlOptions mgc a = a =<< getUrlOptions mgc
-- When downloading an url, if authentication is needed, uses
--- git-credential to prompt for username and password.
+-- git-credential for the prompting.
--
-- Note that, when the downloader is curl, it will not use git-credential.
-- If the user wants to, they can configure curl to use a netrc file that
@@ -169,8 +169,8 @@ withUrlOptionsPromptingCreds mgc a = do
prompter <- mkPrompter
cc <- Annex.getRead Annex.gitcredentialcache
a $ uo
- { U.getBasicAuth = \u -> prompter $
- getBasicAuthFromCredential g cc u
+ { U.getBasicAuth = \u respheaders -> prompter $
+ getBasicAuthFromCredential g cc u respheaders
}
checkBoth :: U.URLString -> Maybe Integer -> U.UrlOptions -> Annex Bool
diff --git a/CHANGELOG b/CHANGELOG
index 8d8605dca5..39e6e628a2 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -13,6 +13,8 @@ git-annex (10.20251216) UNRELEASED; urgency=medium
* import: Fix display of some import errors.
* Fix bug that could result in a tree imported from a remote containing
missing git blobs.
+ * Pass www-authenticate headers in to to git credential, to support
+ eg, git-credential-oauth.
-- Joey Hess <id@joeyh.name> Thu, 01 Jan 2026 12:20:29 -0400
diff --git a/Git/Credential.hs b/Git/Credential.hs
index 379fe585b0..1b69381996 100644
--- a/Git/Credential.hs
+++ b/Git/Credential.hs
@@ -1,6 +1,6 @@
{- git credential interface
-
- - Copyright 2019-2022 Joey Hess <id@joeyh.name>
+ - Copyright 2019-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -19,6 +19,8 @@ import Utility.Url.Parse
import qualified Data.Map as M
import Network.URI
+import Network.HTTP.Types
+import Network.HTTP.Types.Header
import Control.Concurrent.STM
data Credential = Credential { fromCredential :: M.Map String String }
@@ -35,7 +37,7 @@ credentialBasicAuth cred = BasicAuth
<*> credentialPassword cred
getBasicAuthFromCredential :: Repo -> TMVar CredentialCache -> GetBasicAuth
-getBasicAuthFromCredential r ccv u = do
+getBasicAuthFromCredential r ccv u respheaders = do
(CredentialCache cc) <- atomically $ readTMVar ccv
case mkCredentialBaseURL r u of
Just bu -> case M.lookup bu cc of
@@ -44,8 +46,8 @@ getBasicAuthFromCredential r ccv u = do
let storeincache = \c -> atomically $ do
CredentialCache cc' <- takeTMVar ccv
putTMVar ccv (CredentialCache (M.insert bu c cc'))
- go storeincache =<< getUrlCredential u r
- Nothing -> go (const noop) =<< getUrlCredential u r
+ go storeincache =<< getUrlCredential u respheaders r
+ Nothing -> go (const noop) =<< getUrlCredential u respheaders r
where
go storeincache c =
case credentialBasicAuth c of
@@ -61,8 +63,9 @@ getBasicAuthFromCredential r ccv u = do
-- | This may prompt the user for the credential, or get a cached
-- credential from git.
-getUrlCredential :: URLString -> Repo -> IO Credential
-getUrlCredential = runCredential "fill" . urlCredential
+getUrlCredential :: URLString -> ResponseHeaders -> Repo -> IO Credential
+getUrlCredential url respheaders = runCredential "fill" $
+ urlCredential url respheaders
-- | Call if the credential the user entered works, and can be cached for
-- later use if git is configured to do so.
@@ -73,8 +76,12 @@ approveUrlCredential c = void . runCredential "approve" c
rejectUrlCredential :: Credential -> Repo -> IO ()
rejectUrlCredential c = void . runCredential "reject" c
-urlCredential :: URLString -> Credential
-urlCredential = Credential . M.singleton "url"
+urlCredential :: URLString -> ResponseHeaders -> Credential
+urlCredential url respheaders = Credential $ M.fromList $
+ ("url", url) : map wwwauth (filter iswwwauth respheaders)
+ where
+ iswwwauth (h, _) = h == hWWWAuthenticate
+ wwwauth (_, v) = ("wwwauth[]", decodeBS v)
runCredential :: String -> Credential -> Repo -> IO Credential
runCredential action input r =
diff --git a/P2P/Http/Client.hs b/P2P/Http/Client.hs
index 024fce2242..1588728850 100644
--- a/P2P/Http/Client.hs
+++ b/P2P/Http/Client.hs
@@ -2,7 +2,7 @@
-
- https://git-annex.branchable.com/design/p2p_protocol_over_http/
-
- - Copyright 2024 Joey Hess <id@joeyh.name>
+ - Copyright 2024-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -42,7 +42,7 @@ import Servant hiding (BasicAuthData(..))
import Servant.Client.Streaming
import qualified Servant.Types.SourceT as S
import Network.HTTP.Types.Status
-import Network.HTTP.Client
+import Network.HTTP.Client hiding (responseHeaders)
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy.Internal as LI
import qualified Data.Map as M
@@ -52,6 +52,7 @@ import Control.Concurrent
import System.IO.Unsafe
import Data.Time.Clock.POSIX
import qualified Data.ByteString.Lazy as L
+import Data.Foldable (toList)
type ClientAction a
= ClientEnv
@@ -119,7 +120,7 @@ p2pHttpClientVersions' allowedversion rmt rmtrepo fallback clientaction =
go clientenv mcred credcached mauth vs
| statusCode (responseStatusCode resp) == 401 ->
case mcred of
- Nothing -> authrequired clientenv (v:vs)
+ Nothing -> authrequired clientenv resp (v:vs)
Just cred -> do
inRepo $ Git.rejectUrlCredential cred
Just <$> fallback (showstatuscode resp)
@@ -134,9 +135,10 @@ p2pHttpClientVersions' allowedversion rmt rmtrepo fallback clientaction =
catchclienterror a = a `catch` \(ex :: ClientError) -> pure (Left ex)
- authrequired clientenv vs = do
+ authrequired clientenv resp vs = do
+ let respheaders = toList $ responseHeaders resp
cred <- prompt $
- inRepo $ Git.getUrlCredential credentialbaseurl
+ inRepo $ Git.getUrlCredential credentialbaseurl respheaders
go clientenv (Just cred) False (credauth cred) vs
showstatuscode resp =
diff --git a/Remote/GitLFS.hs b/Remote/GitLFS.hs
index 2ec2f429d7..89d70b6e91 100644
--- a/Remote/GitLFS.hs
+++ b/Remote/GitLFS.hs
@@ -316,7 +316,10 @@ discoverLFSEndpoint tro h =
resp <- makeSmallAPIRequest testreq
if needauth (responseStatus resp)
then do
- cred <- prompt $ inRepo $ Git.getUrlCredential (show lfsrepouri)
+ cred <- prompt $ inRepo $
+ Git.getUrlCredential
+ (show lfsrepouri)
+ (responseHeaders resp)
let endpoint' = addbasicauth (Git.credentialBasicAuth cred) endpoint
let testreq' = LFS.startTransferRequest endpoint' transfernothing
flip catchNonAsync (const (returnendpoint endpoint')) $ do
diff --git a/Utility/Url.hs b/Utility/Url.hs
index d98ade2738..c40a3ee748 100644
--- a/Utility/Url.hs
+++ b/Utility/Url.hs
@@ -281,7 +281,7 @@ getUrlInfo url uo = case parseURIRelaxed url of
fn <- extractFromResourceT (extractfilename resp)
return $ found len fn
else if responseStatus resp == unauthorized401
- then return $ getBasicAuth uo' (show (getUri req)) >>= \case
+ then return $ getBasicAuth uo' (show (getUri req)) (responseHeaders resp) >>= \case
Nothing -> return dne
Just (ba, signalsuccess) -> do
ui <- existsconduit'
@@ -476,7 +476,7 @@ downloadConduit meterupdate iv req file uo =
else do
rf <- extractFromResourceT (respfailure resp)
if responseStatus resp == unauthorized401
- then return $ getBasicAuth uo (show (getUri req')) >>= \case
+ then return $ getBasicAuth uo (show (getUri req')) (responseHeaders resp) >>= \case
Nothing -> giveup rf
Just ba -> retryauthed ba
else return $ giveup rf
@@ -516,7 +516,7 @@ downloadConduit meterupdate iv req file uo =
else do
rf <- extractFromResourceT (respfailure resp)
if responseStatus resp == unauthorized401
- then return $ getBasicAuth uo (show (getUri req'')) >>= \case
(Diff truncated)
sig
diff --git a/doc/todo/support_push_to_create.mdwn b/doc/todo/support_push_to_create.mdwn index 2bda5e2520..50d9cd0a65 100644 --- a/doc/todo/support_push_to_create.mdwn +++ b/doc/todo/support_push_to_create.mdwn @@ -30,4 +30,4 @@ remotes that don't have a UUID. This would slow down pushes to eg github slightl since it would ignore annex-ignore being set, and re-probe the git config to see if a UUID has appeared. That seems a small enough price to pay. -The assistant would also need to be made to handle this. jjjj +The assistant would also need to be made to handle this. --[[Joey]]
break todo out of bug report
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment index e6fed5674d..cba034b3da 100644 --- a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment @@ -10,4 +10,6 @@ than "push to create". I do think my idea in comment #2 would be better than how you implemented that. But it's also not directly relevant to this bug report. + +I did open [[todo/support_push_to_create]]. """]] diff --git a/doc/todo/support_push_to_create.mdwn b/doc/todo/support_push_to_create.mdwn new file mode 100644 index 0000000000..2bda5e2520 --- /dev/null +++ b/doc/todo/support_push_to_create.mdwn @@ -0,0 +1,33 @@ +"push to create" as supported by eg Forgejo makes a `git push` to a new +git repository create the repository. + +Since the repository does not exist when git-annex probes the UUID, +which happens before any push, annex-ignore is set to true. +So a command like `git-annex push` will do the git push and create the +repository, but fail to discover the uuid of that repository, and so +not send annexed files to it. + +forgejo-aneksajo has worked around this by making git-annex's request for +"$url/config" create the repository. See: + +* <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/commit/3c53e9803de9c59e9e78ac19f0bb107651bb48f8> +* <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/85> +* <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/83#issuecomment-5093679> and following comments + +But that means that `git-annex pull` will also auto-create the repository. +Or even a command like `git-annex info` that does UUID discovery of a newly +added remote. + +git-annex could support push to create better by having `git-annex push`, +after pushing the git branches, regenerate the remote list, while +ignoring the annex-ignore configuration of remotes. +So if the branch push created the git repo, any annex uuid that the +new repo has would be discovered at that point. (And at that point annex-ignore +would need to be cleared.) + +The remote list regeneration would only need to be done when there are git +remotes that don't have a UUID. This would slow down pushes to eg github slightly, +since it would ignore annex-ignore being set, and re-probe the git config +to see if a UUID has appeared. That seems a small enough price to pay. + +The assistant would also need to be made to handle this. jjjj
followup
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment new file mode 100644 index 0000000000..e6fed5674d --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2026-01-07T16:45:48Z" + content=""" +> Forgejo-aneksajo also creates the repository for requests to /config, and will git-annex-init it if the request comes from a git-annex user agent and the user has write permissions. + +Hmm, then `git-annex pull` will create a repository. Which is going further +than "push to create". + +I do think my idea in comment #2 would be better than how you implemented +that. But it's also not directly relevant to this bug report. +"""]] diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_7_13b7c0b807f6b19be1d2b097fe597f5c._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_7_13b7c0b807f6b19be1d2b097fe597f5c._comment new file mode 100644 index 0000000000..a7b472c66a --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_7_13b7c0b807f6b19be1d2b097fe597f5c._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2026-01-07T16:47:38Z" + content=""" +The www-authenticate header is also sent when the request for `/config` is +a 401. So git-annex can use that to set the wwwauth field. + +The capability fields are indicating capabilities of git. +I checked and git-credential-oauth does not rely on those capabilities. + +(Wildly, git-credential-oauth is looking for "GitLab", "GitHub", and +"Gitea" in order to sniff what backend it's authenticating to, and that's +all it uses the wwwauth for.) +"""]]
Added a comment
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_5_53cc071de11aa604e6eecb68ce15baba._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_5_53cc071de11aa604e6eecb68ce15baba._comment new file mode 100644 index 0000000000..ebb2a6c868 --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_5_53cc071de11aa604e6eecb68ce15baba._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 5" + date="2026-01-06T17:47:53Z" + content=""" +`git push` seems to first make a GET request for something like `/m.risse/test-push-oauth2.git/info/refs?service=git-receive-pack`, which responds with a 401 and `www-authenticate: Basic realm=\"Gitea\"` among the headers. Git then seems to pass this information on to the git-credential-helper. + +`git annex push` likewise receives a 401 response from the `/config` endpoint with the same www-authenticate header, so it could pass it on to the credential helper too. + +I am not sure where the `capability`s are coming from... +"""]]
Added a comment
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_4_094fe78aaf919e54d5457fb3274a023e._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_4_094fe78aaf919e54d5457fb3274a023e._comment new file mode 100644 index 0000000000..d44c824587 --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_4_094fe78aaf919e54d5457fb3274a023e._comment @@ -0,0 +1,52 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 4" + date="2026-01-06T17:36:19Z" + content=""" +The chicken-and-egg problem you are describing is actually something msz has already encountered and reported, but that issue is fixed: Forgejo-aneksajo also creates the repository for requests to /config, and will git-annex-init it if the request comes from a git-annex user agent and the user has write permissions. More about that here: + +- <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/commit/3c53e9803de9c59e9e78ac19f0bb107651bb48f8> +- <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/85> +- <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/83#issuecomment-5093679> and following comments + +So that's not it... I've investigated a bit and I think I led you astray with the comment about a \"non-existing repository\". I am also seeing the issue with a pre-created repository, and even with a pre-created and git-annex-init'ialized repository. + +The issue is actually that for ATRIS I rely on git-credential-oauth's \"Gitea-like-Server\" discovery here: <https://github.com/hickford/git-credential-oauth/blob/f01271d94c70b9280c19f489f90c05e9aba0d757/main.go#L206> + +When doing a `git push origin main` the git-credential-oauth helper actually receives this request: + +``` +$ git push origin main +capability[]=authtype +capability[]=state +protocol=https +host=atris.fz-juelich.de +wwwauth[]=Basic realm=\"Gitea\" +``` + +while with `git annex push` it is just this: + +``` +$ git annex push +protocol=https +host=atris.fz-juelich.de +``` + +Git-credential-oauth recognizes that it is talking to a Gitea/Forgejo server based on this `wwwauth[]=Basic realm=\"Gitea\"` data. Without it and in the absence of a more specific configuration for the server it doesn't try to handle it and falls back to the standard http credential handling of git. I am not sure where these capability and wwwauth fields are coming from, but I think git-annex should somehow do the same as git here... + +--- + +I've gotten at the data git sends to the credential helper with this trivial script: + +``` +$ cat ~/bin/git-credential-echo +#!/usr/bin/env bash + +exec cat >&2 +``` + +and configuring it as my credential helper. + +I have to say, I like this pattern of processes communicating over simple line-based protocols :) +"""]]
comment
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_3_570e6b61adef7c2f8ee0dcdcff225f76._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_3_570e6b61adef7c2f8ee0dcdcff225f76._comment new file mode 100644 index 0000000000..43fc603dae --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_3_570e6b61adef7c2f8ee0dcdcff225f76._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-06T17:28:34Z" + content=""" +Looks like the 401 Unauthorized happens for all non-existent repos when accessing `/config`. + +Eg: + + joey@darkstar:~>curl https://atris.fz-juelich.de/m.risse/joeytestmadeup.git + Not found. + joey@darkstar:~>curl https://atris.fz-juelich.de/m.risse/joeytestmadeup.git/config + Unauthorized + +A bug in Forgejo? +"""]]
corrections
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment index 17c86a0550..dd6cd3e520 100644 --- a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment @@ -6,24 +6,10 @@ git-annex is actually using git credential here. That's where the "Username for" prompt comes from. -I think that this is a chicken and egg problem. git-annex is doing UUID -discovery, which is the first thing it does when run with a new remote that -does not have a UUID. But the repository does not exist, so has no UUID, -and it won't be created until git push happens. - -Deferring git-annex UUID discovery would avoid the problem, but I think -that would be very complicated if possible at all. - -I wonder if there is some way that git-annex could tell, at the http level, -that this URL does not exist yet? If so, it could avoid doing UUID -discovery. Then `git-annex push` would at least be able to push the git -repo. And then on the next run git-annex would discover the UUID and would -be able to fully use the repository. Not an ideal solution perhaps, since -you would need to `git-annex push` twice in a row to fully populate the -repisitory. - -Looks like the url you gave just 404's, but I'm not sure if I'm seeing -now the same as what you would have seen. +Looks like the url you gave 404's. But git-annex is hitting +`https://atris.fz-juelich.de/m.risse/test1.git/config` and getting a 401 +Unauthorized for that. Which is why it is using git credential. +But I'm not sure if I'm seeing now the same now as what you would have seen. @matrs Any chance you could give me access to reproduce this using your server so I could look into that? diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment index eb8396320a..213f93e4a2 100644 --- a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment @@ -3,16 +3,16 @@ subject="""comment 2""" date="2026-01-06T16:39:40Z" content=""" -The chicken and egg problem could be solved by making `git-annex push`, -after pushing the git branches, regenerate the remote list. So if the -branch push created the git repo, any annex uuid that the new repo has -would be discovered at that point. +If the server sent back 404 for the /config hit, then the early UUID +discovery would not prompt with git credential. + +Then, to make "push to create" work smoothly, `git-annex push`, +after pushing the git branches, could regenerate the remote list. So if +the branch push created the git repo, any annex uuid that the new repo +has would be discovered at that point. The remote list regeneration would only need to be done when there are git remotes that don't have a UUID yet. The assistant would also need to be made to do that. - -This, combined with avoiding prompting on 404 in -UUID discovery would make "push to create" work smoothly. """]]
update
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment index 654dc7a04c..eb8396320a 100644 --- a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment @@ -13,7 +13,6 @@ git remotes that don't have a UUID yet. The assistant would also need to be made to do that. -This, combined with avoiding the early -UUID discovery that led to the git-credential prompt, would make -"push to create" work smoothly. +This, combined with avoiding prompting on 404 in +UUID discovery would make "push to create" work smoothly. """]]
comment
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment new file mode 100644 index 0000000000..654dc7a04c --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-01-06T16:39:40Z" + content=""" +The chicken and egg problem could be solved by making `git-annex push`, +after pushing the git branches, regenerate the remote list. So if the +branch push created the git repo, any annex uuid that the new repo has +would be discovered at that point. + +The remote list regeneration would only need to be done when there are +git remotes that don't have a UUID yet. + +The assistant would also need to be made to do that. + +This, combined with avoiding the early +UUID discovery that led to the git-credential prompt, would make +"push to create" work smoothly. +"""]]
verify git sha from ciddb is in git repository
Fix bug that could result in a tree imported from a remote containing
missing git blobs.
When there was a previous import that failed, the cid log gets committed to
the git-annex branch, but no tree is generated.
And so there are GIT keys that point to blobs that are not attached to any
tree, so never get pushed anywhere. So running the same import in another
clone of the repository will result in a tree that references blobs that
are missing.
In the unlikely situation where the ciddb contains a git sha that
is not in the git repository, this makes it just re-download the file from
the remote. Which should be no problem, since these are small files.
This does add a small performance penalty when importing. Existing
GIT keys have to be verified every time. If there are a lot of non-annexed
files in the imported tree, this could be a significant performance
penalty.
But I don't see any good way to prevent the cid log from getting
committed to the git-annex branch in a failing import? If that could be
done, the check could be avoided.
But since this bug has already affected real world repositories, this check
seems to be needed in any case, to make import do the right thing in those
repositories.
Sponsored-by: Dartmouth College's DANDI project
Fix bug that could result in a tree imported from a remote containing
missing git blobs.
When there was a previous import that failed, the cid log gets committed to
the git-annex branch, but no tree is generated.
And so there are GIT keys that point to blobs that are not attached to any
tree, so never get pushed anywhere. So running the same import in another
clone of the repository will result in a tree that references blobs that
are missing.
In the unlikely situation where the ciddb contains a git sha that
is not in the git repository, this makes it just re-download the file from
the remote. Which should be no problem, since these are small files.
This does add a small performance penalty when importing. Existing
GIT keys have to be verified every time. If there are a lot of non-annexed
files in the imported tree, this could be a significant performance
penalty.
But I don't see any good way to prevent the cid log from getting
committed to the git-annex branch in a failing import? If that could be
done, the check could be avoided.
But since this bug has already affected real world repositories, this check
seems to be needed in any case, to make import do the right thing in those
repositories.
Sponsored-by: Dartmouth College's DANDI project
diff --git a/Annex/Import.hs b/Annex/Import.hs
index cda82022a9..67b845ddd5 100644
--- a/Annex/Import.hs
+++ b/Annex/Import.hs
@@ -982,14 +982,23 @@ importKeys remote importtreeconfig importcontent thirdpartypopulated importablec
ImportSubTree subdir _ ->
getTopFilePath subdir </> fromImportLocation loc
- getcidkey cidmap db cid = liftIO $
+ getcidkey cidmap db cid = do
-- Avoiding querying the database when it's empty speeds up
-- the initial import.
- if CIDDb.databaseIsEmpty db
+ l <- liftIO $ if CIDDb.databaseIsEmpty db
then getcidkeymap cidmap cid
else CIDDb.getContentIdentifierKeys db rs cid >>= \case
[] -> getcidkeymap cidmap cid
l -> return l
+ filterM validcidkey l
+
+ -- Guard against a content identifier containing a git sha that is
+ -- not present in the repository. This can happen when a previous,
+ -- import failed and the tree was not recorded, and this import is
+ -- being run in another clone of the repository.
+ validcidkey k = case keyGitSha k of
+ Just sha -> isJust <$> catObjectMetaData sha
+ Nothing -> return True
getcidkeymap cidmap cid =
atomically $ maybeToList . M.lookup cid <$> readTVar cidmap
diff --git a/CHANGELOG b/CHANGELOG
index cd6231b075..8d8605dca5 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -11,6 +11,8 @@ git-annex (10.20251216) UNRELEASED; urgency=medium
on network-multicast or network-info.
* stack.yaml: Update to lts-24.26.
* import: Fix display of some import errors.
+ * Fix bug that could result in a tree imported from a remote containing
+ missing git blobs.
-- Joey Hess <id@joeyh.name> Thu, 01 Jan 2026 12:20:29 -0400
diff --git a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs.mdwn b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs.mdwn
index 938d68ba57..deec27e116 100644
--- a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs.mdwn
+++ b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs.mdwn
@@ -114,4 +114,4 @@ Originally all keys in the bucket
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
-
+> [[fixed|done]] --[[Joey]]
diff --git a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_11_b582afc1f538b76cd7605e80fcd43adb._comment b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_11_b582afc1f538b76cd7605e80fcd43adb._comment
index 93b97e6dd6..41a8d49fec 100644
--- a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_11_b582afc1f538b76cd7605e80fcd43adb._comment
+++ b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_11_b582afc1f538b76cd7605e80fcd43adb._comment
@@ -21,4 +21,10 @@ result:
error: unable to read sha1 file of 1 (d00491fd7e5bb6fa28c517a0bb32b8b506539d4d)
error: unable to read sha1 file of 2 (5716ca5987cbf97d6bb54920bea6adde242d87e6)
error: unable to read sha1 file of 3 (aab959616afa9408f5efc385eb98f63fdb990ba5)
+
+Verified that [[!commit 69e6c4d024dcff7c2f8ea1a2ed3b483a86b2cc7d]] does in
+fact avoid this problem. Running steps 9 and 10 with that commit results in
+a non-broken repository.
+
+Yay, solved!
"""]]
reproduced
diff --git a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_11_b582afc1f538b76cd7605e80fcd43adb._comment b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_11_b582afc1f538b76cd7605e80fcd43adb._comment new file mode 100644 index 0000000000..93b97e6dd6 --- /dev/null +++ b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_11_b582afc1f538b76cd7605e80fcd43adb._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2026-01-06T16:16:56Z" + content=""" +Replicated this problem as follows: + +1. modified `importKeys` to fail at the end +2. set up a directory special remote with importtree=yes +3. git config annex.largefiles nothing +4. run, git-annex import, which fails +5. that left git-annex branch changes in the journal, for `GIT` keys +6. git-annex sync back to origin +7. return `importKeys` to usual behavior +8. make new clone from origin +9. run git-annex import in the new clone +10. merge the imported branch into master + +result: + + error: unable to read sha1 file of 1 (d00491fd7e5bb6fa28c517a0bb32b8b506539d4d) + error: unable to read sha1 file of 2 (5716ca5987cbf97d6bb54920bea6adde242d87e6) + error: unable to read sha1 file of 3 (aab959616afa9408f5efc385eb98f63fdb990ba5) +"""]]
comment
diff --git a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_10_4f08f7a0665bfd30d5c32eb326b04e66._comment b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_10_4f08f7a0665bfd30d5c32eb326b04e66._comment new file mode 100644 index 0000000000..c844e68a3a --- /dev/null +++ b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_10_4f08f7a0665bfd30d5c32eb326b04e66._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 10""" + date="2026-01-06T15:57:43Z" + content=""" +I think that a previous, failed import from the remote, run in a +different clone of the repository than the import that later fails, +could have caused the problem. + +My thinking is, while import is downloading files, the content identifiers +get recorded in the git-annex branch. Only once the import is complete does +the imported tree get grafted into the git-annex branch. So, if the import +fails (or is interrupted), this can leave content identifiers in the log. +The git blobs for small files have already been stored in git, but no tree +references them. If that git-annex branch gets pushed, then in a separate +clone of the repository, running the import again would see those content +identifiers. But the git blobs referenced by them would not have been pushed, +and so would not be available. + +We already know that the import was failing due to the S3 permissions, +so the only other thing that would have been needed is for the git-annex +branch to be pushed to origin, and then this same import tried later in a +different clone. + +@yarikoptic does this seem plausibly what could have happened? +"""]]
comment
diff --git a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_9_672d9ee1ac2db009702b3f307cf93517._comment b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_9_672d9ee1ac2db009702b3f307cf93517._comment new file mode 100644 index 0000000000..f939169eca --- /dev/null +++ b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_9_672d9ee1ac2db009702b3f307cf93517._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2026-01-06T15:46:38Z" + content=""" +I was wrong, `git-annex forget` cannot cause this, since +[[!commit 8e7dc958d20861a91562918e24e071f70d34cf5b]] in 8.20210428 +made exported tree grafts be preserved through a forget. + +This leaves me with no scenario that might cause this problem. Unless +a git-annex version older than that were used. + +I've reverted [[!commit 69e6c4d024dcff7c2f8ea1a2ed3b483a86b2cc7d]] which I +had made to guard against the `git-annex forget` scenario, since it would +slow down imports of trees that contain a lot of small files. + +It still seems possible that commit would have avoided the problem, but +until I understand what actually caused the problem, I don't want to +unncessarily slow git-annex down with an unverified fix for it. +"""]]
Revert "verify git sha from ciddb is in git repository"
This reverts commit 69e6c4d024dcff7c2f8ea1a2ed3b483a86b2cc7d.
git-annex forget cannot cause this problem with any recent version of
git-annex, see commit 8e7dc958d20861a91562918e24e071f70d34cf5b
This reverts commit 69e6c4d024dcff7c2f8ea1a2ed3b483a86b2cc7d.
git-annex forget cannot cause this problem with any recent version of
git-annex, see commit 8e7dc958d20861a91562918e24e071f70d34cf5b
diff --git a/Annex/Import.hs b/Annex/Import.hs
index 6a71538563..cda82022a9 100644
--- a/Annex/Import.hs
+++ b/Annex/Import.hs
@@ -982,22 +982,14 @@ importKeys remote importtreeconfig importcontent thirdpartypopulated importablec
ImportSubTree subdir _ ->
getTopFilePath subdir </> fromImportLocation loc
- getcidkey cidmap db cid = do
+ getcidkey cidmap db cid = liftIO $
-- Avoiding querying the database when it's empty speeds up
-- the initial import.
- l <- liftIO $ if CIDDb.databaseIsEmpty db
+ if CIDDb.databaseIsEmpty db
then getcidkeymap cidmap cid
else CIDDb.getContentIdentifierKeys db rs cid >>= \case
[] -> getcidkeymap cidmap cid
l -> return l
- filterM validcidkey l
-
- -- Guard against a content identifier containing a git sha that is
- -- not present in the repository. It's possible that it's not,
- -- when git-annex forget is used.
- validcidkey k = case keyGitSha k of
- Just sha -> isJust <$> catObjectMetaData sha
- Nothing -> return True
getcidkeymap cidmap cid =
atomically $ maybeToList . M.lookup cid <$> readTVar cidmap
diff --git a/CHANGELOG b/CHANGELOG
index a8ada4f875..cd6231b075 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -11,8 +11,6 @@ git-annex (10.20251216) UNRELEASED; urgency=medium
on network-multicast or network-info.
* stack.yaml: Update to lts-24.26.
* import: Fix display of some import errors.
- * Fix bug importing a tree from a remote after git-annex forget has been
- used, that could result in the imported tree mising git blobs.
-- Joey Hess <id@joeyh.name> Thu, 01 Jan 2026 12:20:29 -0400
diff --git a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_8_4d86764ebd02d547cad7eebbcd116759._comment b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_8_4d86764ebd02d547cad7eebbcd116759._comment
deleted file mode 100644
index 52156dd365..0000000000
--- a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_8_4d86764ebd02d547cad7eebbcd116759._comment
+++ /dev/null
@@ -1,7 +0,0 @@
-[[!comment format=mdwn
- username="joey"
- subject="""comment 8"""
- date="2026-01-02T15:59:56Z"
- content="""
-I've made it deal with the `git-annex forget` scenario now.
-"""]]
response
diff --git a/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_/comment_5_ba6b286216609fe250010b549828f4e4._comment b/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_/comment_5_ba6b286216609fe250010b549828f4e4._comment new file mode 100644 index 0000000000..06e1ed1c8c --- /dev/null +++ b/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_/comment_5_ba6b286216609fe250010b549828f4e4._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-06T15:43:39Z" + content=""" +Currently: `git-annex smudge --update` + +In next release, optionally: `git-annex fix` (can be run on the specific +file if that makes it faster in a large repo) +"""]]
Added a comment
diff --git a/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_/comment_4_42d3bc3283fbe69a8444ca62622b4932._comment b/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_/comment_4_42d3bc3283fbe69a8444ca62622b4932._comment new file mode 100644 index 0000000000..d021437301 --- /dev/null +++ b/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_/comment_4_42d3bc3283fbe69a8444ca62622b4932._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 4" + date="2026-01-02T17:36:34Z" + content=""" +Thank you Joey for looking into it. Since there was a bit of exploration above, in the nutshell, what should the tandem of git-annex command(s) for users to do after `git reset --hard COMMITISH` to \"time travel\" most efficiently (assuming heavy repos)? +"""]]
followup
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment new file mode 100644 index 0000000000..17c86a0550 --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-02T16:42:10Z" + content=""" +git-annex is actually using git credential here. That's +where the "Username for" prompt comes from. + +I think that this is a chicken and egg problem. git-annex is doing UUID +discovery, which is the first thing it does when run with a new remote that +does not have a UUID. But the repository does not exist, so has no UUID, +and it won't be created until git push happens. + +Deferring git-annex UUID discovery would avoid the problem, but I think +that would be very complicated if possible at all. + +I wonder if there is some way that git-annex could tell, at the http level, +that this URL does not exist yet? If so, it could avoid doing UUID +discovery. Then `git-annex push` would at least be able to push the git +repo. And then on the next run git-annex would discover the UUID and would +be able to fully use the repository. Not an ideal solution perhaps, since +you would need to `git-annex push` twice in a row to fully populate the +repisitory. + +Looks like the url you gave just 404's, but I'm not sure if I'm seeing +now the same as what you would have seen. + +@matrs Any chance you could give me access to reproduce this using your +server so I could look into that? +"""]]
comment
diff --git a/doc/bugs/some_conflict_resolution_tests_fail_some_time/comment_4_f9f79c336e6887c718c04608300b6040._comment b/doc/bugs/some_conflict_resolution_tests_fail_some_time/comment_4_f9f79c336e6887c718c04608300b6040._comment new file mode 100644 index 0000000000..de162d3aa3 --- /dev/null +++ b/doc/bugs/some_conflict_resolution_tests_fail_some_time/comment_4_f9f79c336e6887c718c04608300b6040._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-02T16:35:22Z" + content=""" +Nothing changed on the git-annex side I'm pretty sure that would have fixed +this. + +I am inclined to chalk this up to something having crashed in some way on +that machine, and the problem later clearing up. Ugh. +"""]]
verify git sha from ciddb is in git repository
Fix bug importing a tree from a remote after git-annex forget has been
used, that could result in the imported tree mising git blobs.
In the unlikely situation where the ciddb contains a git sha that
is not in the git repository, this makes it just re-download the file from
the remote. Which should be no problem, since these are small files.
Sponsored-by: Dartmouth College's DANDI project
Fix bug importing a tree from a remote after git-annex forget has been
used, that could result in the imported tree mising git blobs.
In the unlikely situation where the ciddb contains a git sha that
is not in the git repository, this makes it just re-download the file from
the remote. Which should be no problem, since these are small files.
Sponsored-by: Dartmouth College's DANDI project
diff --git a/Annex/Import.hs b/Annex/Import.hs
index cda82022a9..6a71538563 100644
--- a/Annex/Import.hs
+++ b/Annex/Import.hs
@@ -982,14 +982,22 @@ importKeys remote importtreeconfig importcontent thirdpartypopulated importablec
ImportSubTree subdir _ ->
getTopFilePath subdir </> fromImportLocation loc
- getcidkey cidmap db cid = liftIO $
+ getcidkey cidmap db cid = do
-- Avoiding querying the database when it's empty speeds up
-- the initial import.
- if CIDDb.databaseIsEmpty db
+ l <- liftIO $ if CIDDb.databaseIsEmpty db
then getcidkeymap cidmap cid
else CIDDb.getContentIdentifierKeys db rs cid >>= \case
[] -> getcidkeymap cidmap cid
l -> return l
+ filterM validcidkey l
+
+ -- Guard against a content identifier containing a git sha that is
+ -- not present in the repository. It's possible that it's not,
+ -- when git-annex forget is used.
+ validcidkey k = case keyGitSha k of
+ Just sha -> isJust <$> catObjectMetaData sha
+ Nothing -> return True
getcidkeymap cidmap cid =
atomically $ maybeToList . M.lookup cid <$> readTVar cidmap
diff --git a/CHANGELOG b/CHANGELOG
index cd6231b075..a8ada4f875 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -11,6 +11,8 @@ git-annex (10.20251216) UNRELEASED; urgency=medium
on network-multicast or network-info.
* stack.yaml: Update to lts-24.26.
* import: Fix display of some import errors.
+ * Fix bug importing a tree from a remote after git-annex forget has been
+ used, that could result in the imported tree mising git blobs.
-- Joey Hess <id@joeyh.name> Thu, 01 Jan 2026 12:20:29 -0400
diff --git a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_8_4d86764ebd02d547cad7eebbcd116759._comment b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_8_4d86764ebd02d547cad7eebbcd116759._comment
new file mode 100644
index 0000000000..52156dd365
--- /dev/null
+++ b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_8_4d86764ebd02d547cad7eebbcd116759._comment
@@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 8"""
+ date="2026-01-02T15:59:56Z"
+ content="""
+I've made it deal with the `git-annex forget` scenario now.
+"""]]
comment
diff --git a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_7_e144dd5bab56646d07043de394b5f44b._comment b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_7_e144dd5bab56646d07043de394b5f44b._comment new file mode 100644 index 0000000000..95b5345ca6 --- /dev/null +++ b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_7_e144dd5bab56646d07043de394b5f44b._comment @@ -0,0 +1,29 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2026-01-02T15:27:57Z" + content=""" +One way I can see that this might happen is if `git-annex forget` +has been used, after a previous export/import. + +In that case, the content identifier database would be populated with a +GIT key, which would be used instead of downloading the file to be +imported. Resulting in a git sha being used, which could not be present in +the git repository. Because while the git-annex branch usually gets +imported/exported trees linked into it, `git-annex forget` erases that. + +So a possible scenario: + + git-annex export or import + git-annex forget + pushing git-annex branch to somewhere + in a separate git clone, pulling that git-annex branch + git-annex import + +That is worth trying to replicate. But it seems pretty unlikely to me that +is what you actually did ...? + +Leaving aside the possibility that `git hash-object` might be buggy and not +record the object in the git repository, that's the only way I can find for +this to possibly happen, after staring at the code for far too long. +"""]]
comment
diff --git a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_6_7d0415ce72f0c9a609dc9ebb87dc69eb._comment b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_6_7d0415ce72f0c9a609dc9ebb87dc69eb._comment
new file mode 100644
index 0000000000..76d73e52e1
--- /dev/null
+++ b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_6_7d0415ce72f0c9a609dc9ebb87dc69eb._comment
@@ -0,0 +1,29 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 6"""
+ date="2026-01-02T15:04:03Z"
+ content="""
+I was able to set up this same special remote myself (manually populating remote.log)
+and use with my own S3 creds (which of course have no special access rights to this bucket
+so it was all public access only), importing into a fresh repository.
+
+Part of that import included:
+
+ import s3-dandiarchive 000345/draft/dandiset.yaml
+ HttpExceptionRequest Request {
+ [...]
+ (StatusCodeException (Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("x-amz-request-id","T0PNM10TN8STRTK4"),("x-amz-id-2","pqZXYNtU9T0mQxmHvtBjr2weztjwWwP3GleV7Jy5P3DcZbCi7Mt4Kzqo1wpPj9Zy85cZ3CUPHro="),("Content-Type","application/xml"),("Transfer-Encoding","chunked"),("Date","Fri, 02 Jan 2026 15:01:16 GMT"),("Server","AmazonS3")], responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose, responseOriginalRequest = Request {
+ [...]
+ , responseEarlyHints = []}) "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>T0PNM10TN8STRTK4</RequestId><HostId>pqZXYNtU9T0mQxmHvtBjr2weztjwWwP3GleV7Jy5P3DcZbCi7Mt4Kzqo1wpPj9Zy85cZ3CUPHro=</HostId></Error>")
+ ok
+
+But, the import ended with:
+
+ Failed to import some files from s3-dandiarchive. Re-run command to resume import.
+
+And did not create a branch, so I have not been able to reproduce the
+problem.
+
+Digging into why it says "ok" there, that was unfortunately only a display
+problem. Corrected that.
+"""]]
comment
diff --git a/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_5_d60e3214f167ef42f76938738acf135b._comment b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_5_d60e3214f167ef42f76938738acf135b._comment new file mode 100644 index 0000000000..968081e5bf --- /dev/null +++ b/doc/bugs/s3_imported_branch_is___34__git_buggy__34____58____bad_blobs/comment_5_d60e3214f167ef42f76938738acf135b._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-02T14:43:14Z" + content=""" +All being small files does make me think this bug is somehow specfic to +adding the files to git. So it would be very useful to re-run the +reproducer again, with annex.largefiles this time configured so everything +is annexed. + +> > And when you replicated the problem from the backup, were you using it in the configuration where it cannot access those? +> +> if I got the question right and since I do not recall now -- judging from me using `( source .git/secrets.env; git-annex import master...` I think I was with credentials allowing to access them (hence no errors while importing) + +Well that's why I asked. It's not clear to me if it ever did show a failure, +when used in the configuration where it couldn't access the files. + +It seems equally likely that it somehow incorrectly thought it succeeded. +"""]]
response
diff --git a/doc/tips/offline_archive_drives/comment_9_ed4ebae6bb903dcb6447dd3efe6c1617._comment b/doc/tips/offline_archive_drives/comment_9_ed4ebae6bb903dcb6447dd3efe6c1617._comment new file mode 100644 index 0000000000..574703d9ad --- /dev/null +++ b/doc/tips/offline_archive_drives/comment_9_ed4ebae6bb903dcb6447dd3efe6c1617._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: Directory remotes in offline drives for archiving?""" + date="2026-01-02T14:29:49Z" + content=""" +The only time git-annex will complain about being unable to lock down a file on +a remote is when you are dropping a file from a special remote, and the only +copy is in another special remote. + + drop foo (from dirremote...) (unsafe) + Unable to lock down 1 copy of file necessary to safely drop it. + + These remotes do not support locking: otherdirremote + + (Use --force to override this check, or adjust numcopies.) + +In that situation, you can either use `--force` or `git-annex get` the file, +then drop from the remote, and then drop the file from the local repository. +The latter avoids any possible concurrency problems, but `--force` is of +course faster, and would be fine in your situation. + +Dropping a file from a local repository that is present in a special remote +does not have this problem. +"""]]
dumbpipe version
diff --git a/doc/special_remotes/p2p/git-annex-p2p-iroh b/doc/special_remotes/p2p/git-annex-p2p-iroh index 83be015c6f..f03f5ae0b6 100755 --- a/doc/special_remotes/p2p/git-annex-p2p-iroh +++ b/doc/special_remotes/p2p/git-annex-p2p-iroh @@ -1,8 +1,8 @@ #!/bin/sh # Allows git-annex to use iroh for P2P connections. # -# This uses iroh's dumbpipe program. It needs a version with the -# generate-ticket command, which was added in this pull request: +# This uses iroh's dumbpipe program. It needs version 0.33 or newer, +# with the generate-ticket command, which was added in this pull request: # https://github.com/n0-computer/dumbpipe/pull/86 # # Copyright 2025 Joey Hess; licenced under the GNU GPL version 3 or higher.
comment
diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_3_9e31745d4e890b2d0fe8d997c9bf169a._comment b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_3_9e31745d4e890b2d0fe8d997c9bf169a._comment new file mode 100644 index 0000000000..e90c172b60 --- /dev/null +++ b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_3_9e31745d4e890b2d0fe8d997c9bf169a._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-01T18:45:26Z" + content=""" +Looks to me like arch is no longer stuck on the old 9.4.8 ghc but has a +slightly newer 9.6.6. Which is the same as Debian stable. + +So, I am probably going to make git-annex only support back to that +version, to simplify things. + +Please let me know if I have misunderstood the situation in arch land.. +"""]]
correct link to arch linux package
It seems to have moved sections? Don't understand arch
Also, remove the non-official packages, which all seem very old or gone.
It seems to have moved sections? Don't understand arch
Also, remove the non-official packages, which all seem very old or gone.
diff --git a/doc/install/ArchLinux.mdwn b/doc/install/ArchLinux.mdwn index 6919c4cbec..303904a9a8 100644 --- a/doc/install/ArchLinux.mdwn +++ b/doc/install/ArchLinux.mdwn @@ -1,21 +1,3 @@ -There is now an [official git-annex package for Arch](https://www.archlinux.org/packages/community/x86_64/git-annex/), so to install it: +There is now an [official git-annex package for Arch](https://www.archlinux.org/packages/extra/x86_64/git-annex/), so to install it: pacman -S git-annex - -There are at least three non non-official packages for git-annex in the Arch Linux User Repository. Any of these may be installed manually per [AUR guidelines](https://wiki.archlinux.org/index.php/AUR_User_Guidelines#Installing_packages) or using a wrapper such as [`yaourt`](https://wiki.archlinux.org/index.php/yaourt) shown below. - -1. A git-annex package is available in the haskell-core AUR <https://wiki.archlinux.org/index.php/ArchHaskell> - -2. A development package is available at [git-annex-git](https://aur.archlinux.org/packages/git-annex-git/) that functions similarly to the source package but builds directly from the HEAD of the git repository rather that the last official release. - - $ yaourt -Sy git-annex-git - -3. A Cabal sandbox build is also available - - $ yaourt -Sy git-annex-cabal - -Finally you may choose to forgo the Arch Linux package system entirely and install git-annex directly through cabal. - - $ pacman -S git rsync curl wget gnupg openssh cabal-install - $ cabal update - $ cabal install git-annex --bindir=$HOME/bin
found it
diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_3_2818941822cf1c1563c420e4d055dd4b._comment b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_3_2818941822cf1c1563c420e4d055dd4b._comment deleted file mode 100644 index 8ba801115d..0000000000 --- a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_3_2818941822cf1c1563c420e4d055dd4b._comment +++ /dev/null @@ -1,9 +0,0 @@ -[[!comment format=mdwn - username="joey" - subject="""comment 3""" - date="2026-01-01T18:33:42Z" - content=""" -@caleb from what I can see there is no current version of git-annex -packaged in Arch, at least <https://aur.archlinux.org/packages?O=0&SeB=nd&K=git-annex&outdated=&SB=p&SO=d&PP=50&submit=Go> -only has old stuff. Where did your package go? -"""]]
comment
diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_3_2818941822cf1c1563c420e4d055dd4b._comment b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_3_2818941822cf1c1563c420e4d055dd4b._comment new file mode 100644 index 0000000000..8ba801115d --- /dev/null +++ b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_3_2818941822cf1c1563c420e4d055dd4b._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-01T18:33:42Z" + content=""" +@caleb from what I can see there is no current version of git-annex +packaged in Arch, at least <https://aur.archlinux.org/packages?O=0&SeB=nd&K=git-annex&outdated=&SB=p&SO=d&PP=50&submit=Go> +only has old stuff. Where did your package go? +"""]]
Remove support for building with old versions of persistent-sqlite
Old versions of persistent-sqlite don't properly support non-ascii
paths when run in a non-unicode locale. So this both simplifies the code
and avoids buggy behavior.
Old versions of persistent-sqlite don't properly support non-ascii
paths when run in a non-unicode locale. So this both simplifies the code
and avoids buggy behavior.
diff --git a/BuildFlags.hs b/BuildFlags.hs
index d4a3a4f73e..60f240c368 100644
--- a/BuildFlags.hs
+++ b/BuildFlags.hs
@@ -80,7 +80,6 @@ dependencyVersions = map fmt $ sortBy (comparing (CI.mk . fst))
, ("uuid", VERSION_uuid)
, ("bloomfilter", VERSION_bloomfilter)
, ("http-client", VERSION_http_client)
- , ("persistent-sqlite", VERSION_persistent_sqlite)
, ("crypton", VERSION_crypton)
, ("aws", VERSION_aws)
, ("DAV", VERSION_DAV)
diff --git a/CHANGELOG b/CHANGELOG
index 85aa0528a3..96d3df4f89 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -3,6 +3,7 @@ git-annex (10.20251216) UNRELEASED; urgency=medium
* fix: Populate unlocked pointer files in situations where a git command,
like git reset or git stash, leaves them unpopulated.
* When displaying sqlite error messages, include the path to the database.
+ * Remove support for building with old versions of persistent-sqlite.
-- Joey Hess <id@joeyh.name> Thu, 01 Jan 2026 12:20:29 -0400
diff --git a/Database/ContentIdentifier.hs b/Database/ContentIdentifier.hs
index 4fdfd5b292..e5a701ba3f 100644
--- a/Database/ContentIdentifier.hs
+++ b/Database/ContentIdentifier.hs
@@ -5,7 +5,6 @@
- Licensed under the GNU AGPL version 3 or higher.
-}
-{-# LANGUAGE CPP #-}
{-# LANGUAGE QuasiQuotes, TypeFamilies, TypeOperators, TemplateHaskell #-}
{-# LANGUAGE OverloadedStrings, GADTs, FlexibleContexts, EmptyDataDecls #-}
{-# LANGUAGE MultiParamTypeClasses, GeneralizedNewtypeDeriving #-}
@@ -50,13 +49,7 @@ import qualified Logs.ContentIdentifier as Log
import Database.Persist.Sql hiding (Key)
import Database.Persist.TH
-
-#if MIN_VERSION_persistent_sqlite(2,13,3)
import Database.RawFilePath
-#else
-import Database.Persist.Sqlite (runSqlite)
-import qualified Data.Text as T
-#endif
data ContentIdentifierHandle = ContentIdentifierHandle H.DbQueue Bool
@@ -103,13 +96,8 @@ openDb = do
runMigrationSilent migrateContentIdentifier
-- Migrate from old versions of database, which had buggy
-- and suboptimal uniqueness constraints.
-#if MIN_VERSION_persistent_sqlite(2,13,3)
else liftIO $ runSqlite' (fromOsPath db) $ void $
runMigrationSilent migrateContentIdentifier
-#else
- else liftIO $ runSqlite (T.pack (fromRawFilePath db)) $ void $
- runMigrationSilent migrateContentIdentifier
-#endif
h <- liftIO $ H.openDbQueue db "content_identifiers"
return $ ContentIdentifierHandle h isnew
diff --git a/Database/Handle.hs b/Database/Handle.hs
index f859467b8e..135811ca86 100644
--- a/Database/Handle.hs
+++ b/Database/Handle.hs
@@ -195,11 +195,7 @@ runSqliteRobustly tablename db a = do
| otherwise -> rethrow $ errmsg ("after successful sqlite database " ++ fromOsPath (safeOutput db) ++ " open") ex
opensettle retries ic = do
-#if MIN_VERSION_persistent_sqlite(2,13,3)
conn <- Sqlite.open' (fromOsPath db)
-#else
- conn <- Sqlite.open (T.pack (fromOsPath db))
-#endif
settle conn retries ic
settle conn retries ic = do
diff --git a/Database/Init.hs b/Database/Init.hs
index eab3a6f32d..c516c89c76 100644
--- a/Database/Init.hs
+++ b/Database/Init.hs
@@ -5,7 +5,7 @@
- Licensed under the GNU AGPL version 3 or higher.
-}
-{-# LANGUAGE OverloadedStrings, CPP #-}
+{-# LANGUAGE OverloadedStrings #-}
module Database.Init where
@@ -13,9 +13,7 @@ import Annex.Common
import Annex.Perms
import Utility.FileMode
import qualified Utility.RawFilePath as R
-#if MIN_VERSION_persistent_sqlite(2,13,3)
import Database.RawFilePath
-#endif
import Database.Persist.Sqlite
import Lens.Micro
@@ -36,11 +34,7 @@ initDb db migration = do
let tmpdb = tmpdbdir </> literalOsPath "db"
let tmpdb' = fromOsPath tmpdb
createAnnexDirectory tmpdbdir
-#if MIN_VERSION_persistent_sqlite(2,13,3)
liftIO $ runSqliteInfo' tmpdb' (enableWAL tmpdb) migration
-#else
- liftIO $ runSqliteInfo (enableWAL tmpdb) migration
-#endif
setAnnexDirPerm tmpdbdir
-- Work around sqlite bug that prevents it from honoring
-- less restrictive umasks.
diff --git a/Database/RawFilePath.hs b/Database/RawFilePath.hs
index e154b74a3a..fdedf65762 100644
--- a/Database/RawFilePath.hs
+++ b/Database/RawFilePath.hs
@@ -31,11 +31,10 @@
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-}
-{-# LANGUAGE OverloadedStrings, CPP #-}
+{-# LANGUAGE OverloadedStrings #-}
module Database.RawFilePath where
-#if MIN_VERSION_persistent_sqlite(2,13,3)
import Database.Persist.Sqlite
import qualified Database.Sqlite as Sqlite
import Utility.RawFilePath (RawFilePath)
@@ -92,4 +91,3 @@ withSqliteConnInfo'
-> (SqlBackend -> m a)
-> m a
withSqliteConnInfo' db = withSqlConn . openWith' db const
-#endif
diff --git a/doc/bugs/Get_crashes_when_remote_contains_non-english_chars.mdwn b/doc/bugs/Get_crashes_when_remote_contains_non-english_chars.mdwn
index fe6098f754..3610ac03f3 100644
--- a/doc/bugs/Get_crashes_when_remote_contains_non-english_chars.mdwn
+++ b/doc/bugs/Get_crashes_when_remote_contains_non-english_chars.mdwn
@@ -1,6 +1,7 @@
Hi,
### Please describe the problem.
+
I'm trying to set up a git-annex repo for my books/technical papers to have easy access to them on my desktop and laptop. I'm using a centralized server (following [this guide](https://git-annex.branchable.com/tips/centralized_git_repository_tutorial/on_your_own_server/)) to make it easy to sync between my machines.
The issue is however that sqlite crashes when I'm trying to get a file from my server. See the log further down for the error message. I'm suspecting it is due to the repo on my server is named `Böcker` (swedish name for books). It does work if I'm cloning it locally on my server. E.g.
@@ -105,3 +106,11 @@ I'm not giving up on this that easily. Worst case I'll just rename my repo on my
Thank you for all the hours developing this software!
+> This seems to be the same bug that was fixed in [[!commit 8a3beabf350899e369dcd57a72432930581fbc25]].
+> and released in version 10.20231227. While this bug actually has a fixed
+> version of git-annex, the version output shows it was built with too
+> old a version of persistent-sqlite to get the fix.
+>
+> I've now updated git-annex's build deps, so all future versions will
+> be with a sufficiently new persistent-sqlite to not have this problem.
+> [[done]] --[[Joey]]
diff --git a/git-annex.cabal b/git-annex.cabal
index f0fdb7c031..1006e7a59a 100644
--- a/git-annex.cabal
+++ b/git-annex.cabal
@@ -245,7 +245,7 @@ Executable git-annex
conduit,
time (>= 1.9.1),
old-locale,
- persistent-sqlite (>= 2.8.1),
+ persistent-sqlite (>= 2.13.3),
persistent (>= 2.8.1),
persistent-template (>= 2.8.0),
unliftio-core,
When displaying sqlite error messages, include the path to the database
diff --git a/CHANGELOG b/CHANGELOG
index 476c305d8f..85aa0528a3 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -2,6 +2,7 @@ git-annex (10.20251216) UNRELEASED; urgency=medium
* fix: Populate unlocked pointer files in situations where a git command,
like git reset or git stash, leaves them unpopulated.
+ * When displaying sqlite error messages, include the path to the database.
-- Joey Hess <id@joeyh.name> Thu, 01 Jan 2026 12:20:29 -0400
diff --git a/Database/Handle.hs b/Database/Handle.hs
index ff358f7588..f859467b8e 100644
--- a/Database/Handle.hs
+++ b/Database/Handle.hs
@@ -1,6 +1,6 @@
{- Persistent sqlite database handles.
-
- - Copyright 2015-2023 Joey Hess <id@joeyh.name>
+ - Copyright 2015-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -24,6 +24,7 @@ import Utility.Debug
import Utility.DebugLocks
import Utility.InodeCache
import Utility.OsPath
+import Utility.SafeOutput
import Database.Persist.Sqlite
import qualified Database.Sqlite as Sqlite
@@ -78,7 +79,7 @@ closeDb (DbHandle _db worker jobs _) = do
- it is able to run.
-}
queryDb :: DbHandle -> SqlPersistM a -> IO a
-queryDb (DbHandle _db _ jobs errvar) a = do
+queryDb (DbHandle db _ jobs errvar) a = do
res <- newEmptyMVar
putMVar jobs $ QueryJob $
debugLocks $ liftIO . putMVar res =<< tryNonAsync a
@@ -86,7 +87,7 @@ queryDb (DbHandle _db _ jobs errvar) a = do
Right r -> either throwIO return r
Left BlockedIndefinitelyOnMVar -> do
err <- takeMVar errvar
- giveup $ "sqlite worker thread crashed: " ++ err
+ giveup $ "sqlite worker thread for " ++ fromOsPath (safeOutput db) ++ " crashed: " ++ err
{- Writes a change to the database.
-
@@ -111,7 +112,7 @@ commitDb h@(DbHandle db _ _ errvar) wa =
robustly a
Left BlockedIndefinitelyOnMVar -> do
err <- takeMVar errvar
- giveup $ "sqlite worker thread crashed: " ++ err
+ giveup $ "sqlite worker thread for " ++ fromOsPath (safeOutput db) ++ " crashed: " ++ err
briefdelay = 100000 -- 1/10th second
@@ -191,7 +192,7 @@ runSqliteRobustly tablename db a = do
briefdelay
retryHelper "access" ex maxretries db retries ic $
go conn
- | otherwise -> rethrow $ errmsg "after successful open" ex
+ | otherwise -> rethrow $ errmsg ("after successful sqlite database " ++ fromOsPath (safeOutput db) ++ " open") ex
opensettle retries ic = do
#if MIN_VERSION_persistent_sqlite(2,13,3)
@@ -217,7 +218,7 @@ runSqliteRobustly tablename db a = do
if e == Sqlite.ErrorIO
then opensettle
else settle conn
- | otherwise -> rethrow $ errmsg "while opening database connection" ex
+ | otherwise -> rethrow $ errmsg ("while opening sqlite database " ++ fromOsPath (safeOutput db) ++ " connection") ex
-- This should succeed for any table.
nullselect = T.pack $ "SELECT null from " ++ tablename ++ " limit 1"
@@ -274,7 +275,7 @@ closeRobustly db conn = go maxretries emptyDatabaseInodeCache
| e == Sqlite.ErrorBusy -> do
threadDelay briefdelay
retryHelper "close" ex maxretries db retries ic go
- | otherwise -> rethrow $ errmsg "while closing database connection" ex
+ | otherwise -> rethrow $ errmsg ("while closing sqlite database " ++ fromOsPath (safeOutput db) ++ " connection") ex
briefdelay = 1000 -- 1/1000th second
@@ -312,7 +313,7 @@ retryHelper action err maxretries db retries ic a = do
databaseAccessStalledMsg :: Show err => String -> OsPath -> err -> String
databaseAccessStalledMsg action db err =
- "Repeatedly unable to " ++ action ++ " sqlite database " ++ fromOsPath db
+ "Repeatedly unable to " ++ action ++ " sqlite database " ++ fromOsPath (safeOutput db)
++ ": " ++ show err ++ ". "
++ "Perhaps another git-annex process is suspended and is "
++ "keeping this database locked?"
diff --git a/doc/bugs/SQLite3_database_disk_image_malformed.mdwn b/doc/bugs/SQLite3_database_disk_image_malformed.mdwn
index ca25c815f4..6e38637d05 100644
--- a/doc/bugs/SQLite3_database_disk_image_malformed.mdwn
+++ b/doc/bugs/SQLite3_database_disk_image_malformed.mdwn
@@ -41,3 +41,5 @@ The only SQLite3 database I can find is in .git/annex/keysdb . I can open that u
I've been happily using git-annex for many many years, first time I've encountered an issue like this.
+> Calling this [[done]] since the sqlite error messages have been improved.
+> --[[Joey]]
diff --git a/doc/bugs/SQLite3_database_disk_image_malformed/comment_6_e16f300193b36db6793d9d6e2808e56a._comment b/doc/bugs/SQLite3_database_disk_image_malformed/comment_6_e16f300193b36db6793d9d6e2808e56a._comment
new file mode 100644
index 0000000000..c610260d53
--- /dev/null
+++ b/doc/bugs/SQLite3_database_disk_image_malformed/comment_6_e16f300193b36db6793d9d6e2808e56a._comment
@@ -0,0 +1,15 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 6"""
+ date="2026-01-01T17:58:00Z"
+ content="""
+> A useful thing to display might be the path to the corrupted database file and advice to remove it?
+
+Good idea to display the path. I've made that change.
+
+I don't think I want to make git-annex suggest deleting sqlite databases
+anytime sqlite crashes for any reason. While they are safe to delete,
+that encourages users to shrug and move on and tends to normalize any
+problem with sqlite. In reality, problems with sqlite are very rare,
+and I'd like to hear about them and understand them.
+"""]]
response
diff --git a/doc/bugs/SQLite3_database_disk_image_malformed/comment_5_2f6a291a2bb37000f6e3b757a00a0713._comment b/doc/bugs/SQLite3_database_disk_image_malformed/comment_5_2f6a291a2bb37000f6e3b757a00a0713._comment
new file mode 100644
index 0000000000..4c90a0235e
--- /dev/null
+++ b/doc/bugs/SQLite3_database_disk_image_malformed/comment_5_2f6a291a2bb37000f6e3b757a00a0713._comment
@@ -0,0 +1,21 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 5"""
+ date="2026-01-01T17:29:54Z"
+ content="""
+Your previous problem with the sqlite database cannot have caused fsck to
+detect a checksum problem with your annexed file.
+
+It looks like you have somehow modified annex object files, eg files in
+`.git/annex/objects`. git-annex sets permissions that usually prevent such
+a thing from happening.
+
+There is no way to make git-annex accept a version of a file with a different
+checksum than the one recorded in git. Instead you need to `git-annex add` the
+new version of the files to the repository in place of the old version.
+
+Here is a bash script that will pull the files out of `.git/annex/bad/`
+and update the annexed files:
+
+ IFS=$'\n'; for x in $(git-annex find --format='${key}\n${file}\n'); do if [ "$l" ]; then f="$x"; l=; if [ -e ".git/annex/bad/$k" ]; then mv ".git/annex/bad/$k" "$f"; git-annex add "$f" ; fi; else k="$x"; l=1; fi; done
+"""]]
improve synopsis for fix
It operates on pointers, whether those are symlinks or unlocked pointer
files.
It operates on pointers, whether those are symlinks or unlocked pointer
files.
diff --git a/Command/Fix.hs b/Command/Fix.hs index 05292059e5..2852fac9a3 100644 --- a/Command/Fix.hs +++ b/Command/Fix.hs @@ -29,7 +29,7 @@ import Utility.Touch cmd :: Command cmd = noCommit $ withAnnexOptions [annexedMatchingOptions, jsonOptions] $ command "fix" SectionMaintenance - "fix up links to annexed content" + "fix up pointers to annexed content" paramPaths (withParams seek) seek :: CmdParams -> CommandSeek diff --git a/doc/git-annex-fix.mdwn b/doc/git-annex-fix.mdwn index 1ac2165c89..e1ec3fc771 100644 --- a/doc/git-annex-fix.mdwn +++ b/doc/git-annex-fix.mdwn @@ -1,6 +1,6 @@ # NAME -git-annex fix - fix up links to annexed content +git-annex fix - fix up pointers to annexed content # SYNOPSIS @@ -13,8 +13,9 @@ content. This is useful to run manually when you have been moving the symlinks around, but is done automatically when committing a change with git too. -Also, populates unlocked files with annexed content. Usually this happens -automatically, but some git commands can leave them as unpopulated. +Also, populates unlocked pointer files with annexed content. +Usually this happens automatically, but some git commands can leave them +unpopulated. Also, adjusts unlocked files to be copies or hard links as configured by annex.thin.
fix: handle unlocked pointer files
fix: Populate unlocked pointer files in situations where a git command,
like git reset or git stash, leaves them unpopulated.
populatePointerFile' is safe to use here because seeking has found the
key, and isPointerFile is checked just before calling it.
fix: Populate unlocked pointer files in situations where a git command,
like git reset or git stash, leaves them unpopulated.
populatePointerFile' is safe to use here because seeking has found the
key, and isPointerFile is checked just before calling it.
diff --git a/Annex/Content/PointerFile.hs b/Annex/Content/PointerFile.hs
index 51c431d5ad..4c0743c2b4 100644
--- a/Annex/Content/PointerFile.hs
+++ b/Annex/Content/PointerFile.hs
@@ -1,6 +1,6 @@
{- git-annex pointer files
-
- - Copyright 2010-2018 Joey Hess <id@joeyh.name>
+ - Copyright 2010-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -33,21 +33,29 @@ import System.PosixCompat.Files (fileMode)
populatePointerFile :: Restage -> Key -> OsPath -> OsPath -> Annex (Maybe InodeCache)
populatePointerFile restage k obj f = go =<< liftIO (isPointerFile f)
where
- go (Just k') | k == k' = do
- destmode <- liftIO $ catchMaybeIO $
- fileMode <$> R.getFileStatus (fromOsPath f)
- (ic, populated) <- replaceWorkTreeFile f $ \tmp -> do
- ok <- linkOrCopy k obj tmp destmode >>= \case
- Just _ -> thawContent tmp >> return True
- Nothing -> liftIO (writePointerFile tmp k destmode) >> return False
- ic <- withTSDelta (liftIO . genInodeCache tmp)
- return (ic, ok)
- maybe noop (restagePointerFile restage f) ic
- if populated
- then return ic
- else return Nothing
+ go (Just k') | k == k' = populatePointerFile' restage k obj f
go _ = return Nothing
+{- Before calling, must verify that the pointer file is a pointer to the key.
+ -
+ - This returns Nothing when populating the pointer file fails due to eg,
+ - not enough disk space.
+ -}
+populatePointerFile' :: Restage -> Key -> OsPath -> OsPath -> Annex (Maybe InodeCache)
+populatePointerFile' restage k obj f = do
+ destmode <- liftIO $ catchMaybeIO $
+ fileMode <$> R.getFileStatus (fromOsPath f)
+ (ic, populated) <- replaceWorkTreeFile f $ \tmp -> do
+ ok <- linkOrCopy k obj tmp destmode >>= \case
+ Just _ -> thawContent tmp >> return True
+ Nothing -> liftIO (writePointerFile tmp k destmode) >> return False
+ ic <- withTSDelta (liftIO . genInodeCache tmp)
+ return (ic, ok)
+ maybe noop (restagePointerFile restage f) ic
+ if populated
+ then return ic
+ else return Nothing
+
{- Removes the content from a pointer file, replacing it with a pointer.
-
- Does not check if the pointer file is modified. -}
diff --git a/CHANGELOG b/CHANGELOG
index a034823796..476c305d8f 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,3 +1,10 @@
+git-annex (10.20251216) UNRELEASED; urgency=medium
+
+ * fix: Populate unlocked pointer files in situations where a git command,
+ like git reset or git stash, leaves them unpopulated.
+
+ -- Joey Hess <id@joeyh.name> Thu, 01 Jan 2026 12:20:29 -0400
+
git-annex (10.20251215) upstream; urgency=medium
* Added annex.trashbin configuration.
diff --git a/Command/Fix.hs b/Command/Fix.hs
index a12747ee49..05292059e5 100644
--- a/Command/Fix.hs
+++ b/Command/Fix.hs
@@ -1,6 +1,6 @@
{- git-annex command
-
- - Copyright 2010-2015 Joey Hess <id@joeyh.name>
+ - Copyright 2010-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -14,6 +14,7 @@ import Config
import qualified Annex
import Annex.ReplaceFile
import Annex.Content
+import Annex.Content.PointerFile
import Annex.Perms
import Annex.Link
import qualified Database.Keys
@@ -54,11 +55,24 @@ start fixwhat si file key = do
fixby $ fixSymlink file wantlink
| otherwise -> stop
Nothing -> case fixwhat of
- FixAll -> fixthin
+ FixAll -> fixpointers
FixSymlinks -> stop
where
file' = fromOsPath file
+
fixby = starting "fix" (mkActionItem (key, file)) si
+
+ fixpointers =
+ ifM (isJust <$> liftIO (isPointerFile file))
+ ( stopUnless (inAnnex key) $ fixby $ do
+ obj <- calcRepo (gitAnnexLocation key)
+ populatePointerFile' QueueRestage key obj file >>= \case
+ Just ic -> Database.Keys.addInodeCaches key [ic]
+ Nothing -> giveup "not enough disk space to populate pointer file"
+ next $ return True
+ , fixthin
+ )
+
fixthin = do
obj <- calcRepo (gitAnnexLocation key)
stopUnless (isUnmodified key file <&&> isUnmodified key obj) $ do
@@ -71,7 +85,6 @@ start fixwhat si file key = do
(Just n, Just n', False) | n > 1 && n == n' ->
fixby $ breakHardLink file key obj
_ -> stop
-
breakHardLink :: OsPath -> Key -> OsPath -> CommandPerform
breakHardLink file key obj = do
replaceWorkTreeFile file $ \tmp -> do
diff --git a/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_.mdwn b/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_.mdwn
index caed1ef7e6..d4cc46a24a 100644
--- a/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_.mdwn
+++ b/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_.mdwn
@@ -112,3 +112,5 @@ git-annex version: 10.20251114-geeb21b831e7c45078bd9447ec2b0532a691fe471
```
[[!meta title="after git reset --hard, git-annex get of unlocked unpopulated pointer file does nothing"]]
+
+> [[fixed|done]] --[[Joey]]
diff --git a/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_/comment_3_09e90b656763e3a8452260f0abead168._comment b/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_/comment_3_09e90b656763e3a8452260f0abead168._comment
new file mode 100644
index 0000000000..d52acaf827
--- /dev/null
+++ b/doc/bugs/get_of_unlocked___34__absent__34___file_does_nothing_/comment_3_09e90b656763e3a8452260f0abead168._comment
@@ -0,0 +1,9 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 3"""
+ date="2026-01-01T16:17:58Z"
+ content="""
+I think it makes sense for `git-annex fix` to deal with this situation.
+In both cases the user has run a git command that affects files in the
+workint tree, and it has left the annexed content not accessible.
+"""]]
diff --git a/doc/git-annex-fix.mdwn b/doc/git-annex-fix.mdwn
index 5a670cd1a0..1ac2165c89 100644
--- a/doc/git-annex-fix.mdwn
+++ b/doc/git-annex-fix.mdwn
@@ -9,10 +9,12 @@ git annex fix `[path ...]`
# DESCRIPTION
Fixes up symlinks that have become broken to again point to annexed
-content.
+content. This is useful to run manually when you have been moving the
+symlinks around, but is done automatically when committing a change
+with git too.
-This is useful to run manually when you have been moving the symlinks
-around, but is done automatically when committing a change with git too.
+Also, populates unlocked files with annexed content. Usually this happens
+automatically, but some git commands can leave them as unpopulated.
Also, adjusts unlocked files to be copies or hard links as
configured by annex.thin.
diff --git a/doc/git-annex-smudge.mdwn b/doc/git-annex-smudge.mdwn
index 6f6eba8140..7c44779641 100644
--- a/doc/git-annex-smudge.mdwn
+++ b/doc/git-annex-smudge.mdwn
@@ -47,8 +47,8 @@ it records which worktree files need to be updated, and
the content. That is run by several git hooks, including post-checkout
and post-merge. However, a few git commands, notably `git stash` and
`git cherry-pick`, do not run any hooks, so after using those commands
-you can manually run `git annex smudge --update` to update the working
-tree.
+you can manually run `git annex smudge --update` (or `git-annex fix`)
+to update the working tree.
# OPTIONS
Added a comment
diff --git a/doc/special_remotes/rclone/comment_10_edef3c4eb5f6d06e496c0e90329d8143._comment b/doc/special_remotes/rclone/comment_10_edef3c4eb5f6d06e496c0e90329d8143._comment new file mode 100644 index 0000000000..4908d5b5fe --- /dev/null +++ b/doc/special_remotes/rclone/comment_10_edef3c4eb5f6d06e496c0e90329d8143._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="nadir" + avatar="http://cdn.libravatar.org/avatar/2af9174cf6c06de802104d632dc40071" + subject="comment 10" + date="2026-01-01T11:27:39Z" + content=""" +That makes a lot of sense. So if I understood things right, the correct place to work on this is rclone. I think I'll try to ask what they think of this kind of use case. + +Thanks for the explanation +"""]]
Added a comment: Fixing a bit of a mess
diff --git a/doc/bugs/SQLite3_database_disk_image_malformed/comment_4_8cd94b23828fa865c6f04b021b971b55._comment b/doc/bugs/SQLite3_database_disk_image_malformed/comment_4_8cd94b23828fa865c6f04b021b971b55._comment new file mode 100644 index 0000000000..7d4e86b3a0 --- /dev/null +++ b/doc/bugs/SQLite3_database_disk_image_malformed/comment_4_8cd94b23828fa865c6f04b021b971b55._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="puck" + avatar="http://cdn.libravatar.org/avatar/06d3f4f0a82dd00a84f8f8fabc8e537d" + subject="Fixing a bit of a mess" + date="2026-01-01T09:07:11Z" + content=""" +While the database file was corrupt, I did some work (not realising it was corrupt) to fix up MP3 tags in my music collection. Now when I run git annex fsck I'm getting errors like: + + fsck music/Arlo_Guthrie/The_Best_Of_Arlo_Guthrie/01-Alices_Restaurant_Massacree.mp3 + music/Arlo_Guthrie/The_Best_Of_Arlo_Guthrie/01-Alices_Restaurant_Massacree.mp3: Bad file size (128 B larger); moved to .git/annex/bad/SHA256E-s17800671--1a992cda34a5ab52d42cd7a420114fc122458ff57672e468f8403faa77f209b0.mp3 + + ** No known copies exist of music/Arlo_Guthrie/The_Best_Of_Arlo_Guthrie/01-Alices_Restaurant_Massacree.mp3 + failed + +and + + fsck music/Arrow/misc/Hot_Hot_Hot.mp3 (checksum...) + music/Arrow/misc/Hot_Hot_Hot.mp3: Bad file content; moved to .git/annex/bad/SHA256E-s3444736--3178689ce4a69a0e94fe11afaf077b6471077fd2d5128a5a65a71dcf84272ed5.mp3 + + ** No known copies exist of music/Arrow/misc/Hot_Hot_Hot.mp3 + failed + +I've tried using git annex reinject, but that is refused as the checksum doesn't match. + +Can I tell git-annex to just accept the files that I have in my repository as being correct? +"""]]
Added a comment: More details in error message?
diff --git a/doc/bugs/SQLite3_database_disk_image_malformed/comment_3_9ae97b4f4cacefef77542a65455cc1d3._comment b/doc/bugs/SQLite3_database_disk_image_malformed/comment_3_9ae97b4f4cacefef77542a65455cc1d3._comment new file mode 100644 index 0000000000..6b40bdbb51 --- /dev/null +++ b/doc/bugs/SQLite3_database_disk_image_malformed/comment_3_9ae97b4f4cacefef77542a65455cc1d3._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="puck" + avatar="http://cdn.libravatar.org/avatar/06d3f4f0a82dd00a84f8f8fabc8e537d" + subject="More details in error message?" + date="2026-01-01T07:32:23Z" + content=""" +Hey, + +I just came back to this after trying to do something in my repository. Good to hear I can just the SQlite file, done that now, and it is busy running fsck now. + +A useful thing to display might be the path to the corrupted database file and advice to remove it? +"""]]
todo
diff --git a/doc/todo/support_more_backup_software_like_borg.mdwn b/doc/todo/support_more_backup_software_like_borg.mdwn new file mode 100644 index 0000000000..3f097babf4 --- /dev/null +++ b/doc/todo/support_more_backup_software_like_borg.mdwn @@ -0,0 +1,17 @@ +The borg special remote allows git-annex to treat borg backups of a +git-annex repository as just another remote. This could also be done for +other backup software. + +restic seems like a good candidate. What other commonly used backup +software might be good to support? Comments welcome with suggestions.. + +--- + +Currently, support for these has to be in git-annex, it cannot be an +external special remote. Just providing a way to in the external special +remote interfase to set `thirdPartyPopulated` might be enough to allow +using external special remotes for this. + +The borg implementation does have getImported which looks at the git-annex +branch, and is used in an optimisation. It would be good to factor that out +to a common optimisation for all `thirdPartyPopulated` remotes. --[[Joey]]
response
diff --git a/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_1_470f9ec8a18e2080558af8d5a568bc97._comment b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_1_470f9ec8a18e2080558af8d5a568bc97._comment new file mode 100644 index 0000000000..e40d60ca0e --- /dev/null +++ b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_1_470f9ec8a18e2080558af8d5a568bc97._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-12-31T18:58:44Z" + content=""" +`git-annex p2phttp` does update the git-annex branch itself when recieving +files. And generally, any time git-annex stores an object in a repository, +it updates the git-annex branch accordingly. + +So, you can fetch from the remote and learn about those objects, +and then `git-annex unused --from=$remote` will show you unused objects in +the remote. + +When running `git-annex unused` on the local repository, it does list all +objects in the local repository. So if an object somehow does get into the +repository without a branch update, it will still show as unused. + +There is no way to list all objects present in a remote. Special remotes +are not required to support emumeration at all. So, if an object got sent +to a special remote, and the git-annex branch record of that was lost, +there would be no way to find that unused object. +"""]]
response
diff --git a/doc/special_remotes/rclone/comment_9_d0c23b1d2c2267ef0e1e91e8b33385df._comment b/doc/special_remotes/rclone/comment_9_d0c23b1d2c2267ef0e1e91e8b33385df._comment new file mode 100644 index 0000000000..6effbb7263 --- /dev/null +++ b/doc/special_remotes/rclone/comment_9_d0c23b1d2c2267ef0e1e91e8b33385df._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: passing additional flags to rclone""" + date="2025-12-31T18:38:21Z" + content=""" +Passing arbitrary parameters to rclone is not supported. It would possibly +be a security hole if it were supported, because if there were a parameter +say --deleteeverything, you could `initremote` a special remote with that +parameter, and then wait for someone else to `enableremote` and use that +special remote and have a bad day. + +The "*" in `initremote --whatelse` output is a placeholder. It is not +intended to mean that every possible thing is passed through, but that, +if rclone supports some additional parameters, and explicitly asks for +them (via GETCONFIG), they will be passed through to it. + +I think that currently, `rclone gitannex` does not request any parameters. +It would certainly be possible to make it support something like +"bwlimit=3000". +"""]]
comment
diff --git a/doc/tips/using_borg_for_efficient_storage_of_old_annexed_files/comment_2_5fe65196b2f160c63305cc0274cf1530._comment b/doc/tips/using_borg_for_efficient_storage_of_old_annexed_files/comment_2_5fe65196b2f160c63305cc0274cf1530._comment new file mode 100644 index 0000000000..10f1ccfb42 --- /dev/null +++ b/doc/tips/using_borg_for_efficient_storage_of_old_annexed_files/comment_2_5fe65196b2f160c63305cc0274cf1530._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-12-31T18:18:50Z" + content=""" +It might well be possible to implement this for restic too. +The crucial thing needed is for git-annex to +be able to list the backups and find the annexed files. For borg, +it does that by using `borg list`. +"""]]
webapp: Remove support for local pairing
As a feature only supported by the webapp, and not by git-annex at the
command line, this is by now a very obscure corner of git-annex, and not
one I want to keep maintaining.
It's worth removing it to avoid the security expsure alone. People using
the assistant w/o the webapp probably don't expect it to be listening on
a UDP port for a handrolled protocol, but it was.
The webapp has supported pairing via magic-wormhole since 2016, which
makes a link including between local computers, albeit with the overhead
of tor. That sort of covers the same use case. Of course advanced users
can easily enough add a ssh remote to their repository themselves, using
a hostname on the local network.
git-annex-p2p-iroh would be a great alternative, since it should
communicate over LAN when both computers are on the same one. Before
supporting that in the webapp, dumbpipe would need to be reasonably
likely to be installed.
Sponsored-by: unqueued
As a feature only supported by the webapp, and not by git-annex at the
command line, this is by now a very obscure corner of git-annex, and not
one I want to keep maintaining.
It's worth removing it to avoid the security expsure alone. People using
the assistant w/o the webapp probably don't expect it to be listening on
a UDP port for a handrolled protocol, but it was.
The webapp has supported pairing via magic-wormhole since 2016, which
makes a link including between local computers, albeit with the overhead
of tor. That sort of covers the same use case. Of course advanced users
can easily enough add a ssh remote to their repository themselves, using
a hostname on the local network.
git-annex-p2p-iroh would be a great alternative, since it should
communicate over LAN when both computers are on the same one. Before
supporting that in the webapp, dumbpipe would need to be reasonably
likely to be installed.
Sponsored-by: unqueued
diff --git a/Assistant.hs b/Assistant.hs
index 64d2f3b6c5..9616895761 100644
--- a/Assistant.hs
+++ b/Assistant.hs
@@ -40,9 +40,6 @@ import Assistant.Threads.Glacier
#ifdef WITH_WEBAPP
import Assistant.WebApp
import Assistant.Threads.WebApp
-#ifdef WITH_PAIRING
-import Assistant.Threads.PairListener
-#endif
#else
import Assistant.Types.UrlRenderer
#endif
@@ -155,11 +152,6 @@ startDaemon assistant foreground startdelay cannotrun listenhost listenport star
then webappthread
else webappthread ++
[ watch commitThread
-#ifdef WITH_WEBAPP
-#ifdef WITH_PAIRING
- , assist $ pairListenerThread urlrenderer
-#endif
-#endif
, assist pushThread
, assist pushRetryThread
, assist exportThread
diff --git a/Assistant/Pairing/MakeRemote.hs b/Assistant/Pairing/MakeRemote.hs
deleted file mode 100644
index f4468bc07c..0000000000
--- a/Assistant/Pairing/MakeRemote.hs
+++ /dev/null
@@ -1,98 +0,0 @@
-{- git-annex assistant pairing remote creation
- -
- - Copyright 2012 Joey Hess <id@joeyh.name>
- -
- - Licensed under the GNU AGPL version 3 or higher.
- -}
-
-module Assistant.Pairing.MakeRemote where
-
-import Assistant.Common
-import Assistant.Ssh
-import Assistant.Pairing
-import Assistant.Pairing.Network
-import Assistant.MakeRemote
-import Assistant.Sync
-import Config.Cost
-import Config
-import qualified Types.Remote as Remote
-
-import Network.Socket
-import qualified Data.Text as T
-
-{- Authorized keys are set up before pairing is complete, so that the other
- - side can immediately begin syncing. -}
-setupAuthorizedKeys :: PairMsg -> OsPath -> IO ()
-setupAuthorizedKeys msg repodir = case validateSshPubKey $ remoteSshPubKey $ pairMsgData msg of
- Left err -> giveup err
- Right pubkey -> do
- absdir <- absPath repodir
- unlessM (liftIO $ addAuthorizedKeys True absdir pubkey) $
- giveup "failed setting up ssh authorized keys"
-
-{- When local pairing is complete, this is used to set up the remote for
- - the host we paired with. -}
-finishedLocalPairing :: PairMsg -> SshKeyPair -> Assistant ()
-finishedLocalPairing msg keypair = do
- sshdata <- liftIO $ installSshKeyPair keypair =<< pairMsgToSshData msg
- {- Ensure that we know the ssh host key for the host we paired with.
- - If we don't, ssh over to get it. -}
- liftIO $ unlessM (knownHost $ sshHostName sshdata) $
- void $ sshTranscript
- [ sshOpt "StrictHostKeyChecking" "no"
- , sshOpt "NumberOfPasswordPrompts" "0"
- , "-n"
- ]
- (genSshHost (sshHostName sshdata) (sshUserName sshdata))
- ("git-annex-shell -c configlist " ++ T.unpack (sshDirectory sshdata))
- Nothing
- r <- liftAnnex $ addRemote $ makeSshRemote sshdata
- repo <- liftAnnex $ Remote.getRepo r
- liftAnnex $ setRemoteCost repo semiExpensiveRemoteCost
- syncRemote r
-
-{- Mostly a straightforward conversion. Except:
- - * Determine the best hostname to use to contact the host.
- - * Strip leading ~/ from the directory name.
- -}
-pairMsgToSshData :: PairMsg -> IO SshData
-pairMsgToSshData msg = do
- let d = pairMsgData msg
- hostname <- liftIO $ bestHostName msg
- let dir = case remoteDirectory d of
- ('~':'/':v) -> v
- v -> v
- return SshData
- { sshHostName = T.pack hostname
- , sshUserName = Just (T.pack $ remoteUserName d)
- , sshDirectory = T.pack dir
- , sshRepoName = genSshRepoName hostname (toOsPath dir)
- , sshPort = 22
- , needsPubKey = True
- , sshCapabilities = [GitAnnexShellCapable, GitCapable, RsyncCapable]
- , sshRepoUrl = Nothing
- }
-
-{- Finds the best hostname to use for the host that sent the PairMsg.
- -
- - If remoteHostName is set, tries to use a .local address based on it.
- - That's the most robust, if this system supports .local.
- - Otherwise, looks up the hostname in the DNS for the remoteAddress,
- - if any. May fall back to remoteAddress if there's no DNS. Ugh. -}
-bestHostName :: PairMsg -> IO HostName
-bestHostName msg = case remoteHostName $ pairMsgData msg of
- Just h -> do
- let localname = h ++ ".local"
- addrs <- catchDefaultIO [] $
- getAddrInfo Nothing (Just localname) Nothing
- maybe fallback (const $ return localname) (headMaybe addrs)
- Nothing -> fallback
- where
- fallback = do
- let a = pairMsgAddr msg
- let sockaddr = case a of
- IPv4Addr addr -> SockAddrInet (fromInteger 0) addr
- IPv6Addr addr -> SockAddrInet6 (fromInteger 0) 0 addr 0
- fromMaybe (showAddr a)
- <$> catchDefaultIO Nothing
- (fst <$> getNameInfo [] True False sockaddr)
diff --git a/Assistant/Pairing/Network.hs b/Assistant/Pairing/Network.hs
deleted file mode 100644
index 62a4ea02e8..0000000000
--- a/Assistant/Pairing/Network.hs
+++ /dev/null
@@ -1,132 +0,0 @@
-{- git-annex assistant pairing network code
- -
- - All network traffic is sent over multicast UDP. For reliability,
- - each message is repeated until acknowledged. This is done using a
- - thread, that gets stopped before the next message is sent.
- -
- - Copyright 2012 Joey Hess <id@joeyh.name>
- -
- - Licensed under the GNU AGPL version 3 or higher.
- -}
-
-module Assistant.Pairing.Network where
-
-import Assistant.Common
-import Assistant.Pairing
-import Assistant.DaemonStatus
-import Utility.ThreadScheduler
-import Utility.Verifiable
-
-import Network.Multicast
-import Network.Info
-import Network.Socket
-import qualified Network.Socket.ByteString as B
-import qualified Data.ByteString.UTF8 as BU8
-import qualified Data.Map as M
-import Control.Concurrent
-
-{- This is an arbitrary port in the dynamic port range, that could
- - conceivably be used for some other broadcast messages.
- - If so, hope they ignore the garbage from us; we'll certainly
- - ignore garbage from them. Wild wild west. -}
-pairingPort :: PortNumber
-pairingPort = 55556
-
-{- Goal: Reach all hosts on the same network segment.
- - Method: Use same address that avahi uses. Other broadcast addresses seem
- - to not be let through some routers. -}
-multicastAddress :: AddrClass -> HostName
-multicastAddress IPv4AddrClass = "224.0.0.251"
-multicastAddress IPv6AddrClass = "ff02::fb"
-
-{- Multicasts a message repeatedly on all interfaces, with a 2 second
- - delay between each transmission. The message is repeated forever
- - unless a number of repeats is specified.
- -
- - The remoteHostAddress is set to the interface's IP address.
- -
- - Note that new sockets are opened each time. This is hardly efficient,
- - but it allows new network interfaces to be used as they come up.
- - On the other hand, the expensive DNS lookups are cached.
- -}
-multicastPairMsg :: Maybe Int -> Secret -> PairData -> PairStage -> IO ()
-multicastPairMsg repeats secret pairdata stage = go M.empty repeats
- where
- go _ (Just 0) = noop
- go cache n = do
- addrs <- activeNetworkAddresses
- let cache' = updatecache cache addrs
- mapM_ (sendinterface cache') addrs
- threadDelaySeconds (Seconds 2)
- go cache' $ pred <$> n
- {- The multicast library currently chokes on ipv6 addresses. -}
- sendinterface _ (IPv6Addr _) = noop
- sendinterface cache i = void $ tryIO $
(Diff truncated)