Recent changes to this wiki:
initial report on a regression
diff --git a/doc/bugs/annex-ignore_check_is_skipped_for_local_remotes.mdwn b/doc/bugs/annex-ignore_check_is_skipped_for_local_remotes.mdwn new file mode 100644 index 0000000000..c7b4683be0 --- /dev/null +++ b/doc/bugs/annex-ignore_check_is_skipped_for_local_remotes.mdwn @@ -0,0 +1,109 @@ +### Please describe the problem. + +we started to get a test test_ria_postclone_noannex to fail, claude bisected to + + This is a git-annex regression in 10.20260213. The configRead function in Remote/Git.hs was reordered to support a "Push to Create" feature: + +``` + Before (10.20250630) — annex-ignore checked first: + (_, True, _) -> return r -- annex-ignore → bail out immediately + (True, _, _) | remoteAnnexCheckUUID gc -> tryGitConfigRead ... + + After (10.20260213) — local repos checked first, bypassing annex-ignore: + (True, _, _) | remoteAnnexCheckUUID gc -> tryGitConfigRead ... -- local repo → auto-init! + (_, True, _) | remoteAnnexIgnoreAuto gc -> checkpushedtocreate gc + + For local remotes, tryGitConfigRead → readlocalannexconfig → autoInitialize recreates the annex/ directory, even though annex-ignore=true is set. The annex-ignore case is never reached due to Haskell's top-to-bottom pattern matching. +``` + +which indeed sounds correct as pushToCreate feature still should not touch remove which are already known to be annex-ignore'd I think. If needed -- flag should be cleared first + +<details> +<summary>reproducer it created which passes on 10.20251029-1 and fails with 10.20260115+git119-g43a3f3aaf2-1~ndall+1 (might have been my patched version, so subtract few commits back)</summary> + +```shell +#!/bin/bash +# +# Reproducer for git-annex regression: annex-ignore not respected for local remotes +# +# In git-annex <= 10.20250630, configRead in Remote/Git.hs checked annex-ignore +# BEFORE repoCheap (local), so local remotes with annex-ignore=true were skipped. +# +# In git-annex >= 10.20260213, the case ordering was swapped for "Push to Create" +# support. Now repoCheap is matched first, causing tryGitConfigRead -> +# readlocalannexconfig -> autoInitialize to run even when annex-ignore=true. +# +# Expected: annex/ directory is NOT created on the bare remote +# Actual (>= 10.20260213): annex/ directory IS created + +set -eu + +echo "=== git-annex annex-ignore regression reproducer ===" +echo "git-annex version: $(git annex version --raw 2>/dev/null || git annex version | head -1)" +echo + +WORKDIR=$(mktemp -d) +trap "chmod -R u+w '$WORKDIR' 2>/dev/null; rm -rf '$WORKDIR'" EXIT + +ORIGIN="$WORKDIR/origin" +BARE="$WORKDIR/bare.git" +CLONE="$WORKDIR/clone" + +# 1. Create an annex repo with some content +echo "--- Step 1: Create origin repo with annexed content" +git init "$ORIGIN" +cd "$ORIGIN" +git annex init "origin" +echo "hello" > file.txt +git annex add file.txt +git commit -m "add file" +echo + +# 2. Create bare repo, push git-annex branch but remove annex/ and annex UUID. +# This simulates a RIA store where the annex objects dir was removed — +# the bare repo has a git-annex branch (metadata) but no local annex. +echo "--- Step 2: Create bare repo, push, then strip local annex state" +git clone --bare "$ORIGIN" "$BARE" +cd "$ORIGIN" +git remote add bare "$BARE" +git push bare --all +git annex copy --to bare file.txt +git annex sync --content bare 2>&1 | tail -5 +# Now strip the annex/ directory and annex.uuid from the bare repo, +# simulating a store that was never locally annex-initialized +chmod -R u+w "$BARE/annex" +rm -rf "$BARE/annex" +git -C "$BARE" config --unset annex.uuid || true +git -C "$BARE" config --unset annex.version || true +echo +echo "annex/ exists in bare after stripping: $(test -d "$BARE/annex" && echo YES || echo NO)" +echo "annex.uuid in bare: $(git -C "$BARE" config annex.uuid 2>/dev/null || echo '<unset>')" +echo + +# 3. Clone from the bare repo, set annex-ignore BEFORE git-annex init +echo "--- Step 3: Clone from bare, set annex-ignore=true, then git-annex init" +git clone "$BARE" "$CLONE" +cd "$CLONE" +git config remote.origin.annex-ignore true +echo "remote.origin.annex-ignore = $(git config remote.origin.annex-ignore)" +git annex init "clone" +echo + +# 4. Check: was annex/ recreated on the bare repo? +echo "--- Result" +if test -d "$BARE/annex"; then + echo "FAIL: annex/ was recreated on the bare remote despite annex-ignore=true" + echo " This is the git-annex regression." + echo " annex.uuid in bare is now: $(git -C "$BARE" config annex.uuid 2>/dev/null || echo '<unset>')" + exit 1 +else + echo "OK: annex/ was NOT recreated. annex-ignore is respected." + exit 0 +fi + +``` +</details> + + +[[!meta author=yoh]] +[[!tag projects/forgejo]]
comment
diff --git a/doc/todo/Ephemeral_special_remotes/comment_3_5c222cb37669b5f0168579e7e642ef70._comment b/doc/todo/Ephemeral_special_remotes/comment_3_5c222cb37669b5f0168579e7e642ef70._comment new file mode 100644 index 0000000000..f055601ce0 --- /dev/null +++ b/doc/todo/Ephemeral_special_remotes/comment_3_5c222cb37669b5f0168579e7e642ef70._comment @@ -0,0 +1,36 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-02-19T18:10:08Z" + content=""" +After a bug fix, it's now possible to make a sameas remote that +is private to the local repository. + + git-annex initremote bar --sameas=foo --private type=... + +While not ephemeral as such, if you `git remote remove bar`, +the only trace left of it will probably +be in `.git/annex/journal-private/remote.log`, and possibly +any creds that got cached for it. +It would be possible to have a command that removes the remote, and also +clears that. + +If that is close enough to ephemeral, then we could think about the +second part, extending the external special remote protocol with +REDIRECT-REMOTE. + +That is similar to [[todo/Special_remote_redirect_to_URL]]. +And a few comments over there go in a similar direction. +In particular, the discussion of CLAIMURL. If TRANSFER-RETRIEVE-URL +and TRANSFER-CHECKPRESENT-URL supported CLAIMURL, then if the ephermeral +special remote had some type of url, that it claimed, those could be used +rather than REDIRECT-REMOTE. + +That would not cover TRANSFER STORE and REMOVE though. And it probably +doesn't make sense to extend those to urls generally. (There are too many +ways to store to an url or remove an url, everything isn't WebDAV..) + +I don't know if it is really elegant to drag +urls into this anyway. The user may be left making up an url scheme for +something that does not involve urls at all. +"""]]
Added CHECKPRESENT-URL extension to the external special remote protocol
diff --git a/Annex/Url.hs b/Annex/Url.hs
index 148fa2f188..05e408351c 100644
--- a/Annex/Url.hs
+++ b/Annex/Url.hs
@@ -15,6 +15,7 @@ module Annex.Url (
getUserAgent,
ipAddressesUnlimited,
checkBoth,
+ checkBoth',
download,
download',
exists,
@@ -197,6 +198,10 @@ checkBoth url expected_size uo =
Right r -> return r
Left err -> warning (UnquotedString err) >> return False
+checkBoth' :: U.URLString -> Maybe Integer -> U.UrlOptions -> Annex (Either String Bool)
+checkBoth' url expected_size uo = either (Left . show) id
+ <$> tryNonAsync (liftIO $ U.checkBoth url expected_size uo)
+
download :: MeterUpdate -> Maybe IncrementalVerifier -> U.URLString -> OsPath -> U.UrlOptions -> Annex Bool
download meterupdate iv url file uo =
liftIO (U.download meterupdate iv url file uo) >>= \case
diff --git a/CHANGELOG b/CHANGELOG
index a7573f8004..4301e3804a 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -3,6 +3,7 @@ git-annex (10.20260214) UNRELEASED; urgency=medium
* Fix retrival from http git remotes of keys with '%' in their names.
* Fix behavior when initremote is used with --sameas=
combined with --private.
+ * Added CHECKPRESENT-URL extension to the external special remote protocol.
-- Joey Hess <id@joeyh.name> Mon, 16 Feb 2026 13:38:21 -0400
diff --git a/Remote/External.hs b/Remote/External.hs
index 6418dc7f89..02de11dbea 100644
--- a/Remote/External.hs
+++ b/Remote/External.hs
@@ -95,7 +95,7 @@ gen rt externalprogram r u rc gc rs
{ storeExport = storeExportM external
, retrieveExport = retrieveExportM external gc
, removeExport = removeExportM external
- , checkPresentExport = checkPresentExportM external
+ , checkPresentExport = checkPresentExportM external gc
, removeExportDirectory = Just $ removeExportDirectoryM external
, renameExport = Just $ renameExportM external
}
@@ -118,7 +118,7 @@ gen rt externalprogram r u rc gc rs
(storeKeyM external)
(retrieveKeyFileM external gc)
(removeKeyM external)
- (checkPresentM external)
+ (checkPresentM external gc)
rmt
where
mk c cst ordered avail towhereis togetinfo toclaimurl tocheckurl exportactions cheapexportsupported =
@@ -276,8 +276,8 @@ removeKeyM external _proof k = either giveup return =<< go
respErrorMessage "REMOVE" errmsg
_ -> Nothing
-checkPresentM :: External -> CheckPresent
-checkPresentM external k = either giveup id <$> go
+checkPresentM :: External -> RemoteGitConfig -> CheckPresent
+checkPresentM external gc k = either giveup id <$> go
where
go = handleRequestKey external CHECKPRESENT k Nothing $ \resp ->
case resp of
@@ -288,6 +288,8 @@ checkPresentM external k = either giveup id <$> go
CHECKPRESENT_UNKNOWN k' errmsg
| k' == k -> result $ Left $
respErrorMessage "CHECKPRESENT" errmsg
+ CHECKPRESENT_URL k' url
+ | k == k' -> checkKeyUrl' gc k url
_ -> Nothing
whereisKeyM :: External -> Key -> Annex [String]
@@ -327,8 +329,8 @@ retrieveExportM external gc k loc dest p = do
_ -> Nothing
req sk = TRANSFEREXPORT Download sk (fromOsPath dest)
-checkPresentExportM :: External -> Key -> ExportLocation -> Annex Bool
-checkPresentExportM external k loc = either giveup id <$> go
+checkPresentExportM :: External -> RemoteGitConfig -> Key -> ExportLocation -> Annex Bool
+checkPresentExportM external gc k loc = either giveup id <$> go
where
go = handleRequestExport external loc CHECKPRESENTEXPORT k Nothing $ \resp -> case resp of
CHECKPRESENT_SUCCESS k'
@@ -338,6 +340,8 @@ checkPresentExportM external k loc = either giveup id <$> go
CHECKPRESENT_UNKNOWN k' errmsg
| k' == k -> result $ Left $
respErrorMessage "CHECKPRESENT" errmsg
+ CHECKPRESENT_URL k' url
+ | k == k' -> checkKeyUrl' gc k url
UNSUPPORTED_REQUEST -> result $
Left "CHECKPRESENTEXPORT not implemented by external special remote"
_ -> Nothing
@@ -861,6 +865,11 @@ checkKeyUrl gc k = do
us <- getWebUrls k
anyM (\u -> withUrlOptions (Just gc) $ checkBoth u (fromKey keySize k)) us
+checkKeyUrl' :: RemoteGitConfig -> Key -> URLString -> Maybe (Annex (ResponseHandlerResult (Either String Bool)))
+checkKeyUrl' gc k url =
+ Just $ withUrlOptions (Just gc) $ \uo ->
+ Result <$> checkBoth' url (fromKey keySize k) uo
+
getWebUrls :: Key -> Annex [URLString]
getWebUrls key = filter supported <$> getUrls key
where
diff --git a/Remote/External/Types.hs b/Remote/External/Types.hs
index 75f6d801f5..724b54486a 100644
--- a/Remote/External/Types.hs
+++ b/Remote/External/Types.hs
@@ -1,6 +1,6 @@
{- External special remote data types.
-
- - Copyright 2013-2025 Joey Hess <id@joeyh.name>
+ - Copyright 2013-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -116,6 +116,7 @@ supportedExtensionList = ExtensionList
, "GETGITREMOTENAME"
, "UNAVAILABLERESPONSE"
, "TRANSFER-RETRIEVE-URL"
+ , "CHECKPRESENT-URL"
, asyncExtension
]
@@ -247,6 +248,7 @@ data Response
| CHECKPRESENT_SUCCESS Key
| CHECKPRESENT_FAILURE Key
| CHECKPRESENT_UNKNOWN Key ErrorMsg
+ | CHECKPRESENT_URL Key URLString
| REMOVE_SUCCESS Key
| REMOVE_FAILURE Key ErrorMsg
| COST Cost
@@ -286,6 +288,7 @@ instance Proto.Receivable Response where
parseCommand "CHECKPRESENT-SUCCESS" = Proto.parse1 CHECKPRESENT_SUCCESS
parseCommand "CHECKPRESENT-FAILURE" = Proto.parse1 CHECKPRESENT_FAILURE
parseCommand "CHECKPRESENT-UNKNOWN" = Proto.parse2 CHECKPRESENT_UNKNOWN
+ parseCommand "CHECKPRESENT-URL" = Proto.parse2 CHECKPRESENT_URL
parseCommand "REMOVE-SUCCESS" = Proto.parse1 REMOVE_SUCCESS
parseCommand "REMOVE-FAILURE" = Proto.parse2 REMOVE_FAILURE
parseCommand "COST" = Proto.parse1 COST
diff --git a/doc/design/external_special_remote_protocol.mdwn b/doc/design/external_special_remote_protocol.mdwn
index f79b8230ae..7c97d721b2 100644
--- a/doc/design/external_special_remote_protocol.mdwn
+++ b/doc/design/external_special_remote_protocol.mdwn
@@ -45,7 +45,7 @@ Recent versions of git-annex respond with a message indicating
protocol extensions that it supports. Older versions of
git-annex do not send this message.
- EXTENSIONS INFO ASYNC GETGITREMOTENAME UNAVAILABLERESPONSE TRANSFER-RETRIEVE-URL
+ EXTENSIONS INFO ASYNC GETGITREMOTENAME UNAVAILABLERESPONSE TRANSFER-RETRIEVE-URL CHECKPRESENT-URL
The special remote can respond to that with its own EXTENSIONS message, listing
any extensions it wants to use.
@@ -162,6 +162,11 @@ The following requests *must* all be supported by the special remote.
* `CHECKPRESENT-UNKNOWN Key ErrorMsg`
Indicates that it is not currently possible to verify if the key is
present in the remote. (Perhaps the remote cannot be contacted.)
+ * `CHECKPRESENT-URL Key Url`
+ Rather than the special remote checking an url itself,
+ this lets it offload that work to git-annex. This response is a protocol
+ extension; it's only safe to send it to git-annex after it sent an
+ `EXTENSIONS` that included `CHECKPRESENT-URL`.
* `REMOVE Key`
Requests the remote to remove a key's contents.
* `REMOVE-SUCCESS Key`
@@ -488,6 +493,9 @@ These protocol extensions are currently supported.
* `TRANSFER-RETRIEVE-URL`
This allows the `TRANSFER-RETRIEVE-URL` response to be used
in reply to `TRANSFER` and `TRANSFEREXPORT`.
+* `CHECKPRESENT-URL`
+ This allows the `CHECKPRESENT-URL` response to be used
+ in reply to `CHECKPRESENT` and `CHECKPRESENTEXPORT`.
## signals
diff --git a/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn b/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn
index 1f30828c48..0bb70b7e60 100644
--- a/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn
+++ b/doc/design/external_special_remote_protocol/export_and_import_appendix.mdwn
@@ -71,6 +71,11 @@ a request, it can reply with `UNSUPPORTED-REQUEST`.
* `CHECKPRESENT-UNKNOWN Key ErrorMsg`
Indicates that it is not currently possible to verify if content is
present in the remote. (Perhaps the remote cannot be contacted.)
+ * `CHECKPRESENT-URL Key Url`
+ Rather than the special remote checking an url itself,
+ this lets it offload that work to git-annex. This response is a protocol
+ extension; it's only safe to send it to git-annex after it sent an
+ `EXTENSIONS` that included `CHECKPRESENT-URL`.
* `REMOVEEXPORT Key`
Requests the remote to remove content stored by `TRANSFEREXPORT`
with the previously provided `EXPORT` Name.
diff --git a/doc/todo/CHECKPRESENT_redirect_to_URL.mdwn b/doc/todo/CHECKPRESENT_redirect_to_URL.mdwn
index a44884ef63..103c522c71 100644
--- a/doc/todo/CHECKPRESENT_redirect_to_URL.mdwn
+++ b/doc/todo/CHECKPRESENT_redirect_to_URL.mdwn
(Diff truncated)
update
diff --git a/doc/thanks/list b/doc/thanks/list index dfeda7a813..0a65388abb 100644 --- a/doc/thanks/list +++ b/doc/thanks/list @@ -126,3 +126,5 @@ Lilia.Nanne, Dusty Mabe, mpol, Andrew Poelstra, +joshingly, +Melody Tolly,
break out todo
diff --git a/doc/todo/CHECKPRESENT_redirect_to_URL.mdwn b/doc/todo/CHECKPRESENT_redirect_to_URL.mdwn new file mode 100644 index 0000000000..a44884ef63 --- /dev/null +++ b/doc/todo/CHECKPRESENT_redirect_to_URL.mdwn @@ -0,0 +1,15 @@ +Following up on [[todo/Special_remote_redirect_to_URL]], +it would be useful for CHECKPRESENT (and also CHECKPRESENTEXPORT) +to be able to redirect to an url, and let git-annex do the checking. + +This will let external special remotes that are readonly and can calculate +urls on their own avoid needing to implement HTTP at all. + +The protocol extension would be: + + EXTENSIONS CHECKPRESENT-URL + CHECKPRESENT-URL Key Url + +--[[Joey]] + +[[!tag projects/INM7]] diff --git a/doc/todo/Special_remote_redirect_to_URL/comment_9_2a6eaab78886f0d6c372c7bdc929d7c4._comment b/doc/todo/Special_remote_redirect_to_URL/comment_9_2a6eaab78886f0d6c372c7bdc929d7c4._comment new file mode 100644 index 0000000000..d83f4a8fb6 --- /dev/null +++ b/doc/todo/Special_remote_redirect_to_URL/comment_9_2a6eaab78886f0d6c372c7bdc929d7c4._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2026-02-18T20:51:57Z" + content=""" +Opened [[todo/CHECKPRESENT_redirect_to_URL]]. +"""]]
update
diff --git a/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment b/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment index f44216176b..f6180724bf 100644 --- a/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment +++ b/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment @@ -7,11 +7,9 @@ Some of this strikes me as perhaps coming at [[todo/Ephemeral_special_remotes]] from a different direction? Re the inflation of the git-annex branch when using sameas, -I've checked and `git-annex initremote --sameas=foo --private` -still writes to the git-annex branch. But -it should be possible to keep the sameas remote's -config out of the git-annex branch and only stored locally. -Opened a bug report, [[bugs/sameas_private]]. +I fixed a bug ([[bugs/sameas_private]]) and you'll be able to use +`git-annex initremote --sameas=foo --private` to keep the configuration +of the new sameas remote out of the git-annex branch. So, it seems to me that your broker, if it knows of several different urls that can be used to access `myplace`, can be configured at `initremote`
make annex-private use annex-config-uuid when set, rather than annex-uuid
Fix behavior when initremote is used with --sameas= combined with --private.
Fix behavior when initremote is used with --sameas= combined with --private.
diff --git a/CHANGELOG b/CHANGELOG index 21670f4a1f..a7573f8004 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,6 +1,8 @@ git-annex (10.20260214) UNRELEASED; urgency=medium * Fix retrival from http git remotes of keys with '%' in their names. + * Fix behavior when initremote is used with --sameas= + combined with --private. -- Joey Hess <id@joeyh.name> Mon, 16 Feb 2026 13:38:21 -0400 diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs index 5772bebb0c..962cb732ae 100644 --- a/Types/GitConfig.hs +++ b/Types/GitConfig.hs @@ -307,8 +307,10 @@ extractGitConfig configsource r = GitConfig | Git.Config.isTrueFalse' v /= Just True = Nothing | isRemoteKey (remoteAnnexConfigEnd "private") k = do remotename <- remoteKeyToRemoteName k - toUUID <$> Git.Config.getMaybe - (remoteAnnexConfig remotename "uuid") r + let getu c = + toUUID <$> Git.Config.getMaybe + (remoteAnnexConfig remotename c) r + getu "config-uuid" <|> getu "uuid" | otherwise = Nothing in mapMaybe get (M.toList (Git.config r)) ] diff --git a/doc/bugs/sameas_private.mdwn b/doc/bugs/sameas_private.mdwn index db48f5b688..e6aae78912 100644 --- a/doc/bugs/sameas_private.mdwn +++ b/doc/bugs/sameas_private.mdwn @@ -1,13 +1,22 @@ `git-annex initremote --sameas=foo --private` is not actually -private. +private, or not in a way that seems to make sense. -It writes to the git-annex branch, adding in remote.log the config uuid of the -sameas remote. +Currently, it writes to the git-annex branch, adding in remote.log the +config uuid of the sameas remote. It should be possible to avoid writing that there. Since the config uuid is the only place a sameas remote touches the git-annex branch, this would -allow making up sameas remotes for local use. Location log changes -for a private sameas remote would still be recorded in the git-annex -branch, as long as the remote uuid is not itself private. --[[Joey]] +allow making up sameas remotes for local use. + +But also, and worse, that actually makes location log changes for remote +foo be logged to the private journal. That happens because +remote.name.annex-private is set for the sameas remote, and +it has the same annex-uuid as foo. This is highly surprising and wrong +behavior! + +The fix will be to make remote.name.annex-private affect the +annex-config-uuid when there is one, rather than the annex-uuid. + +> [[fixed|done]] --[[Joey]] [[!tag projects/INM7]] diff --git a/doc/git-annex-initremote.mdwn b/doc/git-annex-initremote.mdwn index bcb3494b7f..f4d5705308 100644 --- a/doc/git-annex-initremote.mdwn +++ b/doc/git-annex-initremote.mdwn @@ -87,6 +87,11 @@ want to use `git annex renameremote`. branch. The special remote will only be usable from the repository where it was created. + When used in combination with `--sameas=foo`, the configuration of the + new special remote is kept private. But when files are sent to the new + special remote, it will be public that they are present in remote "foo", + unless it is also private. + * `--json` Enable JSON output. This is intended to be parsed by programs that use
formatting
diff --git a/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment b/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment index 0b59cb561c..f44216176b 100644 --- a/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment +++ b/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment @@ -1,6 +1,6 @@ [[!comment format=mdwn username="joey" - subject="""Re: "download URL broker"""" + subject="""Re: download URL broker""" date="2026-02-18T19:43:01Z" content=""" Some of this strikes me as perhaps coming at
response
diff --git a/doc/bugs/sameas_private.mdwn b/doc/bugs/sameas_private.mdwn new file mode 100644 index 0000000000..db48f5b688 --- /dev/null +++ b/doc/bugs/sameas_private.mdwn @@ -0,0 +1,13 @@ +`git-annex initremote --sameas=foo --private` is not actually +private. + +It writes to the git-annex branch, adding in remote.log the config uuid of the +sameas remote. + +It should be possible to avoid writing that there. Since the config uuid +is the only place a sameas remote touches the git-annex branch, this would +allow making up sameas remotes for local use. Location log changes +for a private sameas remote would still be recorded in the git-annex +branch, as long as the remote uuid is not itself private. --[[Joey]] + +[[!tag projects/INM7]] diff --git a/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment b/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment new file mode 100644 index 0000000000..0b59cb561c --- /dev/null +++ b/doc/todo/Special_remote_redirect_to_URL/comment_8_81c4dd572b3f91eaa577158756150f7e._comment @@ -0,0 +1,21 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: "download URL broker"""" + date="2026-02-18T19:43:01Z" + content=""" +Some of this strikes me as perhaps coming at +[[todo/Ephemeral_special_remotes]] from a different direction? + +Re the inflation of the git-annex branch when using sameas, +I've checked and `git-annex initremote --sameas=foo --private` +still writes to the git-annex branch. But +it should be possible to keep the sameas remote's +config out of the git-annex branch and only stored locally. +Opened a bug report, [[bugs/sameas_private]]. + +So, it seems to me that your broker, if it knows of several different urls +that can be used to access `myplace`, can be configured at `initremote` +time which set of urls to use. And you can initialize multiple instances +of the broker, each configured to use a different set of url, with +`--sameas --private`. +"""]]
response 3
diff --git a/doc/todo/Special_remote_redirect_to_URL/comment_7_f9d5dbb6ca33aa5e9d143fe7f5c93822._comment b/doc/todo/Special_remote_redirect_to_URL/comment_7_f9d5dbb6ca33aa5e9d143fe7f5c93822._comment new file mode 100644 index 0000000000..fc5b383949 --- /dev/null +++ b/doc/todo/Special_remote_redirect_to_URL/comment_7_f9d5dbb6ca33aa5e9d143fe7f5c93822._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: CLAIMURL""" + date="2026-02-18T19:29:02Z" + content=""" +CLAIMURL is not currently used for TRANSFER-RETRIEVE-URL. (It's also +not quite accurate to say that the `web` special remote is used.) + +Supporting that would mean that, each time a remote replies with +TRANSFER-RETRIEVE-URL, git-annex would need to query each other remote +in turn to see if they claim the url. That could mean starting up a lot +of extenal special remote programs (when not running yet) and doing a +roundtrip through them, so latency might start to become a problem. + +Also, there would be the possibility of loops between 2 or more remotes. +Eg, remote A replies with TRANSFER-RETRIEVE-URL with an url that remote B +CLAIMURLs, only to then reply with TRANSFER-RETRIEVE-URL, with an url that +remote A CLAIMURLs. +"""]]
comment 2
diff --git a/doc/todo/Special_remote_redirect_to_URL/comment_6_7bb4abf3b30527d67b3c72a397fc4cd0._comment b/doc/todo/Special_remote_redirect_to_URL/comment_6_7bb4abf3b30527d67b3c72a397fc4cd0._comment new file mode 100644 index 0000000000..ffce28fb67 --- /dev/null +++ b/doc/todo/Special_remote_redirect_to_URL/comment_6_7bb4abf3b30527d67b3c72a397fc4cd0._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: multiple URLS for a key""" + date="2026-02-18T19:16:06Z" + content=""" +TRANSFER-RETRIEVE-URL was designed as a redirect, so it only redirects to +one place. And git-annex won't try again to retrieve from the same remote +if url fails to download. + +I could imagine extending TRANSFER-RETRIEVE-URL to have a list of urls. But +I can also imagine needing to extend it with HTTP headers to use for the +url, and these things conflict, given the simple line and word based +protocol. + +I think that sameas remotes that use other urls might be a solution. +Running eg `git-annex get` without specifying a remote, it will keep trying +different remotes until one succeeds. +"""]]
response 1
diff --git a/doc/todo/Special_remote_redirect_to_URL/comment_5_c38df20873ec737243f27f1f33882703._comment b/doc/todo/Special_remote_redirect_to_URL/comment_5_c38df20873ec737243f27f1f33882703._comment new file mode 100644 index 0000000000..cd8dd55280 --- /dev/null +++ b/doc/todo/Special_remote_redirect_to_URL/comment_5_c38df20873ec737243f27f1f33882703._comment @@ -0,0 +1,35 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: CHECKPRESENT""" + date="2026-02-18T18:47:51Z" + content=""" +Yes CHECKPRESENT still needs the special remote to do HTTP. + +I do think that was an oversight. The original todo mentioned +"taking advantage of the testing and security hardening of the +git-annex implementation" and if a special remote is read-only, +CHECKPRESENT may be the only time it needs to do HTTP. + +A protocol extension for this would look like: + + EXTENSIONS CHECKPRESENT-URL + CHECKPRESENT-URL Key Url + +--- + +> Would it impact the usage of such a special remote, if it would be configured +> with sameas=otherremote? Would both remote implementations need to implement +> CHECKPRESENT (consistently), or would one (in this case otherremote) by enough. + +git-annex won't try to use the otherremote when it's been asked to use +the sameas remote. + +If one implemented CHECKPRESENT and the other always replied with +"CHECKPRESENT-UNKNOWN", then a command like `git-annex fsck --fast --from` +when used with the former remote would be able to verify that the content +is present, and when used with the latter remote would it would error out. + +So you could perhaps get away with not implementing that. For a readonly +remote, fsck is I think the only thing that uses CHECKPRESENT on a +user-specified remote. It's more used on remotes that can be written to. +"""]]
Added a comment: Functionality gaps?
diff --git a/doc/todo/Special_remote_redirect_to_URL/comment_4_d7d6814a2a19227ceb43ba2ba05c32ba._comment b/doc/todo/Special_remote_redirect_to_URL/comment_4_d7d6814a2a19227ceb43ba2ba05c32ba._comment new file mode 100644 index 0000000000..b115e0cd15 --- /dev/null +++ b/doc/todo/Special_remote_redirect_to_URL/comment_4_d7d6814a2a19227ceb43ba2ba05c32ba._comment @@ -0,0 +1,34 @@ +[[!comment format=mdwn + username="mih" + avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd" + subject="Functionality gaps?" + date="2026-02-18T18:10:07Z" + content=""" +I looked into adopting this new feature for a special remote implementation. Four questions arose: + +1. In order to implement CHECKPRESENT it appears that a special remote still needs to implemented the logic for the equivalent of a HTTP HEAD request. From my POV this limits the utility of a git-annex based download, because significant logic still needs to be implemented in a special remote itself. Would it impact the usage of such a special remote, if it would be configured with `sameas=otherremote`? Would both remote implementations need to implement CHECKPRESENT (consistently), or would one (in this case `otherremote`) by enough. + +2. I am uncertain re the signaling in case of multiple possible URL targets for a key, and an eventual download failure regarding one URL communicated via TRANSFER-RETRIEVE-URL. I believe that, when git-annex fails to download from a reported URL successfully, it can only send another TRANSFER-RETRIEVE request to the special remote (possibly go to the next remote first). This would mean that the special remote either needs to maintain a state on which URL has been reported before, or it would need to implement the capacity to test for availability (essentially the topic of Q1), and can never report more than one URL. Is this correct? + +3. What is the logic git-annex uses to act on a URL communicated via TRANSFER-RETRIEVE-URL. Would it match it against all available special remotes via CLAIMURL, or give it straight to `web` (and only that)? + +4. I am wondering, if it would be possible and sensible, to use this feature for implementing a download URL \"broker\"? A use case would be an informed selection of a download URL from a set of URLs associated with a key. This is similar to the `urlinclude/exclude` feature of the `web` special remote, but (depending on Q3) is relevant also to other special remotes acting as downloader implementations. + + +Elaborating on (4) a bit more: My thinking is focused on the optimal long-term accessibility of keys -- across infrastructure transitions and different concurrent environments. From my POV git-annex provides me with the following options for making `myplace` as a special remote optimally work across space and time. + +- via `sameas=myplace`, I can have multiple special remotes point to `myplace`. In each environment I can use the additional remotes (by name) to optimally access `myplace`. The decision making process it independent of git-annex. However, the possible access options need to be encoded in the annex branch to make this work. This creates a problem of inflation of this space in case of repositories that are used in many different contexts (think public (research) data that want to capitalize on the decentralized nature of git-annex). + +- via `enableremote` I can swap out the type and parameterization of `myplace` entirely. However, unlike with `initremote` there is no `--private`, so this is more geared toward the use case of \"previous access method is no longer available\", rather than a temporary optimization. + +- when key access is (temporarily) preferred via URLs, I could generated a temporary `web` special remote via `initremote --private` and a `urlinclude` pattern. + +In all cases, I cannot simply run `git annex get`, but I need to identify a specific remote that may need to be created first, or set a low cost for it. + +I'd be glad to be pointed at omissions in this assessment. Thanks! + + + + + +"""]]
diff --git a/doc/bugs/macos_switch_to_openrsync_seems_to_break_sync.mdwn b/doc/bugs/macos_switch_to_openrsync_seems_to_break_sync.mdwn new file mode 100644 index 0000000000..6a5fc6ce7e --- /dev/null +++ b/doc/bugs/macos_switch_to_openrsync_seems_to_break_sync.mdwn @@ -0,0 +1,46 @@ +### Please describe the problem. +Syncing content to an rsync remote no longer works on macOS 26.3. +(Specifically an encrypted rsync remote with shared encryption) + +### What steps will reproduce the problem? +MacOS 26.3 (with openrsync) +run: `git annex sync my-rsync-remote --content` +observe: `rsync error: unexpected end of file.` +(these files are all fine and can be opened locally) + +### What version of git-annex are you using? On what operating system? +MacOS 26.3 with git-annex from homebrew + +git-annex version: 10.20260213 +build flags: Assistant Webapp FsEvents TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV Servant +dependency versions: aws-0.25.2 bloomfilter-2.0.1.3 crypton-1.0.6 DAV-1.3.4 feed-1.3.2.1 ghc-9.14.1 http-client-0.7.19 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: darwin aarch64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 + +### Please provide any additional information below. + +I have a suspicion its because macOS now has openrsync instead of rsync. +The files themselves all seem to be fine. +The error I'm seeing seems to be some kind of rsync crash in the background. +Googling shows that openrsync doesn't accept all the command line arguments rsync does. +I don't know how git-annex is using it in the backend but I imagine this could be something that would break it. +Are you able to confirm if git-annex should work on MacOS with openrsync? (default) +If it is indeed the case that openrsync isn't supported, checking compatibility and aborting with a message directing the user on how to switch to use the other rsync would be great. +Note: It's totally possible its something else... no idea what though. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +Yeah was working great for syncing and archiving large static files across multiple drives and machines. (helps deduping them too) +Have also had success in the past syncing to NAS storage using rsync remote and then using it to pull data on demand when away from home. +Thanks for taking the time to read this.
switch arm autobuild links to armhf
armel no longer being updated
armel no longer being updated
diff --git a/doc/install/Linux_standalone.mdwn b/doc/install/Linux_standalone.mdwn index 57bcf5bfae..83bae6af9b 100644 --- a/doc/install/Linux_standalone.mdwn +++ b/doc/install/Linux_standalone.mdwn @@ -7,7 +7,7 @@ dependencies and is self-contained. * x86-64: [download tarball](https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-amd64.tar.gz) * x86-32: [download tarball](https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-i386.tar.gz) -* arm: [download tarball](https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-armel.tar.gz) +* arm: [download tarball](https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-armhf.tar.gz) * arm64: [download tarball](https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-arm64.tar.gz) * arm64, for ancient kernels: [download tarball](https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-arm64-ancient.tar.gz) @@ -37,7 +37,7 @@ An hourly autobuild is also available, hosted by [[Joey]]: * x86-32: [download tarball](https://downloads.kitenet.net/git-annex/autobuild/i386/git-annex-standalone-i386.tar.gz) ([build logs](https://downloads.kitenet.net/git-annex/autobuild/i386/)) * arm64: [download tarball](https://downloads.kitenet.net/git-annex/autobuild/arm64/git-annex-standalone-arm64.tar.gz) ([build logs](https://downloads.kitenet.net/git-annex/autobuild/arm64/)) * arm64, for ancient kernels: [download tarball](https://downloads.kitenet.net/git-annex/autobuild/arm64-ancient/git-annex-standalone-arm64-ancient.tar.gz) ([build logs](https://downloads.kitenet.net/git-annex/autobuild/arm64-ancient/)) -* arm: [download tarball](https://downloads.kitenet.net/git-annex/autobuild/armel/git-annex-standalone-armel.tar.gz) ([build logs](https://downloads.kitenet.net/git-annex/autobuild/armel/)) +* arm: [download tarball](https://downloads.kitenet.net/git-annex/autobuild/armhf/git-annex-standalone-armhf.tar.gz) ([build logs](https://downloads.kitenet.net/git-annex/autobuild/armhf/)) ## download security
switch to armhf
diff --git a/doc/builds.mdwn b/doc/builds.mdwn index dc639170c7..48744dda62 100644 --- a/doc/builds.mdwn +++ b/doc/builds.mdwn @@ -9,8 +9,8 @@ <h2>Linux amd64</h2> <iframe width=1024 height=40em scrolling=no frameborder=0 marginheight=0 marginwidth=0 src="https://downloads.kitenet.net/git-annex/autobuild/amd64/build-version"> </iframe> -<h2>Linux armel</h2> -<iframe width=1024 height=40em scrolling=no frameborder=0 marginheight=0 marginwidth=0 src="https://downloads.kitenet.net/git-annex/autobuild/armel/build-version"> +<h2>Linux armhf</h2> +<iframe width=1024 height=40em scrolling=no frameborder=0 marginheight=0 marginwidth=0 src="https://downloads.kitenet.net/git-annex/autobuild/armhf/build-version"> </iframe> <h2>Linux arm64</h2> <iframe width=1024 height=40em scrolling=no frameborder=0 marginheight=0 marginwidth=0 src="https://downloads.kitenet.net/git-annex/autobuild/arm64/build-version"> @@ -34,8 +34,8 @@ <h2>Linux amd64</h2> <iframe width=1024 scrolling=no frameborder=0 marginheight=0 marginwidth=0 src="https://downloads.kitenet.net/git-annex/autobuild/amd64/"> </iframe> -<h2>Linux armel</h2> -<iframe width=1024 scrolling=no frameborder=0 marginheight=0 marginwidth=0 src="https://downloads.kitenet.net/git-annex/autobuild/armel/"> +<h2>Linux armhf</h2> +<iframe width=1024 scrolling=no frameborder=0 marginheight=0 marginwidth=0 src="https://downloads.kitenet.net/git-annex/autobuild/armhf/"> </iframe> <h2>Linux arm64</h2> <iframe width=1024 scrolling=no frameborder=0 marginheight=0 marginwidth=0 src="https://downloads.kitenet.net/git-annex/autobuild/arm64/">
comments
diff --git a/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked/comment_2_922d428bb1506ba413b9ae6e5119aa25._comment b/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked/comment_2_922d428bb1506ba413b9ae6e5119aa25._comment new file mode 100644 index 0000000000..a85edbb7ce --- /dev/null +++ b/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked/comment_2_922d428bb1506ba413b9ae6e5119aa25._comment @@ -0,0 +1,29 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-02-16T18:24:00Z" + content=""" +Looked in more detail into fixing this by moving the ignore check to after +a set of files has been gathered and fed through `git ls-files`. Unfortunately +that will be complicated significantly by the fact that, after the ignore check +it currently does things like re-writing symlinks to annex objects when the +link target needs updating. There is a chicken and egg problem here, +because the type of Change that gets queued depends on parts of that same +code having run. + +BTW: Another way this same bug can manifest is that an annex object is added +to a submodule, and the assistant updates its symlink to point out of the +submodule, to the wrong annex objects directory. + +There is some very delicate timing going on in +Assistant.Threads.Committer in order to gather Changes that happen close +together in time. Which makes me think that even a simple approach of +running `git ls-files` once per changed file, before the ignore check, +might throw the timing off enough to be a problem. As well as being murder +on the CPU when eg, a lot of files have been moved around. + +Note that [[todo/replace_assistant_with_assist]] would fix this bug, +since `git-annex assist` does use `git ls-files`. Not that implementing +that would be any easier than just fixing this bug. But, fixing this bug +moves the assistant in the direction of that todo one way or the other. +"""]] diff --git a/doc/todo/replace_assistant_with_assist/comment_1_685f1aa27ff31fd24cb987f9ff743d93._comment b/doc/todo/replace_assistant_with_assist/comment_1_685f1aa27ff31fd24cb987f9ff743d93._comment new file mode 100644 index 0000000000..600430c568 --- /dev/null +++ b/doc/todo/replace_assistant_with_assist/comment_1_685f1aa27ff31fd24cb987f9ff743d93._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject=""""Gather inotify events"""" + date="2026-02-16T18:51:50Z" + content=""" +The assistant has some very tricky, and probably also fragile code that +gathers related inotify events. That would need to be factored out for +this. +"""]]
close
diff --git a/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file.mdwn b/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file.mdwn index 5303ad1b52..d1d59cec5c 100644 --- a/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file.mdwn +++ b/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file.mdwn @@ -105,3 +105,4 @@ when I was lucky - yes. [[!meta author=yoh]] [[!tag projects/repronim]] +> [[done]] --[[Joey]]
comment
diff --git a/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file/comment_3_f03f1b609b6b3ab46081913977488230._comment b/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file/comment_3_f03f1b609b6b3ab46081913977488230._comment new file mode 100644 index 0000000000..da4a15f67d --- /dev/null +++ b/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file/comment_3_f03f1b609b6b3ab46081913977488230._comment @@ -0,0 +1,54 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-02-16T16:35:28Z" + content=""" +Congrats I guess, that's the first LLM-generated patch to git-annex, and +it seems approximately correct. + +It was unambiguously helpful to get the hint that `Remote/Git.hs:485` +was the location of the bug. That probably saved 10 minutes of my time. + +But, I probably would have found it easier to fix this on my own without +seeing that patch than it was to fix it given that patch. I had to do a +considerable amount of thinking about whether the patch was correct, or +just confidently sounding incorrect in a different manner than a +human-generated patch would be. (Not helped, certainly, by this being an +area of the code with no type system guardrails helping it be correct.) + +For one thing, I wondered, why does it use isUnescapedInURIComponent rather +than isUnescapedInURI? The latter handles '/' correctly without needing a +special case. + +Being faced with an LLM-generated patch also meant that I needed to consider +what its license is. I was faced with needing to clean-room my own version, +which is a bit difficult given how short the patch is (while probably still +long enough to be copyrightable). + +But, it turns out that git-annex already contains essentially the same +code in Remote/S3.hs, in genericPublicUrl: + + baseurl Posix.</> escapeURIString skipescape p + where + -- Don't need to escape '/' because the bucket object + -- is not necessarily a single url component. + -- But do want to escape eg '+' and ' ' + skipescape '/' = True + skipescape c = isUnescapedInURIComponent c + +This code was presumably in the LLM's training set, and certainly appeared +to be available to it for context, so its mirroring of this could simply be +a case of Garbage In, Garbage Out. + +Note that "skipescape" is a much better name than the LLM-generated +"escchar" which behaves backwards from what its name suggests. + +Why did I use isUnescapedInURIComponent in that and isUnescapedInURI +in Remote/WebDav/DavLocation.hs? +I doubt there was a good reason for either choice, but a full analysis +did find a reason to prefer the isUnescapedInURIComponent approach, +to handle a path containing '[' or ']. + +So, in [[!commit 8fd9b67ed82ca0f39796a8d59431d42a7eb84957]], I've +factored out a general purpose function, and fixed this bug by using it. +"""]]
comment
diff --git a/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file/comment_2_d2dee9d9be3ad9a6726397be2093e92d._comment b/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file/comment_2_d2dee9d9be3ad9a6726397be2093e92d._comment new file mode 100644 index 0000000000..1f1214eaa9 --- /dev/null +++ b/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file/comment_2_d2dee9d9be3ad9a6726397be2093e92d._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-02-16T16:09:47Z" + content=""" +> SHA256E keys (like SHA256E-s107998--4545...) contain only alphanumeric characters + +> The keyFile encoding produces no % or & characters + +Incorrect statements FWIW. + +Certian SHA*E keys will also be affected by this bug. +"""]]
Added a comment
diff --git a/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag/comment_3_c127a0a56bacf6d6faabe4964afad37e._comment b/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag/comment_3_c127a0a56bacf6d6faabe4964afad37e._comment new file mode 100644 index 0000000000..fca8715d4a --- /dev/null +++ b/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag/comment_3_c127a0a56bacf6d6faabe4964afad37e._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 3" + date="2026-02-16T11:19:58Z" + content=""" +I must have forgotten to set the email replies checkbox. Thanks for implementing this, it sounds exactly like what I want :) +"""]]
add news item for git-annex 10.20260213
diff --git a/doc/news/version_10.20250929.mdwn b/doc/news/version_10.20250929.mdwn deleted file mode 100644 index 4d46ac2cf1..0000000000 --- a/doc/news/version_10.20250929.mdwn +++ /dev/null @@ -1,7 +0,0 @@ -git-annex 10.20250929 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * enableremote: Allow type= to be provided when it does not change the - type of the special remote. - * importfeed: Fix encoding issues parsing feeds when built with OsPath. - * Fix build with ghc 9.0.2. - * Remove the Servant build flag; always build with support for - annex+http urls and git-annex p2phttp."""]] \ No newline at end of file diff --git a/doc/news/version_10.20260213.mdwn b/doc/news/version_10.20260213.mdwn new file mode 100644 index 0000000000..873cee77a3 --- /dev/null +++ b/doc/news/version_10.20260213.mdwn @@ -0,0 +1,30 @@ +git-annex 10.20260213 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * When used with git forges that allow Push to Create, the remote's + annex-uuid is re-probed after the initial push. + * addurl, importfeed: Enable --verifiable by default. + * fromkey, registerurl: When passed an url, generate a VURL key. + * unregisterurl: Unregister both VURL and URL keys. + * Fix behavior of local git remotes that have annex-ignore + set to be the same as ssh git remotes. + * Added annex.security.allow-insecure-https config, which allows + using old http servers that use TLS 1.2 without Extended Main + Secret support. + * p2phttp: Commit git-annex branch changes promptly. + * p2phttp: Fix a server stall by disabling warp's slowloris attack + prevention. + * p2phttp: Added --cpus option. + * Avoid ever starting more capabilities than the number of cpus. + * fsck: Support repairing a corrupted file in a versioned S3 remote. + * Fix incorrect transfer direction in remote transfer log when + downloading from a local git remote. + * Fix bug that prevented 2 clones of a local git remote + from concurrently downloading the same file. + * rsync: Avoid deleting contents of a non-empty directory when + removing the last exported file from the directory. + * unregisterurl: Fix display of action to not be "registerurl". + * The OsPath build flag requires file-io 0.2.0, which fixes several + issues. + * Remove deprecated commands direct, indirect, proxy, and transferkeys. + * Deprecate undo command. + * Remove undo action from kde and nautilus integrations. + * Fix build on BSDs. Thanks, Greg Steuck"""]] \ No newline at end of file
Added a comment: Ensuring only one process
diff --git a/doc/design/external_special_remote_protocol/async_appendix/comment_4_9ce6f33448a4fde446365746fd094cd6._comment b/doc/design/external_special_remote_protocol/async_appendix/comment_4_9ce6f33448a4fde446365746fd094cd6._comment new file mode 100644 index 0000000000..97fa175719 --- /dev/null +++ b/doc/design/external_special_remote_protocol/async_appendix/comment_4_9ce6f33448a4fde446365746fd094cd6._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="calmofthestorm" + avatar="http://cdn.libravatar.org/avatar/9d97e9bcb1cf7680309e37cd69fab408" + subject="Ensuring only one process" + date="2026-02-13T06:01:44Z" + content=""" +I changed it so that each instance binds an RPC server to a UNIX domain socket and connects to it, so one gets chosen as coordinator, and it works great. Probably overkill, but it works. Still interested in the question of how this **should** work though. +"""]]
Added a comment: the fix
diff --git a/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file/comment_1_12ec16bb350e9a291b6ce39bceeea692._comment b/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file/comment_1_12ec16bb350e9a291b6ce39bceeea692._comment new file mode 100644 index 0000000000..0ecad855ce --- /dev/null +++ b/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file/comment_1_12ec16bb350e9a291b6ce39bceeea692._comment @@ -0,0 +1,75 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="the fix" + date="2026-02-13T01:43:15Z" + content=""" +TL;DR, the patch + +```patch +diff --git a/Remote/Git.hs b/Remote/Git.hs +index 6b7dc77d98..4faaea082d 100644 +--- a/Remote/Git.hs ++++ b/Remote/Git.hs +@@ -482,7 +482,12 @@ inAnnex' repo rmt st@(State connpool duc _ _ _ _) key + keyUrls :: GitConfig -> Git.Repo -> Remote -> Key -> [String] + keyUrls gc repo r key = map tourl locs' + where +- tourl l = Git.repoLocation repo ++ \"/\" ++ l ++ tourl l = Git.repoLocation repo ++ \"/\" ++ escapeURIString escchar l ++ -- Escape characters that are not allowed unescaped in a URI ++ -- path component, but don't escape '/' since the location ++ -- is a path with multiple components. ++ escchar '/' = True ++ escchar c = isUnescapedInURIComponent c + -- If the remote is known to not be bare, try the hash locations + -- used for non-bare repos first, as an optimisation. + locs +``` + +seems to work well. Built in https://github.com/datalad/git-annex/pull/251 (CI tests still run), tested locally: + +``` +❯ /usr/bin/git-annex version +git-annex version: 10.20260115+git119-g43a3f3aaf2-1~ndall+1 +build flags: Assistant Webapp Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV Servant +dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.34 DAV-1.3.4 feed-1.3.2.1 ghc-9.6.6 http-client-0.7.17 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1 +... +❯ /usr/bin/git-annex get --from origin video.mkv +get video.mkv (from origin...) ok +(recording state in git...) +``` +to work. Here is claude's analysis which lead it to the fix: + +``` + Bug Analysis: fails_to_get_from_apache2_server_URL_backend_file + + Root Cause + + The bug is in Remote/Git.hs:485 — the keyUrls function constructs URLs by simple string concatenation without URL-encoding the path components: + + tourl l = Git.repoLocation repo ++ \"/\" ++ l + + How the failure occurs + + 1. Key: URL--yt:https://www.youtube.com/watch,63v,613ZXfZfnRfyM + 2. keyFile encoding (Annex/Locations.hs:783-795) converts : → &c and / → %: + URL--yt&chttps&c%%www.youtube.com%watch,63v,613ZXfZfnRfyM + 3. keyUrls concatenates this directly into the URL path: + https://datasets.datalad.org/.../.git//annex/objects/zZ/3v/URL--yt&chttps&c%%www.youtube.com%watch,63v,613ZXfZfnRfyM/... + 4. parseURIRelaxed (Utility/Url/Parse.hs:45-47) tries to parse this URL. It calls escapeURIString isAllowedInURI first, but % is allowed in URIs (it's the percent-encoding introducer), so it passes through unescaped. + 5. parseURI then sees %%w and %wa which are invalid percent-encoding sequences (% must be followed by two hex digits). The parse fails, returning Nothing. + 6. download' (Utility/Url.hs:389-391) hits the Nothing branch and returns \"invalid url\". + + Why SHA256E keys work + + SHA256E keys (like SHA256E-s107998--4545...) contain only alphanumeric characters, -, and .. The keyFile encoding produces no % or & characters, so the concatenated URL is always valid. + + The fix + + keyUrls in Remote/Git.hs:485 needs to URL-encode the path components. Other remotes already do this: + + - S3 (Remote/S3.hs:1221-1229): uses escapeURIString with a custom predicate keeping / but encoding everything else + - WebDAV (Remote/WebDAV/DavLocation.hs:35): uses escapeURIString isUnescapedInURI +``` +"""]]
report on inability to get video!
diff --git a/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file.mdwn b/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file.mdwn new file mode 100644 index 0000000000..5303ad1b52 --- /dev/null +++ b/doc/bugs/fails_to_get_from_apache2_server_URL_backend_file.mdwn @@ -0,0 +1,107 @@ +### Please describe the problem. + +For a relaxed url youtube video, git-annex seems just completely skip even trying (I see no apache2 log hits) to download from http git remote where it even points to correct HTTP address, and then just proceeds to yt-dlp to just fail there: + +```shell +❯ git clone https://datasets.datalad.org/repronim/ReproTube/AFNIBootcamp/.git/ +Cloning into 'AFNIBootcamp'... +remote: Enumerating objects: 5904, done. +remote: Counting objects: 100% (5904/5904), done. +remote: Compressing objects: 100% (1793/1793), done. +remote: Total 5904 (delta 2659), reused 5554 (delta 2644), pack-reused 0 (from 0) +Receiving objects: 100% (5904/5904), 743.23 KiB | 2.23 MiB/s, done. +Resolving deltas: 100% (2659/2659), done. +❯ cd AFNIBootcamp +authors.tsv@ channel.json channel_avatar.jpg@ playlists/ videos/ +❯ git annex whereis videos/2020/04/2020-04-17_AFNI-Academy-AFNI-GUI-Clusterizing/video.mkv +whereis videos/2020/04/2020-04-17_AFNI-Academy-AFNI-GUI-Clusterizing/video.mkv (3 copies) + 00000000-0000-0000-0000-000000000001 -- web + cc815e85-73bc-4a5c-81c3-81a39b0c677b -- yoh@falkor:/srv/datasets.datalad.org/www/repronim/ReproTube/AFNIBootcamp [origin] + f574aace-b921-4987-b376-f43cfcc0e925 -- annextube YouTube archive + + web: https://www.youtube.com/watch?v=3ZXfZfnRfyM +ok +❯ git annex --debug get --from origin videos/2020/04/2020-04-17_AFNI-Academy-AFNI-GUI-Clusterizing/video.mkv +[2026-02-12 17:58:20.476127402] (Utility.Process) process [1348659] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"] +[2026-02-12 17:58:20.477586712] (Utility.Process) process [1348659] done ExitSuccess +[2026-02-12 17:58:20.477947473] (Utility.Process) process [1348660] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"] +[2026-02-12 17:58:20.479504195] (Utility.Process) process [1348660] done ExitSuccess +[2026-02-12 17:58:20.480128621] (Utility.Process) process [1348661] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..164c7074ef367be9c939366c3febb2322f70c103","--pretty=%H","-n1"] +[2026-02-12 17:58:20.482481122] (Utility.Process) process [1348661] done ExitSuccess +[2026-02-12 17:58:20.484072231] (Utility.Process) process [1348662] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"] +[2026-02-12 17:58:20.488013705] (Utility.Process) process [1348663] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","videos/2020/04/2020-04-17_AFNI-Academy-AFNI-GUI-Clusterizing/video.mkv"] +[2026-02-12 17:58:20.488431021] (Utility.Process) process [1348664] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2026-02-12 17:58:20.488864415] (Utility.Process) process [1348665] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2026-02-12 17:58:20.489285814] (Utility.Process) process [1348666] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2026-02-12 17:58:20.491835062] (Utility.Process) process [1348666] done ExitSuccess +[2026-02-12 17:58:20.491913957] (Utility.Process) process [1348665] done ExitSuccess +[2026-02-12 17:58:20.491944604] (Utility.Process) process [1348664] done ExitSuccess +[2026-02-12 17:58:20.491970167] (Utility.Process) process [1348663] done ExitSuccess +get videos/2020/04/2020-04-17_AFNI-Academy-AFNI-GUI-Clusterizing/video.mkv (from origin...) +[2026-02-12 17:58:20.516237522] (Utility.Url) https://datasets.datalad.org/repronim/ReproTube/AFNIBootcamp/.git//annex/objects/zZ/3v/URL--yt&chttps&c%%www.youtube.com%watch,63v,613ZXfZfnRfyM/URL--yt&chttps&c%%www.youtube.com%watch,63v,613ZXfZfnRfyM +[2026-02-12 17:58:20.519744566] (Utility.Url) https://datasets.datalad.org/repronim/ReproTube/AFNIBootcamp/.git//annex/objects/950/20d/URL--yt&chttps&c%%www.youtube.com%watch,63v,613ZXfZfnRfyM/URL--yt&chttps&c%%www.youtube.com%watch,63v,613ZXfZfnRfyM + download failed: invalid url + + failed to download content +(Delaying 1s before retrying....) +[2026-02-12 17:58:21.524457718] (Utility.Url) https://datasets.datalad.org/repronim/ReproTube/AFNIBootcamp/.git//annex/objects/zZ/3v/URL--yt&chttps&c%%www.youtube.com%watch,63v,613ZXfZfnRfyM/URL--yt&chttps&c%%www.youtube.com%watch,63v,613ZXfZfnRfyM +[2026-02-12 17:58:21.527050375] (Utility.Url) https://datasets.datalad.org/repronim/ReproTube/AFNIBootcamp/.git//annex/objects/950/20d/URL--yt&chttps&c%%www.youtube.com%watch,63v,613ZXfZfnRfyM/URL--yt&chttps&c%%www.youtube.com%watch,63v,6get videos/2020/04/2020-04-17_AFNI-Academy-AFNI-GUI-Clusterizing/video.mkv (from origin...) + download failed: invalid url + + failed to download content +(Delaying 1s before retrying....) + + download failed: invalid url + + failed to download content +(Delaying 2s before retrying....) + + download failed: invalid url + + failed to download content +failed +[2026-02-12 17:58:23.537290686] (Utility.Process) process [1348662] done ExitSuccess +get: 1 failed +``` + +for a simpler file -- works fine + +``` +❯ git annex whereis channel_avatar.jpg +whereis channel_avatar.jpg (2 copies) + cc815e85-73bc-4a5c-81c3-81a39b0c677b -- yoh@falkor:/srv/datasets.datalad.org/www/repronim/ReproTube/AFNIBootcamp [origin] + f574aace-b921-4987-b376-f43cfcc0e925 -- annextube YouTube archive +ok +❯ git annex get --from origin channel_avatar.jpg +get channel_avatar.jpg (from origin...) ok +(recording state in git...) +❯ ls -l channel_avatar.jpg +lrwxrwxrwx 1 yoh yoh 196 Feb 12 17:57 channel_avatar.jpg -> .git/annex/objects/54/77/SHA256E-s107998--454529608f75da5804000d74018ff790ec24a03eef3544fc44c28071e31acd15.jpg/SHA256E-s107998--454529608f75da5804000d74018ff790ec24a03eef3544fc44c28071e31acd15.jpg +``` + +### What steps will reproduce the problem? + +``` + git clone https://datasets.datalad.org/repronim/ReproTube/AFNIBootcamp/.git/ + cd AFNIBootcamp + git annex whereis videos/2020/04/2020-04-17_AFNI-Academy-AFNI-GUI-Clusterizing/video.mkv + git annex --debug get --from origin videos/2020/04/2020-04-17_AFNI-Academy-AFNI-GUI-Clusterizing/video.mkv +``` + +### What version of git-annex are you using? On what operating system? + + +```shell +❯ git annex version +git-annex version: 10.20250929-gf014fd60d05a3407e2f747e0394997d3780eeafc +``` +but did try even most recent + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +when I was lucky - yes. + +[[!meta author=yoh]] +[[!tag projects/repronim]] +
removed
diff --git a/doc/special_remotes/S3/comment_41_4e5ec774da272291dfd39bf04e3075ed._comment b/doc/special_remotes/S3/comment_41_4e5ec774da272291dfd39bf04e3075ed._comment
deleted file mode 100644
index f91c71e705..0000000000
--- a/doc/special_remotes/S3/comment_41_4e5ec774da272291dfd39bf04e3075ed._comment
+++ /dev/null
@@ -1,31 +0,0 @@
-[[!comment format=mdwn
- username="Basile.Pinsard"
- avatar="http://cdn.libravatar.org/avatar/87e1f73acf277ad0337b90fc0253c62e"
- subject="git-annex: potential data loss with initremote and push with S3 special remotes. "
- date="2026-02-12T20:41:19Z"
- content="""
-A colleague used a wrong config, which was pointing to minio console rather than the S3 endpoint. When they ran initremote, the console wrongfully replied 200-OK when PUTting the annex-uuid file, same when they then pushed the data. The minio console always redirect to a login page, and doesn't fail on PUT ( which is non-compliant ). So the dataset recorded all the data being present in that remote, while there was no trace of any buckets or objects in the S3.
-
-## steps to reproduce:
-
-```
-git init test_s3
-cd test_s3/
-git-annex init
-export AWS_ACCESS_KEY_ID=john AWS_SECRET_ACCESS_KEY=doe
-git annex initremote -d test_remote host=\"play.min.io\" bucket=\"test_bucket\" type=S3 encryption=none autoenable=true port=9443 protocol=https chunk=1GiB requeststyle=pathecho test > test_annexed_file
-git-annex add test_annexed_file
-git commit -m 'add annexed file'
-git-annex copy --fast --to test_remote
-```
-
-I am showing it with `--fast` flag here, as this is what datalad uses by default. Without `--fast`, it fails with (HeaderException {headerErrorMessage = \"ETag missing\"}) failed which is better.
-
-So to sum it up, the unfortunate circumstances are:
-
-1. the initremote PUT of annex-uuid is not performing check that the annex-uuid file was effectively pushed in a bucket.
-2. minio console replies with 200-OK for all http requests
-3. datalad uses `push --fast` by default, which recorded files as being pushed without performing a HEAD after push. I guess that's for performance reason, but that is dangerous if a server or reverse-proxy ends-up responding 200-OK to all requests after init.
-
-Thanks for your help!
-"""]]
Added a comment: git-annex: potential data loss with initremote and push with S3 special remotes.
diff --git a/doc/special_remotes/S3/comment_41_4e5ec774da272291dfd39bf04e3075ed._comment b/doc/special_remotes/S3/comment_41_4e5ec774da272291dfd39bf04e3075ed._comment
new file mode 100644
index 0000000000..f91c71e705
--- /dev/null
+++ b/doc/special_remotes/S3/comment_41_4e5ec774da272291dfd39bf04e3075ed._comment
@@ -0,0 +1,31 @@
+[[!comment format=mdwn
+ username="Basile.Pinsard"
+ avatar="http://cdn.libravatar.org/avatar/87e1f73acf277ad0337b90fc0253c62e"
+ subject="git-annex: potential data loss with initremote and push with S3 special remotes. "
+ date="2026-02-12T20:41:19Z"
+ content="""
+A colleague used a wrong config, which was pointing to minio console rather than the S3 endpoint. When they ran initremote, the console wrongfully replied 200-OK when PUTting the annex-uuid file, same when they then pushed the data. The minio console always redirect to a login page, and doesn't fail on PUT ( which is non-compliant ). So the dataset recorded all the data being present in that remote, while there was no trace of any buckets or objects in the S3.
+
+## steps to reproduce:
+
+```
+git init test_s3
+cd test_s3/
+git-annex init
+export AWS_ACCESS_KEY_ID=john AWS_SECRET_ACCESS_KEY=doe
+git annex initremote -d test_remote host=\"play.min.io\" bucket=\"test_bucket\" type=S3 encryption=none autoenable=true port=9443 protocol=https chunk=1GiB requeststyle=pathecho test > test_annexed_file
+git-annex add test_annexed_file
+git commit -m 'add annexed file'
+git-annex copy --fast --to test_remote
+```
+
+I am showing it with `--fast` flag here, as this is what datalad uses by default. Without `--fast`, it fails with (HeaderException {headerErrorMessage = \"ETag missing\"}) failed which is better.
+
+So to sum it up, the unfortunate circumstances are:
+
+1. the initremote PUT of annex-uuid is not performing check that the annex-uuid file was effectively pushed in a bucket.
+2. minio console replies with 200-OK for all http requests
+3. datalad uses `push --fast` by default, which recorded files as being pushed without performing a HEAD after push. I guess that's for performance reason, but that is dangerous if a server or reverse-proxy ends-up responding 200-OK to all requests after init.
+
+Thanks for your help!
+"""]]
Added a comment: git-annex: potential data loss with initremote and push with S3 special remotes.
diff --git a/doc/special_remotes/S3/comment_40_63191a8ab9482ef5ff8503261bdf6b8d._comment b/doc/special_remotes/S3/comment_40_63191a8ab9482ef5ff8503261bdf6b8d._comment
new file mode 100644
index 0000000000..294f7200ea
--- /dev/null
+++ b/doc/special_remotes/S3/comment_40_63191a8ab9482ef5ff8503261bdf6b8d._comment
@@ -0,0 +1,31 @@
+[[!comment format=mdwn
+ username="Basile.Pinsard"
+ avatar="http://cdn.libravatar.org/avatar/87e1f73acf277ad0337b90fc0253c62e"
+ subject="git-annex: potential data loss with initremote and push with S3 special remotes. "
+ date="2026-02-12T20:40:44Z"
+ content="""
+A colleague used a wrong config, which was pointing to minio console rather than the S3 endpoint. When they ran initremote, the console wrongfully replied 200-OK when PUTting the annex-uuid file, same when they then pushed the data. The minio console always redirect to a login page, and doesn't fail on PUT ( which is non-compliant ). So the dataset recorded all the data being present in that remote, while there was no trace of any buckets or objects in the S3.
+
+## steps to reproduce:
+
+```
+git init test_s3
+cd test_s3/
+git-annex init
+export AWS_ACCESS_KEY_ID=john AWS_SECRET_ACCESS_KEY=doe
+git annex initremote -d test_remote host=\"play.min.io\" bucket=\"test_bucket\" type=S3 encryption=none autoenable=true port=9443 protocol=https chunk=1GiB requeststyle=pathecho test > test_annexed_file
+git-annex add test_annexed_file
+git commit -m 'add annexed file'
+git-annex copy --fast --to test_remote
+```
+
+I am showing it with `--fast` flag here, as this is what datalad uses by default. Without `--fast`, it fails with (HeaderException {headerErrorMessage = \"ETag missing\"}) failed which is better.
+
+So to sum it up, the unfortunate circumstances are:
+
+1. the initremote PUT of annex-uuid is not performing check that the annex-uuid file was effectively pushed in a bucket.
+2. minio console replies with 200-OK for all http requests
+3. datalad uses `push --fast` by default, which recorded files as being pushed without performing a HEAD after push. I guess that's for performance reason, but that is dangerous if a server or reverse-proxy ends-up responding 200-OK to all requests after init.
+
+Thanks for your help!
+"""]]
Added a comment: Ensuring only one process
diff --git a/doc/design/external_special_remote_protocol/async_appendix/comment_3_cb80269f5a1ebc292bc63f2736f13262._comment b/doc/design/external_special_remote_protocol/async_appendix/comment_3_cb80269f5a1ebc292bc63f2736f13262._comment new file mode 100644 index 0000000000..2aecd31bed --- /dev/null +++ b/doc/design/external_special_remote_protocol/async_appendix/comment_3_cb80269f5a1ebc292bc63f2736f13262._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="calmofthestorm" + avatar="http://cdn.libravatar.org/avatar/9d97e9bcb1cf7680309e37cd69fab408" + subject="Ensuring only one process" + date="2026-02-12T19:10:11Z" + content=""" +My remote (version 1) uses a database that is multithreaded but has process-level locking. Despite using async, multiple remote processes are still being started in `testremote`. Right now I have it working with POSIX advisory locks and open/close the database for each operation in each thread in each process, but that's a lot of overhead. Is there a better way to do this? I could make them coordinate via IPC, or have them release the lock only when idle/when others are waiting, but it seems like it shouldn't be that complex. + +I get clear failures when I use `testremote`. On real workloads (with -j 24) it is more confusing. There are not errors, but at some point the `git-annex` command hangs. Quite possibly a bug in my code, given `testremote` is failing. + +I guess my question is: Is there a way to force git-annex to only use one special remote process, either by configuration or by having all but the first return \"use the other one\" (without -j 1 always)? And does the way this is handled differ between actual use and `testremote`? + +Or to put it another way: how do you envision one should design a special remote that supports concurrency and relies on a database with process-level locking? +"""]]
update
diff --git a/doc/todo/replace_assistant_with_assist.mdwn b/doc/todo/replace_assistant_with_assist.mdwn
index 939717afab..a7fc10c923 100644
--- a/doc/todo/replace_assistant_with_assist.mdwn
+++ b/doc/todo/replace_assistant_with_assist.mdwn
@@ -23,15 +23,17 @@ Basically replacing the assistant needs 3 things:
change has been made to a remote.
3. Wait for commits and trigger `git-annex push` to remotes.
-There is more than that to the assistant, eg automatic periodic fscking,
-various attempts to diagnose and fix problems with repositories, live
-detection of configuration changes, detecting drive mount events, etc. But
-those 3 would be enough for most users.
-
Those could be 3 separate programs, which would gain the benefits of
composition. If the user only wants automatic commits but not pushing or
pulling, they can run one 1 program.
+There is more than that to the assistant, eg automatic periodic fscking,
+various attempts to diagnose and fix problems with repositories, live
+detection of configuration changes, detecting drive mount events, etc. But
+those 3 would be enough for most users. Alternatively, keeping that other
+stuff, but replacing the parts of the assistant that do those three things,
+would also ease maintenance.
+
This would also probably involve [[remove_webapp]], although in theory the
webapp could be retained, with only the parts of the assistant that handle
staging, committing, pull, and push replaced.
fix link
diff --git a/doc/todo/replace_assistant_with_assist.mdwn b/doc/todo/replace_assistant_with_assist.mdwn index 878d8e6da9..939717afab 100644 --- a/doc/todo/replace_assistant_with_assist.mdwn +++ b/doc/todo/replace_assistant_with_assist.mdwn @@ -5,7 +5,7 @@ that the rest of git-annex don't, and doesn't support everything that the rest of git-annex does. For example, the assistant currently -[misbehaves in repos containing a submodule](https://git-annex.branchable.com/bugs/assistant__58___nothing_added_to_commit_but_untracked../). +[misbehaves in repos containing a submodule](https://git-annex.branchable.com/bugs/assistant__58___nothing_added_to_commit_but_untracked/). Fixing that will involve the assistant running `git ls-files`, which it currently does not do. So it will get closer to how the rest of git-annex works. But approaching the rest of git-annex by degrees is an ongoing
rename bug report for windows
Windows is unable to handle a directory name ending with ".."
Windows is unable to handle a directory name ending with ".."
diff --git a/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked...mdwn b/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked.mdwn similarity index 100% rename from doc/bugs/assistant__58___nothing_added_to_commit_but_untracked...mdwn rename to doc/bugs/assistant__58___nothing_added_to_commit_but_untracked.mdwn diff --git a/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked../comment_1_020687ed7ef1cac5bc083d5cffd82edb._comment b/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked/comment_1_020687ed7ef1cac5bc083d5cffd82edb._comment similarity index 100% rename from doc/bugs/assistant__58___nothing_added_to_commit_but_untracked../comment_1_020687ed7ef1cac5bc083d5cffd82edb._comment rename to doc/bugs/assistant__58___nothing_added_to_commit_but_untracked/comment_1_020687ed7ef1cac5bc083d5cffd82edb._comment
todo
diff --git a/doc/todo/replace_assistant_with_assist.mdwn b/doc/todo/replace_assistant_with_assist.mdwn new file mode 100644 index 0000000000..878d8e6da9 --- /dev/null +++ b/doc/todo/replace_assistant_with_assist.mdwn @@ -0,0 +1,37 @@ +`git-annex assistant` is a complicated mess of race conditions and its own +5 kloc of code that is independent of the rest of git-annex and has to be +separately maintained to keep it at feature parity. Generally it has bugs +that the rest of git-annex don't, and doesn't support everything that the +rest of git-annex does. + +For example, the assistant currently +[misbehaves in repos containing a submodule](https://git-annex.branchable.com/bugs/assistant__58___nothing_added_to_commit_but_untracked../). +Fixing that will involve the assistant running `git ls-files`, which it +currently does not do. So it will get closer to how the rest of git-annex +works. But approaching the rest of git-annex by degrees is an ongoing +maintenance burden. + +So why not throw out the current assistant and replace it with a compositional +system using parts that already exist elsewhere in git-annex? It might also +be possible to use git's own support for inotify (and similar), rather than +reinventing that wheel as well. + +Basically replacing the assistant needs 3 things: + +1. Gather inotify events and trigger an add and commit. +2. Trigger `git-annex pull` when `git-annex remotedaemon` detects a + change has been made to a remote. +3. Wait for commits and trigger `git-annex push` to remotes. + +There is more than that to the assistant, eg automatic periodic fscking, +various attempts to diagnose and fix problems with repositories, live +detection of configuration changes, detecting drive mount events, etc. But +those 3 would be enough for most users. + +Those could be 3 separate programs, which would gain the benefits of +composition. If the user only wants automatic commits but not pushing or +pulling, they can run one 1 program. + +This would also probably involve [[remove_webapp]], although in theory the +webapp could be retained, with only the parts of the assistant that handle +staging, committing, pull, and push replaced.
comment
diff --git a/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked../comment_1_020687ed7ef1cac5bc083d5cffd82edb._comment b/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked../comment_1_020687ed7ef1cac5bc083d5cffd82edb._comment new file mode 100644 index 0000000000..485abcbd9e --- /dev/null +++ b/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked../comment_1_020687ed7ef1cac5bc083d5cffd82edb._comment @@ -0,0 +1,48 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-02-09T17:08:04Z" + content=""" +This is due to the assistant not supporting submodules. Nothing has ever +been done to make it support them. + +When `git check-ignore --stdin` is passed a path in a submodule, it exits. +We can see this happen near the top of the log: + + fatal: Pathspec 'code/containers/.codespellrc' is in submodule 'code/containers' + + git check-ignore EOF: user error + +The subseqent "resource vanished (Broken pipe)" +are each time git-annex tries to talk to git check-ignore. + +Indeed, looking at the source code to check-ignore, if it's passed a path +inside a submodule, it errors out, and so won't be listening to stdin +for any more paths: + + joey@darkstar:~/tmp/t>git check-ignore --stdin + r/x + fatal: Pathspec 'r/x' is in submodule 'r' + - exit 128 + +And I was able to reproduce this by having a submodule with a file in it, +and starting the assistant. + +In some cases, the assistant still added files despite check-ignore +having crashed. (It will even add gitignored files when check-ignore has +crashed.) In other cases not. The problem probably extends beyond +check-ignore to also staging files. Eg, "git add submodule/foo bar" will +error out on the file in the submodule and not ever get to the point of +adding the second file. + +Fixing this would need an inexpensive way to query git about whether a file +is in a submodule. Passing the files that +the assistant gathers through `git ls-files --modified --others` +might be the only way to do that. + +Using that at all efficiently would need +some other changes, because it needs to come before the ignore check, +which it currently does for each file event. The ignore check would need to +be moved to the point where a set of files has been gathered, so +ls-files can be run once on the set of files. +"""]]
comment
diff --git a/doc/bugs/concurrent_get_from_separate_clones_fails/comment_4_6dc64825fb863d2bfcf7d4f6078774a5._comment b/doc/bugs/concurrent_get_from_separate_clones_fails/comment_4_6dc64825fb863d2bfcf7d4f6078774a5._comment new file mode 100644 index 0000000000..a716080472 --- /dev/null +++ b/doc/bugs/concurrent_get_from_separate_clones_fails/comment_4_6dc64825fb863d2bfcf7d4f6078774a5._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-02-09T16:56:19Z" + content=""" +> ideally there should be no locking for the entire duration of get since there could be hundreds of clients trying to get that file + +It's somewhat more complex than that, but git-annex's locking does take +concurrenct into account. + +The transfer locking in specific is there to avoid issues like `git-annex +get` of the same file being run in the same repo in 2 different terminals. +So it intentionally does not allow concurrency. Except for in this +particular case where multiple clients are downloading. +"""]]
use alwaysRunTransfer for local remote Upload
Fix bug that prevented 2 clones of a local git remote from concurrently
downloading the same file.
Here the Upload is running from the perspective of the local remote,
so it should not prevent 2 from running at the same time.
It seems this was missed when making the equivilant change to
git-annex-shell sendkey back in 2014, in commit
852185c2428fdaaafb240b74d99663bc5f6627e1
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Fix bug that prevented 2 clones of a local git remote from concurrently
downloading the same file.
Here the Upload is running from the perspective of the local remote,
so it should not prevent 2 from running at the same time.
It seems this was missed when making the equivilant change to
git-annex-shell sendkey back in 2014, in commit
852185c2428fdaaafb240b74d99663bc5f6627e1
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/CHANGELOG b/CHANGELOG
index 7ebb6965a4..fc9641a83e 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -26,6 +26,8 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* Remove undo action from kde and nautilus integrations.
* Fix incorrect transfer direction in remote transfer log when
downloading from a local git remote.
+ * Fix bug that prevented 2 clones of a local git remote
+ from concurrently downloading the same file.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Remote/Git.hs b/Remote/Git.hs
index 807373b7a1..6b7dc77d98 100644
--- a/Remote/Git.hs
+++ b/Remote/Git.hs
@@ -605,7 +605,7 @@ copyFromRemote'' repo r st@(State connpool _ _ _ _ _) key af dest meterupdate vc
Just err -> giveup err
Nothing -> return True
copier <- mkFileCopier hardlink st
- (ok, v) <- runTransfer (Transfer Upload u (fromKey id key))
+ (ok, v) <- alwaysRunTransfer (Transfer Upload u (fromKey id key))
Nothing af Nothing stdRetry $ \p ->
metered (Just (combineMeterUpdate p meterupdate)) key bwlimit $ \_ p' ->
copier object dest key p' checksuccess vc
diff --git a/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn b/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn
index 7ef6832159..325cdea945 100644
--- a/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn
+++ b/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn
@@ -229,3 +229,5 @@ ok
> [[!tag projects/repronim]]
+
+> [[fixed|done]] --[[Joey]]
diff --git a/doc/bugs/concurrent_get_from_separate_clones_fails/comment_3_ece1e3b836c907f4516f711aa34647fe._comment b/doc/bugs/concurrent_get_from_separate_clones_fails/comment_3_ece1e3b836c907f4516f711aa34647fe._comment
new file mode 100644
index 0000000000..4a154e08d3
--- /dev/null
+++ b/doc/bugs/concurrent_get_from_separate_clones_fails/comment_3_ece1e3b836c907f4516f711aa34647fe._comment
@@ -0,0 +1,14 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 3"""
+ date="2026-02-09T16:29:19Z"
+ content="""
+Reproducer worked for me.
+
+This seems specific to using a local git remote, it will not happen over
+ssh.
+
+In Remote.Git, copyFromRemote calls `runTransfer` in the remote repository.
+That should be `alwaysRunTransfer` as is usually used when git-annex is
+running as a server to send files. Which avoids this problem.
+"""]]
Added a comment: Same issue
diff --git a/doc/special_remotes/borg/comment_3_c419e74b5e25388f1f7f99a4f72c8448._comment b/doc/special_remotes/borg/comment_3_c419e74b5e25388f1f7f99a4f72c8448._comment new file mode 100644 index 0000000000..ec2f7c21d5 --- /dev/null +++ b/doc/special_remotes/borg/comment_3_c419e74b5e25388f1f7f99a4f72c8448._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="nadir" + avatar="http://cdn.libravatar.org/avatar/2af9174cf6c06de802104d632dc40071" + subject="Same issue" + date="2026-02-07T10:42:10Z" + content=""" +I can't follow the link as it appears to be broken, so couldn't look whether there was a fix for the high ram usage. + +My borg repo is only about 100 GB, but it has a large amount of files: + +``` +local annex keys: 850128 +local annex size: 117.05 gigabytes +annexed files in working tree: 1197089 +``` + +The first sync action after my first borg backup (only one snapshot) is currently using about 26 GB of my 32 GB of RAM and possibly still climbing. + + +"""]]
Added a comment
diff --git a/doc/bugs/concurrent_get_from_separate_clones_fails/comment_2_dfd6c875fd7c2650e36daa4e13cb5e36._comment b/doc/bugs/concurrent_get_from_separate_clones_fails/comment_2_dfd6c875fd7c2650e36daa4e13cb5e36._comment new file mode 100644 index 0000000000..f6c4887aa9 --- /dev/null +++ b/doc/bugs/concurrent_get_from_separate_clones_fails/comment_2_dfd6c875fd7c2650e36daa4e13cb5e36._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 2" + date="2026-02-06T21:58:36Z" + content=""" +ideally there should be no locking for the entire duration of `get` since there could be hundreds of clients trying to get that file. If locking is needed to update git-annex branch, better be journalled + flushed at once or locked only for git-annex branch edit (anyways better to debounce multiple operations) +"""]]
Deprecate undo command.
* Deprecate undo command.
* Remove undo action from kde and nautilus integrations.
git-annex undo is not git-annex specific in any way. So it does not make
sense for it to be part of git-annex. And the same thing can of course be
done by other git commands.
* Deprecate undo command.
* Remove undo action from kde and nautilus integrations.
git-annex undo is not git-annex specific in any way. So it does not make
sense for it to be part of git-annex. And the same thing can of course be
done by other git commands.
diff --git a/Assistant/Install.hs b/Assistant/Install.hs
index c5710ff213..64b4eff34f 100644
--- a/Assistant/Install.hs
+++ b/Assistant/Install.hs
@@ -113,7 +113,7 @@ installWrapper file content = do
installFileManagerHooks :: OsPath -> IO ()
#ifdef linux_HOST_OS
installFileManagerHooks program = unlessM osAndroid $ do
- let actions = ["get", "drop", "undo"]
+ let actions = ["get", "drop"]
-- Gnome
nautilusScriptdir <- (\d -> d </> literalOsPath "nautilus" </> literalOsPath "scripts") <$> userDataDir
diff --git a/CHANGELOG b/CHANGELOG
index 57a3181ae6..584dd0b7ce 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -22,6 +22,8 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* rsync: Avoid deleting contents of a non-empty directory when
removing the last exported file to the directory.
* Remove deprecated commands direct, indirect, proxy, and transferkeys.
+ * Deprecate undo command.
+ * Remove undo action from kde and nautilus integrations.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Command/Undo.hs b/Command/Undo.hs
index 289d4c35d2..5bc19486ea 100644
--- a/Command/Undo.hs
+++ b/Command/Undo.hs
@@ -22,11 +22,12 @@ import qualified Git.Branch
cmd :: Command
cmd = notBareRepo $ withAnnexOptions [jsonOptions] $
command "undo" SectionCommon
- "undo last change to a file or directory"
+ "undo last change to a file or directory (deprecated)"
paramPaths (withParams seek)
seek :: CmdParams -> CommandSeek
seek ps = do
+ warning "git-annex undo is deprecated and will be removed from a future version of git-annex"
-- Safety first; avoid any undo that would touch files that are not
-- in the index.
(fs, cleanup) <- inRepo $ LsFiles.notInRepo [] False (map toOsPath ps)
diff --git a/doc/git-annex-undo.mdwn b/doc/git-annex-undo.mdwn
index 000ece2f1c..7f754d0057 100644
--- a/doc/git-annex-undo.mdwn
+++ b/doc/git-annex-undo.mdwn
@@ -1,6 +1,6 @@
# NAME
-git-annex undo - undo last change to a file or directory
+git-annex undo - undo last change to a file or directory (deprecated)
# SYNOPSIS
@@ -13,7 +13,7 @@ file.
When passed a directory, undoes the last change that was made to the
contents of that directory.
-
+
Running undo a second time will undo the undo, returning the working
tree to the same state it had before. To support undoing an undo of
staged changes, any staged changes are first committed by the
@@ -22,6 +22,9 @@ undo command.
Note that this does not undo get/drop of a file's content; it only
operates on the file tree committed to git.
+This command is deprecated, the same things can be done
+in other ways using usual git commands.
+
# OPTIONS
* `--json`
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index c15d799259..9ac9dbe3a8 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -182,7 +182,7 @@ content from the key-value store.
* `undo [filename|directory] ...`
- Undo last change to a file or directory.
+ Undo last change to a file or directory. (deprecated)
See [[git-annex-undo]](1) for details.
Remove deprecated command proxy
Was only useful in direct mode repos, and it's been long enough since
moving away from those that the change this breaks someone's script or
workflow is worth finishing the removal.
Especially since git-annex has since gotten support for proxying to
remotes, which is entirely different from this. So removing this avoids
some confusing name spacing.
Was only useful in direct mode repos, and it's been long enough since
moving away from those that the change this breaks someone's script or
workflow is worth finishing the removal.
Especially since git-annex has since gotten support for proxying to
remotes, which is entirely different from this. So removing this avoids
some confusing name spacing.
diff --git a/CHANGELOG b/CHANGELOG
index 1439f88a6f..57a3181ae6 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -21,7 +21,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* p2phttp: Added --cpus option.
* rsync: Avoid deleting contents of a non-empty directory when
removing the last exported file to the directory.
- * Remove deprecated commands direct, indirect, and transferkeys.
+ * Remove deprecated commands direct, indirect, proxy, and transferkeys.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/CmdLine/GitAnnex.hs b/CmdLine/GitAnnex.hs
index a9d1a6fbf1..7290b660e9 100644
--- a/CmdLine/GitAnnex.hs
+++ b/CmdLine/GitAnnex.hs
@@ -116,7 +116,6 @@ import qualified Command.Forget
import qualified Command.OldKeys
import qualified Command.P2P
import qualified Command.P2PHttp
-import qualified Command.Proxy
import qualified Command.DiffDriver
import qualified Command.Smudge
import qualified Command.FilterProcess
@@ -246,7 +245,6 @@ cmds testoptparser testrunner mkbenchmarkgenerator = map addGitAnnexCommonOption
, Command.OldKeys.cmd
, Command.P2P.cmd
, Command.P2PHttp.cmd
- , Command.Proxy.cmd
, Command.DiffDriver.cmd
, Command.Smudge.cmd
, Command.FilterProcess.cmd
diff --git a/Command/Proxy.hs b/Command/Proxy.hs
deleted file mode 100644
index 4ccab2f8fa..0000000000
--- a/Command/Proxy.hs
+++ /dev/null
@@ -1,23 +0,0 @@
-{- git-annex command
- -
- - Copyright 2014 Joey Hess <id@joeyh.name>
- -
- - Licensed under the GNU AGPL version 3 or higher.
- -}
-
-module Command.Proxy where
-
-import Command
-
-cmd :: Command
-cmd = notBareRepo $
- command "proxy" SectionPlumbing
- "safely bypass direct mode guard (deprecated)"
- ("-- git command") (withParams seek)
-
-seek :: CmdParams -> CommandSeek
-seek = withWords (commandAction . start)
-
-start :: [String] -> CommandStart
-start [] = giveup "Did not specify command to run."
-start (c:ps) = liftIO $ exitWith =<< safeSystem c (map Param ps)
diff --git a/doc/git-annex-proxy.mdwn b/doc/git-annex-proxy.mdwn
deleted file mode 100644
index 374bd27c6f..0000000000
--- a/doc/git-annex-proxy.mdwn
+++ /dev/null
@@ -1,25 +0,0 @@
-# NAME
-
-git-annex proxy - safely bypass direct mode guard (deprecated)
-
-# SYNOPSIS
-
-git annex proxy `-- git cmd [options]`
-
-# DESCRIPTION
-
-This command was for use in a direct mode repository, and such
-repositories are automatically updated to use an adjusted unlocked branch.
-So, there's no reason to use this command any longer.
-
-# SEE ALSO
-
-[[git-annex]](1)
-
-[[git-annex-direct]](1)
-
-# AUTHOR
-
-Joey Hess <id@joeyh.name>
-
-Warning: Automatically converted into a man page by mdwn2man. Edit with care.
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index 56a3d1676b..c15d799259 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -813,10 +813,6 @@ content from the key-value store.
Lists files in a git ref. (deprecated)
See [[git-annex-findref]](1) for details.
-
-* `proxy -- git cmd [options]`
-
- Bypass direct mode guard. (deprecated)
See [[git-annex-proxy]](1) for details.
diff --git a/git-annex.cabal b/git-annex.cabal
index 8fb3e2ad30..5e4c679b30 100644
--- a/git-annex.cabal
+++ b/git-annex.cabal
@@ -695,7 +695,6 @@ Executable git-annex
Command.P2PStdIO
Command.PostReceive
Command.PreCommit
- Command.Proxy
Command.Pull
Command.Push
Command.Recompute
Remove deprecated commands direct and indirect
The conversion from direct mode was in 7.20190912, which is far enough ago
that people should not be trying to use these old deprecated commands any
longer.
The conversion from direct mode was in 7.20190912, which is far enough ago
that people should not be trying to use these old deprecated commands any
longer.
diff --git a/CHANGELOG b/CHANGELOG
index 5354e72827..1439f88a6f 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -21,7 +21,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* p2phttp: Added --cpus option.
* rsync: Avoid deleting contents of a non-empty directory when
removing the last exported file to the directory.
- * Remove deprecated git-annex transferkeys.
+ * Remove deprecated commands direct, indirect, and transferkeys.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/CmdLine/GitAnnex.hs b/CmdLine/GitAnnex.hs
index a7e90eee59..a9d1a6fbf1 100644
--- a/CmdLine/GitAnnex.hs
+++ b/CmdLine/GitAnnex.hs
@@ -111,8 +111,6 @@ import qualified Command.RmUrl
import qualified Command.Import
import qualified Command.Export
import qualified Command.Map
-import qualified Command.Direct
-import qualified Command.Indirect
import qualified Command.Upgrade
import qualified Command.Forget
import qualified Command.OldKeys
@@ -243,8 +241,6 @@ cmds testoptparser testrunner mkbenchmarkgenerator = map addGitAnnexCommonOption
, Command.Inprogress.cmd
, Command.Migrate.cmd
, Command.Map.cmd
- , Command.Direct.cmd
- , Command.Indirect.cmd
, Command.Upgrade.cmd
, Command.Forget.cmd
, Command.OldKeys.cmd
diff --git a/Command/Direct.hs b/Command/Direct.hs
deleted file mode 100644
index ac295415a8..0000000000
--- a/Command/Direct.hs
+++ /dev/null
@@ -1,21 +0,0 @@
-{- git-annex command
- -
- - Copyright 2012 Joey Hess <id@joeyh.name>
- -
- - Licensed under the GNU AGPL version 3 or higher.
- -}
-
-module Command.Direct where
-
-import Command
-
-cmd :: Command
-cmd = notBareRepo $ noDaemonRunning $
- command "direct" SectionSetup "switch repository to direct mode (deprecated)"
- paramNothing (withParams seek)
-
-seek :: CmdParams -> CommandSeek
-seek = withNothing (commandAction start)
-
-start :: CommandStart
-start = giveup "Direct mode is not supported by this repository version. Use git-annex unlock instead."
diff --git a/Command/Indirect.hs b/Command/Indirect.hs
deleted file mode 100644
index fe5a929b04..0000000000
--- a/Command/Indirect.hs
+++ /dev/null
@@ -1,21 +0,0 @@
-{- git-annex command
- -
- - Copyright 2012 Joey Hess <id@joeyh.name>
- -
- - Licensed under the GNU AGPL version 3 or higher.
- -}
-
-module Command.Indirect where
-
-import Command
-
-cmd :: Command
-cmd = notBareRepo $ noDaemonRunning $
- command "indirect" SectionSetup "switch repository to indirect mode (deprecated)"
- paramNothing (withParams seek)
-
-seek :: CmdParams -> CommandSeek
-seek = withNothing (commandAction start)
-
-start :: CommandStart
-start = stop
diff --git a/doc/git-annex-direct.mdwn b/doc/git-annex-direct.mdwn
deleted file mode 100644
index 909015ae48..0000000000
--- a/doc/git-annex-direct.mdwn
+++ /dev/null
@@ -1,27 +0,0 @@
-# NAME
-
-git-annex direct - switch repository to direct mode (deprecated)
-
-# SYNOPSIS
-
-git annex direct
-
-# DESCRIPTION
-
-This used to switch a repository to use direct mode.
-But direct mode is no longer used; git-annex automatically converts
-direct mode repositories to v7 adjusted unlocked branches.
-
-# SEE ALSO
-
-[[git-annex]](1)
-
-[[git-annex-indirect]](1)
-
-[[git-annex-adjust]](1)
-
-# AUTHOR
-
-Joey Hess <id@joeyh.name>
-
-Warning: Automatically converted into a man page by mdwn2man. Edit with care.
diff --git a/doc/git-annex-indirect.mdwn b/doc/git-annex-indirect.mdwn
deleted file mode 100644
index 4d35afa1b5..0000000000
--- a/doc/git-annex-indirect.mdwn
+++ /dev/null
@@ -1,27 +0,0 @@
-# NAME
-
-git-annex indirect - switch repository to indirect mode (deprecated)
-
-# SYNOPSIS
-
-git annex indirect
-
-# DESCRIPTION
-
-This command was used to switch a repository back from direct mode
-indirect mode.
-
-Now git-annex automatically converts direct mode repositories to v7
-with adjusted unlocked branches, so this command does nothing.
-
-# SEE ALSO
-
-[[git-annex]](1)
-
-[[git-annex-direct]](1)
-
-# AUTHOR
-
-Joey Hess <id@joeyh.name>
-
-Warning: Automatically converted into a man page by mdwn2man. Edit with care.
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index 89c4d0844a..56a3d1676b 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -406,18 +406,6 @@ content from the key-value store.
See [[git-annex-adjust]](1) for details.
-* `direct`
-
- Switches a repository to use direct mode. (deprecated)
-
- See [[git-annex-direct]](1) for details.
-
-* `indirect`
-
- Switches a repository to use indirect mode. (deprecated)
-
- See [[git-annex-indirect]](1) for details.
-
# REPOSITORY MAINTENANCE COMMANDS
* `fsck [path ...]`
diff --git a/git-annex.cabal b/git-annex.cabal
index c50725ac09..8fb3e2ad30 100644
--- a/git-annex.cabal
+++ b/git-annex.cabal
@@ -640,7 +640,6 @@ Executable git-annex
Command.Dead
Command.Describe
Command.DiffDriver
- Command.Direct
Command.Drop
Command.DropKey
Command.DropUnused
@@ -669,7 +668,6 @@ Executable git-annex
Command.Import
Command.ImportFeed
Command.InAnnex
- Command.Indirect
Command.Info
Command.Init
Command.InitCluster
Remove deprecated git-annex transferkeys
diff --git a/CHANGELOG b/CHANGELOG
index a5d470a7bd..5354e72827 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -21,6 +21,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* p2phttp: Added --cpus option.
* rsync: Avoid deleting contents of a non-empty directory when
removing the last exported file to the directory.
+ * Remove deprecated git-annex transferkeys.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/CmdLine/GitAnnex.hs b/CmdLine/GitAnnex.hs
index 97b6c6196d..a7e90eee59 100644
--- a/CmdLine/GitAnnex.hs
+++ b/CmdLine/GitAnnex.hs
@@ -39,7 +39,6 @@ import qualified Command.SetKey
import qualified Command.DropKey
import qualified Command.Transferrer
import qualified Command.TransferKey
-import qualified Command.TransferKeys
import qualified Command.SetPresentKey
import qualified Command.ReadPresentKey
import qualified Command.CheckPresentKey
@@ -212,7 +211,6 @@ cmds testoptparser testrunner mkbenchmarkgenerator = map addGitAnnexCommonOption
, Command.DropKey.cmd
, Command.Transferrer.cmd
, Command.TransferKey.cmd
- , Command.TransferKeys.cmd
, Command.SetPresentKey.cmd
, Command.ReadPresentKey.cmd
, Command.CheckPresentKey.cmd
diff --git a/Command/TransferKeys.hs b/Command/TransferKeys.hs
deleted file mode 100644
index beb37c64ba..0000000000
--- a/Command/TransferKeys.hs
+++ /dev/null
@@ -1,141 +0,0 @@
-{- git-annex command, used internally by assistant in version
- - 8.20201127 and older and provided only to avoid upgrade breakage.
- - Remove at some point when such old versions of git-annex are unlikely
- - to be running any longer.
- -
- - Copyright 2012, 2013 Joey Hess <id@joeyh.name>
- -
- - Licensed under the GNU AGPL version 3 or higher.
- -}
-
-{-# LANGUAGE TypeSynonymInstances, FlexibleInstances #-}
-
-module Command.TransferKeys where
-
-import Command
-import Annex.Content
-import Logs.Location
-import Annex.Transfer
-import qualified Remote
-import Utility.SimpleProtocol (dupIoHandles)
-import qualified Database.Keys
-import Annex.BranchState
-
-data TransferRequest = TransferRequest Direction Remote Key AssociatedFile
-
-cmd :: Command
-cmd = command "transferkeys" SectionPlumbing "transfers keys (deprecated)"
- paramNothing (withParams seek)
-
-seek :: CmdParams -> CommandSeek
-seek = withNothing (commandAction start)
-
-start :: CommandStart
-start = do
- enableInteractiveBranchAccess
- (readh, writeh) <- liftIO dupIoHandles
- runRequests readh writeh runner
- stop
- where
- runner (TransferRequest direction remote key af)
- | direction == Upload = notifyTransfer direction af $
- upload' (Remote.uuid remote) key af Nothing stdRetry $ \p -> do
- tryNonAsync (Remote.storeKey remote key af Nothing p) >>= \case
- Left e -> do
- warning (UnquotedString (show e))
- return False
- Right () -> do
- Remote.logStatus NoLiveUpdate remote key InfoPresent
- return True
- | otherwise = notifyTransfer direction af $
- download' (Remote.uuid remote) key af Nothing stdRetry $ \p ->
- logStatusAfter NoLiveUpdate key $ getViaTmp (Remote.retrievalSecurityPolicy remote) (RemoteVerify remote) key Nothing $ \t -> do
- r <- tryNonAsync (Remote.retrieveKeyFile remote key af t p (RemoteVerify remote)) >>= \case
- Left e -> do
- warning (UnquotedString (show e))
- return (False, UnVerified)
- Right v -> return (True, v)
- -- Make sure we get the current
- -- associated files data for the key,
- -- not old cached data.
- Database.Keys.closeDb
- return r
-
-runRequests
- :: Handle
- -> Handle
- -> (TransferRequest -> Annex Bool)
- -> Annex ()
-runRequests readh writeh a = do
- liftIO $ hSetBuffering readh NoBuffering
- go =<< readrequests
- where
- go (d:rn:k:f:rest) = do
- case (deserialize d, deserialize rn, deserialize k, deserialize f) of
- (Just direction, Just remotename, Just key, Just file) -> do
- mremote <- Remote.byName' remotename
- case mremote of
- Left _ -> sendresult False
- Right remote -> sendresult =<< a
- (TransferRequest direction remote key file)
- _ -> sendresult False
- go rest
- go [] = noop
- go [""] = noop
- go v = giveup $ "transferkeys protocol error: " ++ show v
-
- readrequests = liftIO $ split fieldSep <$> hGetContents readh
- sendresult b = liftIO $ do
- hPutStrLn writeh $ serialize b
- hFlush writeh
-
-sendRequest :: Transfer -> TransferInfo -> Handle -> IO ()
-sendRequest t tinfo h = do
- hPutStr h $ intercalate fieldSep
- [ serialize (transferDirection t)
- , maybe (serialize ((fromUUID (transferUUID t)) :: String))
- (serialize . Remote.name)
- (transferRemote tinfo)
- , serialize (transferKey t)
- , serialize (associatedFile tinfo)
- , "" -- adds a trailing null
- ]
- hFlush h
-
-readResponse :: Handle -> IO Bool
-readResponse h = fromMaybe False . deserialize <$> hGetLine h
-
-fieldSep :: String
-fieldSep = "\0"
-
-class TCSerialized a where
- serialize :: a -> String
- deserialize :: String -> Maybe a
-
-instance TCSerialized Bool where
- serialize True = "1"
- serialize False = "0"
- deserialize "1" = Just True
- deserialize "0" = Just False
- deserialize _ = Nothing
-
-instance TCSerialized Direction where
- serialize Upload = "u"
- serialize Download = "d"
- deserialize "u" = Just Upload
- deserialize "d" = Just Download
- deserialize _ = Nothing
-
-instance TCSerialized AssociatedFile where
- serialize (AssociatedFile (Just f)) = fromOsPath f
- serialize (AssociatedFile Nothing) = ""
- deserialize "" = Just (AssociatedFile Nothing)
- deserialize f = Just (AssociatedFile (Just (toOsPath f)))
-
-instance TCSerialized RemoteName where
- serialize n = n
- deserialize n = Just n
-
-instance TCSerialized Key where
- serialize = serializeKey
- deserialize = deserializeKey
diff --git a/doc/git-annex-transferkeys.mdwn b/doc/git-annex-transferkeys.mdwn
deleted file mode 100644
index d28004e381..0000000000
--- a/doc/git-annex-transferkeys.mdwn
+++ /dev/null
@@ -1,31 +0,0 @@
-# NAME
-
-git-annex transferkeys - transfers keys (deprecated)
-
-# SYNOPSIS
-
-git annex transferkeys
-
-# DESCRIPTION
-
-This plumbing-level command is used to transfer data, by the assistant
-in git-annex version 8.20201127 and older. It is still included only
-to prevent breakage during upgrades.
-
-It is a long-running process, which is fed instructions about the keys
(Diff truncated)
diff --git a/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn b/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn index b79ba68f36..7ef6832159 100644 --- a/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn +++ b/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn @@ -228,4 +228,4 @@ ok ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) -> [[!tag projects/datalad]] +> [[!tag projects/repronim]]
diff --git a/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn b/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn index da88d2138a..b79ba68f36 100644 --- a/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn +++ b/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn @@ -47,6 +47,8 @@ originally observed with datalad 1.3.0 and git annex 10.20240831+git21-gd717e9ac ### Please provide any additional information below. +<details><summary> Reproducer script output</summary> + [[!format sh """ # If you can, paste a complete transcript of the problem occurring here. # If the problem is with the git-annex assistant, paste in .git/annex/daemon.log @@ -218,6 +220,7 @@ ok [2026-02-05 10:12:40.113699633] (Utility.Process) process [1453650] done ExitSuccess [2026-02-05 10:12:40.113998072] (Utility.Process) process [1453672] done ExitSuccess +</details> # End of transcript or log. """]]
Added a comment: Have you had any luck using git-annex before?
diff --git a/doc/bugs/concurrent_get_from_separate_clones_fails/comment_1_cefb01da9db0ca9d58ca59ebacc48126._comment b/doc/bugs/concurrent_get_from_separate_clones_fails/comment_1_cefb01da9db0ca9d58ca59ebacc48126._comment new file mode 100644 index 0000000000..e8fe578a1f --- /dev/null +++ b/doc/bugs/concurrent_get_from_separate_clones_fails/comment_1_cefb01da9db0ca9d58ca59ebacc48126._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="asmacdo" + avatar="http://cdn.libravatar.org/avatar/546a67f17e6420c02ca544eeb1bf373e" + subject="Have you had any luck using git-annex before?" + date="2026-02-05T16:21:39Z" + content=""" +YES! I use git-annex under the hood of datalad all the time, usually with no git-annex issues at all, I appreciate you for making VC with data possible! +"""]]
diff --git a/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn b/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn
new file mode 100644
index 0000000000..da88d2138a
--- /dev/null
+++ b/doc/bugs/concurrent_get_from_separate_clones_fails.mdwn
@@ -0,0 +1,228 @@
+### Please describe the problem.
+
+When two independent git-annex clones simultaneously `git annex get` the same key from the same local remote, one succeeds and the other fails immediately with:
+
+```
+transfer already in progress, or unable to take transfer lock
+failed to retrieve content from remote
+Unable to access these remotes: origin
+No other repository is known to contain the file.
+```
+
+The two clones are completely independent repositories with separate `.git` directories. The source repo is only being read from.
+
+The use case in an HPC/SLURM environment, we are attempting to run multiple jobs in parallel, each in an independent clone, and each may need access to the annexed data simultaneously.
+
+### What steps will reproduce the problem?
+
+```
+# Create source repo with a 500MB annexed file on tmpfs
+mkdir -p /dev/shm/annex-lock-test
+cd /dev/shm/annex-lock-test
+git init source && cd source
+git annex init "source"
+dd if=/dev/urandom of=bigfile bs=1M count=500
+git annex add bigfile
+git commit -m "Add 500MB test file"
+cd ..
+
+# Clone twice
+git clone source clone_1 && (cd clone_1 && git annex init "clone_1")
+git clone source clone_2 && (cd clone_2 && git annex init "clone_2")
+
+# Concurrent get
+(cd clone_1 && git annex get --debug bigfile) &
+(cd clone_2 && git annex get --debug bigfile) &
+wait
+```
+
+I reproduced this on tmpfs (`/dev/shm`, Fedora 42) and XFS (`/scratch`, RHEL on Dartmouth Discovery cluster). I did did not observe it on btrfs (`/home`, Fedora 42) where `cp --reflink=always` succeeds and the copy completes in ~1ms.
+
+
+### What version of git-annex are you using? On what operating system?
+
+git-annex 10.20250630 on Fedora 42 (local)
+originally observed with datalad 1.3.0 and git annex 10.20240831+git21-gd717e9aca0-1 on RHEL (Dartmouth Discovery cluster).
+
+
+### Please provide any additional information below.
+
+[[!format sh """
+# If you can, paste a complete transcript of the problem occurring here.
+# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
+# /home/austin/devel/tmp-concurrent-branch-subdatasets/annex-only-portable/setup2.log
+hint: Using 'master' as the name for the initial branch. This default branch name
+hint: will change to "main" in Git 3.0. To configure the initial branch name
+hint: to use in all of your new repositories, which will suppress this warning,
+hint: call:
+hint:
+hint: git config --global init.defaultBranch <name>
+hint:
+hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
+hint: 'development'. The just-created branch can be renamed via this command:
+hint:
+hint: git branch -m <name>
+hint:
+hint: Disable this message with "git config set advice.defaultBranchName false"
+Initialized empty Git repository in /dev/shm/annex-lock-test/source/.git/
+init source ok
+(recording state in git...)
+500+0 records in
+500+0 records out
+524288000 bytes (524 MB, 500 MiB) copied, 0.876868 s, 598 MB/s
+add bigfile
+
+0% 31.98 KiB 86 MiB/s 5s
+13% 65.28 MiB 326 MiB/s 1s
+27% 132.94 MiB 338 MiB/s 1s
+40% 198.43 MiB 327 MiB/s 0s
+53% 265.37 MiB 335 MiB/s 0s
+67% 333.12 MiB 339 MiB/s 0s
+80% 400.15 MiB 335 MiB/s 0s
+93% 467.24 MiB 335 MiB/s 0s
+100% 500 MiB 336 MiB/s 0s
+
+ok
+(recording state in git...)
+[master (root-commit) 10206e3] Add 500MB test file
+ 1 file changed, 1 insertion(+)
+ create mode 120000 bigfile
+Cloning into 'clone_1'...
+done.
+init clone_1 ok
+(recording state in git...)
+Cloning into 'clone_2'...
+done.
+init clone_2 ok
+(recording state in git...)
+[2026-02-05 10:12:38.52048625] (Utility.Process) process [1453634] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","bigfile"]
+[2026-02-05 10:12:38.520735725] (Utility.Process) process [1453636] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
+[2026-02-05 10:12:38.520660512] (Utility.Process) process [1453635] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","bigfile"]
+[2026-02-05 10:12:38.520983348] (Utility.Process) process [1453637] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
+[2026-02-05 10:12:38.52106501] (Utility.Process) process [1453638] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
+[2026-02-05 10:12:38.521431755] (Utility.Process) process [1453639] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
+[2026-02-05 10:12:38.521605516] (Utility.Process) process [1453640] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
+[2026-02-05 10:12:38.522041612] (Utility.Process) process [1453641] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
+[2026-02-05 10:12:38.52229073] (Utility.Process) process [1453639] done ExitSuccess
+[2026-02-05 10:12:38.522528836] (Utility.Process) process [1453642] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
+[2026-02-05 10:12:38.523030765] (Utility.Process) process [1453641] done ExitSuccess
+[2026-02-05 10:12:38.523435594] (Utility.Process) process [1453642] done ExitSuccess
+[2026-02-05 10:12:38.523458506] (Utility.Process) process [1453643] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
+[2026-02-05 10:12:38.523829116] (Utility.Process) process [1453644] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..432fba2a78b87504dc9b8f8ee5680af9278029de","--pretty=%H","-n1"]
+[2026-02-05 10:12:38.524484364] (Utility.Process) process [1453643] done ExitSuccess
+[2026-02-05 10:12:38.524858446] (Utility.Process) process [1453645] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..6384d58674d16912622fa58a4fe96e843a0c1935","--pretty=%H","-n1"]
+[2026-02-05 10:12:38.525455904] (Utility.Process) process [1453644] done ExitSuccess
+[2026-02-05 10:12:38.526011738] (Utility.Process) process [1453646] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
+[2026-02-05 10:12:38.526375513] (Utility.Process) process [1453645] done ExitSuccess
+[2026-02-05 10:12:38.526691082] (Utility.Process) process [1453648] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
+get bigfile get bigfile [2026-02-05 10:12:38.529361155] (Utility.Process) process [1453650] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
+[2026-02-05 10:12:38.5293897] (Utility.Process) process [1453651] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
+[2026-02-05 10:12:38.531486658] (Utility.Process) process [1453652] read: git ["-c","safe.directory=*","-c","safe.bareRepository=all","config","--null","--list"] in "/dev/shm/annex-lock-test/source"
+[2026-02-05 10:12:38.531575546] (Utility.Process) process [1453653] read: git ["-c","safe.directory=*","-c","safe.bareRepository=all","config","--null","--list"] in "/dev/shm/annex-lock-test/source"
+[2026-02-05 10:12:38.532508668] (Git.Config) git config read: [("",[""]),("annex.uuid",["b5b537cc-6ede-484f-b9c2-da01703972d0"]),("annex.version",["10"]),("color.branch",["auto"]),("color.diff",["auto"]),("color.status",["auto"]),("color.ui",["true"]),("core.bare",["false"]),("core.editor",["vim"]),("core.filemode",["true"]),("core.logallrefupdates",["true"]),("core.repositoryformatversion",["0"]),("datalad.containers-run.oci-runtime",["podman"]),("filter.annex.clean",["git-annex smudge --clean -- %f"]),("filter.annex.process",["git-annex filter-process"]),("filter.annex.smudge",["git-annex smudge -- %f"]),("github.user",["asmacdo"]),("rerere.enabled",["true"]),("safe.barerepository",["all"]),("safe.directory",["*"]),("url.git@github.com:.insteadof",["https://github.com/"]),("user.email",["austin@dartmouth.edu"]),("user.name",["Austin Macdonald"])]
+[2026-02-05 10:12:38.53266739] (Utility.Process) process [1453652] done ExitSuccess
+[2026-02-05 10:12:38.532536997] (Git.Config) git config read: [("",[""]),("annex.uuid",["b5b537cc-6ede-484f-b9c2-da01703972d0"]),("annex.version",["10"]),("color.branch",["auto"]),("color.diff",["auto"]),("color.status",["auto"]),("color.ui",["true"]),("core.bare",["false"]),("core.editor",["vim"]),("core.filemode",["true"]),("core.logallrefupdates",["true"]),("core.repositoryformatversion",["0"]),("datalad.containers-run.oci-runtime",["podman"]),("filter.annex.clean",["git-annex smudge --clean -- %f"]),("filter.annex.process",["git-annex filter-process"]),("filter.annex.smudge",["git-annex smudge -- %f"]),("github.user",["asmacdo"]),("rerere.enabled",["true"]),("safe.barerepository",["all"]),("safe.directory",["*"]),("url.git@github.com:.insteadof",["https://github.com/"]),("user.email",["austin@dartmouth.edu"]),("user.name",["Austin Macdonald"])]
+[2026-02-05 10:12:38.532910522] (Utility.Process) process [1453653] done ExitSuccess
+(from origin...) (from origin...) [2026-02-05 10:12:38.534964516] (Utility.Process) process [1453654] read: git ["--git-dir=/dev/shm/annex-lock-test/source/.git","--work-tree=/dev/shm/annex-lock-test/source","--literal-pathspecs","--literal-pathspecs","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","write-tree"]
+[2026-02-05 10:12:38.53616312] (Utility.Process) process [1453654] done ExitSuccess
+[2026-02-05 10:12:38.536446785] (Utility.Process) process [1453655] read: git ["--git-dir=/dev/shm/annex-lock-test/source/.git","--work-tree=/dev/shm/annex-lock-test/source","--literal-pathspecs","--literal-pathspecs","show-ref","--hash","refs/annex/last-index"]
+[2026-02-05 10:12:38.537529408] (Utility.Process) process [1453655] done ExitFailure 1
+[2026-02-05 10:12:38.537559676] (Database.Keys) reconcileStaged start
+[2026-02-05 10:12:38.53776038] (Utility.Process) process [1453656] chat: git ["--git-dir=/dev/shm/annex-lock-test/source/.git","--work-tree=/dev/shm/annex-lock-test/source","--literal-pathspecs","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
+[2026-02-05 10:12:38.537919309] (Utility.Process) process [1453657] chat: git ["--git-dir=/dev/shm/annex-lock-test/source/.git","--work-tree=/dev/shm/annex-lock-test/source","--literal-pathspecs","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
+[2026-02-05 10:12:38.538395887] (Utility.Process) process [1453658] read: git ["--git-dir=/dev/shm/annex-lock-test/source/.git","--work-tree=/dev/shm/annex-lock-test/source","--literal-pathspecs","--literal-pathspecs","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","-c","diff.external=","diff","4b825dc642cb6eb9a060e54bf8d69288fbee4904","33b7a7ea1a2f380d957fabd024cedb7c9ad23aa4","--raw","-z","--no-abbrev","-G/annex/objects/","--no-renames","--ignore-submodules=all","--no-textconv","--no-ext-diff"]
+[2026-02-05 10:12:38.539472717] (Utility.Process) process [1453658] done ExitSuccess
+[2026-02-05 10:12:38.540331975] (Database.Handle) commitDb start
+[2026-02-05 10:12:38.541705864] (Database.Handle) commitDb done
+[2026-02-05 10:12:38.541745995] (Utility.Process) process [1453657] done ExitSuccess
+[2026-02-05 10:12:38.541761427] (Utility.Process) process [1453656] done ExitSuccess
+[2026-02-05 10:12:38.542048459] (Utility.Process) process [1453659] call: git ["--git-dir=/dev/shm/annex-lock-test/source/.git","--work-tree=/dev/shm/annex-lock-test/source","--literal-pathspecs","--literal-pathspecs","update-ref","refs/annex/last-index","33b7a7ea1a2f380d957fabd024cedb7c9ad23aa4"]
+[2026-02-05 10:12:38.543176116] (Utility.Process) process [1453659] done ExitSuccess
+[2026-02-05 10:12:38.543287785] (Database.Keys) reconcileStaged end
+
+[2026-02-05 10:12:38.544214283] (Utility.Process) process [1453660] read: cp ["--reflink=always","--preserve=timestamps","/dev/shm/annex-lock-test/source/.git/annex/objects/W4/qk/SHA256E-s524288000--a7dbcf6c0d2641bd5d1e45f66aeda998b755d88e9493e74e035bbfe4aa3e5d09/SHA256E-s524288000--a7dbcf6c0d2641bd5d1e45f66aeda998b755d88e9493e74e035bbfe4aa3e5d09",".git/annex/tmp/SHA256E-s524288000--a7dbcf6c0d2641bd5d1e45f66aeda998b755d88e9493e74e035bbfe4aa3e5d09"]
+[2026-02-05 10:12:38.544288768] (Utility.Process) process [1453661] read: git ["--git-dir=/dev/shm/annex-lock-test/source/.git","--work-tree=/dev/shm/annex-lock-test/source","--literal-pathspecs","--literal-pathspecs","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","write-tree"]
+[2026-02-05 10:12:38.544934083] (Utility.Process) process [1453660] done ExitFailure 1
+
+0% 31.98 KiB 20 MiB/s 24s[2026-02-05 10:12:38.545353638] (Utility.Process) process [1453661] done ExitSuccess
+[2026-02-05 10:12:38.545773148] (Utility.Process) process [1453662] read: git ["--git-dir=/dev/shm/annex-lock-test/source/.git","--work-tree=/dev/shm/annex-lock-test/source","--literal-pathspecs","--literal-pathspecs","show-ref","--hash","refs/annex/last-index"]
+[2026-02-05 10:12:38.546702567] (Utility.Process) process [1453662] done ExitSuccess
+
+ transfer already in progress, or unable to take transfer lock
+
+ failed to retrieve content from remote
+
+ Unable to access these remotes: origin
+
+ No other repository is known to contain the file.
+failed
+[2026-02-05 10:12:38.548068928] (Utility.Process) process [1453648] done ExitSuccess
+[2026-02-05 10:12:38.548140718] (Utility.Process) process [1453640] done ExitSuccess
+[2026-02-05 10:12:38.548864776] (Utility.Process) process [1453638] done ExitSuccess
+[2026-02-05 10:12:38.548904621] (Utility.Process) process [1453635] done ExitSuccess
+[2026-02-05 10:12:38.549288156] (Utility.Process) process [1453651] done ExitSuccess
+get: 1 failed
+
+12% 59.88 MiB 299 MiB/s 1s
+24% 122.1 MiB 311 MiB/s 1s
+37% 184.66 MiB 313 MiB/s 1s
+50% 251.72 MiB 335 MiB/s 0s
+64% 319 MiB 336 MiB/s 0s
+77% 384.19 MiB 326 MiB/s 0s
+90% 450.84 MiB 333 MiB/s 0s
+100% 500 MiB 334 MiB/s 0s
+
+[2026-02-05 10:12:40.093823309] (Annex.Perms) freezing content .git/annex/objects/W4/qk/SHA256E-s524288000--a7dbcf6c0d2641bd5d1e45f66aeda998b755d88e9493e74e035bbfe4aa3e5d09/SHA256E-s524288000--a7dbcf6c0d2641bd5d1e45f66aeda998b755d88e9493e74e035bbfe4aa3e5d09
+[2026-02-05 10:12:40.095084697] (Utility.Process) process [1453664] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","write-tree"]
+[2026-02-05 10:12:40.096299746] (Utility.Process) process [1453664] done ExitSuccess
+[2026-02-05 10:12:40.096715468] (Utility.Process) process [1453665] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/annex/last-index"]
+[2026-02-05 10:12:40.097734422] (Utility.Process) process [1453665] done ExitFailure 1
+[2026-02-05 10:12:40.097763387] (Database.Keys) reconcileStaged start
+[2026-02-05 10:12:40.097926938] (Utility.Process) process [1453666] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
+[2026-02-05 10:12:40.098047046] (Utility.Process) process [1453667] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
+[2026-02-05 10:12:40.098505359] (Utility.Process) process [1453668] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","-c","diff.external=","diff","4b825dc642cb6eb9a060e54bf8d69288fbee4904","33b7a7ea1a2f380d957fabd024cedb7c9ad23aa4","--raw","-z","--no-abbrev","-G/annex/objects/","--no-renames","--ignore-submodules=all","--no-textconv","--no-ext-diff"]
+[2026-02-05 10:12:40.09959837] (Utility.Process) process [1453668] done ExitSuccess
+[2026-02-05 10:12:40.100365888] (Database.Handle) commitDb start
+[2026-02-05 10:12:40.100662882] (Database.Handle) commitDb done
+[2026-02-05 10:12:40.100692003] (Utility.Process) process [1453667] done ExitSuccess
+[2026-02-05 10:12:40.100710418] (Utility.Process) process [1453666] done ExitSuccess
+[2026-02-05 10:12:40.100985587] (Utility.Process) process [1453669] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-ref","refs/annex/last-index","33b7a7ea1a2f380d957fabd024cedb7c9ad23aa4"]
+[2026-02-05 10:12:40.102017111] (Utility.Process) process [1453669] done ExitSuccess
+[2026-02-05 10:12:40.102056066] (Database.Keys) reconcileStaged end
+[2026-02-05 10:12:40.102314034] (Annex.Perms) freezing content directory .git/annex/objects/W4/qk/SHA256E-s524288000--a7dbcf6c0d2641bd5d1e45f66aeda998b755d88e9493e74e035bbfe4aa3e5d09
+[2026-02-05 10:12:40.102477675] (Utility.Process) process [1453670] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","symbolic-ref","-q","HEAD"]
+[2026-02-05 10:12:40.103247255] (Utility.Process) process [1453670] done ExitSuccess
(Diff truncated)
Added a comment
diff --git a/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_7_019d24414b781288c6c2822914ca118b._comment b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_7_019d24414b781288c6c2822914ca118b._comment new file mode 100644 index 0000000000..c03997d3e8 --- /dev/null +++ b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_7_019d24414b781288c6c2822914ca118b._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 7" + date="2026-02-05T14:42:49Z" + content=""" +I get the impression that there is quite a bit of complexity with export remotes that makes it dangerous to not let them be managed by git-annex only, and changing that sounds rather complicated. Thanks for looking into it and making some improvements. + + I am planning to download them all, record their checksums + + That's what an import does do.. You would end up with an imported tree which you could diff with your known correct tree, and see what's different there, and for all files that are stored on the remote correctly, git-annex get would be able to get them from there. + +Ohh! Thanks for spelling it out. This sounds way more convenient than what I planned with fsck'ing the remote. Populating the directory with rsync and then importing again doesn't sound like too much overhead, should be fine. + +Given that, I think I would be happy with import support for rsync. +"""]]
Added a comment
diff --git a/doc/bugs/blake3_hash_support/comment_5_0307343507bf00a8d663b943b568bdc6._comment b/doc/bugs/blake3_hash_support/comment_5_0307343507bf00a8d663b943b568bdc6._comment new file mode 100644 index 0000000000..dd3fa3200e --- /dev/null +++ b/doc/bugs/blake3_hash_support/comment_5_0307343507bf00a8d663b943b568bdc6._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nadir" + avatar="http://cdn.libravatar.org/avatar/2af9174cf6c06de802104d632dc40071" + subject="comment 5" + date="2026-02-05T14:12:25Z" + content=""" +Did anything ever come off this? Blake 3 support would be nice. +"""]]
rsync: Avoid deleting contents of a non-empty directory when removing the last exported file to the directory
If a third party has put files in the directory and git-annex doesn't know
about them, this prevents removing them.
Of course, an export can always overwrite third-party added files (unless
importtree=yes when it takes care to avoid that).
But it's more surprising that unexporting the last file in a directory also
removes any other third-party added files in that directory.
Since it's possible for the rsync remote to avoid that scenario, pursue
least surprise, even though it's not perfectly attainable.
The method is just rsync with --include=subdir/ and --exclude=*,
which excludes subdir/*. So only removing subdir when it's empty.
(Note that the includes = function always yielded [], which makes me think
I didn't think very hard when implementing this originally.)
If a third party has put files in the directory and git-annex doesn't know
about them, this prevents removing them.
Of course, an export can always overwrite third-party added files (unless
importtree=yes when it takes care to avoid that).
But it's more surprising that unexporting the last file in a directory also
removes any other third-party added files in that directory.
Since it's possible for the rsync remote to avoid that scenario, pursue
least surprise, even though it's not perfectly attainable.
The method is just rsync with --include=subdir/ and --exclude=*,
which excludes subdir/*. So only removing subdir when it's empty.
(Note that the includes = function always yielded [], which makes me think
I didn't think very hard when implementing this originally.)
diff --git a/CHANGELOG b/CHANGELOG
index 929c496091..a5d470a7bd 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -19,6 +19,8 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
issues.
* Avoid ever starting more capabilities than the number of cpus.
* p2phttp: Added --cpus option.
+ * rsync: Avoid deleting contents of a non-empty directory when
+ removing the last exported file to the directory.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Remote/Rsync.hs b/Remote/Rsync.hs
index e0f9d58065..be52c2cd2d 100644
--- a/Remote/Rsync.hs
+++ b/Remote/Rsync.hs
@@ -344,14 +344,7 @@ removeExportM o _k loc =
Just f' -> includes f'
removeExportDirectoryM :: RsyncOpts -> ExportDirectory -> Annex ()
-removeExportDirectoryM o ed = removeGeneric o $
- map fromOsPath (allbelow d : includes d)
- where
- d = fromExportDirectory ed
- allbelow f = f </> literalOsPath "***"
- includes f = f : case upFrom f of
- Nothing -> []
- Just f' -> includes f'
+removeExportDirectoryM o ed = removeGeneric o []
renameExportM :: RsyncOpts -> Key -> ExportLocation -> ExportLocation -> Annex (Maybe ())
renameExportM _ _ _ _ = return Nothing
diff --git a/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_6_3bca4de137ff28f345f6f2a493499fb1._comment b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_6_3bca4de137ff28f345f6f2a493499fb1._comment
new file mode 100644
index 0000000000..0784df8f9b
--- /dev/null
+++ b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_6_3bca4de137ff28f345f6f2a493499fb1._comment
@@ -0,0 +1,19 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 6"""
+ date="2026-02-04T19:40:43Z"
+ content="""
+Actually, it is possible to get rsync to delete a directory when it's
+empty, but preserve it otherwise. So I have implemented that.
+
+The other remotes that I mentioned will still have this behavior.
+And at least in the case of webdav, I doubt it can be made to only
+delete empty directories.
+
+Also note that the documentation is clear about this at the API level:
+
+ `REMOVEEXPORTDIRECTORY Directory`
+ [...]
+ Typically the directory will be empty, but it could possbly contain
+ files or other directories, and it's ok to remove those.
+"""]]
comment
diff --git a/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_5_d77cda0f4e44dbf229e127289df57ade._comment b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_5_d77cda0f4e44dbf229e127289df57ade._comment new file mode 100644 index 0000000000..655da468a4 --- /dev/null +++ b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_5_d77cda0f4e44dbf229e127289df57ade._comment @@ -0,0 +1,54 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-02-04T16:59:05Z" + content=""" +Reproduced that behavior. + +What is happening here is that empty directories on the rsync special +remote get cleaned up in a separate step after unexport of a file. It is +unexporting `subdir/test1.bin`. And in this situation, due to the use of +`export --fast`, no files have been sent to the export remote yet. So as +far as git-annex is concerned, `subdir/` there is an empty directory, and +so it removes it. + +Now, since `subdir/test1.bin` never did get sent to the remote, its old version +does not actually need to be unexported before the new version is sent. Which +would have avoided the cleanup and so avoided the problem. (Although I think +there are probably good reasons for that unexport to be done, involving +multi-writer situations. I would need to refresh my memory about some +complicated stuff to say for sure.) + +But, the same thing can happen in other ways. For example, consider: + + mkdir newdir + touch newdir/foo + git-annex add newdir/foo + git commit -m add + git-annex export master --to rsync + git rm newdir/foo + git commit -m rm + git-annex export master --to rsync + +That also deletes any other files that a third party has written to +`newdir/` on the remote. And in this case, it really does need to +unexport `newdir/foo`. + +Note that the directory special remote does not behave the same way; it doesn't +need the separate step to remove "empty" directories, and it just cleans up +empty directories after removing a file from the export. But rsync does not +have a way to delete a directory only when it's empty, which is why git-annex +does the separate step to identify and remove empty directories. (From +git-annex's perspective.) Also, the adb and webdav special remotes +behave the same as rsync. + +I don't know that git-annex documents anywhere that an exporttree remote +avoids deleting files added to the remote by third parties. I did find it +susprising that files with names that git-annex doesn't even know about get +deleted in this case. On the other hand, if git-annex is told to export a tree +containing file `foo`, that is going to overwrite any `foo` written to the +remote by a third party, and I think that is expected behavior. + +Also note that importtree remotes don't have this problem. Including avoiding +export overwriting files written by third parties. +"""]]
comment
diff --git a/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_4_17b921a74ce1c9fe6fb538ea546a1b4a._comment b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_4_17b921a74ce1c9fe6fb538ea546a1b4a._comment new file mode 100644 index 0000000000..b265e30405 --- /dev/null +++ b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_4_17b921a74ce1c9fe6fb538ea546a1b4a._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-02-04T16:29:45Z" + content=""" +> I am planning to download them all, record their checksums + +That's what an import does do.. You would end up with an imported tree +which you could diff with your known correct tree, and see what's different +there, and for all files that are stored on the remote correctly, +`git-annex get` would be able to get them from there. +"""]]
Added a comment
diff --git a/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_3_e97a6fcbb44ec4d863deab3febc3e0fb._comment b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_3_e97a6fcbb44ec4d863deab3febc3e0fb._comment new file mode 100644 index 0000000000..8a83b96ada --- /dev/null +++ b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_3_e97a6fcbb44ec4d863deab3febc3e0fb._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 3" + date="2026-02-03T13:22:11Z" + content=""" +> You could do that with a special remote also configured with importtree=yes. No need to do anything special, just import from the remote, and git-annex will learn what files are on it. +> [...] +> Your use case sounds like it might be one that importtree only remotes would support. + +I do not trust the content that is stored in this directory on the HPC system. I am able to reproduce some of the newer files directly from the third-party they were downloaded from, but for older files I get very weird slightly different data. That's why I don't want to import anything from there. Instead, I am building this DataLad Dataset which records the requests necessary to fetch each file from the third-party (via datalad-cds' URLs) and then I am planning to download them all, record their checksums, and fsck the \"legacy\" data that we have in this non-version-controlled directory. In the end I also want to be able to populate this directory from the dataset with files downloaded via git-annex. `git annex copy --to` for export remotes would be nice for that, but as of now I would probably rsync them over myself and update git-annex' content tracking for the export remote myself. + +I don't think importtree or even importtree-only would be the right tool for this. +"""]]
Added a comment
diff --git a/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_2_5cf368edd19801e9798a4971f1d208c4._comment b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_2_5cf368edd19801e9798a4971f1d208c4._comment new file mode 100644 index 0000000000..00921233e4 --- /dev/null +++ b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_2_5cf368edd19801e9798a4971f1d208c4._comment @@ -0,0 +1,62 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 2" + date="2026-02-03T12:58:32Z" + content=""" +Sorry, I must have misunderstood the failure condition (the full project also involves an external special remote with a custom URL scheme and an external backend, which I have now ruled out as the cause). It is actually not the first export, but rather a later re-export once something has changed. What I have been encountering this with is after migrating some keys: + +[[!format sh \"\"\" +icg149@icg1911:~/Playground$ datalad create test-export-repo +create(ok): /home/icg149/Playground/test-export-repo (dataset) +icg149@icg1911:~/Playground$ cd test-export-repo +icg149@icg1911:~/Playground/test-export-repo$ mkdir subdir +icg149@icg1911:~/Playground/test-export-repo$ head -c 10K /dev/urandom > subdir/test1.bin +icg149@icg1911:~/Playground/test-export-repo$ head -c 10K /dev/urandom > subdir/test2.bin +icg149@icg1911:~/Playground/test-export-repo$ head -c 10K /dev/urandom > subdir/test3.bin +icg149@icg1911:~/Playground/test-export-repo$ datalad save +add(ok): subdir/test1.bin (file) +add(ok): subdir/test2.bin (file) +add(ok): subdir/test3.bin (file) +save(ok): . (dataset) +action summary: + add (ok: 3) + save (ok: 1) +icg149@icg1911:~/Playground/test-export-repo$ mkdir -p ../test-export-dir/subdir +icg149@icg1911:~/Playground/test-export-repo$ cp subdir/test*.bin ../test-export-dir/subdir +icg149@icg1911:~/Playground/test-export-repo$ git annex initremote rsync type=rsync rsyncurl=../test-export-dir exporttree=yes encryption=none autoenable=true +initremote rsync ok +(recording state in git...) +icg149@icg1911:~/Playground/test-export-repo$ git annex drop --force subdir/test2.bin +drop subdir/test2.bin ok +(recording state in git...) +icg149@icg1911:~/Playground/test-export-repo$ git annex export --fast main --to rsync +(recording state in git...) +icg149@icg1911:~/Playground/test-export-repo$ tree ../test-export-dir/ +../test-export-dir/ +└── subdir + ├── test1.bin + ├── test2.bin + └── test3.bin + +2 directories, 3 files +icg149@icg1911:~/Playground/test-export-repo$ git annex migrate --backend SHA256E subdir/test1.bin +migrate subdir/test1.bin (checksum...) (checksum...) ok +(recording state in git...) +icg149@icg1911:~/Playground/test-export-repo$ datalad save +save(ok): . (dataset) +icg149@icg1911:~/Playground/test-export-repo$ git annex export --fast main --to rsync +unexport rsync subdir/test1.bin ok +(recording state in git...) +icg149@icg1911:~/Playground/test-export-repo$ tree ../test-export-dir/ +../test-export-dir/ + +0 directories, 0 files +\"\"\"]] + +This seems to not only happen with a migrate, but also if the file-as-tracked-in-git changes in other ways. + +The unexport makes some sense given that the git-tracked-file has changed, but since in my case it is only a backend migration the content is still the same. I think this unexport shouldn't happen at all with --fast though. + +That this single file change also removes the other files in the same subdirectory, regardless of if they are present or not, is very surprising. +"""]]
comment and close
diff --git a/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag.mdwn b/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag.mdwn index 60e1793236..d5308745f2 100644 --- a/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag.mdwn +++ b/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag.mdwn @@ -92,3 +92,4 @@ local repository version: 10 ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag/comment_2_7b651231ef4d37865acf5fc9d3667581._comment b/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag/comment_2_7b651231ef4d37865acf5fc9d3667581._comment new file mode 100644 index 0000000000..abfd950202 --- /dev/null +++ b/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag/comment_2_7b651231ef4d37865acf5fc9d3667581._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-02-02T18:47:29Z" + content=""" +Added --cpus option. And avoided a high -J value increasing the number of +capabilities higher than the available number of cpus. + +I think you might want to use something like: `git-annex p2phttp --cpus=2 --jobs=100` +"""]]
p2phttp: Added --cpus option
This allows bumping -J up to a high value to handle a lot of concurrent
requests without blocking, without the p2phttp server scaling up to use
all CPUs.
It would be easy to add --cpus to other commands too, but I don't know of a use
case for it. What would be the point of running git-annex get -J100 --cpus=10?
Usually, running more threads per CPU will slow them down due to contention
and caching. The only possible benefit might be if a lot of jobs were slow
and some fast, to avoid bottlenecking running only slow jobs, but without
using up all cores on fast jobs. So perhaps.. But that seems difficult for
the user to understand and unlikely to be something they will want to use.
This allows bumping -J up to a high value to handle a lot of concurrent
requests without blocking, without the p2phttp server scaling up to use
all CPUs.
It would be easy to add --cpus to other commands too, but I don't know of a use
case for it. What would be the point of running git-annex get -J100 --cpus=10?
Usually, running more threads per CPU will slow them down due to contention
and caching. The only possible benefit might be if a lot of jobs were slow
and some fast, to avoid bottlenecking running only slow jobs, but without
using up all cores on fast jobs. So perhaps.. But that seems difficult for
the user to understand and unlikely to be something they will want to use.
diff --git a/Annex.hs b/Annex.hs
index 421d152bf6..b43b684a93 100644
--- a/Annex.hs
+++ b/Annex.hs
@@ -1,6 +1,6 @@
{- git-annex monad
-
- - Copyright 2010-2024 Joey Hess <id@joeyh.name>
+ - Copyright 2010-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -190,6 +190,7 @@ data AnnexState = AnnexState
, remotes :: [Types.Remote.RemoteA Annex]
, output :: MessageState
, concurrency :: ConcurrencySetting
+ , cpus :: Maybe Cpus
, daemon :: Bool
, repoqueue :: Maybe (Git.Queue.Queue Annex)
, catfilehandles :: CatFileHandles
@@ -248,6 +249,7 @@ newAnnexState c r = do
, remotes = []
, output = o
, concurrency = ConcurrencyCmdLine NonConcurrent
+ , cpus = Nothing
, daemon = False
, repoqueue = Nothing
, catfilehandles = catFileHandlesNonConcurrent
diff --git a/Annex/Concurrent.hs b/Annex/Concurrent.hs
index 688895dcdb..f20f519938 100644
--- a/Annex/Concurrent.hs
+++ b/Annex/Concurrent.hs
@@ -1,6 +1,6 @@
{- git-annex concurrent state
-
- - Copyright 2015-2022 Joey Hess <id@joeyh.name>
+ - Copyright 2015-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -62,6 +62,9 @@ setConcurrency' c f = do
, Annex.checkignorehandle = Just cih
}
+setCpus :: Cpus -> Annex ()
+setCpus n = Annex.changeState $ \s -> s { Annex.cpus = Just n }
+
{- Allows forking off a thread that uses a copy of the current AnnexState
- to run an Annex action.
-
diff --git a/CHANGELOG b/CHANGELOG
index 118d43885b..929c496091 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -18,6 +18,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* The OsPath build flag requires file-io 0.2.0, which fixes several
issues.
* Avoid ever starting more capabilities than the number of cpus.
+ * p2phttp: Added --cpus option.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/CmdLine/Action.hs b/CmdLine/Action.hs
index eef868f86a..47713a4b3a 100644
--- a/CmdLine/Action.hs
+++ b/CmdLine/Action.hs
@@ -247,7 +247,7 @@ startConcurrency usedstages a = do
goconcurrentpercpu
where
goconcurrent n = do
- liftIO $ raiseCapabilitiesForJobs n
+ raiseCapabilitiesForJobs n
withMessageState $ \s -> case outputType s of
NormalOutput -> ifM (liftIO concurrentOutputSupported)
( Regions.displayConsoleRegions $
@@ -341,10 +341,11 @@ checkSizeLimit (Just sizelimitvar) startmsg a =
reachedlimit = Annex.changeState $ \s -> s { Annex.reachedlimit = True }
-raiseCapabilitiesForJobs :: Int -> IO ()
+raiseCapabilitiesForJobs :: Int -> Annex ()
raiseCapabilitiesForJobs njobs = do
- ncpus <- getNumProcessors
+ ncpus <- maybe (liftIO getNumProcessors) (\(Cpus n) -> return n)
+ =<< Annex.getState Annex.cpus
let n = min ncpus njobs
- c <- getNumCapabilities
+ c <- liftIO getNumCapabilities
when (n > c) $
- setNumCapabilities n
+ liftIO $ setNumCapabilities n
diff --git a/CmdLine/GitAnnex/Options.hs b/CmdLine/GitAnnex/Options.hs
index 529935f237..cb969c8c21 100644
--- a/CmdLine/GitAnnex/Options.hs
+++ b/CmdLine/GitAnnex/Options.hs
@@ -1,6 +1,6 @@
{- git-annex command-line option parsing
-
- - Copyright 2010-2024 Joey Hess <id@joeyh.name>
+ - Copyright 2010-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -544,6 +544,21 @@ jobsOptionParser =
<> hidden
)
+cpusOption :: [AnnexOption]
+cpusOption =
+ [ annexOption (setAnnexState . setCpus)
+ cpusOptionParser
+ ]
+
+cpusOptionParser :: Parser Cpus
+cpusOptionParser =
+ option (maybeReader parseCpus)
+ ( long "cpus"
+ <> metavar paramNumber
+ <> help "how many cpus to run jobs on"
+ <> hidden
+ )
+
timeLimitOption :: [AnnexOption]
timeLimitOption =
[ annexOption settimelimit $ option (eitherReader parseDuration)
diff --git a/Command/P2PHttp.hs b/Command/P2PHttp.hs
index 0475442891..ac155e02b3 100644
--- a/Command/P2PHttp.hs
+++ b/Command/P2PHttp.hs
@@ -11,7 +11,7 @@
module Command.P2PHttp where
-import Command hiding (jobsOption)
+import Command hiding (jobsOption, cpusOption)
import P2P.Http.Server
import P2P.Http.Url
import qualified P2P.Protocol as P2P
@@ -58,6 +58,7 @@ data Options = Options
, proxyConnectionsOption :: Maybe Integer
, jobsOption :: Maybe Concurrency
, clusterJobsOption :: Maybe Int
+ , cpusOption :: Maybe Cpus
, lockedFilesOption :: Maybe Integer
, directoryOption :: [FilePath]
}
@@ -121,6 +122,7 @@ optParser _ = Options
( long "clusterjobs" <> metavar paramNumber
<> help "number of concurrent node accesses per connection"
))
+ <*> optional cpusOptionParser
<*> optional (option auto
( long "lockedfiles" <> metavar paramNumber
<> help "number of content files that can be locked"
@@ -239,7 +241,7 @@ runServer o mst = go `finally` serverShutdownCleanup mst
mkServerState :: Options -> M.Map Auth P2P.ServerMode -> LockedFilesQSem -> Annex P2PHttpServerState
mkServerState o authenv lockedfilesqsem =
- withAnnexWorkerPool (jobsOption o) $
+ withAnnexWorkerPool (jobsOption o) (cpusOption o) $
mkP2PHttpServerState
(mkGetServerMode authenv o)
return
diff --git a/P2P/Http/State.hs b/P2P/Http/State.hs
index d817a5e270..873c24f382 100644
--- a/P2P/Http/State.hs
+++ b/P2P/Http/State.hs
@@ -642,9 +642,10 @@ dropLock lckid st = do
Nothing -> return ()
Just locker -> wait (lockerThread locker)
-withAnnexWorkerPool :: (Maybe Concurrency) -> (AnnexWorkerPool -> Annex a) -> Annex a
-withAnnexWorkerPool mc a = do
- maybe noop (setConcurrency . ConcurrencyCmdLine) mc
+withAnnexWorkerPool :: Maybe Concurrency -> Maybe Cpus -> (AnnexWorkerPool -> Annex a) -> Annex a
+withAnnexWorkerPool mconc mcpus a = do
+ maybe noop setCpus mcpus
+ maybe noop (setConcurrency . ConcurrencyCmdLine) mconc
startConcurrency transferStages $
Annex.getState Annex.workers >>= \case
Nothing -> giveup "Use -Jn or set annex.jobs to configure the number of worker threads."
diff --git a/P2P/Proxy.hs b/P2P/Proxy.hs
index 6e3165ab94..bbbd023146 100644
--- a/P2P/Proxy.hs
+++ b/P2P/Proxy.hs
@@ -719,7 +719,7 @@ concurrencyConfigJobs = (annexJobs <$> Annex.getGitConfig) >>= \case
ConcurrentPerCpu -> go =<< liftIO getNumProcessors
where
go n = do
- liftIO $ raiseCapabilitiesForJobs n
+ raiseCapabilitiesForJobs n
setConcurrency (ConcurrencyGitConfig (Concurrent n))
mkConcurrencyConfig n
diff --git a/Types/Concurrency.hs b/Types/Concurrency.hs
index 9204f1d0f4..bf825744d5 100644
--- a/Types/Concurrency.hs
+++ b/Types/Concurrency.hs
@@ -1,4 +1,4 @@
-{- Copyright 2016 Joey Hess <id@joeyh.name>
(Diff truncated)
comment
diff --git a/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag/comment_1_74a64103f22c52dc37010f25900e0c86._comment b/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag/comment_1_74a64103f22c52dc37010f25900e0c86._comment new file mode 100644 index 0000000000..970e654195 --- /dev/null +++ b/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag/comment_1_74a64103f22c52dc37010f25900e0c86._comment @@ -0,0 +1,29 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-02-02T17:36:12Z" + content=""" +Seems I was misremembering details of how ghc's "capabilities" work. From +its [manual](https://downloads.haskell.org/ghc/latest/docs/users_guide/using-concurrent.html): + +> Each capability can run one Haskell thread at a time, so the number of +> capabilities is equal to the number of Haskell threads that can run physically +> in parallel. A capability is animated by one or more OS threads; the runtime +> manages a pool of OS threads for each capability, so that if a Haskell thread +> makes a foreign call (see Multi-threading and the FFI) another OS thread can +> take over that capability. + +Currently git-annex raises the number of capabilities to the -J value. + +Probably the thread pool starts at 2 threads to have one spare preallocated +for the first FFI call, explaining why each -J doubles the number of OS threads. + +I think would make sense to have a separate option that controls the number +of capabilities. Then you could set that to eg 2, and set a large -J value, +in order to have `git-annex p2phttp` allow serving a large number of concurrent +requests, threaded on only 2 cores. + +Also, it does not seem to make sense for the default number of capabilities, +with a high -J value, to exceed the number of cores. As you noticed, +each capability uses some FDs, for eventfd, eventpoll, I'm not sure what else. +"""]]
response
diff --git a/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_1_a02fa74d4556269a9b0711d8fa2410e8._comment b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_1_a02fa74d4556269a9b0711d8fa2410e8._comment new file mode 100644 index 0000000000..b655fc82d2 --- /dev/null +++ b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote/comment_1_a02fa74d4556269a9b0711d8fa2410e8._comment @@ -0,0 +1,33 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-02-02T17:19:34Z" + content=""" +I don't reproduce this: + + joey@darkstar:~/tmp/bench/r>git-annex export --fast master --to rsync + (recording state in git...) + +Nothing is exported by `export --fast` (which matches its documentation), +and examining the files in the remote's directory, none are deleted or +overwritten. + +When I later run `git-annex push`, all files in the tree get exported. +In cases where the remote's directory already contained a file with the +same name, it is overwritten. That is as expected. + +> My plan was to add the directory as an exporttree remote, make git-annex +> think that the current main branches tree should be available there via +> the fast export, and then do a +> `git annex fsck --from <remote>` to discover what's actually there + +You could do that with a special remote also configured with +importtree=yes. No need to do anything special, just import from the +remote, and git-annex will learn what files are on it. + +Unfortunately, +[[importtree is not supported by the rsync special remote|todo/import_tree_from_rsync_special_remote]] + +Your use case sounds like it might be one that +[[todo/importtree_only_remotes]] would support. +"""]]
Added a comment
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_16_628b6af03979d77b652e15d8e854d8e3._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_16_628b6af03979d77b652e15d8e854d8e3._comment new file mode 100644 index 0000000000..2bb9667b78 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_16_628b6af03979d77b652e15d8e854d8e3._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 16" + date="2026-02-02T11:20:13Z" + content=""" +I've created a follow-up issue because I don't see the behavior re: OS thread counts that you describe: <https://git-annex.branchable.com/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag/>. +"""]]
diff --git a/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag.mdwn b/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag.mdwn new file mode 100644 index 0000000000..60e1793236 --- /dev/null +++ b/doc/bugs/Number_of_p2phttp_OS_threads_scales_with_-J_flag.mdwn @@ -0,0 +1,94 @@ +### Please describe the problem. + +In the discussion under <https://git-annex.branchable.com/bugs/p2phttp_can_get_stuck_with_interrupted_clients/> you've said that the -J flag of p2phttp should be approximately equivalent to a limit on the green threads git-annex starts, while the number of OS threads should be handled by the haskell runtime and limited by core count. + +What I am seeing is different though: + +``` +$ git annex p2phttp -J2 +$ ps -o thcount $(pgrep git-annex) +THCNT + 7 +``` + +``` +$ git annex p2phttp -J3 +$ ps -o thcount $(pgrep git-annex) +THCNT + 9 +``` + +``` +$ git annex p2phttp -J20 +$ ps -o thcount $(pgrep git-annex) +THCNT + 43 +``` + +``` +$ git annex p2phttp -J200 +$ ps -o thcount $(pgrep git-annex) +THCNT + 403 +``` + +Eventually it fails: + +``` +$ git annex p2phttp -J2000 +git-annex: setNumCapabilities: Attempt to increase capability count beyond maximum capability count 256; clamping... + +git-annex: git: createProcess: pipe: resource exhausted (Too many open files) +``` + +``` +$ git annex p2phttp -J252 +git-annex: git: createProcess: pipe: resource exhausted (Too many open files) +``` + +``` +$ git annex p2phttp -J251 +$ ps -o thcount $(pgrep git-annex) +THCNT + 505 +``` + +The thread count increases together with -J (2 new threads per increment), without any limit on the number of OS threads that I am seeing. My laptop only has 4 cores + hyperthreading. + +Looking at the code it seems like the value of -J is passed to setNumCapabilities. The documentation says that this sets the number of OS threads (and should not be larger than the number of physical cores): <https://hackage-content.haskell.org/package/base-4.22.0.0/docs/Control-Concurrent.html#v:setNumCapabilities>. + + +### What steps will reproduce the problem? + +Trying out different -J values for git-annex p2phttp. + + +### What version of git-annex are you using? On what operating system? + +``` +$ git annex version +git-annex version: 10.20260115-ge8de977f1d5b5ac57cfe7a0c66d4e1c3ff337af1 +build flags: Assistant Webapp Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV Servant OsPath +dependency versions: aws-0.25.2 bloomfilter-2.0.1.3 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.3 http-client-0.7.19 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +
assistant does not add or commit
diff --git a/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked...mdwn b/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked...mdwn new file mode 100644 index 0000000000..ae5968d8c7 --- /dev/null +++ b/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked...mdwn @@ -0,0 +1,45 @@ +### Please describe the problem. + +I think I have done everything nice and clean to sync things up etc... but now assistant just does not care to add/commit new files. + +excerpt from [full daemon.log](https://www.oneukrainian.com/tmp/daemon.log.20260131.log): + +``` + fd:31: hPutBuf: resource vanished (Broken pipe) + + fd:31: hPutBuf: resource vanished (Broken pipe) +(recording state in git...) +On branch master +Your branch is up to date with 'typhon/master'. + +Untracked files: + (use "git add <file>..." to include in what will be committed) + Videos/2026/01/2026.01.31-13.19.02.455--2026.01.31-13.19.18.475.mkv + Videos/2026/01/2026.01.31-13.19.02.455--2026.01.31-13.19.18.475.mkv.duct_info.json + Videos/2026/01/2026.01.31-13.19.02.455--2026.01.31-13.19.18.475.mkv.duct_usage.json + Videos/2026/01/2026.01.31-13.19.02.455--2026.01.31-13.19.18.475.mkv.log + events-micropython/2026-01-31T12:20:42-05:00.csv + logs/2026-01-14T09:33-05:00.log + logs/timesync-stimuli/2026.01.31-13.19.05.339--.log + +nothing added to commit but untracked files present (use "git add" to track) +Everything up-to-date +Everything up-to-date + +``` + +### What version of git-annex are you using? On what operating system? + +well -- ideally that daemon.log should inform us that and potentially other details to help troubleshoot it. I think (since there could be multiple): + +``` +reprostim@reproiner:/data/reprostim$ ps auxw | grep assist +reprost+ 1989627 0.0 0.0 9892 3896 ? Ss 13:12 0:00 /usr/bin/git annex assistant --foreground +reprost+ 1989628 2.8 0.8 1074363924 270904 ? Ssl 13:12 0:19 /usr/lib/git-annex.linux/exe/git-annex --library-path /usr/lib/git-annex.linux//lib/x86_64-linux-gnu: /usr/lib/git-annex.linux/shimmed/git-annex/git-annex assistant --foreground +reprostim@reproiner:/data/reprostim$ /usr/lib/git-annex.linux/git-annex version | head +git-annex version: 10.20251114-1~ndall+1 +``` + + +[[!meta author=yoh]] +[[!tag projects/repronim]]
improve -J docs, suggesting higher value
diff --git a/doc/git-annex-p2phttp.mdwn b/doc/git-annex-p2phttp.mdwn index 1ee0e9a747..acef619098 100644 --- a/doc/git-annex-p2phttp.mdwn +++ b/doc/git-annex-p2phttp.mdwn @@ -61,9 +61,17 @@ convenient way to download the content of any key, by using the path This or annex.jobs must be set to configure the number of worker threads, per repository served, that serve connections to the webserver. - This must be set to 2 or more. + This must be set to 2 or more, since the webserver needs one thread + for itself. - A good choice is often one worker per CPU core: `--jobs=cpus` + Each additional job lets the webserver serve one more concurrent request. + When there are too many requests to serve all at once, the webserver will + delay responding to some requests until others have completed. + + A conservative starting place would be one worker per CPU core: `--jobs=cpus` + + However, to avoid delays, this can be set to much higher values. + Avoid setting it so high that the server runs out of file descriptors. * `--proxyconnections=N`
comment
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_15_cf355648608d70edf5ac871299124546._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_15_cf355648608d70edf5ac871299124546._comment new file mode 100644 index 0000000000..948d95f2a0 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_15_cf355648608d70edf5ac871299124546._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 15""" + date="2026-01-30T16:30:15Z" + content=""" +> Maybe I am overthinking this a bit though, and I should just bump the number up by one or more factors of 2 and see what happens. + +I think that would make sense, or even by 1 or 2 orders of magnitude. +"""]]
response
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_14_0fb6fbd8bada65ee3a0152e6d96265c0._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_14_0fb6fbd8bada65ee3a0152e6d96265c0._comment new file mode 100644 index 0000000000..f791b09c8f --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_14_0fb6fbd8bada65ee3a0152e6d96265c0._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 14""" + date="2026-01-30T16:20:39Z" + content=""" +Re the number of threads, -J will affect the number of green threads used. +(Which will be some constant-ish multiple of the -J value.) +Green threads won't show up in htop, only OS-native theads will. + +The maximum number of OS-native threads should be capped at the number of +cores. + +Exactly how many OS-native threads spawn is under the control of the +haskell runtime, and it probably spawns an additional OS-native thread +per green thread up to the limit. + +(It would be possible to limit the maximum number of OS-native threads to +less than the number of cores, if that would somehow be useful. It would +need a new config setting.) +"""]]
Added a comment
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_13_662dbd3d57681a13f5de2d067be6603f._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_13_662dbd3d57681a13f5de2d067be6603f._comment new file mode 100644 index 0000000000..5033fab7f7 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_13_662dbd3d57681a13f5de2d067be6603f._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 13" + date="2026-01-29T19:25:14Z" + content=""" +I might have some misunderstandings about what the -J flag does exactly... So far I assumed that it just sets the number of OS threads that are used as a worker pool to handle requests. In Forgejo-aneksajo it is set to -J2 because of that assumption and there being one p2phttp process per repository (if p2phttp has recently been used with the repository, it is started on demand and stopped after a while of non-usage), so larger values could multiply pretty fast. Your description sounds like it should actually just be a limit on the number of requests that can be handled concurrently, independent of the size of the worker pool. What I am observing when I increase it is that htop shows two new threads when I increment the value by one though. + +Could there be a fixed-size (small) worker pool, and a higher number of concurrent requests allowed? I agree that limiting the total resource usage makes a lot of sense, but does it have to be tied to the thread count? + +Maybe I am overthinking this a bit though, and I should just bump the number up by one or more factors of 2 and see what happens. +"""]]
response
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_11_7f0ecbac3c5186b03565ae7b6f36a46a._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_11_7f0ecbac3c5186b03565ae7b6f36a46a._comment new file mode 100644 index 0000000000..91be7b4dd2 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_11_7f0ecbac3c5186b03565ae7b6f36a46a._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2026-01-29T17:33:30Z" + content=""" +The `kill -SIGINT` was my mistake; I ran the script using dash and it was +its builtin kill that does not accept that. + +So, your test case was supposed to interrupt it after all. Tested it again +with interruption and my fix does seem to have fixed it as best I can tell. +"""]] diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_12_fa5ddd60cb5ec292594547aa4f45c13b._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_12_fa5ddd60cb5ec292594547aa4f45c13b._comment new file mode 100644 index 0000000000..5b3de7cc43 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_12_fa5ddd60cb5ec292594547aa4f45c13b._comment @@ -0,0 +1,40 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 12""" + date="2026-01-29T17:36:30Z" + content=""" +Re serving more requests than workers, the point of limiting the number +workers is that each worker can take a certian amount of resources. The +resource may only be a file descriptor and a bit of cpu and memory +usually; with proxying it could also include making outgoing connections, +running gpg, etc. The worker limit is about being able to control the +total amount of resources used. + +--- + +It would be possible to have an option where p2phttp does not limit the +number of workers at all, and the slowloris attack prevention could be left +enabled in that mode. Of course then enough clients could overwhelm the +server, but maybe that's better for some use cases. + +IIRC forgejo-aneksajo runs one p2phttp per repository and proxies requests +to them. If so, you need a lower worker limit per p2phttp. I suppose it +would be possible to make the proxy enforce its own limits to the number of +concurrent p2phttp requests, and then it might make sense to not have +p2phttp limit the number of workers. + +--- + +p2phttp (or a proxy in front if it) could send a 503 response if it is +unable to get a worker. That would avoid this slowlaris attack prevention +problem. It would leave it up to the git-annex client to retry. Which +depends on the `annex.retry` setting currently. It might make sense to have +some automatic retrying on 503 in the p2phttp client. + +One benefit of the way it works now is a `git-annex get -J10` will +automatically use as many workers as the p2phttp server has available, and +if 2 people are both running that, it naturally balances out fairly evenly +between them, and keeps the server as busy as it wants to be in an +efficient way. Client side retry would not work as nicely, there would need +to be retry delays, and it would have to time out at some point. +"""]]
Added a comment
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_10_8d3c5f29dd7caba70ea8594393aa7cfc._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_10_8d3c5f29dd7caba70ea8594393aa7cfc._comment new file mode 100644 index 0000000000..9b1c7edd66 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_10_8d3c5f29dd7caba70ea8594393aa7cfc._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 10" + date="2026-01-29T09:04:03Z" + content=""" +> kill -s SIGINT is not valid syntax (at least not with procps's kill), so kill fails to do anything and a bunch of git-annex processes stack up all trying to get the same files. Probably you meant kill -s INT + +That's weird, I checked and both the shell built-in kill that I was using as well as the kill from procps-ng (Ubuntu's build: procps/noble-updates,now 2:4.0.4-4ubuntu3.2 amd64) installed on my laptop accept `-s SIGINT`. + +Anyway, thank you for investigating! I agree being susceptible to DoS attacks is not great, but better than accidentally DoS'ing ourselves in normal usage... + +I wonder, would it be architecturally possible to serve multiple requests concurrently with less workers than requests? E.g. do some async/multitasking magic between requests? If that was the case then I suspect this issue wouldn't come up, because all requests would progress steadily instead of waiting for a potentially long time. +"""]]
p2phttp: Fix a server stall by disabling warp's slowloris attack prevention
Not great, but better than the alternative.
Hopeing this is temporary and the warp bug will be fixed and I can deal
with the problem better then.
Not great, but better than the alternative.
Hopeing this is temporary and the warp bug will be fixed and I can deal
with the problem better then.
diff --git a/CHANGELOG b/CHANGELOG index 5537b57b20..084b670574 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -13,6 +13,8 @@ git-annex (10.20260116) UNRELEASED; urgency=medium * fromkey, registerurl: When passed an url, generate a VURL key. * unregisterurl: Unregister both VURL and URL keys. * unregisterurl: Fix display of action to not be "registerurl". + * p2phttp: Fix a server stall by disabling warp's slowloris attack + prevention. -- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400 diff --git a/Command/P2PHttp.hs b/Command/P2PHttp.hs index ae3fdbcd75..0475442891 100644 --- a/Command/P2PHttp.hs +++ b/Command/P2PHttp.hs @@ -184,12 +184,26 @@ startIO o <> serverShutdownCleanup oldst } +-- Disable Warp's slowloris attack prevention. Since the web server +-- only allows serving -J jobs at a time, and blocks when an additional +-- request is received, that can result in there being no network traffic +-- for a period of time, which triggers the slowloris attack prevention. +-- +-- The implementation of the P2P http server is not exception safe enough +-- to deal with Response handlers being killed at any point by warp. +-- +-- It would be better to use setTimeout, so that slowloris attacks in +-- making the request are prevented. But, it does not work! See +-- https://github.com/yesodweb/wai/issues/1058 +disableSlowlorisPrevention :: Warp.Settings -> Warp.Settings +disableSlowlorisPrevention = Warp.setTimeout maxBound + runServer :: Options -> P2PHttpServerState -> IO () runServer o mst = go `finally` serverShutdownCleanup mst where go = do let settings = Warp.setPort port $ Warp.setHost host $ - Warp.defaultSettings + disableSlowlorisPrevention $ Warp.defaultSettings mstv <- newTMVarIO mst let app = p2pHttpApp mstv case (certFileOption o, privateKeyFileOption o) of diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn index fa78722f50..3c1215392e 100644 --- a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn @@ -175,3 +175,5 @@ get test10.bin ^C Starting with a DataLad Dataset and by extension git-annex repository is the first thing I do whenever I have to deal with code and/or data that is not some throwaway stuff :) [[!tag projects/ICE4]] + +> [[done]] --[[Joey]] diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_8_490aae81d90a6692152d365b1bae95a1._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_8_490aae81d90a6692152d365b1bae95a1._comment new file mode 100644 index 0000000000..e173b6e18d --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_8_490aae81d90a6692152d365b1bae95a1._comment @@ -0,0 +1,31 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2026-01-28T21:10:00Z" + content=""" +Seems likely that getP2PConnection is run by serveGet, and the worker slot +is allocated. Then a ThreadKilled exception arrives before the rest of +serveGet's threads are started up. So the worker slot never gets freed. +It's even possible that getP2PConnection is itself not cancellation safe. + +So, I made all of serveGet be inside an uninterruptibleMask. That did seem to +make the test case get past more slowloris cancellations than before. But, +it still eventually hung. + +Given the inversion of control that servant and streaming response body +entails, it seems likely that an ThreadKilled exception could arrive at a +point entirely outside the control of git-annex, leaving the P2P connection +open with no way to close it. + +I really dislike that this slowloris attack prevention is making me need +to worry about the server threads getting cancelled at any point. That +requires significantly more robust code, if it's even possible. + +So, I think disabling the slowloris attack prevention may be the way to go, +at least until warp is fixed to allow only disabling it after the Request +is received. + +Doing so will make p2phttp more vulnerable to DDOS, but as it stands, it's +vulnerable to locking up due to entirely legitimate users just running +a few `git-annex get`s. Which is much worse! +"""]] diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_9_6da51da16a637cd889662aa2dea0953d._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_9_6da51da16a637cd889662aa2dea0953d._comment new file mode 100644 index 0000000000..6f9fa95533 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_9_6da51da16a637cd889662aa2dea0953d._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2026-01-28T21:38:46Z" + content=""" +Disabled the slowloris protection. :-/ + +I also checked with the original test case, fixed to call `kill -s INT`, +and it also passed. I'm assuming this was never a bug about interruption.. +"""]]
oops
diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs index ebcf3140c4..012b36350d 100644 --- a/P2P/Http/Server.hs +++ b/P2P/Http/Server.hs @@ -202,8 +202,8 @@ serveGet mst su apiver (B64Key k) cu bypass baf startat sec auth = do waitfinal endv finalv conn annexworker = do -- Wait for everything to be transferred before -- stopping the annexworker. The finalv will usually -- -- be written to at the end. If the client disconnects -- -- early that does not happen, so catch STM exceptions. + -- be written to at the end. If the client disconnects + -- early that does not happen, so catch STM exceptions. alltransferred <- either (const False) id <$> liftIO (tryNonAsync $ atomically $ takeTMVar finalv) diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_7_cb083c0fbba4822b0894393b4d8aec05._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_7_cb083c0fbba4822b0894393b4d8aec05._comment new file mode 100644 index 0000000000..44f3f70134 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_7_cb083c0fbba4822b0894393b4d8aec05._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2026-01-28T20:43:38Z" + content=""" +[[!commit 786360cdcf7f784847715ec79ef9837ada9fa649]] catches an exception +that the slowloris attack prevention causes. It does prevent the server +locking up... but only sometimes. So the test case +gets further, but eventually still locks up. + +Since slowloris attack prevention can cancel the thread at any point, it +seems likely that there is some other point there a resource is left +un-freed. +"""]]
p2phttp: close P2P connection when streamer is canceled
Slowloris attack prevention in warp can cancel the streamer.
In that case, waitfinal never gets called. So make an exception handler
that sets finalv. This lets the P2P connection get shut down properly,
releasing the annex worker back to the pool.
Unfortunatly, this does not solve the whole problem. It does prevent a
p2phttp with -J from locking up after the second time slowloris
protection triggers. But, later it does still lock up.
There must be some other resource that is leaking when slowloris attack
prevention triggers.
Note that the STM exception catching when reading finalv may not be
needed any longer? I'm not sure and it seemed like perhaps it took
longer to hang when I left it in.
Slowloris attack prevention in warp can cancel the streamer.
In that case, waitfinal never gets called. So make an exception handler
that sets finalv. This lets the P2P connection get shut down properly,
releasing the annex worker back to the pool.
Unfortunatly, this does not solve the whole problem. It does prevent a
p2phttp with -J from locking up after the second time slowloris
protection triggers. But, later it does still lock up.
There must be some other resource that is leaking when slowloris attack
prevention triggers.
Note that the STM exception catching when reading finalv may not be
needed any longer? I'm not sure and it seemed like perhaps it took
longer to hang when I left it in.
diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs index b7f773301a..ebcf3140c4 100644 --- a/P2P/Http/Server.hs +++ b/P2P/Http/Server.hs @@ -2,7 +2,7 @@ - - https://git-annex.branchable.com/design/p2p_protocol_over_http/ - - - Copyright 2024 Joey Hess <id@joeyh.name> + - Copyright 2024-2026 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -145,8 +145,9 @@ serveGet mst su apiver (B64Key k) cu bypass baf startat sec auth = do (Len len, bs) <- liftIO $ atomically $ takeTMVar bsv bv <- liftIO $ newMVar (filter (not . B.null) (L.toChunks bs)) szv <- liftIO $ newMVar 0 - let streamer = S.SourceT $ \s -> s =<< return - (stream (bv, szv, len, endv, validityv, finalv)) + let streamer = S.SourceT $ do + \s -> s (stream (bv, szv, len, endv, validityv, finalv)) + `onException` streamexception finalv return $ addHeader (DataLength len) streamer where stream (bv, szv, len, endv, validityv, finalv) = @@ -189,7 +190,7 @@ serveGet mst su apiver (B64Key k) cu bypass baf startat sec auth = do atomically $ putTMVar endv () validity <- atomically $ takeTMVar validityv sz <- takeMVar szv - atomically $ putTMVar finalv () + atomically $ putTMVar finalv True void $ atomically $ tryPutTMVar endv () return $ case validity of Nothing -> True @@ -197,14 +198,15 @@ serveGet mst su apiver (B64Key k) cu bypass baf startat sec auth = do Just Invalid -> sz /= len , pure True ) - + waitfinal endv finalv conn annexworker = do -- Wait for everything to be transferred before -- stopping the annexworker. The finalv will usually - -- be written to at the end. If the client disconnects - -- early that does not happen, so catch STM exception. - alltransferred <- isRight - <$> tryNonAsync (liftIO $ atomically $ takeTMVar finalv) +- -- be written to at the end. If the client disconnects +- -- early that does not happen, so catch STM exceptions. + alltransferred <- + either (const False) id + <$> liftIO (tryNonAsync $ atomically $ takeTMVar finalv) -- Make sure the annexworker is not left blocked on endv -- if the client disconnected early. void $ liftIO $ atomically $ tryPutTMVar endv () @@ -213,6 +215,11 @@ serveGet mst su apiver (B64Key k) cu bypass baf startat sec auth = do else closeP2PConnection conn void $ tryNonAsync $ wait annexworker + -- Slowloris attack prevention can cancel the streamer. Be sure to + -- close the P2P connection when that happens. + streamexception finalv = + liftIO $ atomically $ putTMVar finalv False + sizer = pure $ Len $ case startat of Just (Offset o) -> fromIntegral o Nothing -> 0 diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_6_bdb77afb1c349e52d94b54427bce18c4._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_6_bdb77afb1c349e52d94b54427bce18c4._comment new file mode 100644 index 0000000000..07e7bbbd5e --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_6_bdb77afb1c349e52d94b54427bce18c4._comment @@ -0,0 +1,47 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2026-01-28T19:22:19Z" + content=""" +Developed the below patch to use pauseTimeout after the Request is +consumed. + +Unfortunatelty, I then discovered that [pauseTimeout does not work](https://github.com/yesodweb/wai/issues/1058)! + +This leaves only the options of waiting for a fixed version of warp, +or disabling slowloris prevention entirely, or somehow dealing with +the way that the Response handler gets killed by the timeout. + + diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs + index b7f773301a..8c8ae96c06 100644 + --- a/P2P/Http/Server.hs + +++ b/P2P/Http/Server.hs + @@ -40,14 +40,27 @@ import qualified Servant.Types.SourceT as S + import qualified Data.ByteString as B + import qualified Data.ByteString.Lazy as L + import qualified Data.ByteString.Lazy.Internal as LI + +import qualified Network.Wai.Handler.Warp as Warp + import Control.Concurrent.Async + import Control.Concurrent.STM + import Control.Concurrent + import System.IO.Unsafe + import Data.Either + + +-- WAI middleware that disables warp's usual Slowloris protection after the + +-- Request is received. This is needed for the p2phttp server because + +-- after a client connects and makes its Request, and when the Request + +-- includes valid authentication, the server waits for a worker to become + +-- available to handle it. During that time, no traffic is being sent, + +-- which would usually trigger the Slowloris protection. + +avoidResponseTimeout :: Application -> Application + +avoidResponseTimeout app req resp = do + + liftIO $ Warp.pauseTimeout req + + app req resp + + + p2pHttpApp :: TMVar P2PHttpServerState -> Application + -p2pHttpApp = serve p2pHttpAPI . serveP2pHttp + +p2pHttpApp st = avoidResponseTimeout $ serve p2pHttpAPI $ serveP2pHttp st + + serveP2pHttp :: TMVar P2PHttpServerState -> Server P2PHttpAPI + serveP2pHttp st +"""]]
comments
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_4_57b418889de2942d1f2d998cca68218a._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_4_57b418889de2942d1f2d998cca68218a._comment new file mode 100644 index 0000000000..08e5365f7e --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_4_57b418889de2942d1f2d998cca68218a._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-28T17:53:26Z" + content=""" +Warp's Slowloris attack prevention seems to be causing this problem. +I was able to get the test case to not hang by applying +`Warp.setTimeout 1000000000` to the warp settings. + +I guess that, when Warp detects what it thinks is a slowloris attack, +it kills the handling thread in some unusual way. Which prevents the usual +STM exception from being thrown? + +This also explains the InvalidChunkHeaders exception, because the http +server has hung up on the client before sending the expected headers. + +`git-annex get` is triggering the slowloris attack detection because +it connects to the p2phttp server, sends a request, and then is stuck +waiting some long period of time for a worker slot to become available. + +Warp detects a slowloris attack by examining how much network traffic is +flowing. And in this case, no traffic is flowing. + +So the reason this test case triggers the problem is because it's using 1 +GB files! With smaller files, the transfers happen too fast to trigger the +default 30 second timeout. +"""]] diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_5_033e449b3fa974fe67c7ce9603baec92._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_5_033e449b3fa974fe67c7ce9603baec92._comment new file mode 100644 index 0000000000..dd7c7940a9 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_5_033e449b3fa974fe67c7ce9603baec92._comment @@ -0,0 +1,29 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-28T18:26:41Z" + content=""" +So, can the slowloris attack prevention just be disabled in p2phttp, +without exposing it to problems due to that attack? + +Well, the slowloris attack is a DDOS that tries to open as many http +connections to the server as possible, and keep them open with as little +bandwidth used as possible. It does so by sending partial request headers +slowly, so the server is stuck waiting to see the full request. + +Given that the p2phttp server is serving large objects, and probably runs +with a moderately low -J value (probably < 100), just opening that many +connections to the server each requesting an object, and consuming a chunk +of the response once per 30 seconds would be enough to work around Warp's +protections against the slowloris attack. Which needs little enough +bandwidth to be a viable attack. + +The client would need authentication to do that though. A slowloris attack +though just sends requests, it does not need to successfully authenticate. + +So it would be better to disable the slowloris attack prevention only after +the request has been authenticated. + +warp provides `pauseTimeout` that can do that, but I'm not sure how +to use it from inside a servant application. +"""]]
analysis
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_2_3caf6da7a34bd6580e365fe01d990f42._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_2_3caf6da7a34bd6580e365fe01d990f42._comment
new file mode 100644
index 0000000000..598c0c7df7
--- /dev/null
+++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_2_3caf6da7a34bd6580e365fe01d990f42._comment
@@ -0,0 +1,55 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2026-01-28T16:15:03Z"
+ content="""
+`kill -s SIGINT` is not valid syntax (at least not with procps's `kill`), so
+`kill` fails to do anything and a bunch of git-annex processes stack up all
+trying to get the same files. Probably you meant `kill -s INT`
+
+With that said, the busted test case does work in exposing a problem, since
+the git-annex get processes hang.
+
+This behaves the same:
+
+ for x in $(seq 1 5); do git annex get & done
+
+That's without anything interrupting `git-annex get` at any point.
+
+This error is displayed by some of the git-annex get processes,
+and once this has happened as many times as the number of jobs,
+the server is hung:
+
+ HttpExceptionRequest Request {
+ host = "localhost"
+ port = 3001
+ secure = False
+ requestHeaders = [("Accept","application/octet-stream")]
+ path = "/git-annex/8dd1a380-3785-4285-b93d-994e1ccb9fbf/v4/key/SHA256E-s1073741824--52fc7ce3067ad69f3989f7fef817670096f00eab7721884fe606d17b9215d6f5.bin"
+ queryString = "?clientuuid=2ab2859b-d423-4427-bac2-553e18c02197&associatedfile=test1.bin"
+ method = "GET"
+ proxy = Nothing
+ rawBody = False
+ redirectCount = 10
+ responseTimeout = ResponseTimeoutDefault
+ requestVersion = HTTP/1.1
+ proxySecureMode = ProxySecureWithConnect
+ }
+ InvalidChunkHeaders
+
+So this seems very similar to the bug that
+[[!commit f2fed42a090e081bf880dcacc9a25bfa8a0f7d8f]] was supposed to fix.
+Same InvalidChunkHeaders exception indicating the http server response
+thread probably crashed.
+
+And I've verified that when this happens, serveGet's waitfinal starts
+and never finishes, which is why the job slot remains in use.
+
+BTW, InvalidChunkHeaders is a http-client exception, so it seems this
+might involve a problem at the http layer, so with http-client or warp?
+Looking in http-client, it is thrown in 3 situations. 2 are when a read
+from the server yields 0 bytes. The 3rd is when a line is read from the server,
+and the result cannot be parsed as hexidecimal.
+So it seems likely that the http server is crashing in the middle of servicing
+a request. Possibly due to a bug in the http stack.
+"""]]
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_3_1891008a9e901b0801376a870eee8be4._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_3_1891008a9e901b0801376a870eee8be4._comment
new file mode 100644
index 0000000000..d8c8aa4ca3
--- /dev/null
+++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_3_1891008a9e901b0801376a870eee8be4._comment
@@ -0,0 +1,18 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 3"""
+ date="2026-01-28T17:01:37Z"
+ content="""
+The hang happens here:
+
+ -- Wait for everything to be transferred before
+ -- stopping the annexworker. The finalv will usually
+ -- be written to at the end. If the client disconnects
+ -- early that does not happen, so catch STM exception.
+ alltransferred <- isRight
+ <$> tryNonAsync (liftIO $ atomically $ takeTMVar finalv)
+
+I think what is happening is finalv is never getting filled, but for whatever
+reason, STM also is not detecting a deadlock, so this does not fail with an
+exception and waits forever.
+"""]]
add hint about how to get specific files in a hide-missing branch
diff --git a/doc/git-annex-adjust.mdwn b/doc/git-annex-adjust.mdwn index 55f7f646e7..bd2964779d 100644 --- a/doc/git-annex-adjust.mdwn +++ b/doc/git-annex-adjust.mdwn @@ -92,11 +92,15 @@ and will also propagate commits back to the original branch. set the `annex.adjustedbranchrefresh` config. Despite missing files being hidden, `git annex sync --content` will - still operate on them, and can be used to download missing + still operate on them, and can be used to retrieve missing files from remotes. It also updates the adjusted branch after transferring content. - This option can be combined with --unlock, --lock, or --fix. + To retrieve specific missing files, use eg: + `git-annex get --branch=master --include=foo` + + The `--hide-missing` option can be combined with + `--unlock`, `--lock`, or `--fix`. * `--unlock-present`
comment
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_1_cc08641b5ed4562ca29432907cecf5bf._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_1_cc08641b5ed4562ca29432907cecf5bf._comment new file mode 100644 index 0000000000..701aaff753 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_1_cc08641b5ed4562ca29432907cecf5bf._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-27T17:40:15Z" + content=""" +This is extremely similar to the bug that +[[!commit f2fed42a090e081bf880dcacc9a25bfa8a0f7d8f]] was supposed to fix. +But that had a small amount of gets hang, without any interruptions being +needed to cause it. So I think is different. + +There was also the similar +[[!commit 1c67f2310a7ca3e4fce183794f0cff2f4f5d1efb]] where an interrupted +drop caused later hangs. +"""]]
fromkey, registerurl: When passed an url, generate a VURL key
diff --git a/CHANGELOG b/CHANGELOG
index 8cea03728a..5537b57b20 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -10,6 +10,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
Secret support.
* fsck: Support repairing a corrupted file in a versiond S3 remote.
* addurl, importfeed: Enable --verifiable by default.
+ * fromkey, registerurl: When passed an url, generate a VURL key.
* unregisterurl: Unregister both VURL and URL keys.
* unregisterurl: Fix display of action to not be "registerurl".
diff --git a/Command/FromKey.hs b/Command/FromKey.hs
index 6649b4110e..5acfd531ed 100644
--- a/Command/FromKey.hs
+++ b/Command/FromKey.hs
@@ -93,7 +93,7 @@ keyOpt = either giveup id . keyOpt'
keyOpt' :: String -> Either String Key
keyOpt' s = case parseURIPortable s of
Just u | not (isKeyPrefix (uriScheme u)) ->
- Right $ Backend.URL.fromUrl s Nothing False
+ Right $ Backend.URL.fromUrl s Nothing True
_ -> case deserializeKey s of
Just k -> Right k
Nothing -> Left $ "bad key/url " ++ s
diff --git a/doc/git-annex-registerurl.mdwn b/doc/git-annex-registerurl.mdwn
index bf5133b8db..bddfa52f80 100644
--- a/doc/git-annex-registerurl.mdwn
+++ b/doc/git-annex-registerurl.mdwn
@@ -15,7 +15,7 @@ No verification is performed of the url's contents.
Normally the key is a git-annex formatted key. However, to make it easier
to use this to add urls, if the key cannot be parsed as a key, and is a
-valid url, an URL key is constructed from the url.
+valid url, a VURL key is constructed from the url.
Registering an url also makes git-annex treat the key as present in the
special remote that claims it. (Usually the web special remote.)
diff --git a/doc/todo/migration_to_VURL_by_default.mdwn b/doc/todo/migration_to_VURL_by_default.mdwn
index 4d658b9c89..6450f5ad7b 100644
--- a/doc/todo/migration_to_VURL_by_default.mdwn
+++ b/doc/todo/migration_to_VURL_by_default.mdwn
@@ -41,6 +41,10 @@ configuration of which kind of keys addurl uses, once VURL is the default.
> > VURL keys. (Registering one without an equivilant key would make no hash
> > verification be done, so no better than an URL key.)
> >
+> > > Wait, if it generates a VURL key with no size, wouldn't it be the
+> > > same as `git-annex addurl --verifiable --relaxed`? Which is fine;
+> > > it's currently the same as `git-annex addurl --relaxed`.
+> >
> > But, I don't think that registerurl/unregisterurl continuing to
> > generate URL keys is a big problem, it should not block making VURL
> > the default in places where it can be default. --[[Joey]]
@@ -49,16 +53,10 @@ configuration of which kind of keys addurl uses, once VURL is the default.
Made --verifiable be the default for addurl and importfeed.
-I want to think more about registerurl and unregisterurl (and fromkey's)
-generation of URL keys though.
-
-unregisterurl could generate from an url both an URL and a VURL and
-unregister both, or whichever is registered. That seems to make sense,
-because which ever might have been registered before, unregisterurl is used
-when the content can no longer be downloaded from the web (or other special
-remote that claims an url).
+Made generate from an url both an URL and a VURL and
+unregister both, or whichever is registered.
-> Implemented this..
+Made registerurl (and fromkey) generate a VURL key that behaves the same
+as addurl --relaxed.
-Could registerurl (and fromkey) generate a VURL key that behaves the same
-as addurl --relaxed? --[[Joey]]
+So all [[done]]! --[[Joey]]
fix docs to match unregisterurl behavior
As implemented, when it's passed an URL key, it also unregisters the
VURL key, and vice-versa.
I think this behavior is ok, since the idea is that the url is no longer
available. And so unregistering both URL and VURL key is ok, since
neither should work to get from it any longer.
As implemented, when it's passed an URL key, it also unregisters the
VURL key, and vice-versa.
I think this behavior is ok, since the idea is that the url is no longer
available. And so unregistering both URL and VURL key is ok, since
neither should work to get from it any longer.
diff --git a/CHANGELOG b/CHANGELOG
index 365aac74eb..8cea03728a 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -10,8 +10,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
Secret support.
* fsck: Support repairing a corrupted file in a versiond S3 remote.
* addurl, importfeed: Enable --verifiable by default.
- * unregisterurl: Unregister both VURL and URL keys when passed an url
- instead of a key.
+ * unregisterurl: Unregister both VURL and URL keys.
* unregisterurl: Fix display of action to not be "registerurl".
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/doc/git-annex-unregisterurl.mdwn b/doc/git-annex-unregisterurl.mdwn
index 17964145fa..1a93bac082 100644
--- a/doc/git-annex-unregisterurl.mdwn
+++ b/doc/git-annex-unregisterurl.mdwn
@@ -12,8 +12,7 @@ This plumbing-level command can be used to unregister urls when keys can
no longer be downloaded from them.
Normally the key is a git-annex formatted key. However, when the key cannot
-be parsed as a key, and is a valid url, an URL key and a VURL key are both
-constructed from the url, and both keys are unregistered.
+be parsed as a key, and is a valid url, a key is generated from the url.
Unregistering a key's last web url will make git-annex no longer treat content
as being present in the web special remote. If some other special remote
unregisterurl: Unregister both VURL and URL keys when passed an url instead of a key
The idea with doing both is that unregisterurl is used with an url when
the content of the url is no longer present. So unregistering both makes
sense. And, as git-annex transitions from using URL to VURL by default,
there can be both in repos, and so unregistering both avoids breaking
workflows that used to register URL keys, but are now registering VURL
keys.
The idea with doing both is that unregisterurl is used with an url when
the content of the url is no longer present. So unregistering both makes
sense. And, as git-annex transitions from using URL to VURL by default,
there can be both in repos, and so unregistering both avoids breaking
workflows that used to register URL keys, but are now registering VURL
keys.
diff --git a/Backend/URL.hs b/Backend/URL.hs
index d68b2196e3..a23409fd69 100644
--- a/Backend/URL.hs
+++ b/Backend/URL.hs
@@ -1,13 +1,14 @@
{- git-annex URL backend -- keys whose content is available from urls.
-
- - Copyright 2011-2024 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
module Backend.URL (
backends,
- fromUrl
+ fromUrl,
+ otherUrlKey,
) where
import Annex.Common
@@ -41,3 +42,12 @@ fromUrl url size verifiable = mkKey $ \k -> k
, keyVariety = if verifiable then VURLKey else URLKey
, keySize = size
}
+
+{- From an URL key to a VURL key and vice-versa. -}
+otherUrlKey :: Key -> Maybe Key
+otherUrlKey k
+ | fromKey keyVariety k == URLKey = Just $
+ alterKey k $ \kd -> kd { keyVariety = VURLKey }
+ | fromKey keyVariety k == VURLKey = Just $
+ alterKey k $ \kd -> kd { keyVariety = URLKey }
+ | otherwise = Nothing
diff --git a/CHANGELOG b/CHANGELOG
index 453c8951d4..eedacf7fdc 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -10,6 +10,8 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
Secret support.
* fsck: Support repairing a corrupted file in a versiond S3 remote.
* addurl, importfeed: Enable --verifiable by default.
+ * unregisterurl: Unregister both VURL and URL keys when passed an url
+ instead of a key.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Command/UnregisterUrl.hs b/Command/UnregisterUrl.hs
index e8bf16c933..adc4e76dd5 100644
--- a/Command/UnregisterUrl.hs
+++ b/Command/UnregisterUrl.hs
@@ -1,6 +1,6 @@
{- git-annex command
-
- - Copyright 2015-2022 Joey Hess <id@joeyh.name>
+ - Copyright 2015-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -12,6 +12,7 @@ module Command.UnregisterUrl where
import Command
import Logs.Web
import Command.RegisterUrl (seekBatch, start, optParser, RegisterUrlOptions(..))
+import Backend.URL
cmd :: Command
cmd = withAnnexOptions [jsonOptions] $ command "unregisterurl"
@@ -26,13 +27,19 @@ seek o = case (batchOption o, keyUrlPairs o) of
unregisterUrl :: Remote -> Key -> String -> Annex ()
unregisterUrl _remote key url = do
+ unregisterUrl' url key
+ maybe noop (unregisterUrl' url) (otherUrlKey key)
+
+unregisterUrl' :: String -> Key -> Annex ()
+unregisterUrl' url key = do
-- Remove the url no matter what downloader;
-- registerurl can set OtherDownloader, and this should also
-- be able to remove urls added by addurl, which may use
-- YoutubeDownloader.
forM_ [minBound..maxBound] $ \dl ->
setUrlMissing key (setDownloader url dl)
- -- Unlike unregisterurl, this does not update location tracking
- -- for remotes other than the web special remote. Doing so with
- -- a remote that git-annex can drop content from would rather
- -- unexpectedly leave content stranded on that remote.
+ -- Unlike registerurl, this does not update location
+ -- tracking for remotes other than the web special remote.
+ -- Doing so with a remote that git-annex can drop content
+ -- from would rather unexpectedly leave content stranded
+ -- on that remote.
diff --git a/doc/git-annex-unregisterurl.mdwn b/doc/git-annex-unregisterurl.mdwn
index e8192f4084..17964145fa 100644
--- a/doc/git-annex-unregisterurl.mdwn
+++ b/doc/git-annex-unregisterurl.mdwn
@@ -11,8 +11,9 @@ git annex unregisterurl `[key url]`
This plumbing-level command can be used to unregister urls when keys can
no longer be downloaded from them.
-Normally the key is a git-annex formatted key. However, if the key cannot be
-parsed as a key, and is a valid url, an URL key is constructed from the url.
+Normally the key is a git-annex formatted key. However, when the key cannot
+be parsed as a key, and is a valid url, an URL key and a VURL key are both
+constructed from the url, and both keys are unregistered.
Unregistering a key's last web url will make git-annex no longer treat content
as being present in the web special remote. If some other special remote
diff --git a/doc/todo/migration_to_VURL_by_default.mdwn b/doc/todo/migration_to_VURL_by_default.mdwn
index 8f47a71f8a..4d658b9c89 100644
--- a/doc/todo/migration_to_VURL_by_default.mdwn
+++ b/doc/todo/migration_to_VURL_by_default.mdwn
@@ -58,5 +58,7 @@ because which ever might have been registered before, unregisterurl is used
when the content can no longer be downloaded from the web (or other special
remote that claims an url).
+> Implemented this..
+
Could registerurl (and fromkey) generate a VURL key that behaves the same
as addurl --relaxed? --[[Joey]]
addurl, importfeed: Enable --verifiable by default
diff --git a/CHANGELOG b/CHANGELOG
index 724ba92b57..453c8951d4 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -9,6 +9,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
using old http servers that use TLS 1.2 without Extended Main
Secret support.
* fsck: Support repairing a corrupted file in a versiond S3 remote.
+ * addurl, importfeed: Enable --verifiable by default.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Command/AddUrl.hs b/Command/AddUrl.hs
index cbfc71c577..5c725df7a5 100644
--- a/Command/AddUrl.hs
+++ b/Command/AddUrl.hs
@@ -1,6 +1,6 @@
{- git-annex command
-
- - Copyright 2011-2024 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -61,7 +61,7 @@ data AddUrlOptions = AddUrlOptions
data DownloadOptions = DownloadOptions
{ relaxedOption :: Bool
- , verifiableOption :: Bool
+ , oldVerifiableOption :: Bool -- no longer configurable
, rawOption :: Bool
, noRawOption :: Bool
, rawExceptOption :: Maybe (DeferredParse Remote)
@@ -101,7 +101,7 @@ parseDownloadOptions withfileoptions = DownloadOptions
<*> switch
( long "verifiable"
<> short 'V'
- <> help "improve later verification of --fast or --relaxed content"
+ <> help "no longer needed, verifiable urls are used by default"
)
<*> switch
( long "raw"
@@ -221,7 +221,7 @@ performRemote addunlockedmatcher r o uri file sz = lookupKey file >>= \case
downloadRemoteFile :: AddUnlockedMatcher -> Remote -> DownloadOptions -> URLString -> OsPath -> Maybe Integer -> Annex (Maybe Key)
downloadRemoteFile addunlockedmatcher r o uri file sz = checkCanAdd o file $ \canadd -> do
- let urlkey = Backend.URL.fromUrl uri sz (verifiableOption o)
+ let urlkey = Backend.URL.fromUrl uri sz True
createWorkTreeDirectory (parentDir file)
ifM (Annex.getRead Annex.fast <||> pure (relaxedOption o))
( do
@@ -351,7 +351,7 @@ downloadWeb :: AddUnlockedMatcher -> DownloadOptions -> URLString -> Url.UrlInfo
downloadWeb addunlockedmatcher o url urlinfo file =
go =<< downloadWith' downloader urlkey webUUID url file
where
- urlkey = addSizeUrlKey urlinfo $ Backend.URL.fromUrl url Nothing (verifiableOption o)
+ urlkey = addSizeUrlKey urlinfo $ Backend.URL.fromUrl url Nothing True
downloader f p = Url.withUrlOptions Nothing $
downloadUrl False urlkey p Nothing [url] f
go Nothing = return Nothing
@@ -395,7 +395,7 @@ downloadWeb addunlockedmatcher o url urlinfo file =
warning (UnquotedString youtubeDlCommand <> " did not download anything")
return Nothing
mediaurl = setDownloader url YoutubeDownloader
- mediakey = Backend.URL.fromUrl mediaurl Nothing (verifiableOption o)
+ mediakey = Backend.URL.fromUrl mediaurl Nothing True
-- Does the already annexed file have the mediaurl
-- as an url? If so nothing to do.
alreadyannexed dest k = do
@@ -443,7 +443,7 @@ startingAddUrl si url o p = starting "addurl" ai si $ do
-- used to prevent two threads running concurrently when that would
-- likely fail.
ai = OnlyActionOn urlkey (ActionItemOther (Just (UnquotedString url)))
- urlkey = Backend.URL.fromUrl url Nothing (verifiableOption (downloadOptions o))
+ urlkey = Backend.URL.fromUrl url Nothing True
showDestinationFile :: OsPath -> Annex ()
showDestinationFile file = do
@@ -546,12 +546,12 @@ nodownloadWeb addunlockedmatcher o url urlinfo file
return Nothing
where
nomedia = do
- let key = Backend.URL.fromUrl url (Url.urlSize urlinfo) (verifiableOption o)
+ let key = Backend.URL.fromUrl url (Url.urlSize urlinfo) True
nodownloadWeb' o addunlockedmatcher url key file
usemedia mediafile = do
let dest = youtubeDlDestFile o file mediafile
let mediaurl = setDownloader url YoutubeDownloader
- let mediakey = Backend.URL.fromUrl mediaurl Nothing (verifiableOption o)
+ let mediakey = Backend.URL.fromUrl mediaurl Nothing True
nodownloadWeb' o addunlockedmatcher mediaurl mediakey dest
youtubeDlDestFile :: DownloadOptions -> OsPath -> OsPath -> OsPath
diff --git a/Command/ImportFeed.hs b/Command/ImportFeed.hs
index e502915c41..e3411c16e8 100644
--- a/Command/ImportFeed.hs
+++ b/Command/ImportFeed.hs
@@ -275,7 +275,7 @@ startDownload addunlockedmatcher opts cache cv todownload = case location todown
Enclosure url -> startdownloadenclosure url
MediaLink linkurl -> do
let mediaurl = setDownloader linkurl YoutubeDownloader
- let mediakey = Backend.URL.fromUrl mediaurl Nothing (verifiableOption (downloadOptions opts))
+ let mediakey = Backend.URL.fromUrl mediaurl Nothing True
-- Old versions of git-annex that used quvi might have
-- used the quviurl for this, so check if it's known
-- to avoid adding it a second time.
diff --git a/doc/backends.mdwn b/doc/backends.mdwn
index c08f3d52e6..2d1328590b 100644
--- a/doc/backends.mdwn
+++ b/doc/backends.mdwn
@@ -57,7 +57,7 @@ in `.gitattributes`:
* `VURL` -- This is like an `URL` (see below) but the content can
be verified with a cryptographically secure checksum that is
recorded in the git-annex branch. It's generated when using
- eg `git-annex addurl --fast --verifiable`.
+ eg `git-annex addurl --fast/--relaxed`.
## non-cryptograpgically secure backends
@@ -70,15 +70,16 @@ content of an annexed file remains unchanged.
the same filename, size, and modification time has the same content.
This is the least expensive backend, recommended for really large
files or slow systems.
-* `URL` -- This is a key that is generated from the url to a file.
- It's generated when using eg, `git annex addurl --fast`, when the file
- content is not available for hashing.
+* `URL` -- This is a key that is generated from the url to a file.
The key may not contain the full URL; for long URLs, part of the URL may be
represented by a checksum.
The URL key may contain `&` characters; be sure to quote the key if
passing it to a shell script. These types of keys are distinct from URLs/URIs
that may be attached to a key (using any backend) indicating the key's location
- on the web or in one of [[special_remotes]].
+ on the web or in one of [[special_remotes]].
+ Older versions of git-annex generated this when using
+ `git annex addurl --fast/--relaxed`, and `git-annex registerurl` still
+ generates this.
## external backends
diff --git a/doc/git-annex-addurl.mdwn b/doc/git-annex-addurl.mdwn
index c2247e1de9..4bfc3a9dc0 100644
--- a/doc/git-annex-addurl.mdwn
+++ b/doc/git-annex-addurl.mdwn
@@ -45,22 +45,27 @@ be used to get better filenames.
* `--verifiable` `-V`
- This can be used with the `--fast` or `--relaxed` option. It improves
- the safety of the resulting annexed file, by letting its content be
- verified with a checksum when it is transferred between git-annex
- repositories, as well as by things like `git-annex fsck`.
-
- When used with --relaxed, content from the web special remote will
- always be accepted, even if it has changed, and the checksum recorded
- for later verification.
-
- When used with --fast, the checksum is recorded the first time the
- content is downloaded from the web special remote. Once a checksum has
- been recorded, subsequent downloads from the web special remote
- must have the same checksum.
-
- When addurl was used without this option before, the file it added
- can be converted to be verifiable by migrating it to the VURL backend.
+ This option is now enabled by default when using `--fast` or `--relaxed`,
+ but was not the default in older versions of git-annex.
+
+ When a file is added without first downloading its content from the web,
+ the checksum of the file is not yet known.
+
+ To allow later learning and verifying the checksum, the VURL backend is
+ used. With `--fast`, the checksum is learned on initial download of the
+ file from the web, and all subsequent downloads from the web must have
+ the same checksum. With `--relaxed`, additional checksums are added each
+ time different content is downloaded from the web.
+
+ This improves the safety of the resulting annexed file, by letting
+ its content be verified with a checksum when it is transferred between
+ git-annex repositories, as well as by things like `git-annex fsck`.
+
+ Files that were added with old versions of addurl (or with
+ `git-annex registerurl` or `git-annex fromkey`) are not verifiable.
+ They can be converted to verifiable by migrating them from the URL
+ backend to the VURL backend.
+
For example: `git-annex migrate foo --backend=VURL`
* `--raw`
diff --git a/doc/todo/migration_to_VURL_by_default.mdwn b/doc/todo/migration_to_VURL_by_default.mdwn
index 69db574059..8f47a71f8a 100644
--- a/doc/todo/migration_to_VURL_by_default.mdwn
+++ b/doc/todo/migration_to_VURL_by_default.mdwn
@@ -44,3 +44,19 @@ configuration of which kind of keys addurl uses, once VURL is the default.
> > But, I don't think that registerurl/unregisterurl continuing to
> > generate URL keys is a big problem, it should not block making VURL
> > the default in places where it can be default. --[[Joey]]
+
+----
+
+Made --verifiable be the default for addurl and importfeed.
(Diff truncated)
remove duplicate documentation of --fast, --relaxed, etc
diff --git a/doc/git-annex-importfeed.mdwn b/doc/git-annex-importfeed.mdwn index 6a763e0380..134afbd80a 100644 --- a/doc/git-annex-importfeed.mdwn +++ b/doc/git-annex-importfeed.mdwn @@ -41,30 +41,6 @@ resulting in the new url being downloaded to such a filename. These options behave the same as when using [[git-annex-addurl]](1). -* `--fast` - - Avoid immediately downloading urls. The url is still checked - (via HEAD) to verify that it exists, and to get its size if possible. - -* `--relaxed` - - Don't immediately download urls, and avoid storing the size of the - url's content. This makes git-annex accept whatever content is there - at a future point. - -* `--raw` - - Prevent special handling of urls by yt-dlp, bittorrent, and other - special remotes. This will for example, make importfeed - download a .torrent file and not the contents it points to. - -* `--no-raw` - - Require content pointed to by the url to be downloaded using yt-dlp - or a special remote, rather than the raw content of the url. if that - cannot be done, the import will fail, and the next import of the feed - will retry. - * `--scrape` Rather than downloading the url and parsing it as a rss/atom feed
diff --git a/doc/bugs/git_annex_export_--fast_deletes_files_on_remote.mdwn b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote.mdwn new file mode 100644 index 0000000000..ae34d49200 --- /dev/null +++ b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote.mdwn @@ -0,0 +1,47 @@ +### Please describe the problem. + +`git annex export --fast main --to <remote>` deletes existing files on a rsync ssh remote. My mental model was that `--fast` usually instructs git-annex to not do (slow) network connections. + +My use-case for a fast export is to add an existing ssh-accessible non-git directory on a HPC system as a potential data source for a git-annex repository. The repository has additional information like how to retrieve the files from a third-party, while the directory on HPC only has the files (which were downloaded without git-annex involvement already). My plan was to add the directory as an exporttree remote, make git-annex think that the current main branches tree should be available there via the fast export, and then do a `git annex fsck --from <remote>` to discover what's actually there. Obviously it is very undesirable to loose those files on export then. + +From what I understand I could hack around this if I graft the tree into the git-annex branch and write export.log myself, but I am wondering if I am just encountering a bug and this should work the way I wanted it to. + + +### What steps will reproduce the problem? + +- Create a git-annex repository and add some files +- Create a plain directory with (a subset of) the same filenames in the repository +- Add this directory as an rsync export remote: `git annex initremote <remote> type=rsync rsyncurl=<host>:<path> exporttree=yes encryption=none` +- `git annex export --fast main --to <remote>` +- Observe files being deleted on the remote + + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20260115-ge8de977f1d5b5ac57cfe7a0c66d4e1c3ff337af1 +build flags: Assistant Webapp Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV Servant OsPath +dependency versions: aws-0.25.2 bloomfilter-2.0.1.3 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.3 http-client-0.7.19 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + + +[[!tag projects/ICE4]]
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn index f02545f603..fa78722f50 100644 --- a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn @@ -2,7 +2,7 @@ The p2phttp server can get stuck such that it no longer sends responses when client git-annex processes are interrupted. -I think this is the cause for deadlocks mih and I have seen (very sporadically) on Forgejo-aneksajo instances. I know of ~4 deadlocks since the 10.20251114 that happened in regular usage of these instances and that required a server restart to fix. +I think this is the cause for deadlocks mih and I have seen (very sporadically) on Forgejo-aneksajo instances. I know of ~4 deadlocks since the 10.20251114 release that happened in regular usage of these instances and that required a server restart to fix. ### What steps will reproduce the problem?
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn index d2fd77e0be..f02545f603 100644 --- a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn @@ -2,7 +2,7 @@ The p2phttp server can get stuck such that it no longer sends responses when client git-annex processes are interrupted. -I think this is the cause for deadlocks mih and I have seen (very sporadically) on Forgejo-aneksajo instances. I know of \~4 deadlocks since +I think this is the cause for deadlocks mih and I have seen (very sporadically) on Forgejo-aneksajo instances. I know of ~4 deadlocks since the 10.20251114 that happened in regular usage of these instances and that required a server restart to fix. ### What steps will reproduce the problem?
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn new file mode 100644 index 0000000000..d2fd77e0be --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn @@ -0,0 +1,177 @@ +### Please describe the problem. + +The p2phttp server can get stuck such that it no longer sends responses when client git-annex processes are interrupted. + +I think this is the cause for deadlocks mih and I have seen (very sporadically) on Forgejo-aneksajo instances. I know of \~4 deadlocks since + + +### What steps will reproduce the problem? + +Create a repository with some data (I used datalad, but plain git-annex should be the same): + +``` +datalad create test-p2phttp-interrupt +cd test-p2phttp-interrupt +for i in $(seq 1 20); do head -c 1G /dev/urandom > test$i.bin; done +datalad save +``` + +Create two clones: + +``` +datalad clone test-p2phttp-interrupt test-p2phttp-interrupt-clone +datalad clone test-p2phttp-interrupt test-p2phttp-interrupt-clone2 +``` + +Make them use p2phttp (run in both clones): + +``` +git config remote.origin.annexUrl 'annex+http://localhost:3001' +``` + +Serve the first repo via p2phttp: + +``` +git annex p2phttp -J2 --debug --bind localhost --port 3001 --wideopen +``` + +In one clone run a get that is constantly interrupted and restarted: + +``` +while true; do +git annex get . & +pid=$! +sleep 5 +kill -s SIGINT $pid +done +``` + +In the other clone just run a regular get: + +``` +git annex get . +``` + +Observation: after letting this run for a while the get's no longer make any progress. The p2phttp process no longer logs anything new. + +Given my understanding from the previous deadlocks in p2phttp it seems like the worker process that should be used to respond to these requests somehow didn't get released after an interrupted request. + + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20260115-ge8de977f1d5b5ac57cfe7a0c66d4e1c3ff337af1 +build flags: Assistant Webapp Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV Servant OsPath +dependency versions: aws-0.25.2 bloomfilter-2.0.1.3 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.3 http-client-0.7.19 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + +$ git annex p2phttp -J2 --debug --bind localhost --port 3001 --wideopen +[2026-01-27 15:17:52.122387435] (Utility.Process) process [1704520] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"] +[2026-01-27 15:17:52.124778937] (Utility.Process) process [1704520] done ExitSuccess +[2026-01-27 15:17:52.125127598] (Utility.Process) process [1704521] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"] +[2026-01-27 15:17:52.127448536] (Utility.Process) process [1704521] done ExitSuccess +[2026-01-27 15:17:52.128485775] (Utility.Process) process [1704522] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"] +[2026-01-27 15:17:52.131112388] (Annex.Branch) read proxy.log +[2026-01-27 15:17:56.728686389] (P2P.IO) [http client] [ThreadId 12] P2P > CHECKPRESENT MD5E-s1073741824--4c882b5dc5bbb53d59ab0d4e67e2a3c4.bin +[2026-01-27 15:17:56.728896008] (P2P.IO) [http server] [ThreadId 15] P2P < CHECKPRESENT MD5E-s1073741824--4c882b5dc5bbb53d59ab0d4e67e2a3c4.bin +[2026-01-27 15:17:56.729107393] (P2P.IO) [http server] [ThreadId 15] P2P > SUCCESS +[2026-01-27 15:17:56.729160766] (P2P.IO) [http client] [ThreadId 12] P2P < SUCCESS +[2026-01-27 15:17:57.093025365] (P2P.IO) [http client] [ThreadId 18] P2P > GET 220011077 test10.bin MD5E-s1073741824--4c882b5dc5bbb53d59ab0d4e67e2a3c4.bin +[2026-01-27 15:17:57.093145849] (P2P.IO) [http server] [ThreadId 17] P2P < GET 220011077 test10.bin MD5E-s1073741824--4c882b5dc5bbb53d59ab0d4e67e2a3c4.bin +[2026-01-27 15:17:57.093714738] (Utility.Process) process [1704639] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","write-tree"] +[2026-01-27 15:17:57.096671191] (Utility.Process) process [1704639] done ExitSuccess +[2026-01-27 15:17:57.096984805] (Utility.Process) process [1704640] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/annex/last-index"] +[2026-01-27 15:17:57.099497] (Utility.Process) process [1704640] done ExitSuccess +[2026-01-27 15:17:57.100142132] (P2P.IO) [http server] [ThreadId 17] P2P > DATA 853730747 +[2026-01-27 15:17:57.100206771] (P2P.IO) [http client] [ThreadId 18] P2P < DATA 853730747 +[2026-01-27 15:17:59.215559236] (P2P.IO) [http client] [ThreadId 24] P2P > CHECKPRESENT MD5E-s1073741824--e8bb491c04da0917cf1871a4d9f719d2.bin +[2026-01-27 15:17:59.215654747] (P2P.IO) [http server] [ThreadId 26] P2P < CHECKPRESENT MD5E-s1073741824--e8bb491c04da0917cf1871a4d9f719d2.bin +[2026-01-27 15:17:59.215723248] (P2P.IO) [http server] [ThreadId 26] P2P > SUCCESS +[2026-01-27 15:17:59.215761274] (P2P.IO) [http client] [ThreadId 24] P2P < SUCCESS +[2026-01-27 15:17:59.217064991] (P2P.IO) [http client] [ThreadId 29] P2P > GET 0 test1.bin MD5E-s1073741824--e8bb491c04da0917cf1871a4d9f719d2.bin +[2026-01-27 15:17:59.217130521] (P2P.IO) [http server] [ThreadId 28] P2P < GET 0 test1.bin MD5E-s1073741824--e8bb491c04da0917cf1871a4d9f719d2.bin +[2026-01-27 15:17:59.217519652] (P2P.IO) [http server] [ThreadId 28] P2P > DATA 1073741824 +[2026-01-27 15:17:59.21755853] (P2P.IO) [http client] [ThreadId 29] P2P < DATA 1073741824 +[2026-01-27 15:18:00.279578339] (P2P.IO) [http server] [ThreadId 17] P2P > VALID +[2026-01-27 15:18:00.279785154] (P2P.IO) [http client] [ThreadId 18] P2P < VALID +[2026-01-27 15:18:00.279818373] (P2P.IO) [http client] [ThreadId 18] P2P > SUCCESS +[2026-01-27 15:18:00.279862343] (P2P.IO) [http server] [ThreadId 17] P2P < SUCCESS +[2026-01-27 15:18:00.329523146] (P2P.IO) [http client] [ThreadId 12] P2P > CHECKPRESENT MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:00.329702303] (P2P.IO) [http server] [ThreadId 33] P2P < CHECKPRESENT MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:00.329825138] (P2P.IO) [http server] [ThreadId 33] P2P > SUCCESS +[2026-01-27 15:18:00.329871666] (P2P.IO) [http client] [ThreadId 12] P2P < SUCCESS +[2026-01-27 15:18:00.331456293] (P2P.IO) [http client] [ThreadId 36] P2P > GET 0 test11.bin MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:00.331595061] (P2P.IO) [http server] [ThreadId 35] P2P < GET 0 test11.bin MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:00.332346826] (P2P.IO) [http server] [ThreadId 35] P2P > DATA 1073741824 +[2026-01-27 15:18:00.332430727] (P2P.IO) [http client] [ThreadId 36] P2P < DATA 1073741824 +[2026-01-27 15:18:01.745659339] (P2P.IO) [http client] [ThreadId 39] P2P > CHECKPRESENT MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:01.745775646] (P2P.IO) [http server] [ThreadId 41] P2P < CHECKPRESENT MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:01.745896432] (P2P.IO) [http server] [ThreadId 41] P2P > SUCCESS +[2026-01-27 15:18:01.745947955] (P2P.IO) [http client] [ThreadId 39] P2P < SUCCESS +[2026-01-27 15:18:02.304886078] (P2P.IO) [http client] [ThreadId 44] P2P > GET 335670329 test11.bin MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:02.305117538] (P2P.IO) [http server] [ThreadId 43] P2P < GET 335670329 test11.bin MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:03.331345311] (P2P.IO) [http server] [ThreadId 28] P2P > VALID +[2026-01-27 15:18:03.331419419] (P2P.IO) [http client] [ThreadId 29] P2P < VALID +[2026-01-27 15:18:03.331465319] (P2P.IO) [http client] [ThreadId 29] P2P > SUCCESS +[2026-01-27 15:18:03.331492753] (P2P.IO) [http server] [ThreadId 28] P2P < SUCCESS +[2026-01-27 15:18:03.331780166] (P2P.IO) [http server] [ThreadId 43] P2P > DATA 738071495 +[2026-01-27 15:18:03.331839961] (P2P.IO) [http client] [ThreadId 44] P2P < DATA 738071495 +[2026-01-27 15:18:06.044717964] (P2P.IO) [http server] [ThreadId 43] P2P > VALID +[2026-01-27 15:18:06.044806699] (P2P.IO) [http client] [ThreadId 44] P2P < VALID +[2026-01-27 15:18:06.044861192] (P2P.IO) [http client] [ThreadId 44] P2P > SUCCESS +[2026-01-27 15:18:06.04490031] (P2P.IO) [http server] [ThreadId 43] P2P < SUCCESS + +$ while true; do +git annex get . & +pid=$! +sleep 5 +kill -s SIGINT $pid +done +[1] 1704547 +get test10.bin (from origin...) +ok +get test11.bin (from origin...) +29% 294.58 MiB 255 MiB/s 2s [2] 1704749 +(recording state in git...) +get test11.bin (from origin...) +ok +get test12.bin [1]- Interrupt git annex get . +[3] 1704857 +(recording state in git...) +get test12.bin [2]- Interrupt git annex get . +[4] 1704996 +get test12.bin [3]- Interrupt git annex get . +[5] 1705094 +get test12.bin [4]- Interrupt git annex get . +[6] 1705191 +get test12.bin [5]- Interrupt git annex get . +[7] 1705286 +get test12.bin ^C[6]- Interrupt git annex get . + +$ git annex get . +get test1.bin (from origin...) +ok +get test10.bin ^C + +# End of transcript or log. +"""]] + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Starting with a DataLad Dataset and by extension git-annex repository is the first thing I do whenever I have to deal with code and/or data that is not some throwaway stuff :) + +[[!tag projects/ICE4]]
comment
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_8_8bb8daa43d23d8ffdb652cfeb627b2db._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_8_8bb8daa43d23d8ffdb652cfeb627b2db._comment new file mode 100644 index 0000000000..55d7c11a3d --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_8_8bb8daa43d23d8ffdb652cfeb627b2db._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2026-01-26T17:01:47Z" + content=""" +I thought about making `git-annex export` checksum files before uploading, +but I don't see why export needs that any more than a regular copy to a +remote does. In either case, annex.verify will notice the bad content when +getting from the remote, and fscking the remote will also detect it, and +now, recover from it. + +It seems unlikely to me that the annex object file got truncated before +it was sent to ds005256 in any case. Seems more likely that the upload +was somehow not of the whole file. +"""]]
done
diff --git a/CHANGELOG b/CHANGELOG
index bc1e4875a5..724ba92b57 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -8,6 +8,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* Added annex.security.allow-insecure-https config, which allows
using old http servers that use TLS 1.2 without Extended Main
Secret support.
+ * fsck: Support repairing a corrupted file in a versiond S3 remote.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/doc/todo/recover_from_export_of_corrupted_object.mdwn b/doc/todo/recover_from_export_of_corrupted_object.mdwn
index 9311547310..8383eae5b1 100644
--- a/doc/todo/recover_from_export_of_corrupted_object.mdwn
+++ b/doc/todo/recover_from_export_of_corrupted_object.mdwn
@@ -37,3 +37,5 @@ Could fsck be extended to handle this? It should be possible for fsck to:
--[[Joey]]
[[!tag projects/openneuro]]
+
+> [[done]] --[[Joey]]
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_7_189e551ab3adbd42175bed435452e39b._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_7_189e551ab3adbd42175bed435452e39b._comment
new file mode 100644
index 0000000000..93ae733261
--- /dev/null
+++ b/doc/todo/recover_from_export_of_corrupted_object/comment_7_189e551ab3adbd42175bed435452e39b._comment
@@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 7"""
+ date="2026-01-26T16:56:12Z"
+ content="""
+Finished implementing recovery from a corrupted S3 version id.
+"""]]
comments
diff --git a/doc/todo/drop_from_export_remote/comment_2_528e517b82a9f9a3e0f6ba5c2177e21a._comment b/doc/todo/drop_from_export_remote/comment_2_528e517b82a9f9a3e0f6ba5c2177e21a._comment new file mode 100644 index 0000000000..8d128e039b --- /dev/null +++ b/doc/todo/drop_from_export_remote/comment_2_528e517b82a9f9a3e0f6ba5c2177e21a._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-01-26T16:43:58Z" + content=""" +Currently `git-annex fsck --from` an export remote is unable to drop a key +if it finds corrupted data. Implementing this would also deal with that +problem. +"""]] diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_5_5d1aa23a6819b3d169c2c2090cf24041._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_5_5d1aa23a6819b3d169c2c2090cf24041._comment new file mode 100644 index 0000000000..83f222fa92 --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_5_5d1aa23a6819b3d169c2c2090cf24041._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-26T16:41:32Z" + content=""" +#1 is not needed for the case of a versioned S3 bucket, because after +`git-annex fsck --from S3` corrects the problem, `git-annex export --to S3` +will see that the file is not in S3, and re-upload it. + +In the general case, #1 is still needed. I think +[[todo/drop_from_export_remote]] would solve this, and so no need to deal +with it here. +"""]]
diff --git a/doc/forum/How_do_I_prevent_annex-sync_from_eating_my_data__63__.mdwn b/doc/forum/How_do_I_prevent_annex-sync_from_eating_my_data__63__.mdwn new file mode 100644 index 0000000000..8151752177 --- /dev/null +++ b/doc/forum/How_do_I_prevent_annex-sync_from_eating_my_data__63__.mdwn @@ -0,0 +1,46 @@ +How do I prevent annex-sync from eating my data by automatic commits? + +I have already set the following: + +$ anx config --get annex.autocommit + +false + +$ anx config --get annex.resolvemerge + +false + +$ anx config --get annex.synccontent + +false + +$ anx config --get annex.synconlyannex + +true + +Workflow is as follows: + +- single repo on a PC with mixed locked and unlocked files and an adb special remote. + +- make a change to a file on the android device; + +- run git annex sync; + +Current behavior: + +Said file (sometimes; I don't get the logic) gets overwritten on the PC. + +All unlocked files are automatically locked. + +Oh and if git-annex noticed a conflict and refused to overwrite the file on the android device during export, +then if I run git annex sync again it overwrites the file on the android device anyway leading to data loss. + +What the hell? + +Desired behavior: + +Apply some conflict resolution strategies if needed and just stage the changes. + +Don't actually commit any changes. + +Don't eat my data by automatically running git annex export.
add git config for HTTPS with TLS 1.2 w/o EMS
Added annex.security.allow-insecure-https config, which allows using old
HTTPS servers that use TLS 1.2 without Extended Main Secret support.
When git-annex is built with tls-2.0, it will default to not supporting
those. Note that currently, Debian has an older version of the library,
but building with stack will get tls-2.0.
The annex.security.allow-insecure-https name and setting
was chosen to allow supporting other such things in the future.
With that said, I hope that the "tls-1.2-no-EMS" value can be removed from
git-annex at some point in the future. The number of affected HTTPS servers
must be decreasing, and they will eventually get fixed. And this is an ugly
bit of complexity.
Users will I suppose have to find the setting by googling the error
message, which is "peer does not support Extended Main Secret".
It would be possible to catch the exception,
HandshakeFailed (Error_Protocol "peer does not support Extended Main Secret" HandshakeFailure)
but it would be hard to catch it in the right places where the http manager
is used.
The added dependencies on crypton-connection and tls are free,
those were already indirect dependencies.
Sponsored-by: Leon Schuermann
Added annex.security.allow-insecure-https config, which allows using old
HTTPS servers that use TLS 1.2 without Extended Main Secret support.
When git-annex is built with tls-2.0, it will default to not supporting
those. Note that currently, Debian has an older version of the library,
but building with stack will get tls-2.0.
The annex.security.allow-insecure-https name and setting
was chosen to allow supporting other such things in the future.
With that said, I hope that the "tls-1.2-no-EMS" value can be removed from
git-annex at some point in the future. The number of affected HTTPS servers
must be decreasing, and they will eventually get fixed. And this is an ugly
bit of complexity.
Users will I suppose have to find the setting by googling the error
message, which is "peer does not support Extended Main Secret".
It would be possible to catch the exception,
HandshakeFailed (Error_Protocol "peer does not support Extended Main Secret" HandshakeFailure)
but it would be hard to catch it in the right places where the http manager
is used.
The added dependencies on crypton-connection and tls are free,
those were already indirect dependencies.
Sponsored-by: Leon Schuermann
diff --git a/Annex/Url.hs b/Annex/Url.hs
index 6d0cb43767..f08aa1baef 100644
--- a/Annex/Url.hs
+++ b/Annex/Url.hs
@@ -1,7 +1,7 @@
{- Url downloading, with git-annex user agent and configured http
- headers, security restrictions, etc.
-
- - Copyright 2013-2022 Joey Hess <id@joeyh.name>
+ - Copyright 2013-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -48,6 +48,8 @@ import Network.HTTP.Client
import Network.HTTP.Client.TLS
import Text.Read
import qualified Data.Set as S
+import qualified Network.Connection as NC
+import qualified Network.TLS as TLS
defaultUserAgent :: U.UserAgent
defaultUserAgent = "git-annex/" ++ BuildInfo.packageversion
@@ -66,7 +68,7 @@ getUrlOptions mgc = Annex.getState Annex.urloptions >>= \case
return uo
where
mk = do
- (urldownloader, manager) <- checkallowedaddr
+ (urldownloader, manager) <- mk' =<< Annex.getGitConfig
U.mkUrlOptions
<$> (Just <$> getUserAgent)
<*> headers
@@ -87,7 +89,7 @@ getUrlOptions mgc = Annex.getState Annex.urloptions >>= \case
pure (remoteAnnexWebOptions gc)
_ -> annexWebOptions <$> Annex.getGitConfig
- checkallowedaddr = words . annexAllowedIPAddresses <$> Annex.getGitConfig >>= \case
+ mk' gc = case words (annexAllowedIPAddresses gc) of
["all"] -> do
curlopts <- map Param <$> getweboptions
allowedurlschemes <- annexAllowedUrlSchemes <$> Annex.getGitConfig
@@ -96,7 +98,7 @@ getUrlOptions mgc = Annex.getState Annex.urloptions >>= \case
U.DownloadWithCurlRestricted mempty
else U.DownloadWithCurl curlopts
manager <- liftIO $ U.newManager $
- avoidtimeout $ tlsManagerSettings
+ avoidtimeout managersettings
return (urldownloader, manager)
allowedaddrsports -> do
addrmatcher <- liftIO $
@@ -118,7 +120,7 @@ getUrlOptions mgc = Annex.getState Annex.urloptions >>= \case
then Nothing
else Just (connectionrestricted addr)
(settings, pr) <- liftIO $
- mkRestrictedManagerSettings r Nothing Nothing
+ mkRestrictedManagerSettings r Nothing tlssettings
case pr of
Nothing -> return ()
Just ProxyRestricted -> toplevelWarning True
@@ -130,6 +132,18 @@ getUrlOptions mgc = Annex.getState Annex.urloptions >>= \case
let urldownloader = U.DownloadWithConduit $
U.DownloadWithCurlRestricted r
return (urldownloader, manager)
+ where
+ -- When configured, allow TLS 1.2 without EMS.
+ -- In tls-2.0, the default was changed from
+ -- TLS.AllowEMS to TLS.RequireEMS.
+ tlssettings
+ | annexAllowInsecureHttps gc = Just $
+ NC.TLSSettingsSimple False False False
+ def { TLS.supportedExtendedMainSecret = TLS.AllowEMS }
+ | otherwise = Nothing
+ managersettings = case tlssettings of
+ Nothing -> tlsManagerSettings
+ Just v -> mkManagerSettings v Nothing
-- http-client defailts to timing out a request after 30 seconds
-- or so, but some web servers are slower and git-annex has its own
diff --git a/CHANGELOG b/CHANGELOG
index 6b44912669..bc1e4875a5 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -5,6 +5,9 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* p2phttp: Commit git-annex branch changes promptly.
* When used with git forges that allow Push to Create, the remote's
annex-uuid is re-probed after the initial push.
+ * Added annex.security.allow-insecure-https config, which allows
+ using old http servers that use TLS 1.2 without Extended Main
+ Secret support.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs
index c31dec617f..5772bebb0c 100644
--- a/Types/GitConfig.hs
+++ b/Types/GitConfig.hs
@@ -147,6 +147,7 @@ data GitConfig = GitConfig
, annexRetryDelay :: Maybe Seconds
, annexAllowedUrlSchemes :: S.Set Scheme
, annexAllowedIPAddresses :: String
+ , annexAllowInsecureHttps :: Bool
, annexAllowUnverifiedDownloads :: Bool
, annexAllowedComputePrograms :: Maybe String
, annexMaxExtensionLength :: Maybe Int
@@ -268,6 +269,8 @@ extractGitConfig configsource r = GitConfig
getmaybe (annexConfig "security.allowed-ip-addresses")
<|>
getmaybe (annexConfig "security.allowed-http-addresses") -- old name
+ , annexAllowInsecureHttps = (== Just "tls-1.2-no-EMS") $
+ getmaybe (annexConfig "security.allow-insecure-https")
, annexAllowUnverifiedDownloads = (== Just "ACKTHPPT") $
getmaybe (annexConfig "security.allow-unverified-downloads")
, annexAllowedComputePrograms =
diff --git a/debian/control b/debian/control
index 7484f04658..32f6e038e1 100644
--- a/debian/control
+++ b/debian/control
@@ -10,6 +10,7 @@ Build-Depends:
libghc-data-default-dev,
libghc-hslogger-dev,
libghc-crypton-dev,
+ libghc-crypton-connection-dev,
libghc-memory-dev,
libghc-deepseq-dev,
libghc-attoparsec-dev,
@@ -27,6 +28,7 @@ Build-Depends:
libghc-uuid-dev,
libghc-aeson-dev,
libghc-tagsoup-dev,
+ libghc-tls-dev,
libghc-unordered-containers-dev,
libghc-ifelse-dev,
libghc-bloomfilter-dev,
diff --git a/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret.mdwn b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret.mdwn
index 5723e07957..a04be28bbe 100644
--- a/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret.mdwn
+++ b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret.mdwn
@@ -99,3 +99,5 @@ ewen@basadi:~/Music/podcasts$
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Absolutely, I've been using git-annex as my podcatcher (among other reasons) for about a decade at this point. Thanks for developing it!
+
+> [[fixed|done]] --[[Joey]]
diff --git a/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_6_1d9a5eeb5c5f4894460dbdb326e1edec._comment b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_6_1d9a5eeb5c5f4894460dbdb326e1edec._comment
new file mode 100644
index 0000000000..8c0e9c4498
--- /dev/null
+++ b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_6_1d9a5eeb5c5f4894460dbdb326e1edec._comment
@@ -0,0 +1,22 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 6"""
+ date="2026-01-25T20:19:57Z"
+ content="""
+Finally ran into this myself, and I observed several podcast hosts still
+not supporting EMS even now.
+
+Implemented a config to solve this:
+
+ git config annex.security.allow-insecure-https tls-1.2-no-EMS
+
+I do caution against setting this globally for security reasons. At least not
+without understanding the security implications, which I can't say I do.
+
+Even setting it in a single repo could affect other
+connections by git-annx to eg, API endpoints used for storage.
+
+Personally, I am setting it only when importing feeds from those hosts:
+
+ git -c annex.security.allow-insecure-https=tls-1.2-no-EMS annex importfeed
+"""]]
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index df3f84d8ab..4d4484dc47 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -2246,6 +2246,15 @@ Remotes are configured using these settings in `.git/config`.
If set, this is treated the same as having
annex.security.allowed-ip-addresses set.
+* `annex.security.allow-insecure-https`
+
+ This can be used to loosen the security of the HTTPS implementation.
+
+ Set to "tls-1.2-no-EMS" to allow using TLS 1.2 without Extended Main
+ Secret support. You should do this only when needing to use git-annex
+ with a server that is insecure, and where the security of TLS is not
+ important to you.
+
* `annex.security.allow-unverified-downloads`
For security reasons, git-annex refuses to download content from
diff --git a/git-annex.cabal b/git-annex.cabal
index 5361fc0acd..d4dd436f80 100644
--- a/git-annex.cabal
+++ b/git-annex.cabal
@@ -274,6 +274,8 @@ Executable git-annex
git-lfs (>= 1.2.0),
clock (>= 0.3.0),
crypton,
+ crypton-connection,
+ tls,
(Diff truncated)
workaround
diff --git a/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_5_e1650df4a90fb14cd1b0332bbb2c4e36._comment b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_5_e1650df4a90fb14cd1b0332bbb2c4e36._comment new file mode 100644 index 0000000000..fb4e080823 --- /dev/null +++ b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_5_e1650df4a90fb14cd1b0332bbb2c4e36._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""workaround""" + date="2026-01-25T19:27:58Z" + content=""" +Workaround: Make git-annex use curl for url downloads. Eg: + + git config annex.security.allowed-ip-addresses all + git config annex.web-options --netrc + +Note this using curl has other security implications, including letting +git-annex download from IPs on the LAN. +"""]]
comment
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_5_aea052fe21134d421c184272372e0cd8._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_5_aea052fe21134d421c184272372e0cd8._comment new file mode 100644 index 0000000000..12ef212684 --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_5_aea052fe21134d421c184272372e0cd8._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-23T20:52:33Z" + content=""" +Started implementation in the `repair` branch. +"""]]
update
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment index 9cd701d3fb..235402bf24 100644 --- a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment @@ -6,9 +6,14 @@ If [[todo/drop_from_export_remote]] were implemented that would take care of #1. -Since `git-annex fsck` already tells the user what to do when it finds a -corrupted file on an export remote, and that works for ones not using -versioning, I think #1 can be left to that todo to solve, -and #2 be dealt with here. That will be enough to recover the problem -dataset. +The user can export a tree that removes the file themselves. fsck even +suggests doing that when it finds a corrupted file on an exporttree remote, +since it's unable to drop it in that case. + +But notice that the fsck run above does not suggest doing that. Granted, +with a S3 bucket with versioning, exporting a tree won't remove the +corrupted version of the file from the remote anyway. + +It seems that dealing with #2 here is enough to recover the problem +dataset, and #1 can be left to that other todo. """]]
comments
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment index b740f91970..ce7ceff9a9 100644 --- a/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment @@ -3,7 +3,24 @@ subject="""comment 2""" date="2025-12-17T18:30:06Z" content=""" -In a non-export S3 bucket with versioning, fsck also cannot recover from a -corrupted object, due to the same problem with the versionId. The same -method should work to handle this case. +The OpenNeuro dataset ds005256 is a S3 bucket with versioning=yes, and a +publicurl set, and exporttree=yes. With that combination, when S3 +credentials are not set, the versionId is used, in the public url for downloading. + + git clone https://github.com/OpenNeuroDatasets/ds005256.git + git-annex get stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 + +Note that this first does a download that fails incomplete with +"Verification of content failed". Then it complains "Unable to access these +remotes: s3-PUBLIC". It's trying two different download methods; the second +one can only work with S3 credentials set. + + git-annex fsck stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 + fsck stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 (fixing location log) + ** Based on the location log, stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 + ** was expected to be present, but its content is missing. + failed + +Note that this doesn't download, but fails at the checkPresent stage. At that +point, the HTTP HEAD reports the size of the object, and it's too short. """]] diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment index 1a38db539e..9cd701d3fb 100644 --- a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment @@ -8,5 +8,7 @@ of #1. Since `git-annex fsck` already tells the user what to do when it finds a corrupted file on an export remote, and that works for ones not using -versioning, I think #1 can be postponed and #2 be dealt with first. +versioning, I think #1 can be left to that todo to solve, +and #2 be dealt with here. That will be enough to recover the problem +dataset. """]] diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_4_9f2be5ff2d4225c880eb39831455b2a5._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_4_9f2be5ff2d4225c880eb39831455b2a5._comment new file mode 100644 index 0000000000..1218d28c06 --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_4_9f2be5ff2d4225c880eb39831455b2a5._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-23T17:21:51Z" + content=""" +After a *lot* of thought and struggling with layering issues between fsck and +the S3 remote, here is a design to solve #2: + +Add a new method `repairCorruptedKey :: Key -> Annex Bool` + +fsck calls this when it finds a remote does not have a key it expected it +to have, or when it downloads corrupted content. + +If `repairCorruptedKey` returns True, it was able to repair a problem, and +the Key should be able to be downloaded from the remote still. If it +returns False, it was not able to repair the problem. + +Most special remotes will make this `pure False`. For S3 with versioning=yes, +it will download the object from the bucket, using each recorded versionId. +Any versionId that does not work will be removed. And return True if any +download did succeed. + +In a case where the object size is right, but it's corrupt, +fsck will download the object, and then repairCorruptedKey will download it +a second time. If there were 2 files with the same content, it would end up +being downloaded 3 times! So this can be pretty expensive, +but it's simple and will work. +"""]]
comment
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment new file mode 100644 index 0000000000..1a38db539e --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-23T16:42:49Z" + content=""" +If [[todo/drop_from_export_remote]] were implemented that would take care +of #1. + +Since `git-annex fsck` already tells the user what to do when it finds a +corrupted file on an export remote, and that works for ones not using +versioning, I think #1 can be postponed and #2 be dealt with first. +"""]]
comment
diff --git a/doc/todo/drop_from_export_remote/comment_1_dac4f33da46b3695383df63e88fc4e67._comment b/doc/todo/drop_from_export_remote/comment_1_dac4f33da46b3695383df63e88fc4e67._comment new file mode 100644 index 0000000000..860bb39c4f --- /dev/null +++ b/doc/todo/drop_from_export_remote/comment_1_dac4f33da46b3695383df63e88fc4e67._comment @@ -0,0 +1,21 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-23T16:46:05Z" + content=""" +Rather than altering the exported git tree, it could removeExport and then +update the export log to say that the export is incomplete. + +That would result in a re-export putting the file back on the remote. + +It's not uncommon to eg want to `git-annex move foo --from remote`, +due to it being low on space, or to temporarily make it unavailable, +and later send the file back to the remote. Supporting drop from export +remotes in this way would allow for such a workflow, although with the +difference that `git-annex export` would be needed to put the file back. + +It might also be possible to make sending a particular file to an export +remote succeed when the export to the remote is incomplete and the file is +in the exported tree. Then `git-annex move foo --to remote` would work to +put the file back. +"""]]
Added a comment
diff --git a/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_4_16b6ef07800c1bff2e01f258b031d9b9._comment b/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_4_16b6ef07800c1bff2e01f258b031d9b9._comment new file mode 100644 index 0000000000..befcbe4105 --- /dev/null +++ b/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_4_16b6ef07800c1bff2e01f258b031d9b9._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 4" + date="2026-01-23T14:39:24Z" + content=""" +> A balance might be that if it fails to connect to the remote.name.annexUrl, it could re-check it then. + +Would this include re-checking when remote.name.annexUrl is unset? That would be necessary in the situations where either the client didn't understand p2phttp when the repository was closed or when the server-side didn't provide p2phttp yet. + +Given that the clone happened in the knowledge that \"dumb http\" was the only supported http protocol and read only, I am now questioning if such a automatic upgrade to p2phttp would really be needed, or even desirable. Dumb http continues to work anyway. + +Only re-checking if remote.name.annexUrl is set already would solve the issue of relocating the p2phttp endpoint. +"""]]
diff --git a/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn b/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn index 7ebff66146..8276b76ba6 100644 --- a/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn +++ b/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn @@ -140,3 +140,5 @@ I know that this is sort of abusing the URL handling in git-annex, but it was su ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) Yes! It is absolutely great, thank you for it. + +[[!tag projects/ICE4]]