Recent changes to this wiki:
assistant does not add or commit
diff --git a/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked...mdwn b/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked...mdwn new file mode 100644 index 0000000000..ae5968d8c7 --- /dev/null +++ b/doc/bugs/assistant__58___nothing_added_to_commit_but_untracked...mdwn @@ -0,0 +1,45 @@ +### Please describe the problem. + +I think I have done everything nice and clean to sync things up etc... but now assistant just does not care to add/commit new files. + +excerpt from [full daemon.log](https://www.oneukrainian.com/tmp/daemon.log.20260131.log): + +``` + fd:31: hPutBuf: resource vanished (Broken pipe) + + fd:31: hPutBuf: resource vanished (Broken pipe) +(recording state in git...) +On branch master +Your branch is up to date with 'typhon/master'. + +Untracked files: + (use "git add <file>..." to include in what will be committed) + Videos/2026/01/2026.01.31-13.19.02.455--2026.01.31-13.19.18.475.mkv + Videos/2026/01/2026.01.31-13.19.02.455--2026.01.31-13.19.18.475.mkv.duct_info.json + Videos/2026/01/2026.01.31-13.19.02.455--2026.01.31-13.19.18.475.mkv.duct_usage.json + Videos/2026/01/2026.01.31-13.19.02.455--2026.01.31-13.19.18.475.mkv.log + events-micropython/2026-01-31T12:20:42-05:00.csv + logs/2026-01-14T09:33-05:00.log + logs/timesync-stimuli/2026.01.31-13.19.05.339--.log + +nothing added to commit but untracked files present (use "git add" to track) +Everything up-to-date +Everything up-to-date + +``` + +### What version of git-annex are you using? On what operating system? + +well -- ideally that daemon.log should inform us that and potentially other details to help troubleshoot it. I think (since there could be multiple): + +``` +reprostim@reproiner:/data/reprostim$ ps auxw | grep assist +reprost+ 1989627 0.0 0.0 9892 3896 ? Ss 13:12 0:00 /usr/bin/git annex assistant --foreground +reprost+ 1989628 2.8 0.8 1074363924 270904 ? Ssl 13:12 0:19 /usr/lib/git-annex.linux/exe/git-annex --library-path /usr/lib/git-annex.linux//lib/x86_64-linux-gnu: /usr/lib/git-annex.linux/shimmed/git-annex/git-annex assistant --foreground +reprostim@reproiner:/data/reprostim$ /usr/lib/git-annex.linux/git-annex version | head +git-annex version: 10.20251114-1~ndall+1 +``` + + +[[!meta author=yoh]] +[[!tag projects/repronim]]
improve -J docs, suggesting higher value
diff --git a/doc/git-annex-p2phttp.mdwn b/doc/git-annex-p2phttp.mdwn index 1ee0e9a747..acef619098 100644 --- a/doc/git-annex-p2phttp.mdwn +++ b/doc/git-annex-p2phttp.mdwn @@ -61,9 +61,17 @@ convenient way to download the content of any key, by using the path This or annex.jobs must be set to configure the number of worker threads, per repository served, that serve connections to the webserver. - This must be set to 2 or more. + This must be set to 2 or more, since the webserver needs one thread + for itself. - A good choice is often one worker per CPU core: `--jobs=cpus` + Each additional job lets the webserver serve one more concurrent request. + When there are too many requests to serve all at once, the webserver will + delay responding to some requests until others have completed. + + A conservative starting place would be one worker per CPU core: `--jobs=cpus` + + However, to avoid delays, this can be set to much higher values. + Avoid setting it so high that the server runs out of file descriptors. * `--proxyconnections=N`
comment
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_15_cf355648608d70edf5ac871299124546._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_15_cf355648608d70edf5ac871299124546._comment new file mode 100644 index 0000000000..948d95f2a0 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_15_cf355648608d70edf5ac871299124546._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 15""" + date="2026-01-30T16:30:15Z" + content=""" +> Maybe I am overthinking this a bit though, and I should just bump the number up by one or more factors of 2 and see what happens. + +I think that would make sense, or even by 1 or 2 orders of magnitude. +"""]]
response
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_14_0fb6fbd8bada65ee3a0152e6d96265c0._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_14_0fb6fbd8bada65ee3a0152e6d96265c0._comment new file mode 100644 index 0000000000..f791b09c8f --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_14_0fb6fbd8bada65ee3a0152e6d96265c0._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 14""" + date="2026-01-30T16:20:39Z" + content=""" +Re the number of threads, -J will affect the number of green threads used. +(Which will be some constant-ish multiple of the -J value.) +Green threads won't show up in htop, only OS-native theads will. + +The maximum number of OS-native threads should be capped at the number of +cores. + +Exactly how many OS-native threads spawn is under the control of the +haskell runtime, and it probably spawns an additional OS-native thread +per green thread up to the limit. + +(It would be possible to limit the maximum number of OS-native threads to +less than the number of cores, if that would somehow be useful. It would +need a new config setting.) +"""]]
Added a comment
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_13_662dbd3d57681a13f5de2d067be6603f._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_13_662dbd3d57681a13f5de2d067be6603f._comment new file mode 100644 index 0000000000..5033fab7f7 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_13_662dbd3d57681a13f5de2d067be6603f._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 13" + date="2026-01-29T19:25:14Z" + content=""" +I might have some misunderstandings about what the -J flag does exactly... So far I assumed that it just sets the number of OS threads that are used as a worker pool to handle requests. In Forgejo-aneksajo it is set to -J2 because of that assumption and there being one p2phttp process per repository (if p2phttp has recently been used with the repository, it is started on demand and stopped after a while of non-usage), so larger values could multiply pretty fast. Your description sounds like it should actually just be a limit on the number of requests that can be handled concurrently, independent of the size of the worker pool. What I am observing when I increase it is that htop shows two new threads when I increment the value by one though. + +Could there be a fixed-size (small) worker pool, and a higher number of concurrent requests allowed? I agree that limiting the total resource usage makes a lot of sense, but does it have to be tied to the thread count? + +Maybe I am overthinking this a bit though, and I should just bump the number up by one or more factors of 2 and see what happens. +"""]]
response
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_11_7f0ecbac3c5186b03565ae7b6f36a46a._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_11_7f0ecbac3c5186b03565ae7b6f36a46a._comment new file mode 100644 index 0000000000..91be7b4dd2 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_11_7f0ecbac3c5186b03565ae7b6f36a46a._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2026-01-29T17:33:30Z" + content=""" +The `kill -SIGINT` was my mistake; I ran the script using dash and it was +its builtin kill that does not accept that. + +So, your test case was supposed to interrupt it after all. Tested it again +with interruption and my fix does seem to have fixed it as best I can tell. +"""]] diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_12_fa5ddd60cb5ec292594547aa4f45c13b._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_12_fa5ddd60cb5ec292594547aa4f45c13b._comment new file mode 100644 index 0000000000..5b3de7cc43 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_12_fa5ddd60cb5ec292594547aa4f45c13b._comment @@ -0,0 +1,40 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 12""" + date="2026-01-29T17:36:30Z" + content=""" +Re serving more requests than workers, the point of limiting the number +workers is that each worker can take a certian amount of resources. The +resource may only be a file descriptor and a bit of cpu and memory +usually; with proxying it could also include making outgoing connections, +running gpg, etc. The worker limit is about being able to control the +total amount of resources used. + +--- + +It would be possible to have an option where p2phttp does not limit the +number of workers at all, and the slowloris attack prevention could be left +enabled in that mode. Of course then enough clients could overwhelm the +server, but maybe that's better for some use cases. + +IIRC forgejo-aneksajo runs one p2phttp per repository and proxies requests +to them. If so, you need a lower worker limit per p2phttp. I suppose it +would be possible to make the proxy enforce its own limits to the number of +concurrent p2phttp requests, and then it might make sense to not have +p2phttp limit the number of workers. + +--- + +p2phttp (or a proxy in front if it) could send a 503 response if it is +unable to get a worker. That would avoid this slowlaris attack prevention +problem. It would leave it up to the git-annex client to retry. Which +depends on the `annex.retry` setting currently. It might make sense to have +some automatic retrying on 503 in the p2phttp client. + +One benefit of the way it works now is a `git-annex get -J10` will +automatically use as many workers as the p2phttp server has available, and +if 2 people are both running that, it naturally balances out fairly evenly +between them, and keeps the server as busy as it wants to be in an +efficient way. Client side retry would not work as nicely, there would need +to be retry delays, and it would have to time out at some point. +"""]]
Added a comment
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_10_8d3c5f29dd7caba70ea8594393aa7cfc._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_10_8d3c5f29dd7caba70ea8594393aa7cfc._comment new file mode 100644 index 0000000000..9b1c7edd66 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_10_8d3c5f29dd7caba70ea8594393aa7cfc._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 10" + date="2026-01-29T09:04:03Z" + content=""" +> kill -s SIGINT is not valid syntax (at least not with procps's kill), so kill fails to do anything and a bunch of git-annex processes stack up all trying to get the same files. Probably you meant kill -s INT + +That's weird, I checked and both the shell built-in kill that I was using as well as the kill from procps-ng (Ubuntu's build: procps/noble-updates,now 2:4.0.4-4ubuntu3.2 amd64) installed on my laptop accept `-s SIGINT`. + +Anyway, thank you for investigating! I agree being susceptible to DoS attacks is not great, but better than accidentally DoS'ing ourselves in normal usage... + +I wonder, would it be architecturally possible to serve multiple requests concurrently with less workers than requests? E.g. do some async/multitasking magic between requests? If that was the case then I suspect this issue wouldn't come up, because all requests would progress steadily instead of waiting for a potentially long time. +"""]]
p2phttp: Fix a server stall by disabling warp's slowloris attack prevention
Not great, but better than the alternative.
Hopeing this is temporary and the warp bug will be fixed and I can deal
with the problem better then.
Not great, but better than the alternative.
Hopeing this is temporary and the warp bug will be fixed and I can deal
with the problem better then.
diff --git a/CHANGELOG b/CHANGELOG index 5537b57b20..084b670574 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -13,6 +13,8 @@ git-annex (10.20260116) UNRELEASED; urgency=medium * fromkey, registerurl: When passed an url, generate a VURL key. * unregisterurl: Unregister both VURL and URL keys. * unregisterurl: Fix display of action to not be "registerurl". + * p2phttp: Fix a server stall by disabling warp's slowloris attack + prevention. -- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400 diff --git a/Command/P2PHttp.hs b/Command/P2PHttp.hs index ae3fdbcd75..0475442891 100644 --- a/Command/P2PHttp.hs +++ b/Command/P2PHttp.hs @@ -184,12 +184,26 @@ startIO o <> serverShutdownCleanup oldst } +-- Disable Warp's slowloris attack prevention. Since the web server +-- only allows serving -J jobs at a time, and blocks when an additional +-- request is received, that can result in there being no network traffic +-- for a period of time, which triggers the slowloris attack prevention. +-- +-- The implementation of the P2P http server is not exception safe enough +-- to deal with Response handlers being killed at any point by warp. +-- +-- It would be better to use setTimeout, so that slowloris attacks in +-- making the request are prevented. But, it does not work! See +-- https://github.com/yesodweb/wai/issues/1058 +disableSlowlorisPrevention :: Warp.Settings -> Warp.Settings +disableSlowlorisPrevention = Warp.setTimeout maxBound + runServer :: Options -> P2PHttpServerState -> IO () runServer o mst = go `finally` serverShutdownCleanup mst where go = do let settings = Warp.setPort port $ Warp.setHost host $ - Warp.defaultSettings + disableSlowlorisPrevention $ Warp.defaultSettings mstv <- newTMVarIO mst let app = p2pHttpApp mstv case (certFileOption o, privateKeyFileOption o) of diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn index fa78722f50..3c1215392e 100644 --- a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn @@ -175,3 +175,5 @@ get test10.bin ^C Starting with a DataLad Dataset and by extension git-annex repository is the first thing I do whenever I have to deal with code and/or data that is not some throwaway stuff :) [[!tag projects/ICE4]] + +> [[done]] --[[Joey]] diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_8_490aae81d90a6692152d365b1bae95a1._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_8_490aae81d90a6692152d365b1bae95a1._comment new file mode 100644 index 0000000000..e173b6e18d --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_8_490aae81d90a6692152d365b1bae95a1._comment @@ -0,0 +1,31 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2026-01-28T21:10:00Z" + content=""" +Seems likely that getP2PConnection is run by serveGet, and the worker slot +is allocated. Then a ThreadKilled exception arrives before the rest of +serveGet's threads are started up. So the worker slot never gets freed. +It's even possible that getP2PConnection is itself not cancellation safe. + +So, I made all of serveGet be inside an uninterruptibleMask. That did seem to +make the test case get past more slowloris cancellations than before. But, +it still eventually hung. + +Given the inversion of control that servant and streaming response body +entails, it seems likely that an ThreadKilled exception could arrive at a +point entirely outside the control of git-annex, leaving the P2P connection +open with no way to close it. + +I really dislike that this slowloris attack prevention is making me need +to worry about the server threads getting cancelled at any point. That +requires significantly more robust code, if it's even possible. + +So, I think disabling the slowloris attack prevention may be the way to go, +at least until warp is fixed to allow only disabling it after the Request +is received. + +Doing so will make p2phttp more vulnerable to DDOS, but as it stands, it's +vulnerable to locking up due to entirely legitimate users just running +a few `git-annex get`s. Which is much worse! +"""]] diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_9_6da51da16a637cd889662aa2dea0953d._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_9_6da51da16a637cd889662aa2dea0953d._comment new file mode 100644 index 0000000000..6f9fa95533 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_9_6da51da16a637cd889662aa2dea0953d._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2026-01-28T21:38:46Z" + content=""" +Disabled the slowloris protection. :-/ + +I also checked with the original test case, fixed to call `kill -s INT`, +and it also passed. I'm assuming this was never a bug about interruption.. +"""]]
oops
diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs index ebcf3140c4..012b36350d 100644 --- a/P2P/Http/Server.hs +++ b/P2P/Http/Server.hs @@ -202,8 +202,8 @@ serveGet mst su apiver (B64Key k) cu bypass baf startat sec auth = do waitfinal endv finalv conn annexworker = do -- Wait for everything to be transferred before -- stopping the annexworker. The finalv will usually -- -- be written to at the end. If the client disconnects -- -- early that does not happen, so catch STM exceptions. + -- be written to at the end. If the client disconnects + -- early that does not happen, so catch STM exceptions. alltransferred <- either (const False) id <$> liftIO (tryNonAsync $ atomically $ takeTMVar finalv) diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_7_cb083c0fbba4822b0894393b4d8aec05._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_7_cb083c0fbba4822b0894393b4d8aec05._comment new file mode 100644 index 0000000000..44f3f70134 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_7_cb083c0fbba4822b0894393b4d8aec05._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2026-01-28T20:43:38Z" + content=""" +[[!commit 786360cdcf7f784847715ec79ef9837ada9fa649]] catches an exception +that the slowloris attack prevention causes. It does prevent the server +locking up... but only sometimes. So the test case +gets further, but eventually still locks up. + +Since slowloris attack prevention can cancel the thread at any point, it +seems likely that there is some other point there a resource is left +un-freed. +"""]]
p2phttp: close P2P connection when streamer is canceled
Slowloris attack prevention in warp can cancel the streamer.
In that case, waitfinal never gets called. So make an exception handler
that sets finalv. This lets the P2P connection get shut down properly,
releasing the annex worker back to the pool.
Unfortunatly, this does not solve the whole problem. It does prevent a
p2phttp with -J from locking up after the second time slowloris
protection triggers. But, later it does still lock up.
There must be some other resource that is leaking when slowloris attack
prevention triggers.
Note that the STM exception catching when reading finalv may not be
needed any longer? I'm not sure and it seemed like perhaps it took
longer to hang when I left it in.
Slowloris attack prevention in warp can cancel the streamer.
In that case, waitfinal never gets called. So make an exception handler
that sets finalv. This lets the P2P connection get shut down properly,
releasing the annex worker back to the pool.
Unfortunatly, this does not solve the whole problem. It does prevent a
p2phttp with -J from locking up after the second time slowloris
protection triggers. But, later it does still lock up.
There must be some other resource that is leaking when slowloris attack
prevention triggers.
Note that the STM exception catching when reading finalv may not be
needed any longer? I'm not sure and it seemed like perhaps it took
longer to hang when I left it in.
diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs index b7f773301a..ebcf3140c4 100644 --- a/P2P/Http/Server.hs +++ b/P2P/Http/Server.hs @@ -2,7 +2,7 @@ - - https://git-annex.branchable.com/design/p2p_protocol_over_http/ - - - Copyright 2024 Joey Hess <id@joeyh.name> + - Copyright 2024-2026 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -145,8 +145,9 @@ serveGet mst su apiver (B64Key k) cu bypass baf startat sec auth = do (Len len, bs) <- liftIO $ atomically $ takeTMVar bsv bv <- liftIO $ newMVar (filter (not . B.null) (L.toChunks bs)) szv <- liftIO $ newMVar 0 - let streamer = S.SourceT $ \s -> s =<< return - (stream (bv, szv, len, endv, validityv, finalv)) + let streamer = S.SourceT $ do + \s -> s (stream (bv, szv, len, endv, validityv, finalv)) + `onException` streamexception finalv return $ addHeader (DataLength len) streamer where stream (bv, szv, len, endv, validityv, finalv) = @@ -189,7 +190,7 @@ serveGet mst su apiver (B64Key k) cu bypass baf startat sec auth = do atomically $ putTMVar endv () validity <- atomically $ takeTMVar validityv sz <- takeMVar szv - atomically $ putTMVar finalv () + atomically $ putTMVar finalv True void $ atomically $ tryPutTMVar endv () return $ case validity of Nothing -> True @@ -197,14 +198,15 @@ serveGet mst su apiver (B64Key k) cu bypass baf startat sec auth = do Just Invalid -> sz /= len , pure True ) - + waitfinal endv finalv conn annexworker = do -- Wait for everything to be transferred before -- stopping the annexworker. The finalv will usually - -- be written to at the end. If the client disconnects - -- early that does not happen, so catch STM exception. - alltransferred <- isRight - <$> tryNonAsync (liftIO $ atomically $ takeTMVar finalv) +- -- be written to at the end. If the client disconnects +- -- early that does not happen, so catch STM exceptions. + alltransferred <- + either (const False) id + <$> liftIO (tryNonAsync $ atomically $ takeTMVar finalv) -- Make sure the annexworker is not left blocked on endv -- if the client disconnected early. void $ liftIO $ atomically $ tryPutTMVar endv () @@ -213,6 +215,11 @@ serveGet mst su apiver (B64Key k) cu bypass baf startat sec auth = do else closeP2PConnection conn void $ tryNonAsync $ wait annexworker + -- Slowloris attack prevention can cancel the streamer. Be sure to + -- close the P2P connection when that happens. + streamexception finalv = + liftIO $ atomically $ putTMVar finalv False + sizer = pure $ Len $ case startat of Just (Offset o) -> fromIntegral o Nothing -> 0 diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_6_bdb77afb1c349e52d94b54427bce18c4._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_6_bdb77afb1c349e52d94b54427bce18c4._comment new file mode 100644 index 0000000000..07e7bbbd5e --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_6_bdb77afb1c349e52d94b54427bce18c4._comment @@ -0,0 +1,47 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2026-01-28T19:22:19Z" + content=""" +Developed the below patch to use pauseTimeout after the Request is +consumed. + +Unfortunatelty, I then discovered that [pauseTimeout does not work](https://github.com/yesodweb/wai/issues/1058)! + +This leaves only the options of waiting for a fixed version of warp, +or disabling slowloris prevention entirely, or somehow dealing with +the way that the Response handler gets killed by the timeout. + + diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs + index b7f773301a..8c8ae96c06 100644 + --- a/P2P/Http/Server.hs + +++ b/P2P/Http/Server.hs + @@ -40,14 +40,27 @@ import qualified Servant.Types.SourceT as S + import qualified Data.ByteString as B + import qualified Data.ByteString.Lazy as L + import qualified Data.ByteString.Lazy.Internal as LI + +import qualified Network.Wai.Handler.Warp as Warp + import Control.Concurrent.Async + import Control.Concurrent.STM + import Control.Concurrent + import System.IO.Unsafe + import Data.Either + + +-- WAI middleware that disables warp's usual Slowloris protection after the + +-- Request is received. This is needed for the p2phttp server because + +-- after a client connects and makes its Request, and when the Request + +-- includes valid authentication, the server waits for a worker to become + +-- available to handle it. During that time, no traffic is being sent, + +-- which would usually trigger the Slowloris protection. + +avoidResponseTimeout :: Application -> Application + +avoidResponseTimeout app req resp = do + + liftIO $ Warp.pauseTimeout req + + app req resp + + + p2pHttpApp :: TMVar P2PHttpServerState -> Application + -p2pHttpApp = serve p2pHttpAPI . serveP2pHttp + +p2pHttpApp st = avoidResponseTimeout $ serve p2pHttpAPI $ serveP2pHttp st + + serveP2pHttp :: TMVar P2PHttpServerState -> Server P2PHttpAPI + serveP2pHttp st +"""]]
comments
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_4_57b418889de2942d1f2d998cca68218a._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_4_57b418889de2942d1f2d998cca68218a._comment new file mode 100644 index 0000000000..08e5365f7e --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_4_57b418889de2942d1f2d998cca68218a._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-28T17:53:26Z" + content=""" +Warp's Slowloris attack prevention seems to be causing this problem. +I was able to get the test case to not hang by applying +`Warp.setTimeout 1000000000` to the warp settings. + +I guess that, when Warp detects what it thinks is a slowloris attack, +it kills the handling thread in some unusual way. Which prevents the usual +STM exception from being thrown? + +This also explains the InvalidChunkHeaders exception, because the http +server has hung up on the client before sending the expected headers. + +`git-annex get` is triggering the slowloris attack detection because +it connects to the p2phttp server, sends a request, and then is stuck +waiting some long period of time for a worker slot to become available. + +Warp detects a slowloris attack by examining how much network traffic is +flowing. And in this case, no traffic is flowing. + +So the reason this test case triggers the problem is because it's using 1 +GB files! With smaller files, the transfers happen too fast to trigger the +default 30 second timeout. +"""]] diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_5_033e449b3fa974fe67c7ce9603baec92._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_5_033e449b3fa974fe67c7ce9603baec92._comment new file mode 100644 index 0000000000..dd7c7940a9 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_5_033e449b3fa974fe67c7ce9603baec92._comment @@ -0,0 +1,29 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-28T18:26:41Z" + content=""" +So, can the slowloris attack prevention just be disabled in p2phttp, +without exposing it to problems due to that attack? + +Well, the slowloris attack is a DDOS that tries to open as many http +connections to the server as possible, and keep them open with as little +bandwidth used as possible. It does so by sending partial request headers +slowly, so the server is stuck waiting to see the full request. + +Given that the p2phttp server is serving large objects, and probably runs +with a moderately low -J value (probably < 100), just opening that many +connections to the server each requesting an object, and consuming a chunk +of the response once per 30 seconds would be enough to work around Warp's +protections against the slowloris attack. Which needs little enough +bandwidth to be a viable attack. + +The client would need authentication to do that though. A slowloris attack +though just sends requests, it does not need to successfully authenticate. + +So it would be better to disable the slowloris attack prevention only after +the request has been authenticated. + +warp provides `pauseTimeout` that can do that, but I'm not sure how +to use it from inside a servant application. +"""]]
analysis
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_2_3caf6da7a34bd6580e365fe01d990f42._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_2_3caf6da7a34bd6580e365fe01d990f42._comment
new file mode 100644
index 0000000000..598c0c7df7
--- /dev/null
+++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_2_3caf6da7a34bd6580e365fe01d990f42._comment
@@ -0,0 +1,55 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2026-01-28T16:15:03Z"
+ content="""
+`kill -s SIGINT` is not valid syntax (at least not with procps's `kill`), so
+`kill` fails to do anything and a bunch of git-annex processes stack up all
+trying to get the same files. Probably you meant `kill -s INT`
+
+With that said, the busted test case does work in exposing a problem, since
+the git-annex get processes hang.
+
+This behaves the same:
+
+ for x in $(seq 1 5); do git annex get & done
+
+That's without anything interrupting `git-annex get` at any point.
+
+This error is displayed by some of the git-annex get processes,
+and once this has happened as many times as the number of jobs,
+the server is hung:
+
+ HttpExceptionRequest Request {
+ host = "localhost"
+ port = 3001
+ secure = False
+ requestHeaders = [("Accept","application/octet-stream")]
+ path = "/git-annex/8dd1a380-3785-4285-b93d-994e1ccb9fbf/v4/key/SHA256E-s1073741824--52fc7ce3067ad69f3989f7fef817670096f00eab7721884fe606d17b9215d6f5.bin"
+ queryString = "?clientuuid=2ab2859b-d423-4427-bac2-553e18c02197&associatedfile=test1.bin"
+ method = "GET"
+ proxy = Nothing
+ rawBody = False
+ redirectCount = 10
+ responseTimeout = ResponseTimeoutDefault
+ requestVersion = HTTP/1.1
+ proxySecureMode = ProxySecureWithConnect
+ }
+ InvalidChunkHeaders
+
+So this seems very similar to the bug that
+[[!commit f2fed42a090e081bf880dcacc9a25bfa8a0f7d8f]] was supposed to fix.
+Same InvalidChunkHeaders exception indicating the http server response
+thread probably crashed.
+
+And I've verified that when this happens, serveGet's waitfinal starts
+and never finishes, which is why the job slot remains in use.
+
+BTW, InvalidChunkHeaders is a http-client exception, so it seems this
+might involve a problem at the http layer, so with http-client or warp?
+Looking in http-client, it is thrown in 3 situations. 2 are when a read
+from the server yields 0 bytes. The 3rd is when a line is read from the server,
+and the result cannot be parsed as hexidecimal.
+So it seems likely that the http server is crashing in the middle of servicing
+a request. Possibly due to a bug in the http stack.
+"""]]
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_3_1891008a9e901b0801376a870eee8be4._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_3_1891008a9e901b0801376a870eee8be4._comment
new file mode 100644
index 0000000000..d8c8aa4ca3
--- /dev/null
+++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_3_1891008a9e901b0801376a870eee8be4._comment
@@ -0,0 +1,18 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 3"""
+ date="2026-01-28T17:01:37Z"
+ content="""
+The hang happens here:
+
+ -- Wait for everything to be transferred before
+ -- stopping the annexworker. The finalv will usually
+ -- be written to at the end. If the client disconnects
+ -- early that does not happen, so catch STM exception.
+ alltransferred <- isRight
+ <$> tryNonAsync (liftIO $ atomically $ takeTMVar finalv)
+
+I think what is happening is finalv is never getting filled, but for whatever
+reason, STM also is not detecting a deadlock, so this does not fail with an
+exception and waits forever.
+"""]]
add hint about how to get specific files in a hide-missing branch
diff --git a/doc/git-annex-adjust.mdwn b/doc/git-annex-adjust.mdwn index 55f7f646e7..bd2964779d 100644 --- a/doc/git-annex-adjust.mdwn +++ b/doc/git-annex-adjust.mdwn @@ -92,11 +92,15 @@ and will also propagate commits back to the original branch. set the `annex.adjustedbranchrefresh` config. Despite missing files being hidden, `git annex sync --content` will - still operate on them, and can be used to download missing + still operate on them, and can be used to retrieve missing files from remotes. It also updates the adjusted branch after transferring content. - This option can be combined with --unlock, --lock, or --fix. + To retrieve specific missing files, use eg: + `git-annex get --branch=master --include=foo` + + The `--hide-missing` option can be combined with + `--unlock`, `--lock`, or `--fix`. * `--unlock-present`
comment
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_1_cc08641b5ed4562ca29432907cecf5bf._comment b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_1_cc08641b5ed4562ca29432907cecf5bf._comment new file mode 100644 index 0000000000..701aaff753 --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients/comment_1_cc08641b5ed4562ca29432907cecf5bf._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-27T17:40:15Z" + content=""" +This is extremely similar to the bug that +[[!commit f2fed42a090e081bf880dcacc9a25bfa8a0f7d8f]] was supposed to fix. +But that had a small amount of gets hang, without any interruptions being +needed to cause it. So I think is different. + +There was also the similar +[[!commit 1c67f2310a7ca3e4fce183794f0cff2f4f5d1efb]] where an interrupted +drop caused later hangs. +"""]]
fromkey, registerurl: When passed an url, generate a VURL key
diff --git a/CHANGELOG b/CHANGELOG
index 8cea03728a..5537b57b20 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -10,6 +10,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
Secret support.
* fsck: Support repairing a corrupted file in a versiond S3 remote.
* addurl, importfeed: Enable --verifiable by default.
+ * fromkey, registerurl: When passed an url, generate a VURL key.
* unregisterurl: Unregister both VURL and URL keys.
* unregisterurl: Fix display of action to not be "registerurl".
diff --git a/Command/FromKey.hs b/Command/FromKey.hs
index 6649b4110e..5acfd531ed 100644
--- a/Command/FromKey.hs
+++ b/Command/FromKey.hs
@@ -93,7 +93,7 @@ keyOpt = either giveup id . keyOpt'
keyOpt' :: String -> Either String Key
keyOpt' s = case parseURIPortable s of
Just u | not (isKeyPrefix (uriScheme u)) ->
- Right $ Backend.URL.fromUrl s Nothing False
+ Right $ Backend.URL.fromUrl s Nothing True
_ -> case deserializeKey s of
Just k -> Right k
Nothing -> Left $ "bad key/url " ++ s
diff --git a/doc/git-annex-registerurl.mdwn b/doc/git-annex-registerurl.mdwn
index bf5133b8db..bddfa52f80 100644
--- a/doc/git-annex-registerurl.mdwn
+++ b/doc/git-annex-registerurl.mdwn
@@ -15,7 +15,7 @@ No verification is performed of the url's contents.
Normally the key is a git-annex formatted key. However, to make it easier
to use this to add urls, if the key cannot be parsed as a key, and is a
-valid url, an URL key is constructed from the url.
+valid url, a VURL key is constructed from the url.
Registering an url also makes git-annex treat the key as present in the
special remote that claims it. (Usually the web special remote.)
diff --git a/doc/todo/migration_to_VURL_by_default.mdwn b/doc/todo/migration_to_VURL_by_default.mdwn
index 4d658b9c89..6450f5ad7b 100644
--- a/doc/todo/migration_to_VURL_by_default.mdwn
+++ b/doc/todo/migration_to_VURL_by_default.mdwn
@@ -41,6 +41,10 @@ configuration of which kind of keys addurl uses, once VURL is the default.
> > VURL keys. (Registering one without an equivilant key would make no hash
> > verification be done, so no better than an URL key.)
> >
+> > > Wait, if it generates a VURL key with no size, wouldn't it be the
+> > > same as `git-annex addurl --verifiable --relaxed`? Which is fine;
+> > > it's currently the same as `git-annex addurl --relaxed`.
+> >
> > But, I don't think that registerurl/unregisterurl continuing to
> > generate URL keys is a big problem, it should not block making VURL
> > the default in places where it can be default. --[[Joey]]
@@ -49,16 +53,10 @@ configuration of which kind of keys addurl uses, once VURL is the default.
Made --verifiable be the default for addurl and importfeed.
-I want to think more about registerurl and unregisterurl (and fromkey's)
-generation of URL keys though.
-
-unregisterurl could generate from an url both an URL and a VURL and
-unregister both, or whichever is registered. That seems to make sense,
-because which ever might have been registered before, unregisterurl is used
-when the content can no longer be downloaded from the web (or other special
-remote that claims an url).
+Made generate from an url both an URL and a VURL and
+unregister both, or whichever is registered.
-> Implemented this..
+Made registerurl (and fromkey) generate a VURL key that behaves the same
+as addurl --relaxed.
-Could registerurl (and fromkey) generate a VURL key that behaves the same
-as addurl --relaxed? --[[Joey]]
+So all [[done]]! --[[Joey]]
fix docs to match unregisterurl behavior
As implemented, when it's passed an URL key, it also unregisters the
VURL key, and vice-versa.
I think this behavior is ok, since the idea is that the url is no longer
available. And so unregistering both URL and VURL key is ok, since
neither should work to get from it any longer.
As implemented, when it's passed an URL key, it also unregisters the
VURL key, and vice-versa.
I think this behavior is ok, since the idea is that the url is no longer
available. And so unregistering both URL and VURL key is ok, since
neither should work to get from it any longer.
diff --git a/CHANGELOG b/CHANGELOG
index 365aac74eb..8cea03728a 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -10,8 +10,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
Secret support.
* fsck: Support repairing a corrupted file in a versiond S3 remote.
* addurl, importfeed: Enable --verifiable by default.
- * unregisterurl: Unregister both VURL and URL keys when passed an url
- instead of a key.
+ * unregisterurl: Unregister both VURL and URL keys.
* unregisterurl: Fix display of action to not be "registerurl".
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/doc/git-annex-unregisterurl.mdwn b/doc/git-annex-unregisterurl.mdwn
index 17964145fa..1a93bac082 100644
--- a/doc/git-annex-unregisterurl.mdwn
+++ b/doc/git-annex-unregisterurl.mdwn
@@ -12,8 +12,7 @@ This plumbing-level command can be used to unregister urls when keys can
no longer be downloaded from them.
Normally the key is a git-annex formatted key. However, when the key cannot
-be parsed as a key, and is a valid url, an URL key and a VURL key are both
-constructed from the url, and both keys are unregistered.
+be parsed as a key, and is a valid url, a key is generated from the url.
Unregistering a key's last web url will make git-annex no longer treat content
as being present in the web special remote. If some other special remote
unregisterurl: Unregister both VURL and URL keys when passed an url instead of a key
The idea with doing both is that unregisterurl is used with an url when
the content of the url is no longer present. So unregistering both makes
sense. And, as git-annex transitions from using URL to VURL by default,
there can be both in repos, and so unregistering both avoids breaking
workflows that used to register URL keys, but are now registering VURL
keys.
The idea with doing both is that unregisterurl is used with an url when
the content of the url is no longer present. So unregistering both makes
sense. And, as git-annex transitions from using URL to VURL by default,
there can be both in repos, and so unregistering both avoids breaking
workflows that used to register URL keys, but are now registering VURL
keys.
diff --git a/Backend/URL.hs b/Backend/URL.hs
index d68b2196e3..a23409fd69 100644
--- a/Backend/URL.hs
+++ b/Backend/URL.hs
@@ -1,13 +1,14 @@
{- git-annex URL backend -- keys whose content is available from urls.
-
- - Copyright 2011-2024 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
module Backend.URL (
backends,
- fromUrl
+ fromUrl,
+ otherUrlKey,
) where
import Annex.Common
@@ -41,3 +42,12 @@ fromUrl url size verifiable = mkKey $ \k -> k
, keyVariety = if verifiable then VURLKey else URLKey
, keySize = size
}
+
+{- From an URL key to a VURL key and vice-versa. -}
+otherUrlKey :: Key -> Maybe Key
+otherUrlKey k
+ | fromKey keyVariety k == URLKey = Just $
+ alterKey k $ \kd -> kd { keyVariety = VURLKey }
+ | fromKey keyVariety k == VURLKey = Just $
+ alterKey k $ \kd -> kd { keyVariety = URLKey }
+ | otherwise = Nothing
diff --git a/CHANGELOG b/CHANGELOG
index 453c8951d4..eedacf7fdc 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -10,6 +10,8 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
Secret support.
* fsck: Support repairing a corrupted file in a versiond S3 remote.
* addurl, importfeed: Enable --verifiable by default.
+ * unregisterurl: Unregister both VURL and URL keys when passed an url
+ instead of a key.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Command/UnregisterUrl.hs b/Command/UnregisterUrl.hs
index e8bf16c933..adc4e76dd5 100644
--- a/Command/UnregisterUrl.hs
+++ b/Command/UnregisterUrl.hs
@@ -1,6 +1,6 @@
{- git-annex command
-
- - Copyright 2015-2022 Joey Hess <id@joeyh.name>
+ - Copyright 2015-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -12,6 +12,7 @@ module Command.UnregisterUrl where
import Command
import Logs.Web
import Command.RegisterUrl (seekBatch, start, optParser, RegisterUrlOptions(..))
+import Backend.URL
cmd :: Command
cmd = withAnnexOptions [jsonOptions] $ command "unregisterurl"
@@ -26,13 +27,19 @@ seek o = case (batchOption o, keyUrlPairs o) of
unregisterUrl :: Remote -> Key -> String -> Annex ()
unregisterUrl _remote key url = do
+ unregisterUrl' url key
+ maybe noop (unregisterUrl' url) (otherUrlKey key)
+
+unregisterUrl' :: String -> Key -> Annex ()
+unregisterUrl' url key = do
-- Remove the url no matter what downloader;
-- registerurl can set OtherDownloader, and this should also
-- be able to remove urls added by addurl, which may use
-- YoutubeDownloader.
forM_ [minBound..maxBound] $ \dl ->
setUrlMissing key (setDownloader url dl)
- -- Unlike unregisterurl, this does not update location tracking
- -- for remotes other than the web special remote. Doing so with
- -- a remote that git-annex can drop content from would rather
- -- unexpectedly leave content stranded on that remote.
+ -- Unlike registerurl, this does not update location
+ -- tracking for remotes other than the web special remote.
+ -- Doing so with a remote that git-annex can drop content
+ -- from would rather unexpectedly leave content stranded
+ -- on that remote.
diff --git a/doc/git-annex-unregisterurl.mdwn b/doc/git-annex-unregisterurl.mdwn
index e8192f4084..17964145fa 100644
--- a/doc/git-annex-unregisterurl.mdwn
+++ b/doc/git-annex-unregisterurl.mdwn
@@ -11,8 +11,9 @@ git annex unregisterurl `[key url]`
This plumbing-level command can be used to unregister urls when keys can
no longer be downloaded from them.
-Normally the key is a git-annex formatted key. However, if the key cannot be
-parsed as a key, and is a valid url, an URL key is constructed from the url.
+Normally the key is a git-annex formatted key. However, when the key cannot
+be parsed as a key, and is a valid url, an URL key and a VURL key are both
+constructed from the url, and both keys are unregistered.
Unregistering a key's last web url will make git-annex no longer treat content
as being present in the web special remote. If some other special remote
diff --git a/doc/todo/migration_to_VURL_by_default.mdwn b/doc/todo/migration_to_VURL_by_default.mdwn
index 8f47a71f8a..4d658b9c89 100644
--- a/doc/todo/migration_to_VURL_by_default.mdwn
+++ b/doc/todo/migration_to_VURL_by_default.mdwn
@@ -58,5 +58,7 @@ because which ever might have been registered before, unregisterurl is used
when the content can no longer be downloaded from the web (or other special
remote that claims an url).
+> Implemented this..
+
Could registerurl (and fromkey) generate a VURL key that behaves the same
as addurl --relaxed? --[[Joey]]
addurl, importfeed: Enable --verifiable by default
diff --git a/CHANGELOG b/CHANGELOG
index 724ba92b57..453c8951d4 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -9,6 +9,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
using old http servers that use TLS 1.2 without Extended Main
Secret support.
* fsck: Support repairing a corrupted file in a versiond S3 remote.
+ * addurl, importfeed: Enable --verifiable by default.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Command/AddUrl.hs b/Command/AddUrl.hs
index cbfc71c577..5c725df7a5 100644
--- a/Command/AddUrl.hs
+++ b/Command/AddUrl.hs
@@ -1,6 +1,6 @@
{- git-annex command
-
- - Copyright 2011-2024 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -61,7 +61,7 @@ data AddUrlOptions = AddUrlOptions
data DownloadOptions = DownloadOptions
{ relaxedOption :: Bool
- , verifiableOption :: Bool
+ , oldVerifiableOption :: Bool -- no longer configurable
, rawOption :: Bool
, noRawOption :: Bool
, rawExceptOption :: Maybe (DeferredParse Remote)
@@ -101,7 +101,7 @@ parseDownloadOptions withfileoptions = DownloadOptions
<*> switch
( long "verifiable"
<> short 'V'
- <> help "improve later verification of --fast or --relaxed content"
+ <> help "no longer needed, verifiable urls are used by default"
)
<*> switch
( long "raw"
@@ -221,7 +221,7 @@ performRemote addunlockedmatcher r o uri file sz = lookupKey file >>= \case
downloadRemoteFile :: AddUnlockedMatcher -> Remote -> DownloadOptions -> URLString -> OsPath -> Maybe Integer -> Annex (Maybe Key)
downloadRemoteFile addunlockedmatcher r o uri file sz = checkCanAdd o file $ \canadd -> do
- let urlkey = Backend.URL.fromUrl uri sz (verifiableOption o)
+ let urlkey = Backend.URL.fromUrl uri sz True
createWorkTreeDirectory (parentDir file)
ifM (Annex.getRead Annex.fast <||> pure (relaxedOption o))
( do
@@ -351,7 +351,7 @@ downloadWeb :: AddUnlockedMatcher -> DownloadOptions -> URLString -> Url.UrlInfo
downloadWeb addunlockedmatcher o url urlinfo file =
go =<< downloadWith' downloader urlkey webUUID url file
where
- urlkey = addSizeUrlKey urlinfo $ Backend.URL.fromUrl url Nothing (verifiableOption o)
+ urlkey = addSizeUrlKey urlinfo $ Backend.URL.fromUrl url Nothing True
downloader f p = Url.withUrlOptions Nothing $
downloadUrl False urlkey p Nothing [url] f
go Nothing = return Nothing
@@ -395,7 +395,7 @@ downloadWeb addunlockedmatcher o url urlinfo file =
warning (UnquotedString youtubeDlCommand <> " did not download anything")
return Nothing
mediaurl = setDownloader url YoutubeDownloader
- mediakey = Backend.URL.fromUrl mediaurl Nothing (verifiableOption o)
+ mediakey = Backend.URL.fromUrl mediaurl Nothing True
-- Does the already annexed file have the mediaurl
-- as an url? If so nothing to do.
alreadyannexed dest k = do
@@ -443,7 +443,7 @@ startingAddUrl si url o p = starting "addurl" ai si $ do
-- used to prevent two threads running concurrently when that would
-- likely fail.
ai = OnlyActionOn urlkey (ActionItemOther (Just (UnquotedString url)))
- urlkey = Backend.URL.fromUrl url Nothing (verifiableOption (downloadOptions o))
+ urlkey = Backend.URL.fromUrl url Nothing True
showDestinationFile :: OsPath -> Annex ()
showDestinationFile file = do
@@ -546,12 +546,12 @@ nodownloadWeb addunlockedmatcher o url urlinfo file
return Nothing
where
nomedia = do
- let key = Backend.URL.fromUrl url (Url.urlSize urlinfo) (verifiableOption o)
+ let key = Backend.URL.fromUrl url (Url.urlSize urlinfo) True
nodownloadWeb' o addunlockedmatcher url key file
usemedia mediafile = do
let dest = youtubeDlDestFile o file mediafile
let mediaurl = setDownloader url YoutubeDownloader
- let mediakey = Backend.URL.fromUrl mediaurl Nothing (verifiableOption o)
+ let mediakey = Backend.URL.fromUrl mediaurl Nothing True
nodownloadWeb' o addunlockedmatcher mediaurl mediakey dest
youtubeDlDestFile :: DownloadOptions -> OsPath -> OsPath -> OsPath
diff --git a/Command/ImportFeed.hs b/Command/ImportFeed.hs
index e502915c41..e3411c16e8 100644
--- a/Command/ImportFeed.hs
+++ b/Command/ImportFeed.hs
@@ -275,7 +275,7 @@ startDownload addunlockedmatcher opts cache cv todownload = case location todown
Enclosure url -> startdownloadenclosure url
MediaLink linkurl -> do
let mediaurl = setDownloader linkurl YoutubeDownloader
- let mediakey = Backend.URL.fromUrl mediaurl Nothing (verifiableOption (downloadOptions opts))
+ let mediakey = Backend.URL.fromUrl mediaurl Nothing True
-- Old versions of git-annex that used quvi might have
-- used the quviurl for this, so check if it's known
-- to avoid adding it a second time.
diff --git a/doc/backends.mdwn b/doc/backends.mdwn
index c08f3d52e6..2d1328590b 100644
--- a/doc/backends.mdwn
+++ b/doc/backends.mdwn
@@ -57,7 +57,7 @@ in `.gitattributes`:
* `VURL` -- This is like an `URL` (see below) but the content can
be verified with a cryptographically secure checksum that is
recorded in the git-annex branch. It's generated when using
- eg `git-annex addurl --fast --verifiable`.
+ eg `git-annex addurl --fast/--relaxed`.
## non-cryptograpgically secure backends
@@ -70,15 +70,16 @@ content of an annexed file remains unchanged.
the same filename, size, and modification time has the same content.
This is the least expensive backend, recommended for really large
files or slow systems.
-* `URL` -- This is a key that is generated from the url to a file.
- It's generated when using eg, `git annex addurl --fast`, when the file
- content is not available for hashing.
+* `URL` -- This is a key that is generated from the url to a file.
The key may not contain the full URL; for long URLs, part of the URL may be
represented by a checksum.
The URL key may contain `&` characters; be sure to quote the key if
passing it to a shell script. These types of keys are distinct from URLs/URIs
that may be attached to a key (using any backend) indicating the key's location
- on the web or in one of [[special_remotes]].
+ on the web or in one of [[special_remotes]].
+ Older versions of git-annex generated this when using
+ `git annex addurl --fast/--relaxed`, and `git-annex registerurl` still
+ generates this.
## external backends
diff --git a/doc/git-annex-addurl.mdwn b/doc/git-annex-addurl.mdwn
index c2247e1de9..4bfc3a9dc0 100644
--- a/doc/git-annex-addurl.mdwn
+++ b/doc/git-annex-addurl.mdwn
@@ -45,22 +45,27 @@ be used to get better filenames.
* `--verifiable` `-V`
- This can be used with the `--fast` or `--relaxed` option. It improves
- the safety of the resulting annexed file, by letting its content be
- verified with a checksum when it is transferred between git-annex
- repositories, as well as by things like `git-annex fsck`.
-
- When used with --relaxed, content from the web special remote will
- always be accepted, even if it has changed, and the checksum recorded
- for later verification.
-
- When used with --fast, the checksum is recorded the first time the
- content is downloaded from the web special remote. Once a checksum has
- been recorded, subsequent downloads from the web special remote
- must have the same checksum.
-
- When addurl was used without this option before, the file it added
- can be converted to be verifiable by migrating it to the VURL backend.
+ This option is now enabled by default when using `--fast` or `--relaxed`,
+ but was not the default in older versions of git-annex.
+
+ When a file is added without first downloading its content from the web,
+ the checksum of the file is not yet known.
+
+ To allow later learning and verifying the checksum, the VURL backend is
+ used. With `--fast`, the checksum is learned on initial download of the
+ file from the web, and all subsequent downloads from the web must have
+ the same checksum. With `--relaxed`, additional checksums are added each
+ time different content is downloaded from the web.
+
+ This improves the safety of the resulting annexed file, by letting
+ its content be verified with a checksum when it is transferred between
+ git-annex repositories, as well as by things like `git-annex fsck`.
+
+ Files that were added with old versions of addurl (or with
+ `git-annex registerurl` or `git-annex fromkey`) are not verifiable.
+ They can be converted to verifiable by migrating them from the URL
+ backend to the VURL backend.
+
For example: `git-annex migrate foo --backend=VURL`
* `--raw`
diff --git a/doc/todo/migration_to_VURL_by_default.mdwn b/doc/todo/migration_to_VURL_by_default.mdwn
index 69db574059..8f47a71f8a 100644
--- a/doc/todo/migration_to_VURL_by_default.mdwn
+++ b/doc/todo/migration_to_VURL_by_default.mdwn
@@ -44,3 +44,19 @@ configuration of which kind of keys addurl uses, once VURL is the default.
> > But, I don't think that registerurl/unregisterurl continuing to
> > generate URL keys is a big problem, it should not block making VURL
> > the default in places where it can be default. --[[Joey]]
+
+----
+
+Made --verifiable be the default for addurl and importfeed.
(Diff truncated)
remove duplicate documentation of --fast, --relaxed, etc
diff --git a/doc/git-annex-importfeed.mdwn b/doc/git-annex-importfeed.mdwn index 6a763e0380..134afbd80a 100644 --- a/doc/git-annex-importfeed.mdwn +++ b/doc/git-annex-importfeed.mdwn @@ -41,30 +41,6 @@ resulting in the new url being downloaded to such a filename. These options behave the same as when using [[git-annex-addurl]](1). -* `--fast` - - Avoid immediately downloading urls. The url is still checked - (via HEAD) to verify that it exists, and to get its size if possible. - -* `--relaxed` - - Don't immediately download urls, and avoid storing the size of the - url's content. This makes git-annex accept whatever content is there - at a future point. - -* `--raw` - - Prevent special handling of urls by yt-dlp, bittorrent, and other - special remotes. This will for example, make importfeed - download a .torrent file and not the contents it points to. - -* `--no-raw` - - Require content pointed to by the url to be downloaded using yt-dlp - or a special remote, rather than the raw content of the url. if that - cannot be done, the import will fail, and the next import of the feed - will retry. - * `--scrape` Rather than downloading the url and parsing it as a rss/atom feed
diff --git a/doc/bugs/git_annex_export_--fast_deletes_files_on_remote.mdwn b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote.mdwn new file mode 100644 index 0000000000..ae34d49200 --- /dev/null +++ b/doc/bugs/git_annex_export_--fast_deletes_files_on_remote.mdwn @@ -0,0 +1,47 @@ +### Please describe the problem. + +`git annex export --fast main --to <remote>` deletes existing files on a rsync ssh remote. My mental model was that `--fast` usually instructs git-annex to not do (slow) network connections. + +My use-case for a fast export is to add an existing ssh-accessible non-git directory on a HPC system as a potential data source for a git-annex repository. The repository has additional information like how to retrieve the files from a third-party, while the directory on HPC only has the files (which were downloaded without git-annex involvement already). My plan was to add the directory as an exporttree remote, make git-annex think that the current main branches tree should be available there via the fast export, and then do a `git annex fsck --from <remote>` to discover what's actually there. Obviously it is very undesirable to loose those files on export then. + +From what I understand I could hack around this if I graft the tree into the git-annex branch and write export.log myself, but I am wondering if I am just encountering a bug and this should work the way I wanted it to. + + +### What steps will reproduce the problem? + +- Create a git-annex repository and add some files +- Create a plain directory with (a subset of) the same filenames in the repository +- Add this directory as an rsync export remote: `git annex initremote <remote> type=rsync rsyncurl=<host>:<path> exporttree=yes encryption=none` +- `git annex export --fast main --to <remote>` +- Observe files being deleted on the remote + + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20260115-ge8de977f1d5b5ac57cfe7a0c66d4e1c3ff337af1 +build flags: Assistant Webapp Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV Servant OsPath +dependency versions: aws-0.25.2 bloomfilter-2.0.1.3 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.3 http-client-0.7.19 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + + +[[!tag projects/ICE4]]
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn index f02545f603..fa78722f50 100644 --- a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn @@ -2,7 +2,7 @@ The p2phttp server can get stuck such that it no longer sends responses when client git-annex processes are interrupted. -I think this is the cause for deadlocks mih and I have seen (very sporadically) on Forgejo-aneksajo instances. I know of ~4 deadlocks since the 10.20251114 that happened in regular usage of these instances and that required a server restart to fix. +I think this is the cause for deadlocks mih and I have seen (very sporadically) on Forgejo-aneksajo instances. I know of ~4 deadlocks since the 10.20251114 release that happened in regular usage of these instances and that required a server restart to fix. ### What steps will reproduce the problem?
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn index d2fd77e0be..f02545f603 100644 --- a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn @@ -2,7 +2,7 @@ The p2phttp server can get stuck such that it no longer sends responses when client git-annex processes are interrupted. -I think this is the cause for deadlocks mih and I have seen (very sporadically) on Forgejo-aneksajo instances. I know of \~4 deadlocks since +I think this is the cause for deadlocks mih and I have seen (very sporadically) on Forgejo-aneksajo instances. I know of ~4 deadlocks since the 10.20251114 that happened in regular usage of these instances and that required a server restart to fix. ### What steps will reproduce the problem?
diff --git a/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn new file mode 100644 index 0000000000..d2fd77e0be --- /dev/null +++ b/doc/bugs/p2phttp_can_get_stuck_with_interrupted_clients.mdwn @@ -0,0 +1,177 @@ +### Please describe the problem. + +The p2phttp server can get stuck such that it no longer sends responses when client git-annex processes are interrupted. + +I think this is the cause for deadlocks mih and I have seen (very sporadically) on Forgejo-aneksajo instances. I know of \~4 deadlocks since + + +### What steps will reproduce the problem? + +Create a repository with some data (I used datalad, but plain git-annex should be the same): + +``` +datalad create test-p2phttp-interrupt +cd test-p2phttp-interrupt +for i in $(seq 1 20); do head -c 1G /dev/urandom > test$i.bin; done +datalad save +``` + +Create two clones: + +``` +datalad clone test-p2phttp-interrupt test-p2phttp-interrupt-clone +datalad clone test-p2phttp-interrupt test-p2phttp-interrupt-clone2 +``` + +Make them use p2phttp (run in both clones): + +``` +git config remote.origin.annexUrl 'annex+http://localhost:3001' +``` + +Serve the first repo via p2phttp: + +``` +git annex p2phttp -J2 --debug --bind localhost --port 3001 --wideopen +``` + +In one clone run a get that is constantly interrupted and restarted: + +``` +while true; do +git annex get . & +pid=$! +sleep 5 +kill -s SIGINT $pid +done +``` + +In the other clone just run a regular get: + +``` +git annex get . +``` + +Observation: after letting this run for a while the get's no longer make any progress. The p2phttp process no longer logs anything new. + +Given my understanding from the previous deadlocks in p2phttp it seems like the worker process that should be used to respond to these requests somehow didn't get released after an interrupted request. + + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20260115-ge8de977f1d5b5ac57cfe7a0c66d4e1c3ff337af1 +build flags: Assistant Webapp Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV Servant OsPath +dependency versions: aws-0.25.2 bloomfilter-2.0.1.3 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.3 http-client-0.7.19 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + +$ git annex p2phttp -J2 --debug --bind localhost --port 3001 --wideopen +[2026-01-27 15:17:52.122387435] (Utility.Process) process [1704520] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"] +[2026-01-27 15:17:52.124778937] (Utility.Process) process [1704520] done ExitSuccess +[2026-01-27 15:17:52.125127598] (Utility.Process) process [1704521] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"] +[2026-01-27 15:17:52.127448536] (Utility.Process) process [1704521] done ExitSuccess +[2026-01-27 15:17:52.128485775] (Utility.Process) process [1704522] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"] +[2026-01-27 15:17:52.131112388] (Annex.Branch) read proxy.log +[2026-01-27 15:17:56.728686389] (P2P.IO) [http client] [ThreadId 12] P2P > CHECKPRESENT MD5E-s1073741824--4c882b5dc5bbb53d59ab0d4e67e2a3c4.bin +[2026-01-27 15:17:56.728896008] (P2P.IO) [http server] [ThreadId 15] P2P < CHECKPRESENT MD5E-s1073741824--4c882b5dc5bbb53d59ab0d4e67e2a3c4.bin +[2026-01-27 15:17:56.729107393] (P2P.IO) [http server] [ThreadId 15] P2P > SUCCESS +[2026-01-27 15:17:56.729160766] (P2P.IO) [http client] [ThreadId 12] P2P < SUCCESS +[2026-01-27 15:17:57.093025365] (P2P.IO) [http client] [ThreadId 18] P2P > GET 220011077 test10.bin MD5E-s1073741824--4c882b5dc5bbb53d59ab0d4e67e2a3c4.bin +[2026-01-27 15:17:57.093145849] (P2P.IO) [http server] [ThreadId 17] P2P < GET 220011077 test10.bin MD5E-s1073741824--4c882b5dc5bbb53d59ab0d4e67e2a3c4.bin +[2026-01-27 15:17:57.093714738] (Utility.Process) process [1704639] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","write-tree"] +[2026-01-27 15:17:57.096671191] (Utility.Process) process [1704639] done ExitSuccess +[2026-01-27 15:17:57.096984805] (Utility.Process) process [1704640] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/annex/last-index"] +[2026-01-27 15:17:57.099497] (Utility.Process) process [1704640] done ExitSuccess +[2026-01-27 15:17:57.100142132] (P2P.IO) [http server] [ThreadId 17] P2P > DATA 853730747 +[2026-01-27 15:17:57.100206771] (P2P.IO) [http client] [ThreadId 18] P2P < DATA 853730747 +[2026-01-27 15:17:59.215559236] (P2P.IO) [http client] [ThreadId 24] P2P > CHECKPRESENT MD5E-s1073741824--e8bb491c04da0917cf1871a4d9f719d2.bin +[2026-01-27 15:17:59.215654747] (P2P.IO) [http server] [ThreadId 26] P2P < CHECKPRESENT MD5E-s1073741824--e8bb491c04da0917cf1871a4d9f719d2.bin +[2026-01-27 15:17:59.215723248] (P2P.IO) [http server] [ThreadId 26] P2P > SUCCESS +[2026-01-27 15:17:59.215761274] (P2P.IO) [http client] [ThreadId 24] P2P < SUCCESS +[2026-01-27 15:17:59.217064991] (P2P.IO) [http client] [ThreadId 29] P2P > GET 0 test1.bin MD5E-s1073741824--e8bb491c04da0917cf1871a4d9f719d2.bin +[2026-01-27 15:17:59.217130521] (P2P.IO) [http server] [ThreadId 28] P2P < GET 0 test1.bin MD5E-s1073741824--e8bb491c04da0917cf1871a4d9f719d2.bin +[2026-01-27 15:17:59.217519652] (P2P.IO) [http server] [ThreadId 28] P2P > DATA 1073741824 +[2026-01-27 15:17:59.21755853] (P2P.IO) [http client] [ThreadId 29] P2P < DATA 1073741824 +[2026-01-27 15:18:00.279578339] (P2P.IO) [http server] [ThreadId 17] P2P > VALID +[2026-01-27 15:18:00.279785154] (P2P.IO) [http client] [ThreadId 18] P2P < VALID +[2026-01-27 15:18:00.279818373] (P2P.IO) [http client] [ThreadId 18] P2P > SUCCESS +[2026-01-27 15:18:00.279862343] (P2P.IO) [http server] [ThreadId 17] P2P < SUCCESS +[2026-01-27 15:18:00.329523146] (P2P.IO) [http client] [ThreadId 12] P2P > CHECKPRESENT MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:00.329702303] (P2P.IO) [http server] [ThreadId 33] P2P < CHECKPRESENT MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:00.329825138] (P2P.IO) [http server] [ThreadId 33] P2P > SUCCESS +[2026-01-27 15:18:00.329871666] (P2P.IO) [http client] [ThreadId 12] P2P < SUCCESS +[2026-01-27 15:18:00.331456293] (P2P.IO) [http client] [ThreadId 36] P2P > GET 0 test11.bin MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:00.331595061] (P2P.IO) [http server] [ThreadId 35] P2P < GET 0 test11.bin MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:00.332346826] (P2P.IO) [http server] [ThreadId 35] P2P > DATA 1073741824 +[2026-01-27 15:18:00.332430727] (P2P.IO) [http client] [ThreadId 36] P2P < DATA 1073741824 +[2026-01-27 15:18:01.745659339] (P2P.IO) [http client] [ThreadId 39] P2P > CHECKPRESENT MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:01.745775646] (P2P.IO) [http server] [ThreadId 41] P2P < CHECKPRESENT MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:01.745896432] (P2P.IO) [http server] [ThreadId 41] P2P > SUCCESS +[2026-01-27 15:18:01.745947955] (P2P.IO) [http client] [ThreadId 39] P2P < SUCCESS +[2026-01-27 15:18:02.304886078] (P2P.IO) [http client] [ThreadId 44] P2P > GET 335670329 test11.bin MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:02.305117538] (P2P.IO) [http server] [ThreadId 43] P2P < GET 335670329 test11.bin MD5E-s1073741824--a4b23db926fc7b0eed61415c0557d272.bin +[2026-01-27 15:18:03.331345311] (P2P.IO) [http server] [ThreadId 28] P2P > VALID +[2026-01-27 15:18:03.331419419] (P2P.IO) [http client] [ThreadId 29] P2P < VALID +[2026-01-27 15:18:03.331465319] (P2P.IO) [http client] [ThreadId 29] P2P > SUCCESS +[2026-01-27 15:18:03.331492753] (P2P.IO) [http server] [ThreadId 28] P2P < SUCCESS +[2026-01-27 15:18:03.331780166] (P2P.IO) [http server] [ThreadId 43] P2P > DATA 738071495 +[2026-01-27 15:18:03.331839961] (P2P.IO) [http client] [ThreadId 44] P2P < DATA 738071495 +[2026-01-27 15:18:06.044717964] (P2P.IO) [http server] [ThreadId 43] P2P > VALID +[2026-01-27 15:18:06.044806699] (P2P.IO) [http client] [ThreadId 44] P2P < VALID +[2026-01-27 15:18:06.044861192] (P2P.IO) [http client] [ThreadId 44] P2P > SUCCESS +[2026-01-27 15:18:06.04490031] (P2P.IO) [http server] [ThreadId 43] P2P < SUCCESS + +$ while true; do +git annex get . & +pid=$! +sleep 5 +kill -s SIGINT $pid +done +[1] 1704547 +get test10.bin (from origin...) +ok +get test11.bin (from origin...) +29% 294.58 MiB 255 MiB/s 2s [2] 1704749 +(recording state in git...) +get test11.bin (from origin...) +ok +get test12.bin [1]- Interrupt git annex get . +[3] 1704857 +(recording state in git...) +get test12.bin [2]- Interrupt git annex get . +[4] 1704996 +get test12.bin [3]- Interrupt git annex get . +[5] 1705094 +get test12.bin [4]- Interrupt git annex get . +[6] 1705191 +get test12.bin [5]- Interrupt git annex get . +[7] 1705286 +get test12.bin ^C[6]- Interrupt git annex get . + +$ git annex get . +get test1.bin (from origin...) +ok +get test10.bin ^C + +# End of transcript or log. +"""]] + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Starting with a DataLad Dataset and by extension git-annex repository is the first thing I do whenever I have to deal with code and/or data that is not some throwaway stuff :) + +[[!tag projects/ICE4]]
comment
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_8_8bb8daa43d23d8ffdb652cfeb627b2db._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_8_8bb8daa43d23d8ffdb652cfeb627b2db._comment new file mode 100644 index 0000000000..55d7c11a3d --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_8_8bb8daa43d23d8ffdb652cfeb627b2db._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2026-01-26T17:01:47Z" + content=""" +I thought about making `git-annex export` checksum files before uploading, +but I don't see why export needs that any more than a regular copy to a +remote does. In either case, annex.verify will notice the bad content when +getting from the remote, and fscking the remote will also detect it, and +now, recover from it. + +It seems unlikely to me that the annex object file got truncated before +it was sent to ds005256 in any case. Seems more likely that the upload +was somehow not of the whole file. +"""]]
done
diff --git a/CHANGELOG b/CHANGELOG
index bc1e4875a5..724ba92b57 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -8,6 +8,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* Added annex.security.allow-insecure-https config, which allows
using old http servers that use TLS 1.2 without Extended Main
Secret support.
+ * fsck: Support repairing a corrupted file in a versiond S3 remote.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/doc/todo/recover_from_export_of_corrupted_object.mdwn b/doc/todo/recover_from_export_of_corrupted_object.mdwn
index 9311547310..8383eae5b1 100644
--- a/doc/todo/recover_from_export_of_corrupted_object.mdwn
+++ b/doc/todo/recover_from_export_of_corrupted_object.mdwn
@@ -37,3 +37,5 @@ Could fsck be extended to handle this? It should be possible for fsck to:
--[[Joey]]
[[!tag projects/openneuro]]
+
+> [[done]] --[[Joey]]
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_7_189e551ab3adbd42175bed435452e39b._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_7_189e551ab3adbd42175bed435452e39b._comment
new file mode 100644
index 0000000000..93ae733261
--- /dev/null
+++ b/doc/todo/recover_from_export_of_corrupted_object/comment_7_189e551ab3adbd42175bed435452e39b._comment
@@ -0,0 +1,7 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 7"""
+ date="2026-01-26T16:56:12Z"
+ content="""
+Finished implementing recovery from a corrupted S3 version id.
+"""]]
comments
diff --git a/doc/todo/drop_from_export_remote/comment_2_528e517b82a9f9a3e0f6ba5c2177e21a._comment b/doc/todo/drop_from_export_remote/comment_2_528e517b82a9f9a3e0f6ba5c2177e21a._comment new file mode 100644 index 0000000000..8d128e039b --- /dev/null +++ b/doc/todo/drop_from_export_remote/comment_2_528e517b82a9f9a3e0f6ba5c2177e21a._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-01-26T16:43:58Z" + content=""" +Currently `git-annex fsck --from` an export remote is unable to drop a key +if it finds corrupted data. Implementing this would also deal with that +problem. +"""]] diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_5_5d1aa23a6819b3d169c2c2090cf24041._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_5_5d1aa23a6819b3d169c2c2090cf24041._comment new file mode 100644 index 0000000000..83f222fa92 --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_5_5d1aa23a6819b3d169c2c2090cf24041._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-26T16:41:32Z" + content=""" +#1 is not needed for the case of a versioned S3 bucket, because after +`git-annex fsck --from S3` corrects the problem, `git-annex export --to S3` +will see that the file is not in S3, and re-upload it. + +In the general case, #1 is still needed. I think +[[todo/drop_from_export_remote]] would solve this, and so no need to deal +with it here. +"""]]
diff --git a/doc/forum/How_do_I_prevent_annex-sync_from_eating_my_data__63__.mdwn b/doc/forum/How_do_I_prevent_annex-sync_from_eating_my_data__63__.mdwn new file mode 100644 index 0000000000..8151752177 --- /dev/null +++ b/doc/forum/How_do_I_prevent_annex-sync_from_eating_my_data__63__.mdwn @@ -0,0 +1,46 @@ +How do I prevent annex-sync from eating my data by automatic commits? + +I have already set the following: + +$ anx config --get annex.autocommit + +false + +$ anx config --get annex.resolvemerge + +false + +$ anx config --get annex.synccontent + +false + +$ anx config --get annex.synconlyannex + +true + +Workflow is as follows: + +- single repo on a PC with mixed locked and unlocked files and an adb special remote. + +- make a change to a file on the android device; + +- run git annex sync; + +Current behavior: + +Said file (sometimes; I don't get the logic) gets overwritten on the PC. + +All unlocked files are automatically locked. + +Oh and if git-annex noticed a conflict and refused to overwrite the file on the android device during export, +then if I run git annex sync again it overwrites the file on the android device anyway leading to data loss. + +What the hell? + +Desired behavior: + +Apply some conflict resolution strategies if needed and just stage the changes. + +Don't actually commit any changes. + +Don't eat my data by automatically running git annex export.
add git config for HTTPS with TLS 1.2 w/o EMS
Added annex.security.allow-insecure-https config, which allows using old
HTTPS servers that use TLS 1.2 without Extended Main Secret support.
When git-annex is built with tls-2.0, it will default to not supporting
those. Note that currently, Debian has an older version of the library,
but building with stack will get tls-2.0.
The annex.security.allow-insecure-https name and setting
was chosen to allow supporting other such things in the future.
With that said, I hope that the "tls-1.2-no-EMS" value can be removed from
git-annex at some point in the future. The number of affected HTTPS servers
must be decreasing, and they will eventually get fixed. And this is an ugly
bit of complexity.
Users will I suppose have to find the setting by googling the error
message, which is "peer does not support Extended Main Secret".
It would be possible to catch the exception,
HandshakeFailed (Error_Protocol "peer does not support Extended Main Secret" HandshakeFailure)
but it would be hard to catch it in the right places where the http manager
is used.
The added dependencies on crypton-connection and tls are free,
those were already indirect dependencies.
Sponsored-by: Leon Schuermann
Added annex.security.allow-insecure-https config, which allows using old
HTTPS servers that use TLS 1.2 without Extended Main Secret support.
When git-annex is built with tls-2.0, it will default to not supporting
those. Note that currently, Debian has an older version of the library,
but building with stack will get tls-2.0.
The annex.security.allow-insecure-https name and setting
was chosen to allow supporting other such things in the future.
With that said, I hope that the "tls-1.2-no-EMS" value can be removed from
git-annex at some point in the future. The number of affected HTTPS servers
must be decreasing, and they will eventually get fixed. And this is an ugly
bit of complexity.
Users will I suppose have to find the setting by googling the error
message, which is "peer does not support Extended Main Secret".
It would be possible to catch the exception,
HandshakeFailed (Error_Protocol "peer does not support Extended Main Secret" HandshakeFailure)
but it would be hard to catch it in the right places where the http manager
is used.
The added dependencies on crypton-connection and tls are free,
those were already indirect dependencies.
Sponsored-by: Leon Schuermann
diff --git a/Annex/Url.hs b/Annex/Url.hs
index 6d0cb43767..f08aa1baef 100644
--- a/Annex/Url.hs
+++ b/Annex/Url.hs
@@ -1,7 +1,7 @@
{- Url downloading, with git-annex user agent and configured http
- headers, security restrictions, etc.
-
- - Copyright 2013-2022 Joey Hess <id@joeyh.name>
+ - Copyright 2013-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -48,6 +48,8 @@ import Network.HTTP.Client
import Network.HTTP.Client.TLS
import Text.Read
import qualified Data.Set as S
+import qualified Network.Connection as NC
+import qualified Network.TLS as TLS
defaultUserAgent :: U.UserAgent
defaultUserAgent = "git-annex/" ++ BuildInfo.packageversion
@@ -66,7 +68,7 @@ getUrlOptions mgc = Annex.getState Annex.urloptions >>= \case
return uo
where
mk = do
- (urldownloader, manager) <- checkallowedaddr
+ (urldownloader, manager) <- mk' =<< Annex.getGitConfig
U.mkUrlOptions
<$> (Just <$> getUserAgent)
<*> headers
@@ -87,7 +89,7 @@ getUrlOptions mgc = Annex.getState Annex.urloptions >>= \case
pure (remoteAnnexWebOptions gc)
_ -> annexWebOptions <$> Annex.getGitConfig
- checkallowedaddr = words . annexAllowedIPAddresses <$> Annex.getGitConfig >>= \case
+ mk' gc = case words (annexAllowedIPAddresses gc) of
["all"] -> do
curlopts <- map Param <$> getweboptions
allowedurlschemes <- annexAllowedUrlSchemes <$> Annex.getGitConfig
@@ -96,7 +98,7 @@ getUrlOptions mgc = Annex.getState Annex.urloptions >>= \case
U.DownloadWithCurlRestricted mempty
else U.DownloadWithCurl curlopts
manager <- liftIO $ U.newManager $
- avoidtimeout $ tlsManagerSettings
+ avoidtimeout managersettings
return (urldownloader, manager)
allowedaddrsports -> do
addrmatcher <- liftIO $
@@ -118,7 +120,7 @@ getUrlOptions mgc = Annex.getState Annex.urloptions >>= \case
then Nothing
else Just (connectionrestricted addr)
(settings, pr) <- liftIO $
- mkRestrictedManagerSettings r Nothing Nothing
+ mkRestrictedManagerSettings r Nothing tlssettings
case pr of
Nothing -> return ()
Just ProxyRestricted -> toplevelWarning True
@@ -130,6 +132,18 @@ getUrlOptions mgc = Annex.getState Annex.urloptions >>= \case
let urldownloader = U.DownloadWithConduit $
U.DownloadWithCurlRestricted r
return (urldownloader, manager)
+ where
+ -- When configured, allow TLS 1.2 without EMS.
+ -- In tls-2.0, the default was changed from
+ -- TLS.AllowEMS to TLS.RequireEMS.
+ tlssettings
+ | annexAllowInsecureHttps gc = Just $
+ NC.TLSSettingsSimple False False False
+ def { TLS.supportedExtendedMainSecret = TLS.AllowEMS }
+ | otherwise = Nothing
+ managersettings = case tlssettings of
+ Nothing -> tlsManagerSettings
+ Just v -> mkManagerSettings v Nothing
-- http-client defailts to timing out a request after 30 seconds
-- or so, but some web servers are slower and git-annex has its own
diff --git a/CHANGELOG b/CHANGELOG
index 6b44912669..bc1e4875a5 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -5,6 +5,9 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* p2phttp: Commit git-annex branch changes promptly.
* When used with git forges that allow Push to Create, the remote's
annex-uuid is re-probed after the initial push.
+ * Added annex.security.allow-insecure-https config, which allows
+ using old http servers that use TLS 1.2 without Extended Main
+ Secret support.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs
index c31dec617f..5772bebb0c 100644
--- a/Types/GitConfig.hs
+++ b/Types/GitConfig.hs
@@ -147,6 +147,7 @@ data GitConfig = GitConfig
, annexRetryDelay :: Maybe Seconds
, annexAllowedUrlSchemes :: S.Set Scheme
, annexAllowedIPAddresses :: String
+ , annexAllowInsecureHttps :: Bool
, annexAllowUnverifiedDownloads :: Bool
, annexAllowedComputePrograms :: Maybe String
, annexMaxExtensionLength :: Maybe Int
@@ -268,6 +269,8 @@ extractGitConfig configsource r = GitConfig
getmaybe (annexConfig "security.allowed-ip-addresses")
<|>
getmaybe (annexConfig "security.allowed-http-addresses") -- old name
+ , annexAllowInsecureHttps = (== Just "tls-1.2-no-EMS") $
+ getmaybe (annexConfig "security.allow-insecure-https")
, annexAllowUnverifiedDownloads = (== Just "ACKTHPPT") $
getmaybe (annexConfig "security.allow-unverified-downloads")
, annexAllowedComputePrograms =
diff --git a/debian/control b/debian/control
index 7484f04658..32f6e038e1 100644
--- a/debian/control
+++ b/debian/control
@@ -10,6 +10,7 @@ Build-Depends:
libghc-data-default-dev,
libghc-hslogger-dev,
libghc-crypton-dev,
+ libghc-crypton-connection-dev,
libghc-memory-dev,
libghc-deepseq-dev,
libghc-attoparsec-dev,
@@ -27,6 +28,7 @@ Build-Depends:
libghc-uuid-dev,
libghc-aeson-dev,
libghc-tagsoup-dev,
+ libghc-tls-dev,
libghc-unordered-containers-dev,
libghc-ifelse-dev,
libghc-bloomfilter-dev,
diff --git a/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret.mdwn b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret.mdwn
index 5723e07957..a04be28bbe 100644
--- a/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret.mdwn
+++ b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret.mdwn
@@ -99,3 +99,5 @@ ewen@basadi:~/Music/podcasts$
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Absolutely, I've been using git-annex as my podcatcher (among other reasons) for about a decade at this point. Thanks for developing it!
+
+> [[fixed|done]] --[[Joey]]
diff --git a/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_6_1d9a5eeb5c5f4894460dbdb326e1edec._comment b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_6_1d9a5eeb5c5f4894460dbdb326e1edec._comment
new file mode 100644
index 0000000000..8c0e9c4498
--- /dev/null
+++ b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_6_1d9a5eeb5c5f4894460dbdb326e1edec._comment
@@ -0,0 +1,22 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 6"""
+ date="2026-01-25T20:19:57Z"
+ content="""
+Finally ran into this myself, and I observed several podcast hosts still
+not supporting EMS even now.
+
+Implemented a config to solve this:
+
+ git config annex.security.allow-insecure-https tls-1.2-no-EMS
+
+I do caution against setting this globally for security reasons. At least not
+without understanding the security implications, which I can't say I do.
+
+Even setting it in a single repo could affect other
+connections by git-annx to eg, API endpoints used for storage.
+
+Personally, I am setting it only when importing feeds from those hosts:
+
+ git -c annex.security.allow-insecure-https=tls-1.2-no-EMS annex importfeed
+"""]]
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index df3f84d8ab..4d4484dc47 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -2246,6 +2246,15 @@ Remotes are configured using these settings in `.git/config`.
If set, this is treated the same as having
annex.security.allowed-ip-addresses set.
+* `annex.security.allow-insecure-https`
+
+ This can be used to loosen the security of the HTTPS implementation.
+
+ Set to "tls-1.2-no-EMS" to allow using TLS 1.2 without Extended Main
+ Secret support. You should do this only when needing to use git-annex
+ with a server that is insecure, and where the security of TLS is not
+ important to you.
+
* `annex.security.allow-unverified-downloads`
For security reasons, git-annex refuses to download content from
diff --git a/git-annex.cabal b/git-annex.cabal
index 5361fc0acd..d4dd436f80 100644
--- a/git-annex.cabal
+++ b/git-annex.cabal
@@ -274,6 +274,8 @@ Executable git-annex
git-lfs (>= 1.2.0),
clock (>= 0.3.0),
crypton,
+ crypton-connection,
+ tls,
(Diff truncated)
workaround
diff --git a/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_5_e1650df4a90fb14cd1b0332bbb2c4e36._comment b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_5_e1650df4a90fb14cd1b0332bbb2c4e36._comment new file mode 100644 index 0000000000..fb4e080823 --- /dev/null +++ b/doc/bugs/tls__58___peer_does_not_support_Extended_Main_Secret/comment_5_e1650df4a90fb14cd1b0332bbb2c4e36._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""workaround""" + date="2026-01-25T19:27:58Z" + content=""" +Workaround: Make git-annex use curl for url downloads. Eg: + + git config annex.security.allowed-ip-addresses all + git config annex.web-options --netrc + +Note this using curl has other security implications, including letting +git-annex download from IPs on the LAN. +"""]]
comment
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_5_aea052fe21134d421c184272372e0cd8._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_5_aea052fe21134d421c184272372e0cd8._comment new file mode 100644 index 0000000000..12ef212684 --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_5_aea052fe21134d421c184272372e0cd8._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-23T20:52:33Z" + content=""" +Started implementation in the `repair` branch. +"""]]
update
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment index 9cd701d3fb..235402bf24 100644 --- a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment @@ -6,9 +6,14 @@ If [[todo/drop_from_export_remote]] were implemented that would take care of #1. -Since `git-annex fsck` already tells the user what to do when it finds a -corrupted file on an export remote, and that works for ones not using -versioning, I think #1 can be left to that todo to solve, -and #2 be dealt with here. That will be enough to recover the problem -dataset. +The user can export a tree that removes the file themselves. fsck even +suggests doing that when it finds a corrupted file on an exporttree remote, +since it's unable to drop it in that case. + +But notice that the fsck run above does not suggest doing that. Granted, +with a S3 bucket with versioning, exporting a tree won't remove the +corrupted version of the file from the remote anyway. + +It seems that dealing with #2 here is enough to recover the problem +dataset, and #1 can be left to that other todo. """]]
comments
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment index b740f91970..ce7ceff9a9 100644 --- a/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_2_7ce55f8dbe9372085508cebc977587bd._comment @@ -3,7 +3,24 @@ subject="""comment 2""" date="2025-12-17T18:30:06Z" content=""" -In a non-export S3 bucket with versioning, fsck also cannot recover from a -corrupted object, due to the same problem with the versionId. The same -method should work to handle this case. +The OpenNeuro dataset ds005256 is a S3 bucket with versioning=yes, and a +publicurl set, and exporttree=yes. With that combination, when S3 +credentials are not set, the versionId is used, in the public url for downloading. + + git clone https://github.com/OpenNeuroDatasets/ds005256.git + git-annex get stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 + +Note that this first does a download that fails incomplete with +"Verification of content failed". Then it complains "Unable to access these +remotes: s3-PUBLIC". It's trying two different download methods; the second +one can only work with S3 credentials set. + + git-annex fsck stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 + fsck stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 (fixing location log) + ** Based on the location log, stimuli/task-alignvideo/ses-01_run-02_order-01_content-harrymetsally.mp4 + ** was expected to be present, but its content is missing. + failed + +Note that this doesn't download, but fails at the checkPresent stage. At that +point, the HTTP HEAD reports the size of the object, and it's too short. """]] diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment index 1a38db539e..9cd701d3fb 100644 --- a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment @@ -8,5 +8,7 @@ of #1. Since `git-annex fsck` already tells the user what to do when it finds a corrupted file on an export remote, and that works for ones not using -versioning, I think #1 can be postponed and #2 be dealt with first. +versioning, I think #1 can be left to that todo to solve, +and #2 be dealt with here. That will be enough to recover the problem +dataset. """]] diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_4_9f2be5ff2d4225c880eb39831455b2a5._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_4_9f2be5ff2d4225c880eb39831455b2a5._comment new file mode 100644 index 0000000000..1218d28c06 --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_4_9f2be5ff2d4225c880eb39831455b2a5._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-23T17:21:51Z" + content=""" +After a *lot* of thought and struggling with layering issues between fsck and +the S3 remote, here is a design to solve #2: + +Add a new method `repairCorruptedKey :: Key -> Annex Bool` + +fsck calls this when it finds a remote does not have a key it expected it +to have, or when it downloads corrupted content. + +If `repairCorruptedKey` returns True, it was able to repair a problem, and +the Key should be able to be downloaded from the remote still. If it +returns False, it was not able to repair the problem. + +Most special remotes will make this `pure False`. For S3 with versioning=yes, +it will download the object from the bucket, using each recorded versionId. +Any versionId that does not work will be removed. And return True if any +download did succeed. + +In a case where the object size is right, but it's corrupt, +fsck will download the object, and then repairCorruptedKey will download it +a second time. If there were 2 files with the same content, it would end up +being downloaded 3 times! So this can be pretty expensive, +but it's simple and will work. +"""]]
comment
diff --git a/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment new file mode 100644 index 0000000000..1a38db539e --- /dev/null +++ b/doc/todo/recover_from_export_of_corrupted_object/comment_3_8cbdca7342dace95c69800d3adb37398._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-23T16:42:49Z" + content=""" +If [[todo/drop_from_export_remote]] were implemented that would take care +of #1. + +Since `git-annex fsck` already tells the user what to do when it finds a +corrupted file on an export remote, and that works for ones not using +versioning, I think #1 can be postponed and #2 be dealt with first. +"""]]
comment
diff --git a/doc/todo/drop_from_export_remote/comment_1_dac4f33da46b3695383df63e88fc4e67._comment b/doc/todo/drop_from_export_remote/comment_1_dac4f33da46b3695383df63e88fc4e67._comment new file mode 100644 index 0000000000..860bb39c4f --- /dev/null +++ b/doc/todo/drop_from_export_remote/comment_1_dac4f33da46b3695383df63e88fc4e67._comment @@ -0,0 +1,21 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-23T16:46:05Z" + content=""" +Rather than altering the exported git tree, it could removeExport and then +update the export log to say that the export is incomplete. + +That would result in a re-export putting the file back on the remote. + +It's not uncommon to eg want to `git-annex move foo --from remote`, +due to it being low on space, or to temporarily make it unavailable, +and later send the file back to the remote. Supporting drop from export +remotes in this way would allow for such a workflow, although with the +difference that `git-annex export` would be needed to put the file back. + +It might also be possible to make sending a particular file to an export +remote succeed when the export to the remote is incomplete and the file is +in the exported tree. Then `git-annex move foo --to remote` would work to +put the file back. +"""]]
Added a comment
diff --git a/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_4_16b6ef07800c1bff2e01f258b031d9b9._comment b/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_4_16b6ef07800c1bff2e01f258b031d9b9._comment new file mode 100644 index 0000000000..befcbe4105 --- /dev/null +++ b/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_4_16b6ef07800c1bff2e01f258b031d9b9._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 4" + date="2026-01-23T14:39:24Z" + content=""" +> A balance might be that if it fails to connect to the remote.name.annexUrl, it could re-check it then. + +Would this include re-checking when remote.name.annexUrl is unset? That would be necessary in the situations where either the client didn't understand p2phttp when the repository was closed or when the server-side didn't provide p2phttp yet. + +Given that the clone happened in the knowledge that \"dumb http\" was the only supported http protocol and read only, I am now questioning if such a automatic upgrade to p2phttp would really be needed, or even desirable. Dumb http continues to work anyway. + +Only re-checking if remote.name.annexUrl is set already would solve the issue of relocating the p2phttp endpoint. +"""]]
diff --git a/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn b/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn index 7ebff66146..8276b76ba6 100644 --- a/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn +++ b/doc/bugs/migrate_removes_associated_URLs_with_custom_scheme.mdwn @@ -140,3 +140,5 @@ I know that this is sort of abusing the URL handling in git-annex, but it was su ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) Yes! It is absolutely great, thank you for it. + +[[!tag projects/ICE4]]
Added a comment: Poor Bunny
diff --git a/doc/forum/OSX__39__s_default_sshd_behaviour_has_limited_paths_set/comment_4_fc4bc5c0f4e3f75b862adc517739c334._comment b/doc/forum/OSX__39__s_default_sshd_behaviour_has_limited_paths_set/comment_4_fc4bc5c0f4e3f75b862adc517739c334._comment new file mode 100644 index 0000000000..beeca24e25 --- /dev/null +++ b/doc/forum/OSX__39__s_default_sshd_behaviour_has_limited_paths_set/comment_4_fc4bc5c0f4e3f75b862adc517739c334._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="cxararea" + avatar="http://cdn.libravatar.org/avatar/2718f71ca02c851974140f2a0c457b1b" + subject="Poor Bunny" + date="2026-01-21T07:29:04Z" + content=""" +Another standout feature is replayability. Each run feels different due to <a href=\"https://poorbunnygame.com\">Poor Bunny</a> random trap patterns, and the desire to beat your previous high score creates a strong “one more try” loop. +"""]]
Added a comment: Melon playground - Gaming is good
diff --git a/doc/forum/How_to_register_arguments_for_an_external_special_remote__63__/comment_7_a5a401145f88a0bee78fcf2d1c7befbc._comment b/doc/forum/How_to_register_arguments_for_an_external_special_remote__63__/comment_7_a5a401145f88a0bee78fcf2d1c7befbc._comment new file mode 100644 index 0000000000..ade7a9ada4 --- /dev/null +++ b/doc/forum/How_to_register_arguments_for_an_external_special_remote__63__/comment_7_a5a401145f88a0bee78fcf2d1c7befbc._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="cxararea" + avatar="http://cdn.libravatar.org/avatar/2718f71ca02c851974140f2a0c457b1b" + subject="Melon playground - Gaming is good" + date="2026-01-21T07:26:00Z" + content=""" +One of the most impressive aspects of <a href=\"https://melon-playground.io/online/\">Melon Playground</a> is its physics system. Every action feels meaningful because small changes can lead to very different outcomes. Whether you’re connecting objects, applying pressure, or testing explosions, the results often feel unpredictable and entertaining. This makes experimentation highly addictive, as players are constantly curious to see “what happens if” they try something new. + +The ragdoll physics of the characters add another layer of fun. Watching how they react to impacts, tools, and environmental hazards can be both humorous and fascinating, especially when combined with creative setups. +"""]]
comment
diff --git a/doc/todo/misleading_message_when_ssh_remote_does_not_exist.mdwn b/doc/todo/misleading_message_when_ssh_remote_does_not_exist.mdwn new file mode 100644 index 0000000000..4bd6ba54c3 --- /dev/null +++ b/doc/todo/misleading_message_when_ssh_remote_does_not_exist.mdwn @@ -0,0 +1,13 @@ + joey@darkstar:~/tmp/ben/mom4>git remote add foo localhost:/tmp/foo + joey@darkstar:~/tmp/ben/mom4>git-annex init + init + Unable to parse git config from foo + + Remote foo does not have git-annex installed; setting annex-ignore + + This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote foo + ok + +This message is wrong, git-annex-shell is installed. But since /tmp/foo does not exist, it errors out. + +Maybe `git-annex-shell configlist` should output nothing instead of erroring out in this situation? --[[Joey]]
comment
diff --git a/doc/todo/support_push_to_create/comment_6_4bb1c61505124c34618859c71821a963._comment b/doc/todo/support_push_to_create/comment_6_4bb1c61505124c34618859c71821a963._comment new file mode 100644 index 0000000000..daee1c05f7 --- /dev/null +++ b/doc/todo/support_push_to_create/comment_6_4bb1c61505124c34618859c71821a963._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2026-01-20T19:58:18Z" + content=""" +See <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/103> +"""]]
close
diff --git a/doc/todo/support_push_to_create.mdwn b/doc/todo/support_push_to_create.mdwn index 50d9cd0a65..a2735a934e 100644 --- a/doc/todo/support_push_to_create.mdwn +++ b/doc/todo/support_push_to_create.mdwn @@ -31,3 +31,5 @@ since it would ignore annex-ignore being set, and re-probe the git config to see if a UUID has appeared. That seems a small enough price to pay. The assistant would also need to be made to handle this. --[[Joey]] + +> [[done]] --[[Joey]] diff --git a/doc/todo/support_push_to_create/comment_5_6aeb1f35417bf2035ee3072061df97bf._comment b/doc/todo/support_push_to_create/comment_5_6aeb1f35417bf2035ee3072061df97bf._comment new file mode 100644 index 0000000000..330603153e --- /dev/null +++ b/doc/todo/support_push_to_create/comment_5_6aeb1f35417bf2035ee3072061df97bf._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-20T19:46:27Z" + content=""" +Implemented both. +"""]]
sync, push: push-to-create support
When used with git forges that allow Push to Create, the remote's
annex-uuid is re-probed after the initial push.
This works, but requires the user run git-annex sync or push. If they
opt to manually git push to create the repo, and then use other
git-annex commands, annex-ignore will remain set.
The implementation here is not ideal, the annex-ignore git config gets
unset and may then get re-set if the remote host does not support
git-annex-shell. And the use of remoteList' to regenerate the remote
does extra work. But implementing it this way avoids needing any changes
to Remote.Git, and avoids tying it to that type of remote too.
When used with git forges that allow Push to Create, the remote's
annex-uuid is re-probed after the initial push.
This works, but requires the user run git-annex sync or push. If they
opt to manually git push to create the repo, and then use other
git-annex commands, annex-ignore will remain set.
The implementation here is not ideal, the annex-ignore git config gets
unset and may then get re-set if the remote host does not support
git-annex-shell. And the use of remoteList' to regenerate the remote
does extra work. But implementing it this way avoids needing any changes
to Remote.Git, and avoids tying it to that type of remote too.
diff --git a/CHANGELOG b/CHANGELOG
index 112c34db33..6b44912669 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -3,6 +3,8 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* Fix behavior of local git remotes that have annex-ignore
set to be the same as ssh git remotes.
* p2phttp: Commit git-annex branch changes promptly.
+ * When used with git forges that allow Push to Create, the remote's
+ annex-uuid is re-probed after the initial push.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/Command/Sync.hs b/Command/Sync.hs
index e859746f21..e1e9c146f0 100644
--- a/Command/Sync.hs
+++ b/Command/Sync.hs
@@ -1,7 +1,7 @@
{- git-annex command
-
- Copyright 2011 Joachim Breitner <mail@joachim-breitner.de>
- - Copyright 2011-2024 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -264,7 +264,15 @@ seek' :: SyncOptions -> CommandSeek
seek' o = startConcurrency transferStages $ do
let withbranch a = a =<< getCurrentBranch
- remotes <- syncRemotes (syncWith o)
+ mc <- mergeConfig (allowUnrelatedHistories o)
+
+ unless (cleanupOption o) $
+ includeactions
+ [ [ commit o ]
+ , [ withbranch (mergeLocal mc o) ]
+ ]
+
+ remotes <- mapM (pushToCreate o) =<< syncRemotes (syncWith o)
warnSyncContentTransition o remotes
-- Remotes that git can push to and pull from.
let gitremotes = filter Remote.gitSyncableRemote remotes
@@ -277,16 +285,8 @@ seek' o = startConcurrency transferStages $ do
commandAction (withbranch cleanupLocal)
mapM_ (commandAction . withbranch . cleanupRemote) gitremotes
else do
- mc <- mergeConfig (allowUnrelatedHistories o)
-
- -- Syncing involves many actions, any of which
- -- can independently fail, without preventing
- -- the others from running.
- -- These actions cannot be run concurrently.
- mapM_ includeCommandAction $ concat
- [ [ commit o ]
- , [ withbranch (mergeLocal mc o) ]
- , map (withbranch . pullRemote o mc) gitremotes
+ includeactions
+ [ map (withbranch . pullRemote o mc) gitremotes
, [ mergeAnnex ]
]
@@ -325,8 +325,8 @@ seek' o = startConcurrency transferStages $ do
-- git-annex branch on the remotes in the
-- meantime, so pull and merge again to
-- avoid our push overwriting those changes.
- when (syncedcontent || exportedcontent) $ do
- mapM_ includeCommandAction $ concat
+ when (syncedcontent || exportedcontent) $
+ includeactions
[ map (withbranch . pullRemote o mc) gitremotes
, [ commitAnnex, mergeAnnex ]
]
@@ -334,6 +334,12 @@ seek' o = startConcurrency transferStages $ do
void $ includeCommandAction $ withbranch $ pushLocal o
-- Pushes to remotes can run concurrently.
mapM_ (commandAction . withbranch . pushRemote o) gitremotes
+ where
+ -- Syncing involves many actions, any of which
+ -- can independently fail, without preventing
+ -- the others from running.
+ -- These actions cannot be run concurrently.
+ includeactions = mapM_ includeCommandAction . concat
{- Merging may delete the current directory, so go to the top
- of the repo. This also means that sync always acts on all files in the
@@ -1188,3 +1194,43 @@ exportHasAnnexObjects = annexObjects . Remote.config
isThirdPartyPopulated :: Remote -> Bool
isThirdPartyPopulated = Remote.thirdPartyPopulated . Remote.remotetype
+
+{- Support for push-to-create of git repositories.
+ -
+ - When the remote does not exist yet, annex-ignore and
+ - annex-ignore-auto will be set. In that case, try to push.
+ -
+ - After a successful push, clear annex-ignore and regenerate the remote.
+ - That may re-set annex-ignore. Then annex-ignore-auto is cleared, so
+ - this will not run again, even when annex-ignore remains set.
+ -}
+pushToCreate :: SyncOptions -> Remote -> Annex Remote
+pushToCreate o r
+ | not (pushOption o) = return r
+ | Remote.gitSyncableRemote r && remoteAnnexIgnoreAuto (Remote.gitconfig r) =
+ ifM (liftIO $ getDynamicConfig $ remoteAnnexIgnore $ Remote.gitconfig r)
+ ( getCurrentBranch >>= \case
+ currbranch@(Just _, _) -> do
+ pushed <- includeCommandAction $
+ pushRemote o r currbranch
+ if pushed
+ then do
+ repo <- Remote.getRepo r
+ unsetRemoteIgnore repo
+ reloadConfig
+ r' <- regenremote
+ unsetRemoteIgnoreAuto repo
+ return r'
+ else return r
+ _ -> return r
+ , return r
+ )
+ | otherwise = return r
+ where
+ regenremote = do
+ -- Regenerating the remote list involves some extra work,
+ -- but push-to-create only happens once per remote.
+ rs <- Remote.remoteList' False
+ case filter (\r' -> Remote.name r' == Remote.name r) rs of
+ (r':_) -> return r'
+ _ -> return r
diff --git a/Config.hs b/Config.hs
index 892c49d4a5..11e2744648 100644
--- a/Config.hs
+++ b/Config.hs
@@ -1,6 +1,6 @@
{- Git configuration
-
- - Copyright 2011-2023 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -78,6 +78,15 @@ setRemoteAvailability r c = setConfig (remoteAnnexConfig r "availability") (show
setRemoteIgnore :: Git.Repo -> Bool -> Annex ()
setRemoteIgnore r b = setConfig (remoteAnnexConfig r "ignore") (Git.Config.boolConfig b)
+unsetRemoteIgnore :: Git.Repo -> Annex ()
+unsetRemoteIgnore r = unsetConfig (remoteAnnexConfig r "ignore")
+
+setRemoteIgnoreAuto :: Git.Repo -> Bool -> Annex ()
+setRemoteIgnoreAuto r b = setConfig (remoteAnnexConfig r "ignore-auto") (Git.Config.boolConfig b)
+
+unsetRemoteIgnoreAuto :: Git.Repo -> Annex ()
+unsetRemoteIgnoreAuto r = unsetConfig (remoteAnnexConfig r "ignore-auto")
+
setRemoteBare :: Git.Repo -> Bool -> Annex ()
setRemoteBare r b = setConfig (remoteAnnexConfig r "bare") (Git.Config.boolConfig b)
diff --git a/Remote/Git.hs b/Remote/Git.hs
index 36ebf53c65..f2c5206648 100644
--- a/Remote/Git.hs
+++ b/Remote/Git.hs
@@ -368,6 +368,7 @@ tryGitConfigRead gc autoinit r hasuuid
when longmessage $
warning $ UnquotedString $ "This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote " ++ n
setremote setRemoteIgnore True
+ setremote setRemoteIgnoreAuto True
setremote setter v = case Git.remoteName r of
Nothing -> noop
diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs
index a33d8a9dca..c31dec617f 100644
--- a/Types/GitConfig.hs
+++ b/Types/GitConfig.hs
@@ -396,6 +396,7 @@ globalConfigs =
data RemoteGitConfig = RemoteGitConfig
{ remoteAnnexCost :: DynamicConfig (Maybe Cost)
, remoteAnnexIgnore :: DynamicConfig Bool
+ , remoteAnnexIgnoreAuto :: Bool
, remoteAnnexSync :: DynamicConfig Bool
, remoteAnnexPull :: Bool
, remoteAnnexPush :: Bool
@@ -477,6 +478,7 @@ extractRemoteGitConfig r remotename = do
return $ RemoteGitConfig
{ remoteAnnexCost = annexcost
, remoteAnnexIgnore = annexignore
+ , remoteAnnexIgnoreAuto = getbool IgnoreAutoField False
, remoteAnnexSync = annexsync
, remoteAnnexPull = getbool PullField True
, remoteAnnexPush = getbool PushField True
@@ -586,6 +588,7 @@ data RemoteGitConfigField
= CostField
| CostCommandField
| IgnoreField
+ | IgnoreAutoField
| IgnoreCommandField
| SyncField
| SyncCommandField
@@ -659,6 +662,7 @@ remoteGitConfigField = \case
CostField -> inherited True "cost"
(Diff truncated)
comment
diff --git a/doc/todo/support_push_to_create/comment_4_f0b998bee7dafad3485348435c8362af._comment b/doc/todo/support_push_to_create/comment_4_f0b998bee7dafad3485348435c8362af._comment new file mode 100644 index 0000000000..27cd636616 --- /dev/null +++ b/doc/todo/support_push_to_create/comment_4_f0b998bee7dafad3485348435c8362af._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-20T18:48:32Z" + content=""" +The user might manually `git push`, knowing push-to-create is a thing, +but do it after `git-annex init`, and so annex-ignore is already set +and will stay set until they `git-annex push`. Which they may never do. + +To deal with this, when annex-ignore-auto is set, Remote.Git could check if +the remote tracking branch exists. If so, unset annex-ignore-auto and +annex-ignore and re-run the uuid probing. +"""]]
comment
diff --git a/doc/todo/support_push_to_create/comment_3_b07c7c7453e913b70e1608c38b08f885._comment b/doc/todo/support_push_to_create/comment_3_b07c7c7453e913b70e1608c38b08f885._comment new file mode 100644 index 0000000000..c503271998 --- /dev/null +++ b/doc/todo/support_push_to_create/comment_3_b07c7c7453e913b70e1608c38b08f885._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-20T15:49:26Z" + content=""" +Here's a better plan: annex-ignore remains the config, but +annex-ignore-auto is set when git-annex sets annex-ignore. +If the user manually sets annex-ignore, they don't set +annex-ignore-auto. + +Then, `git-annex push` can check if push-to-create happend +and unset annex-ignore iff annex-ignore-auto is set. +"""]]
comment
diff --git a/doc/todo/support_push_to_create/comment_2_09a3a43c73e6d581727d804aa27b8e42._comment b/doc/todo/support_push_to_create/comment_2_09a3a43c73e6d581727d804aa27b8e42._comment new file mode 100644 index 0000000000..8f428c2993 --- /dev/null +++ b/doc/todo/support_push_to_create/comment_2_09a3a43c73e6d581727d804aa27b8e42._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-01-20T15:16:46Z" + content=""" +Problem with a new config (call it annex-ignore-auto) is that users may +have learned to unset annex-ignore when there was a problem that got +corrected, and would need to learn to unset annex-ignore-auto instead. +While `git-annex push` would do it for them, they might not use that. + +Is this disruptive change worth it to support push-to-create? Probably. +But it does make the option of checking before push and after push and +unsetting annex-ignore seem more appealing. + +The situation where 2 users are doing push to create of the same remote +repo at the same time is very unlikely to happen. And currently what +happens is that both have to unset annex-ignore. A change that makes only +one of them but not the other need to unset it is not making things worse. +"""]]
comment
diff --git a/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_4_658fdda65fd7560de29803d46a3af22e._comment b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_4_658fdda65fd7560de29803d46a3af22e._comment new file mode 100644 index 0000000000..5af3437cc4 --- /dev/null +++ b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_4_658fdda65fd7560de29803d46a3af22e._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-19T18:57:00Z" + content=""" +p2phttp is fixed in master to commit the git-annex branch promptly. +"""]]
p2phttp: Commit git-annex branch changes promptly
Changes were piling up in the journal until p2phttp exited or another
git-annex command committed them. That could lead to situations where one
client made a change to the server, but didn't push the git-annex branch to
it, and so another client would be unaware of the change.
Rather than make a commit after every change, wait until the server has
been idle for 1 second, and then commit. This way, when a client is making
several changes, eg sending multiple files, it will wait until the end to
commit.
1 second was chosen as a time that is:
A) Short enough that no user is likely to notice that the server
waits this long before committing.
Long enough that a git-annex command that makes multiple changes to
the server is unlikely to wait this long after one change finishes
before sending the next change.
An example situation where B does not hold is `git-annex copy --to origin`
in a large repository, where the first and last file are not in the server,
and the rest are. So it takes more than 1 second after sending the first
file to get to sending the last file. An extra git-annex branch commit
happens then.
An example situation where A does not hold would have to be something
where the same user (or an automated process) makes a change to the server
in one clone, and then immediately pulls the git-annex branch in another
clone and expects it to reflect the change. That's possible, but in any
situation where there are two different users, 1 second is plenty of time.
And of course, when the same user is doing both, they only need to push the
git-annex branch to the server before pulling it to avoid any timing
issues.
It is possible that a server has so much change activity that it is never
left idle, and so never commits. A low bandwidth series of uploads, for
example. It would be possible to commit after N minutes even when not idle,
but I don't know what would be a good value for N. And any value in minutes
would be too long to satisfy A in any case.
Changes were piling up in the journal until p2phttp exited or another
git-annex command committed them. That could lead to situations where one
client made a change to the server, but didn't push the git-annex branch to
it, and so another client would be unaware of the change.
Rather than make a commit after every change, wait until the server has
been idle for 1 second, and then commit. This way, when a client is making
several changes, eg sending multiple files, it will wait until the end to
commit.
1 second was chosen as a time that is:
A) Short enough that no user is likely to notice that the server
waits this long before committing.
Long enough that a git-annex command that makes multiple changes tothe server is unlikely to wait this long after one change finishes
before sending the next change.
An example situation where B does not hold is `git-annex copy --to origin`
in a large repository, where the first and last file are not in the server,
and the rest are. So it takes more than 1 second after sending the first
file to get to sending the last file. An extra git-annex branch commit
happens then.
An example situation where A does not hold would have to be something
where the same user (or an automated process) makes a change to the server
in one clone, and then immediately pulls the git-annex branch in another
clone and expects it to reflect the change. That's possible, but in any
situation where there are two different users, 1 second is plenty of time.
And of course, when the same user is doing both, they only need to push the
git-annex branch to the server before pulling it to avoid any timing
issues.
It is possible that a server has so much change activity that it is never
left idle, and so never commits. A low bandwidth series of uploads, for
example. It would be possible to commit after N minutes even when not idle,
but I don't know what would be a good value for N. And any value in minutes
would be too long to satisfy A in any case.
diff --git a/CHANGELOG b/CHANGELOG
index 1e1cd87ee7..112c34db33 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -2,6 +2,7 @@ git-annex (10.20260116) UNRELEASED; urgency=medium
* Fix behavior of local git remotes that have annex-ignore
set to be the same as ssh git remotes.
+ * p2phttp: Commit git-annex branch changes promptly.
-- Joey Hess <id@joeyh.name> Mon, 19 Jan 2026 10:55:02 -0400
diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs
index 522ad944e1..b7f773301a 100644
--- a/P2P/Http/Server.hs
+++ b/P2P/Http/Server.hs
@@ -251,7 +251,7 @@ serveRemove
-> IsSecure
-> Maybe Auth
-> Handler t
-serveRemove st resultmangle su apiver (B64Key k) cu bypass sec auth = do
+serveRemove st resultmangle su apiver (B64Key k) cu bypass sec auth = changesBranch st su $ do
res <- withP2PConnection apiver WorkerPoolRunner st cu su bypass sec auth RemoveAction id
$ \(conn, _) ->
liftIO $ proxyClientNetProto conn $ remove Nothing k
@@ -273,7 +273,7 @@ serveRemoveBefore
-> IsSecure
-> Maybe Auth
-> Handler RemoveResultPlus
-serveRemoveBefore st su apiver (B64Key k) cu bypass (Timestamp ts) sec auth = do
+serveRemoveBefore st su apiver (B64Key k) cu bypass (Timestamp ts) sec auth = changesBranch st su $ do
res <- withP2PConnection apiver WorkerPoolRunner st cu su bypass sec auth RemoveAction id
$ \(conn, _) ->
liftIO $ proxyClientNetProto conn $
@@ -320,7 +320,7 @@ servePut
-> IsSecure
-> Maybe Auth
-> Handler t
-servePut mst resultmangle su apiver (Just True) _ k cu bypass baf _ _ sec auth = do
+servePut mst resultmangle su apiver (Just True) _ k cu bypass baf _ _ sec auth = changesBranch mst su $ do
res <- withP2PConnection' apiver WorkerPoolRunner mst cu su bypass sec auth WriteAction
(\cst -> cst { connectionWaitVar = False }) (liftIO . protoaction)
servePutResult resultmangle res
@@ -328,7 +328,7 @@ servePut mst resultmangle su apiver (Just True) _ k cu bypass baf _ _ sec auth =
protoaction conn = servePutAction conn k baf $ \_offset -> do
net $ sendMessage DATA_PRESENT
checkSuccessPlus
-servePut mst resultmangle su apiver _datapresent (DataLength len) k cu bypass baf moffset stream sec auth = do
+servePut mst resultmangle su apiver _datapresent (DataLength len) k cu bypass baf moffset stream sec auth = changesBranch mst su $ do
validityv <- liftIO newEmptyTMVarIO
let validitycheck = local $ runValidityCheck $
liftIO $ atomically $ readTMVar validityv
diff --git a/P2P/Http/State.hs b/P2P/Http/State.hs
index 29355c4851..d817a5e270 100644
--- a/P2P/Http/State.hs
+++ b/P2P/Http/State.hs
@@ -2,7 +2,7 @@
-
- https://git-annex.branchable.com/design/p2p_protocol_over_http/
-
- - Copyright 2024-2025 Joey Hess <id@joeyh.name>
+ - Copyright 2024-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -36,6 +36,7 @@ import Utility.HumanTime
import Logs.Proxy
import Annex.Proxy
import Annex.Cluster
+import qualified Annex.Branch
import qualified P2P.Proxy as Proxy
import qualified Types.Remote as Remote
import Remote.List
@@ -82,6 +83,7 @@ data PerRepoServerState = PerRepoServerState
, getServerMode :: GetServerMode
, openLocks :: TMVar (M.Map LockID Locker)
, lockedFilesQSem :: LockedFilesQSem
+ , branchChangesInProgress :: TMVar Bool
}
type AnnexWorkerPool = TMVar (WorkerPool (Annex.AnnexState, Annex.AnnexRead))
@@ -90,7 +92,7 @@ type GetServerMode = IsSecure -> Maybe Auth -> ServerMode
data ServerMode
= ServerMode
- { serverMode :: P2P.ServerMode
+ { serverMode :: P2P.ServerMode
, unauthenticatedLockingAllowed :: Bool
, authenticationAllowed :: Bool
}
@@ -105,6 +107,7 @@ mkPerRepoServerState acquireconn annexworkerpool annexstate annexread getserverm
<*> pure getservermode
<*> newTMVarIO mempty
<*> pure lockedfilesqsem
+ <*> newEmptyTMVarIO
data ActionClass = ReadAction | WriteAction | RemoveAction | LockAction
deriving (Eq)
@@ -318,15 +321,18 @@ mkP2PHttpServerState getservermode updaterepos proxyconnectionpoolsize clusterco
proxypool <- liftIO $ newTMVarIO (0, mempty)
asyncservicer <- liftIO $ async $
servicer myuuid myproxies proxypool reqv relv endv
- let endit = do
- liftIO $ atomically $ putTMVar endv ()
- liftIO $ wait asyncservicer
let servinguuids = myuuid : map proxyRemoteUUID (maybe [] S.toList myproxies)
annexstate <- liftIO . newTMVarIO =<< dupState
annexread <- Annex.getRead id
st <- liftIO $ mkPerRepoServerState
(acquireconn reqv annexstate annexread)
workerpool annexstate annexread getservermode lockedfilesqsem
+ asynccommitter <- liftIO $ async $
+ branchCommitter st endv
+ let endit = do
+ liftIO $ atomically $ putTMVar endv ()
+ liftIO $ wait asyncservicer
+ liftIO $ wait asynccommitter
return $ P2PHttpServerState
{ servedRepos = M.fromList $ zip servinguuids (repeat st)
, serverShutdownCleanup = endit
@@ -347,7 +353,7 @@ mkP2PHttpServerState getservermode updaterepos proxyconnectionpoolsize clusterco
`orElse`
(Left . Right <$> takeTMVar relv)
`orElse`
- (Left . Left <$> takeTMVar endv)
+ (Left . Left <$> readTMVar endv)
case reqrel of
Right (runnertype, annexstate, annexread, connparams, ready, respvar) -> do
servicereq runnertype annexstate annexread myuuid myproxies proxypool relv connparams ready
@@ -818,3 +824,56 @@ proxyConnectionPoolKey connparams =
, connectionBypass connparams
, connectionProtocolVersion connparams
)
+
+-- Use when running an action which may journal git-annex branch changes.
+-- This arranges for the journalled changes to be committed to the branch
+-- in a timely fashion, so that eg, soon after one client has sent a file,
+-- another client can pull the branch and see that the file is present in
+-- the server.
+changesBranch :: TMVar P2PHttpServerState -> B64UUID ServerSide -> Handler t -> Handler t
+changesBranch mstv su a = liftIO (getPerRepoServerState mstv su) >>= \case
+ Just st -> bracket_ (send st True) (send st False) a
+ Nothing -> a
+ where
+ send st b = liftIO $ atomically $
+ putTMVar (branchChangesInProgress st) b
+
+branchCommitter :: PerRepoServerState -> TMVar () -> IO ()
+branchCommitter st endv = do
+ idlev <- newEmptyTMVarIO
+ void $ async $ committer idlev
+ go idlev (0 :: Integer)
+ where
+ waitchangeorend = (Right <$> takeTMVar (branchChangesInProgress st))
+ `orElse` (Left <$> readTMVar endv)
+ go idlev n = atomically waitchangeorend >>= \case
+ Right True -> do
+ let !n' = succ n
+ -- Not idle.
+ void $ atomically $ tryTakeTMVar idlev
+ go idlev n'
+ Right False -> do
+ let n' = pred n
+ when (n' == 0) $
+ -- Idle.
+ atomically $ writeTMVar idlev ()
+ go idlev n'
+ Left () -> return ()
+ waitidleorend idlev =
+ (Right <$> readTMVar idlev)
+ `orElse` (Left <$> readTMVar endv)
+ committer idlev =
+ -- Wait until a change has completed and it's idle.
+ atomically (waitidleorend idlev) >>= \case
+ Right () -> do
+ threadDelaySeconds (Seconds 1)
+ -- Once it's been idle for a second,
+ -- commit the journalled changes.
+ atomically (tryTakeTMVar idlev) >>= \case
+ Just () ->
+ void $ handleRequestAnnex st $
+ Annex.Branch.commit =<< Annex.Branch.commitMessage
+ Nothing -> noop
+ committer idlev
+ Left () -> return ()
+
diff --git a/doc/bugs/p2phttp_timely_journal_commit.mdwn b/doc/bugs/p2phttp_timely_journal_commit.mdwn
index c0ae1c3217..65c7dfec27 100644
--- a/doc/bugs/p2phttp_timely_journal_commit.mdwn
+++ b/doc/bugs/p2phttp_timely_journal_commit.mdwn
@@ -11,3 +11,5 @@ but does not ever push its git-annex branch, other clients will never learn
that the repository has a copy of the file. --[[Joey]]
[[!tag projects/INM7]]
+
+> [[fixed|done]] --[[Joey]]
comment
diff --git a/doc/bugs/p2phttp_timely_journal_commit/comment_1_59b8cebeae88d50dbcf410ee8bec4a75._comment b/doc/bugs/p2phttp_timely_journal_commit/comment_1_59b8cebeae88d50dbcf410ee8bec4a75._comment new file mode 100644 index 0000000000..8d99d8f218 --- /dev/null +++ b/doc/bugs/p2phttp_timely_journal_commit/comment_1_59b8cebeae88d50dbcf410ee8bec4a75._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-19T15:37:59Z" + content=""" +It could simply commit after each change. +But that would bloat the git-annex branch with a lot of small commits when +a lot of files are being sent to the server in one batch. + +I think what probably makes sense is to detect when the p2phttp +server has been idle for some amount of time, and commit then. +A few seconds idle should be enough to coalesce everything done by +a typical `git annex push` into a single git-annex branch commit. +"""]]
respond, open bug
diff --git a/doc/bugs/p2phttp_timely_journal_commit.mdwn b/doc/bugs/p2phttp_timely_journal_commit.mdwn new file mode 100644 index 0000000000..c0ae1c3217 --- /dev/null +++ b/doc/bugs/p2phttp_timely_journal_commit.mdwn @@ -0,0 +1,13 @@ +`git-annex p2phttp`, when eg receiving files into the repository, leaves +git-annex location log changes in the journal and does not commit them to +the git-annex branch in a timely fashion. + +Usually git-annex branch commits happen when a git-annex command finishes, +but p2phttp runs for a long time. So a commit won't happen until it's +restarted or some other git-annex command is run in the repo. + +This causes problems. Ie, if one client copies a file to the repository, +but does not ever push its git-annex branch, other clients will never learn +that the repository has a copy of the file. --[[Joey]] + +[[!tag projects/INM7]] diff --git a/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_3_06a8830d23ffd33d425147bf859005bc._comment b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_3_06a8830d23ffd33d425147bf859005bc._comment new file mode 100644 index 0000000000..841744c287 --- /dev/null +++ b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_3_06a8830d23ffd33d425147bf859005bc._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-19T15:10:44Z" + content=""" +This seems like a bug in the p2phttp server, it should not be leaving the +git-annex branch uncommitted for long periods of time. It's easy enough to +show that it leaves changes in the journal for a long time. + +Probably we don't usually notice the bug because usually, if the p2phttp server +doesn't commit the journal, the client will record the same information +in the git-annex branch on its side, and push it out in the normal course +of events, eg during a sync. I assume your JS client doesn't do that. + +I've filed a bug: [[bugs/p2phttp_timely_journal_commit]] + +(As to the p2phttp clientuuid parameter, it is actually only used in transfer +logs, which don't get into the git-annex branch. Using a made-up non-UUID there, +or for that matter, using a UUID that "belongs" to someone else won't cause +any real problem. (`git-annex info` will use the non-UUID in the "transfers +in progress" display). This does not seem related to your problem.) +"""]]
Revert "remove incorrect sentance"
This reverts commit fcb2b19910dab3c9a4a1149ca966940bed130b17.
Actually, the docs are correct. It works for a ssh remote. There is a
bug preventing it from working as documented with a local git remote
though.
This reverts commit fcb2b19910dab3c9a4a1149ca966940bed130b17.
Actually, the docs are correct. It works for a ssh remote. There is a
bug preventing it from working as documented with a local git remote
though.
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 297a7ed7b3..7ec1efb314 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1580,6 +1580,7 @@ Remotes are configured using these settings in `.git/config`. If set to `true`, prevents git-annex from storing or retrieving annexed file contents on this remote by default. + (You can still request it be used with the `--from` and `--to` options.) This is, for example, useful if the remote is located somewhere without git-annex-shell. (For example, if it's on GitHub).
Added a comment: Mismatch with observations
diff --git a/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_2_c85202a4ed5022a08e4b390cb9eb5f29._comment b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_2_c85202a4ed5022a08e4b390cb9eb5f29._comment
new file mode 100644
index 0000000000..2c1a78eeae
--- /dev/null
+++ b/doc/forum/Find_never__40____33____41___used_files_in_annex__63__/comment_2_c85202a4ed5022a08e4b390cb9eb5f29._comment
@@ -0,0 +1,71 @@
+[[!comment format=mdwn
+ username="mih"
+ avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd"
+ subject="Mismatch with observations"
+ date="2026-01-16T15:30:17Z"
+ content="""
+Thanks for detailing the behavior. I am observing something different, though. The context is a git-annex repo at a forgejo-aneksajo site.
+
+I used a JS client to upload annex keys to a an annex with uuid `f1a8ef1c-...`. This worked. I see them in `annex/objects` at the remote
+
+```
+git@loki:~/git/repositories/internal/pool-files.git$ tree annex/objects/
+annex/objects/
+|-- d73
+| `-- 370
+| `-- SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.1
+| `-- SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.1
+|-- db2
+| `-- f4b
+| `-- SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.svg
+| `-- SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.svg
+|-- dc7
+| `-- 005
+| `-- SHA256E-s12027--6f4d23344053ca2f22c4a40eec33e178b70e67d63edd1c5e4f05d96053548b69.png
+| `-- SHA256E-s12027--6f4d23344053ca2f22c4a40eec33e178b70e67d63edd1c5e4f05d96053548b69.png
+`-- fa0
+ `-- d63
+ `-- SHA256E-s324254--d6581b4f13219fe93aa8b020df8ec8875881c5d97f28493c6b7b7ac9e80c2532.png
+ `-- SHA256E-s324254--d6581b4f13219fe93aa8b020df8ec8875881c5d97f28493c6b7b7ac9e80c2532.png
+```
+
+I also see this:
+
+```
+git@...:~/git/repositories/internal/pool-files.git$ find . -name '*f1a8ef1c-...*'
+./annex/transfer/upload/f1a8ef1c-...
+git@...:~/git/repositories/internal/pool-files.git$ grep -R 'f1a8ef1c-6d8a-40e3-970f-4634390d961f' .
+./annex/journal/db2_f4b_SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.svg.log:1766077308s 1 f1a8ef1c-...
+./annex/journal/dc7_005_SHA256E-s12027--6f4d23344053ca2f22c4a40eec33e178b70e67d63edd1c5e4f05d96053548b69.png.log:1766058908s 1 f1a8ef1c-...
+./annex/journal/fa0_d63_SHA256E-s324254--d6581b4f13219fe93aa8b020df8ec8875881c5d97f28493c6b7b7ac9e80c2532.png.log:1766060307s 1 f1a8ef1c-...
+./annex/journal/d73_370_SHA256E-s61542--80e2f39feca014a9d6cbaa510e37d96c4663500a1cf6dd74a4cd10b42f3e9169.1.log:1766077792s 1 f1a8ef1c-...
+./config: uuid = f1a8ef1c-...
+```
+
+This made me (incorrectly) think whether this could mean that the repo thinks the upload came FROM f1a8ef1c-... ?
+
+The p2phttp request is made to an endpoint that is composed like this:
+
+```
+endpoint = `${baseUrl}/${targetUuid}/v4/put?key=${encodeURIComponent(fileData.value.annexKey)}&clientuuid=${encodeURIComponent(clientUuid)}`
+```
+
+where
+
+```
+ baseUrl: https://<site>/git-annex-p2phttp/git-annex
+ targetUuid: f1a8ef1c-...
+ clientUuid: not-a-uuid
+```
+
+Notice that `clientUuid` is not a UUID (redacted original value that also was not a valid UUID).
+
+I have adjusted that to be an actual UUID, and did another upload. This achieved two things:
+
+1. A new file uploaded successfully (as before)
+2. The pending logs were applied and the git-annex branch was updated -- exactly like you described.
+
+However, the new upload is now sitting in the journal, and has not been taken into account, and additional uploads do not trigger a git-annex branch update immediately.
+
+This issue may be in the realm of forgejo-aneksajo, and how it runs the p2phttp server. The previous uploads were made mid-December (as seen from the timestamps in the journal). Nothing has triggered a journal commit, also not the fetch of the git-annex branch.
+"""]]
add news item for git-annex 10.20260115
diff --git a/doc/news/version_10.20250925.mdwn b/doc/news/version_10.20250925.mdwn deleted file mode 100644 index 3cba8b8b77..0000000000 --- a/doc/news/version_10.20250925.mdwn +++ /dev/null @@ -1,28 +0,0 @@ -git-annex 10.20250925 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Fix bug that made changes to a special remote sometimes be missed when - importing a tree from it. After upgrading, any such missed changes - will be included in the next tree imported from a special remote. - Fixes reversion introduced in version 10.20230626. - * Fix crash operating on filenames that are exactly 21 bytes long - and begin with a utf-8 character. - * Fix hang that could occur when using git-annex adjust on a branch with - a number of files greater than annex.queuesize. - * Fix bug that could cause an invalid utf-8 sequence to be used in a - temporary filename when the input filename was valid utf-8. - * Improve performance when used with a local git remote that has a - large working tree. - * drop: --fast support when dropping from a remote. - * Added annex.assistant.allowunlocked config. - * Add git-remote-p2p-annex and git-remote-tor-annex to standalone builds. - * enableremote: Disallow using type= to attempt to change the type of an - existing remote. - * Add build warnings when git-annex is built without the OsPath - build flag. - * version: Report on whether it was built with the OsPath build flag. - * Avoid leaking file descriptors to child processes started by git-annex - in some situations. Note that when not built with the OsPath build - flag, these leaks can still happen. - * git-annex.cabal: Turn on the OsPath build flag by default. - * p2phttp: Fix a hang that could occur when used with --directory, - and a repository in the directory got removed. - * Removed support for building with unmaintained cryptonite, use crypton."""]] \ No newline at end of file diff --git a/doc/news/version_10.20260115.mdwn b/doc/news/version_10.20260115.mdwn new file mode 100644 index 0000000000..9835c5150a --- /dev/null +++ b/doc/news/version_10.20260115.mdwn @@ -0,0 +1,20 @@ +git-annex 10.20260115 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * New git configs annex.initwanted, annex.initrequired, and + annex.initgroups. + * Fix bug that could result in a tree imported from a remote containing + missing git blobs. + * fix: Populate unlocked pointer files in situations where a git command, + like git reset or git stash, leaves them unpopulated. + * Pass www-authenticate headers in to to git credential, to support + eg, git-credential-oauth. + * import: Fix display of some import errors. + * external: Respond to GETGITREMOTENAME during INITREMOTE with the remote + name. + * When displaying sqlite error messages, include the path to the database. + * webapp: Remove support for local pairing; use wormhole pairing instead. + * git-annex.cabal: Removed pairing build flag, and no longer depends + on network-multicast or network-info. + * Remove support for building with old versions of persistent and + persistent-sqlite. + * Removed support for building with ghc older than 9.6.6. + * stack.yaml: Update to lts-24.26."""]] \ No newline at end of file
Added a comment: Appending `or present` is a funny idea
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_15_96e536dfab68ddce34c92ccba0186c30._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_15_96e536dfab68ddce34c92ccba0186c30._comment new file mode 100644 index 0000000000..ffa0e586a0 --- /dev/null +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_15_96e536dfab68ddce34c92ccba0186c30._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Appending `or present` is a funny idea" + date="2026-01-15T23:02:54Z" + content=""" +> Hmm, if the default always had \"or present\" added to it, at least the surprise drop would not be a concern. + +That is a very funny idea, I like it! +"""]]
rename defaultwanted to initwanted (etc)
This is to leave open the possibility of a git-annex config
default that is used when there is no preferred content set.
Currently, copying them over at init time feels safe, and a git-annex
config default has known safety problems that would need to be
addressed. But maybe they can be eventually.
This is to leave open the possibility of a git-annex config
default that is used when there is no preferred content set.
Currently, copying them over at init time feels safe, and a git-annex
config default has known safety problems that would need to be
addressed. But maybe they can be eventually.
diff --git a/Annex/Init.hs b/Annex/Init.hs
index 4d6c3a9b73..e610fbce00 100644
--- a/Annex/Init.hs
+++ b/Annex/Init.hs
@@ -20,7 +20,7 @@ module Annex.Init (
probeCrippledFileSystem,
probeCrippledFileSystem',
isCrippledFileSystem,
- propigateDefaultGitConfigs,
+ propigateInitGitConfigs,
) where
import Annex.Common
@@ -177,7 +177,7 @@ initialize' startupannex mversion _initallowed = do
)
propigateSecureHashesOnly
when (isNothing initialversion) $
- propigateDefaultGitConfigs =<< getUUID
+ propigateInitGitConfigs =<< getUUID
createInodeSentinalFile False
fixupUnusualReposAfterInit
@@ -504,12 +504,12 @@ propigateSecureHashesOnly =
=<< getGlobalConfig "annex.securehashesonly"
{- Propigate git configs that set defaults. -}
-propigateDefaultGitConfigs :: UUID -> Annex ()
-propigateDefaultGitConfigs u = do
+propigateInitGitConfigs :: UUID -> Annex ()
+propigateInitGitConfigs u = do
gc <- Annex.getGitConfig
- set (annexDefaultWanted gc) preferredContentSet
- set (annexDefaultRequired gc) requiredContentSet
- case annexDefaultGroups gc of
+ set (annexInitWanted gc) preferredContentSet
+ set (annexInitRequired gc) requiredContentSet
+ case annexInitGroups gc of
[] -> noop
groups -> groupChange u (S.union (S.fromList groups))
where
diff --git a/CHANGELOG b/CHANGELOG
index 9c4a8ae418..6c2f303b07 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,7 +1,7 @@
git-annex (10.20260115) upstream; urgency=medium
- * New git configs annex.defaultwanted, annex.defaultrequired, and
- annex.defaultgroups.
+ * New git configs annex.initwanted, annex.initrequired, and
+ annex.initgroups.
* Fix bug that could result in a tree imported from a remote containing
missing git blobs.
* fix: Populate unlocked pointer files in situations where a git command,
diff --git a/Command/InitRemote.hs b/Command/InitRemote.hs
index 6fd6a0d75c..d4e5f1086d 100644
--- a/Command/InitRemote.hs
+++ b/Command/InitRemote.hs
@@ -128,7 +128,7 @@ cleanup t u name c o = do
case sameas o of
Nothing -> do
describeUUID u (toUUIDDesc name)
- propigateDefaultGitConfigs u
+ propigateInitGitConfigs u
Logs.Remote.configSet u c
Just _ -> do
cu <- liftIO genUUID
diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs
index 9057989495..a33d8a9dca 100644
--- a/Types/GitConfig.hs
+++ b/Types/GitConfig.hs
@@ -172,9 +172,9 @@ data GitConfig = GitConfig
, annexViewUnsetDirectory :: ViewUnset
, annexClusters :: M.Map RemoteName ClusterUUID
, annexFullyBalancedThreshhold :: Double
- , annexDefaultWanted :: Maybe String
- , annexDefaultRequired :: Maybe String
- , annexDefaultGroups :: [Group]
+ , annexInitWanted :: Maybe String
+ , annexInitRequired :: Maybe String
+ , annexInitGroups :: [Group]
}
extractGitConfig :: ConfigSource -> Git.Repo -> GitConfig
@@ -319,10 +319,10 @@ extractGitConfig configsource r = GitConfig
, annexFullyBalancedThreshhold =
fromMaybe 0.9 $ (/ 100) <$> getmayberead
(annexConfig "fullybalancedthreshhold")
- , annexDefaultWanted = getmaybe (annexConfig "defaultwanted")
- , annexDefaultRequired = getmaybe (annexConfig "defaultrequired")
- , annexDefaultGroups = map (Group . encodeBS) $
- getwords (annexConfig "defaultgroups")
+ , annexInitWanted = getmaybe (annexConfig "initwanted")
+ , annexInitRequired = getmaybe (annexConfig "initrequired")
+ , annexInitGroups = map (Group . encodeBS) $
+ getwords (annexConfig "initgroups")
}
where
getbool k d = fromMaybe d $ getmaybebool k
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index 52c601b6b5..297a7ed7b3 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -1061,25 +1061,25 @@ repository, using [[git-annex-config]]. See its man page for a list.)
If this is set to `true` then it will instead use the `annex.addunlocked`
configuration to decide which files to add unlocked.
-* `annex.defaultwanted`
+* `annex.initwanted`
When this is set to a preferred content expression, all
new repositories (and special remotes) will have it copied into their
configuration when initialized, the same as if you had run
[[git-annex-wanted]](1).
-* `annex.defaultrequired`
+* `annex.initrequired`
When this is set to a preferred content expression, all
new repositories (and special remotes) will have it copied into their
configuration when initialized, the same as if you had run
[[git-annex-required]](1).
-* `annex.defaultgroups`
+* `annex.initgroups`
When this is set to a list of groups (separated by whitespace), all
- new repositories (and special remotes) will start out in those groups,
- the same as if you had run [[git-annex-group]](1).
+ new repositories (and special remotes) start out in those groups
+ when initialized, the same as if you had run [[git-annex-group]](1).
* `annex.numcopies`
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_14_60dddf4e5a89e344ad7bb03fd4539635._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_14_60dddf4e5a89e344ad7bb03fd4539635._comment
new file mode 100644
index 0000000000..3f2fe8bbf1
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_14_60dddf4e5a89e344ad7bb03fd4539635._comment
@@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 14"""
+ date="2026-01-15T16:34:09Z"
+ content="""
+Hmm, if the default always had "or present" added to it, at least the
+surprise drop would not be a concern.
+
+I am going to change the names to "initwanted" etc as you suggested,
+to avoid closing off the possiblity of adding a global default later.
+"""]]
Added a comment
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_13_d16a7c55ad3b86370aa06fc3964c1ebf._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_13_d16a7c55ad3b86370aa06fc3964c1ebf._comment new file mode 100644 index 0000000000..3edf7edd5e --- /dev/null +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_13_d16a7c55ad3b86370aa06fc3964c1ebf._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 13" + date="2026-01-15T08:51:46Z" + content=""" +> It's probably somewhat common to want to get files from origin, but not let origin make config changes that drop all the files they have previously shared. + +Fair enough. + +So I guess one can encourage users to include `git config --global annex.jobs 4` and `git config annex.defaultwanted present` in their setup. Thanks for implementing that. +"""]]
response
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_12_76cc0409b834e12aee0eaedb5d5c1e2c._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_12_76cc0409b834e12aee0eaedb5d5c1e2c._comment new file mode 100644 index 0000000000..71d36cb2f1 --- /dev/null +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_12_76cc0409b834e12aee0eaedb5d5c1e2c._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 12""" + date="2026-01-14T17:42:46Z" + content=""" +> Anyone with write access to a repo can already freely change any group, groupwanted or wanted for any involved clone - if it's present in the git-annex branch + +A good point certianly. + +> So your concerns only apply to private repos that don't record their activity in the git-annex branch by using `annex.private=true`. + +Well also repos that lack permission to push or are simply not pushed to +origin. + +It's probably somewhat common to want to get files from origin, but not let +origin make config changes that drop all the files they have previously +shared. +"""]]
fix comment (TAB to indent markdown lists is a bad idea in the webinterface 😅)
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment index fe56f47ce2..c3ecbed1e4 100644 --- a/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment @@ -12,13 +12,18 @@ Yes, but the same is already possible for anyone with write access to a repo. I Other situations I can imagine consider groups of people (or just single users) who trust each other when using a git-annex repo. git-annex is not designed to solve such permission problems - neither is git itself. -git-annex usages: +In your publicly readable (not writable) git-annex-builds repo on the other hand, if *you* were to set `git annex config --set annex.defaultwanted nothing`, then people who just run `git annex sync|assist|assistant` in their clones would have their downloaded builds dropped, okay. -- publicly writable git-annex repo -(bad idea anyway for several reasons) -- publicly readable git-annex repo (e.g. your git-annex-builds repo) - -> people you were able to social engineer to doing that +### git-annex usage scenarios +- publicly writable git-annex repo + - (bad idea anyway for several reasons without any form of permission control on the remote side) + - malicious people could set `git annex config --set annex.defaultwanted nothing` at some point and other's clones would have files dropped on sync. +- publicly readable git-annex repo to provide assets (e.g. your git-annex-builds repo) + - only the owner could do such shenanigans. Users can avoid it by using `git annex pull` and `git annex get` instead of `sync|assist|assistant` (which arguably makes more sense in this case anyway) or explicitly stating their `git annex wanted here ...`. +- groups or individuals working on a repo in several clones - everyone has write access, in a team for example + - anyone can already happily destroy repo contents and control other's wanted expressions + - `git annex config annex.defaultwanted` can be set as an established "repo policy" for everyone's convenience, that anyone can overwrite locally with `git annex wanted here ...`. + - if you run `git annex assist|sync|assistant|satisfy`, you *accept the repo's policy*, as with your `securehashesonly` example. If you're paranoid, don't use these sync commands, but do only exactly what you want such as `git annex pull -g`, `git annex get <thatfile>`, `git annex wanted ...`, etc. """]]
Added a comment
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment
new file mode 100644
index 0000000000..fe56f47ce2
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_11_075df1f72ae05f981986f23b897a79a2._comment
@@ -0,0 +1,24 @@
+[[!comment format=mdwn
+ username="nobodyinperson"
+ avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
+ subject="comment 11"
+ date="2026-01-14T14:13:00Z"
+ content="""
+> you can set annex.defaultwanted to \"standard\", and annex.defaultgroups to some group, and then changing git-annex groupwanted will affect all repositories that copied that defaultwanted into their config
+
+> If annex.defaultwanted were able to be changed for all repositories with git-annex config, then here's a really ugly security problem [...]
+
+Yes, but the same is already possible for anyone with write access to a repo. I can `git annex wanted JOEYS-UUID nothing`, wait for your assistant or manual sync to auto-drop all files (would also need to set `{num,min}copies` to 1 for that, and even then it might not auto-drop it depending on the remotes). Anyone with write access to a repo can already freely change any group, groupwanted or wanted for any involved clone - if it's present in the git-annex branch (i.e. not made with `git config annex.private=true`). So your concerns only apply to private repos that don't record their activity in the git-annex branch by using `annex.private=true`. Making a git-annex repo private is a conscious, active choice. One does not need to do it if one only consumes files and does not have push access anyway. So that'll be people who actively change repo content, probably consume it, but don't want their repo to show up in `git annex info`. Maybe for a publicly-pushable git-annex repo where everyone can add new files (who would host that anyway...). In this case, yes, users of that repo can't trust each other and there setting something like `git annex config --set annex.defaultwanted nothing` at some point can lead to people's `git annex sync|assist|assistant` to suddenly drop their files - and probably also on the central remote. But I'd argue that this kind of publicly writable setup has so many other obvious problems that `annex.defaultwanted` is one of the minor ones.
+
+Other situations I can imagine consider groups of people (or just single users) who trust each other when using a git-annex repo. git-annex is not designed to solve such permission problems - neither is git itself.
+
+git-annex usages:
+
+- publicly writable git-annex repo
+(bad idea anyway for several reasons)
+- publicly readable git-annex repo (e.g. your git-annex-builds repo)
+
+> people you were able to social engineer to doing that
+
+
+"""]]
Added a comment
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_9_bf66e70d86d7c334781d6bd1827a1fe9._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_9_bf66e70d86d7c334781d6bd1827a1fe9._comment new file mode 100644 index 0000000000..99cbf2bfb9 --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_9_bf66e70d86d7c334781d6bd1827a1fe9._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 9" + date="2026-01-14T07:26:26Z" + content=""" +Thank you! +"""]]
comments
diff --git a/Remote/List.hs b/Remote/List.hs
index 80a9781f10..7b2ba4f048 100644
--- a/Remote/List.hs
+++ b/Remote/List.hs
@@ -110,8 +110,8 @@ remoteGen' adjustconfig m t g = do
Just r -> Just <$> adjustExportImport (adjustReadOnly (addHooks r)) rs
{- Updates a local git Remote, re-reading its git config. -}
-updateRemote :: Remote -> Annex (Maybe Remote)
-updateRemote remote = do
+updateRemote :: Remote -> Bool -> Annex (Maybe Remote)
+updateRemote remote honorignore = do
m <- remoteConfigMap
remote' <- updaterepo =<< getRepo remote
remoteGen m (remotetype remote) remote'
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_10_7aeec1887d1851624015c6dbf2feecf5._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_10_7aeec1887d1851624015c6dbf2feecf5._comment
new file mode 100644
index 0000000000..e68007d5cc
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_10_7aeec1887d1851624015c6dbf2feecf5._comment
@@ -0,0 +1,26 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 10"""
+ date="2026-01-13T17:42:20Z"
+ content="""
+If annex.defaultwanted were able to be changed for all repositories with
+`git-annex config`, then here's a really ugly security problem:
+
+* First, I make sure to get a copy of every annexed file.
+* Then I run `git-annex config annex.defaultwanted nothing`
+* Then I wait for git-annex to drop every file from your repository.
+* Finally, I demand $ to get your files back.
+
+Now, the same can be done by convincing people to add their repository to
+some group and set preferred content to "standard", and later
+changing the groupwanted. But that only works on people you were able to
+social engineer to doing that, not everyone who cloned a repository
+with the default settings.
+
+And beyond the ransom problem, there's the problem that once this is set,
+any change to it is going to affect most every other user of the
+repository. With groupwanted there's a communicated intent in the name of
+the group, and there can be different groups with different versions of the
+preferred content expression. This lacks that, it encourages flag day
+events.
+"""]]
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_8_30f883e3d14304b1ed1eaf3f1a6421f2._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_8_30f883e3d14304b1ed1eaf3f1a6421f2._comment
new file mode 100644
index 0000000000..ef415b80db
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_8_30f883e3d14304b1ed1eaf3f1a6421f2._comment
@@ -0,0 +1,17 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 8"""
+ date="2026-01-13T17:21:09Z"
+ content="""
+I'm on the fence about whether the kind of security impact I discussed
+earlier is really something that should prevent a global setting, or not.
+
+`git-annex config` of `annex.securehashesonly` is another example of
+something where my hypothetical "auditing repos" would be vulnerable to a
+behavior change that might be security significant. Since that gets copied
+from the git-annex config to git config at init time, behavior in a
+new clone might be different than behavior in an existing clone.
+
+Does that mean it's ok for there to be more cases where there can be such a
+potential security impact? I don't know.
+"""]]
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_9_115e5b2d9eef39ca086ca6d2b1e67627._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_9_115e5b2d9eef39ca086ca6d2b1e67627._comment
new file mode 100644
index 0000000000..0ade7fc4ff
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_9_115e5b2d9eef39ca086ca6d2b1e67627._comment
@@ -0,0 +1,13 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 9"""
+ date="2026-01-13T17:29:35Z"
+ content="""
+Note that you can set annex.defaultwanted to "standard", and
+annex.defaultgroups to some group, and then changing
+`git-annex groupwanted` will affect all repositories that copied that
+defaultwanted into their config.
+
+So that's a way to be able to make changes that will affect other people's
+clones. But only ones that they have opted into.
+"""]]
comment
diff --git a/doc/todo/support_push_to_create/comment_1_1ad156f002e61c1a3db246c103c37b63._comment b/doc/todo/support_push_to_create/comment_1_1ad156f002e61c1a3db246c103c37b63._comment new file mode 100644 index 0000000000..8c288fe8c2 --- /dev/null +++ b/doc/todo/support_push_to_create/comment_1_1ad156f002e61c1a3db246c103c37b63._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-13T17:02:50Z" + content=""" +The annex-ignore config can be manually set by the user to prevent using an +otherwise usable remote. The man page gives the example of a network +connection that is too slow to use normally. + +It may be that no users are actually using annex-ignore like this. +Using annex-sync seems more likely. But, it's hard to rule out. + +That presents a problem, since this would need to unset annex-ignore once +the repository was created. + +Checking before push if the repository exists, and only unsetting +annex-ignore if it did not exist before sync, but does afterwards, would be +one way around this problem. It does mean that, if 2 people are making +a repository at the same location at the same time, the loser may be left +with annex-ignore set due to the other person having created the +repository. + +Or, a new config could be added, that is like annex-ignore, but is only +set by git-annex, and not by the user. Keeping annex-ignore's behavior, +but making git-annex set and unset the new config as needed. +"""]]
remove incorrect sentance
Testing with annex-ignore set on a remote, git-annex get --from that
remote fails with "cannot access remote"
Testing with annex-ignore set on a remote, git-annex get --from that
remote fails with "cannot access remote"
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index bf69cc4438..52c601b6b5 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1580,7 +1580,6 @@ Remotes are configured using these settings in `.git/config`. If set to `true`, prevents git-annex from storing or retrieving annexed file contents on this remote by default. - (You can still request it be used with the `--from` and `--to` options.) This is, for example, useful if the remote is located somewhere without git-annex-shell. (For example, if it's on GitHub).
Added a comment: Thanks! Maybe still consider a repo-wide setting for default wanted content?
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_7_0fcdc57b6e094bfdd450286be26c9b56._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_7_0fcdc57b6e094bfdd450286be26c9b56._comment
new file mode 100644
index 0000000000..24869b1658
--- /dev/null
+++ b/doc/todo/Setting_default_preferred_content_expressions/comment_7_0fcdc57b6e094bfdd450286be26c9b56._comment
@@ -0,0 +1,27 @@
+[[!comment format=mdwn
+ username="nobodyinperson"
+ avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5"
+ subject="Thanks! Maybe still consider a repo-wide setting for default wanted content? "
+ date="2026-01-13T15:50:32Z"
+ content="""
+Hi joey, thank you for picking this up. IIUC, what you implemented (`git config annex.default{wanted,required,group}`) allows you to set these configs *locally* and then spare *yourself* the initial `git annex wanted . present` (etc.) setup calls. This is cool, thanks!
+
+The problem I was trying to express here is however that `git annex assist` (the very convenient do-it-all command you can tell non-techy people to use to 'do the syncing stuff') will by default pull in *all* files, resulting in a terrible user experience: it's slow (of course nobody sets `annex.jobs=cpus` or uses `-j4`), it takes up a ridiculous amount of space, people will say 'I don't need that 3GB file, why does it download it?' (of course nobody remembers or understands to set `git annex wanted . present` or anything complex), etc. Sure, this is a question of user education, but good defaults can make for a much easier onboarding experience. (I know you are not so fond of such a do-it-all command, but this `git annex assist` single-stepping command really has been a good git annex selling point in the discussions and talks I had.)
+
+So if there was a global setting like `git annex config --set annex.defaultwanted 'present or include=*.pdf'` that would set the default wanted expression for any clone, one could define what the most important files are and tell everyone to `git annex get` the others if necessary. `git annex assist` will be fast, only pull in the most important files (or none!), people can modify or add new stuff, and run `git annex assist` quickly again.
+
+I would say `git annex config --set annex.defaultwanted <whatever>` should **not** execute `git annex wanted . <whatever>` and as such hard-code it in the git-annex branch for every repo (because then again, when would that even be executed? Would it be re-set after another `git annex config --set annex.defaultwanted <whatever2>`? When?). Instead, `git annex --set annex.defaultwanted <whatever>` should cause the *default* (i.e. fallback) value of `git annex wanted .` to be `<whatever>`, which is currently just `\"\"`, which I guess means something like `include=*` IIRC.
+
+## Re: your security concerns
+
+I understand your hesitation to add more `git annex config ...` global repo configs. But here I would argue:
+
+- git annex does not have a permissions model anyway. Anyone with push access to a repo can change any policy, any wanted expression for any repo, etc. If that is a problem, then git annex might not be the right tool. I guess one can implement some level of permission control with post-receive hooks on the remote side, but that is outside git annex's scope. git annex assumes everyone writing to the repo is nice.
+- I don't really understand your 'auditing' repo situation. Does it mean you regularly clone some repos, run `git annex pull|assist` in them to check if it still works? In that case the only negative thing `git annex config --set annex.defaultwanted` could do is indeed leaving you with *less* downloaded files. If one needs all files, `git annex get --all` has always been the way to go, hasn't it? 🤔 Or what kind of external repos from bad actors maliciously setting a default wanted expression do you 'audit'? And how is not having all files after `git annex assist` bad in this case?
+
+*Should* you consider implementing `git annex config --set annex.defaultwanted`, it would conflict with the freshly introduced `git config annex.defaultwanted` local settings. We could rename those to `git config annex.initdefaultwanted` (or just `annex.initwanted), to emphasize that those only happen on `git annex init`. Then `git annex config --set annex.defaultwanted` would also sound very sensible to me in contrast, as it really configures the default, and does not modify individual repos.
+
+Cheers,
+Yann
+
+"""]]
comment
diff --git a/doc/projects/datalad/bugs-done/add_config_var_preventing_adjusted_branch_mode/comment_7_7f0b7073893ee4ab73f273f6204e2a27._comment b/doc/projects/datalad/bugs-done/add_config_var_preventing_adjusted_branch_mode/comment_7_7f0b7073893ee4ab73f273f6204e2a27._comment new file mode 100644 index 0000000000..25c1c70b81 --- /dev/null +++ b/doc/projects/datalad/bugs-done/add_config_var_preventing_adjusted_branch_mode/comment_7_7f0b7073893ee4ab73f273f6204e2a27._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2026-01-13T15:33:19Z" + content=""" +The automatic init that git-annex does in a clone does enter adjusted +branch. I think I was not considering that because you were talking about +having an existing repository and git-annex entering the adjusted branch +later. + +We can reopen this if you want, unsure. +"""]]
response
diff --git a/doc/tips/cloning_a_repository_privately/comment_4_989bafbd0b79b7f75c199cdd7817a82f._comment b/doc/tips/cloning_a_repository_privately/comment_4_989bafbd0b79b7f75c199cdd7817a82f._comment new file mode 100644 index 0000000000..5c742c5863 --- /dev/null +++ b/doc/tips/cloning_a_repository_privately/comment_4_989bafbd0b79b7f75c199cdd7817a82f._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: buyer's remorse""" + date="2026-01-13T14:55:07Z" + content=""" +Oh good question! + +This gets a tiny bit into internals, but `.git/annex/journal-private/` is +where the private information is stored. If you move the files from there +into `.git/annex/journal/`, they will be committed on the next run of +git-annex. + +You would need to take care to avoid overwriting any existing files in the +journal, usually there won't be any though. + +Also unset annex.private of course. +"""]]
response
diff --git a/doc/bugs/available_space_miscomputed_on_large_macOS_volume/comment_2_321c61379aee8c7ad16742eb7e458562._comment b/doc/bugs/available_space_miscomputed_on_large_macOS_volume/comment_2_321c61379aee8c7ad16742eb7e458562._comment
new file mode 100644
index 0000000000..5233219bf4
--- /dev/null
+++ b/doc/bugs/available_space_miscomputed_on_large_macOS_volume/comment_2_321c61379aee8c7ad16742eb7e458562._comment
@@ -0,0 +1,22 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2026-01-13T14:32:53Z"
+ content="""
+I'm inclined to agree with you, it's probably a problem with
+<https://hackage.haskell.org/package/disk-free-space>
+
+I am not going to be able to reproduce this!
+
+Could you take a look at disk-free-space in ghci and see if it reproduces
+there?
+
+ ghci> import System.DiskSpace
+ ghci> getAvailSpace "/"
+ 283744563200
+ ghci> getDiskUsage "/"
+ DiskUsage {diskTotal = 501386043392, diskFree = 283761369088, diskAvail = 283744591872, blockSize = 4096}
+
+Looking at the code, it assumes bsize and frsize are CULong. I guess it's
+that or FsBlkCnt is somehow wrong.
+"""]]
response
diff --git a/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__/comment_1_92ec96134e1045804382cbe28e22154b._comment b/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__/comment_1_92ec96134e1045804382cbe28e22154b._comment new file mode 100644 index 0000000000..676902768d --- /dev/null +++ b/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__/comment_1_92ec96134e1045804382cbe28e22154b._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-13T14:27:09Z" + content=""" +The assistant only sends files to repositories that want them. This is not +guaranteed to make as many copies of the files as whatever you have +numcopies configured to. (Numcopies will prevent the assistant from +dropping a file from a repository if there are not enough copies.) + +All of your archive repositories only want 1 copy of a file across all of +them, so you would need 2 backup repositories (which want all files) in +order to get to 3 copies. +"""]]
response
diff --git a/doc/forum/Old_files_being_pushed_to_transfer_repository/comment_1_f96d7b83fa1d1207cbbe8d35d92a2c1c._comment b/doc/forum/Old_files_being_pushed_to_transfer_repository/comment_1_f96d7b83fa1d1207cbbe8d35d92a2c1c._comment new file mode 100644 index 0000000000..e26f998803 --- /dev/null +++ b/doc/forum/Old_files_being_pushed_to_transfer_repository/comment_1_f96d7b83fa1d1207cbbe8d35d92a2c1c._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-13T14:21:29Z" + content=""" +There are two possibilities: + +1. Transfer repositories want files that have not yet reached all clients, so + maybe you had a second client repository that doesn't have the file yet. + +2. When there is only a single client repository, transfer repositories + want to contain all content, even once it's reached that client. The + assumption is that, since the purpose of a transfer repo is to transfer + between clients, there will be a second client repository added at some + point, and then the trasfer repository will have the content to send it it. + +This is documented in [[preferred_content/standard_groups]]. +"""]]
response
diff --git a/doc/special_remotes/compute/comment_8_f13c86659ddbb1a027455b9b3e67296a._comment b/doc/special_remotes/compute/comment_8_f13c86659ddbb1a027455b9b3e67296a._comment new file mode 100644 index 0000000000..50a20cb97c --- /dev/null +++ b/doc/special_remotes/compute/comment_8_f13c86659ddbb1a027455b9b3e67296a._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: comment 6""" + date="2026-01-13T14:14:27Z" + content=""" +`git-annex findcomputed --inputs` is documented to output one line per +input file. If it doesn't behave that way, file a bug. + +It would be possible to run git-annex commands in the compute script if +you were able to determine where the git repository was. I don't think +git-annex sets anything in the environment that will help with that +currently. + +If the compute program set metadata though, it would re-set the same +metadata when it's used to recompute the files. That might be undesirable +behavior if the user has edited the metadata in the meantime. +"""]]
response
diff --git a/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_2_baa9ac446450aa0522c6298cae80a2c8._comment b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_2_baa9ac446450aa0522c6298cae80a2c8._comment new file mode 100644 index 0000000000..9c84a5f9d1 --- /dev/null +++ b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_2_baa9ac446450aa0522c6298cae80a2c8._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-01-13T14:08:21Z" + content=""" +I tend to agree, this adds a lot of potential for foot shooting. + +It might make sense an an option that enables acting on non-annexed files? +"""]]
response
diff --git a/doc/forum/Confirming_my_preferred_content_understanding/comment_1_52c1b6396a56a2096f9ab5cf3e1c9d43._comment b/doc/forum/Confirming_my_preferred_content_understanding/comment_1_52c1b6396a56a2096f9ab5cf3e1c9d43._comment new file mode 100644 index 0000000000..a9cd3d09b0 --- /dev/null +++ b/doc/forum/Confirming_my_preferred_content_understanding/comment_1_52c1b6396a56a2096f9ab5cf3e1c9d43._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-13T14:01:50Z" + content=""" +I think that will work! + +Since moving content between the archive drives is probably reasonably +fast, it might make sense to use fullybalanced or fullysizebalanced. + +In any case, when using "balanced" things, you will need to use +[[git-annex-maxsize]] to tell it how large each repository is. +"""]]
update
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment index b9f723232c..777af18df7 100644 --- a/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment @@ -45,17 +45,17 @@ Well, there is a small one. If I have made a clone of a repository, I may be hiding the existence of that repository from others. So nobody knows its uuid, and so they cannot change its preferred content setting. But with `git-annex config` allowing overriding the default, -I'd risk a pull from origin changing it. +a clone I made yesterday may behave differently than a clone I make today. -Which, since the default is to want all files, must change my repo +Which, since the default is to want all files, must make clone to want fewer files. -So for this to be an actual security problem, I would need to be relying -on my repository getting all files for some security reason. Which could be -auditing the content of annexed files. As the auditing repository, I want -it to get every file that passes through origin. And by foolishly relying -on the current default preferred content (which after all joey seems like -he's never gonna get around to changing!), I open myself up to an attacker +So for this to be an actual security problem, I would need to be relying on +my clones getting all files for some security reason. Which could be +auditing the content of annexed files. I want the auditing clones to get +every file that passes through origin. And by foolishly relying on the +current default preferred content (which after all joey seems like he's +never gonna get around to changing!), I open myself up to an attacker breaking my auditing process. That's a bit tortured, but it does seem to argue against making this
git configs annex.defaultwanted, annex.defaultrequired, and annex.defaultgroups
These are propigated into the git-annex branch when a repository is
initialized for the 1st time. That includes by git-annex init, by
autointialization, and by git-annex initremote. Note that git-annex
reinit, git-annex init run a second time and git-annex enableremote
do not propigate them, to avoid overwriting the the git-annex branch.
git-remote-annex also propigates them for the local repository when
initializing it. It does not propigate them to the temporary special
remote that it uses for cloning. That special remote was already
initialized elsewhere, so the git-annex branch, once fetched from it, will
have the desired settings. And since git-remote-annex only downloads from
it, these configs don't matter as far as what it does.
Sponsored-by: Graham Spencer on Patreon
These are propigated into the git-annex branch when a repository is
initialized for the 1st time. That includes by git-annex init, by
autointialization, and by git-annex initremote. Note that git-annex
reinit, git-annex init run a second time and git-annex enableremote
do not propigate them, to avoid overwriting the the git-annex branch.
git-remote-annex also propigates them for the local repository when
initializing it. It does not propigate them to the temporary special
remote that it uses for cloning. That special remote was already
initialized elsewhere, so the git-annex branch, once fetched from it, will
have the desired settings. And since git-remote-annex only downloads from
it, these configs don't matter as far as what it does.
Sponsored-by: Graham Spencer on Patreon
diff --git a/Annex/Init.hs b/Annex/Init.hs
index 64c924fd04..4d6c3a9b73 100644
--- a/Annex/Init.hs
+++ b/Annex/Init.hs
@@ -1,6 +1,6 @@
{- git-annex repository initialization
-
- - Copyright 2011-2025 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -20,6 +20,7 @@ module Annex.Init (
probeCrippledFileSystem,
probeCrippledFileSystem',
isCrippledFileSystem,
+ propigateDefaultGitConfigs,
) where
import Annex.Common
@@ -34,6 +35,8 @@ import qualified Database.Fsck
import Logs.UUID
import Logs.Trust.Basic
import Logs.Config
+import Logs.PreferredContent.Raw
+import Logs.Group
import Types.TrustLevel
import Types.RepoVersion
import Annex.Version
@@ -64,6 +67,7 @@ import qualified Utility.LockFile.Posix as Posix
#endif
import qualified Data.Map as M
+import qualified Data.Set as S
import Control.Monad.IO.Class (MonadIO)
#ifndef mingw32_HOST_OS
import System.PosixCompat.Files (ownerReadMode, isNamedPipe)
@@ -150,7 +154,8 @@ initialize' startupannex mversion _initallowed = do
hookWrite preCommitHook
hookWrite postReceiveHook
setDifferences
- unlessM (isJust <$> getVersion) $
+ initialversion <- getVersion
+ unless (isJust initialversion) $
setVersion (fromMaybe defaultVersion mversion)
supportunlocked <- annexSupportUnlocked <$> Annex.getGitConfig
if supportunlocked
@@ -171,6 +176,8 @@ initialize' startupannex mversion _initallowed = do
Direct.switchHEADBack
)
propigateSecureHashesOnly
+ when (isNothing initialversion) $
+ propigateDefaultGitConfigs =<< getUUID
createInodeSentinalFile False
fixupUnusualReposAfterInit
@@ -487,7 +494,7 @@ initSharedClone True = do
trustSet u UnTrusted
setConfig (annexConfig "hardlink") (Git.Config.boolConfig True)
-{- Propagate annex.securehashesonly from then global config to local
+{- Propigate annex.securehashesonly from the global config to local
- config. This makes a clone inherit a parent's setting, but once
- a repository has a local setting, changes to the global config won't
- affect it. -}
@@ -496,6 +503,19 @@ propigateSecureHashesOnly =
maybe noop (setConfig "annex.securehashesonly" . fromConfigValue)
=<< getGlobalConfig "annex.securehashesonly"
+{- Propigate git configs that set defaults. -}
+propigateDefaultGitConfigs :: UUID -> Annex ()
+propigateDefaultGitConfigs u = do
+ gc <- Annex.getGitConfig
+ set (annexDefaultWanted gc) preferredContentSet
+ set (annexDefaultRequired gc) requiredContentSet
+ case annexDefaultGroups gc of
+ [] -> noop
+ groups -> groupChange u (S.union (S.fromList groups))
+ where
+ set (Just expr) setter = setter u expr
+ set Nothing _ = noop
+
fixupUnusualReposAfterInit :: Annex ()
fixupUnusualReposAfterInit = do
gc <- Annex.getGitConfig
diff --git a/CHANGELOG b/CHANGELOG
index 92a10ca152..fb965ce70c 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,5 +1,7 @@
git-annex (10.20251216) UNRELEASED; urgency=medium
+ * New git configs annex.defaultwanted, annex.defaultrequired, and
+ annex.defaultgroups.
* Fix bug that could result in a tree imported from a remote containing
missing git blobs.
* fix: Populate unlocked pointer files in situations where a git command,
diff --git a/Command/InitRemote.hs b/Command/InitRemote.hs
index eda978cfea..6fd6a0d75c 100644
--- a/Command/InitRemote.hs
+++ b/Command/InitRemote.hs
@@ -1,6 +1,6 @@
{- git-annex command
-
- - Copyright 2011-2024 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -22,6 +22,7 @@ import Types.ProposedAccepted
import Config
import Git.Config
import Git.Types
+import Annex.Init
import qualified Data.Map as M
import qualified Data.Text as T
@@ -127,6 +128,7 @@ cleanup t u name c o = do
case sameas o of
Nothing -> do
describeUUID u (toUUIDDesc name)
+ propigateDefaultGitConfigs u
Logs.Remote.configSet u c
Just _ -> do
cu <- liftIO genUUID
diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs
index 4303c09961..9057989495 100644
--- a/Types/GitConfig.hs
+++ b/Types/GitConfig.hs
@@ -1,6 +1,6 @@
{- git-annex configuration
-
- - Copyright 2012-2025 Joey Hess <id@joeyh.name>
+ - Copyright 2012-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -51,6 +51,7 @@ import Types.RepoVersion
import Types.StallDetection
import Types.View
import Types.Cluster
+import Types.Group
import Config.DynamicConfig
import Utility.HumanTime
import Utility.Gpg (GpgCmd, mkGpgCmd)
@@ -171,6 +172,9 @@ data GitConfig = GitConfig
, annexViewUnsetDirectory :: ViewUnset
, annexClusters :: M.Map RemoteName ClusterUUID
, annexFullyBalancedThreshhold :: Double
+ , annexDefaultWanted :: Maybe String
+ , annexDefaultRequired :: Maybe String
+ , annexDefaultGroups :: [Group]
}
extractGitConfig :: ConfigSource -> Git.Repo -> GitConfig
@@ -284,7 +288,7 @@ extractGitConfig configsource r = GitConfig
(getmayberead (annexConfig "adjustedbranchrefresh"))
, annexSupportUnlocked = getbool (annexConfig "supportunlocked") True
, annexAssistantAllowUnlocked = getbool (annexConfig "assistant.allowunlocked") False
- , annexTrashbin = getmaybe "annex.trashbin"
+ , annexTrashbin = getmaybe (annexConfig "trashbin")
, coreSymlinks = getbool "core.symlinks" True
, coreSharedRepository = getSharedRepository r
, coreQuotePath = QuotePath (getbool "core.quotepath" True)
@@ -315,6 +319,10 @@ extractGitConfig configsource r = GitConfig
, annexFullyBalancedThreshhold =
fromMaybe 0.9 $ (/ 100) <$> getmayberead
(annexConfig "fullybalancedthreshhold")
+ , annexDefaultWanted = getmaybe (annexConfig "defaultwanted")
+ , annexDefaultRequired = getmaybe (annexConfig "defaultrequired")
+ , annexDefaultGroups = map (Group . encodeBS) $
+ getwords (annexConfig "defaultgroups")
}
where
getbool k d = fromMaybe d $ getmaybebool k
diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn
index c325adc9c1..bf69cc4438 100644
--- a/doc/git-annex.mdwn
+++ b/doc/git-annex.mdwn
@@ -1061,6 +1061,26 @@ repository, using [[git-annex-config]]. See its man page for a list.)
If this is set to `true` then it will instead use the `annex.addunlocked`
configuration to decide which files to add unlocked.
+* `annex.defaultwanted`
+
+ When this is set to a preferred content expression, all
+ new repositories (and special remotes) will have it copied into their
+ configuration when initialized, the same as if you had run
+ [[git-annex-wanted]](1).
+
+* `annex.defaultrequired`
+
+ When this is set to a preferred content expression, all
+ new repositories (and special remotes) will have it copied into their
+ configuration when initialized, the same as if you had run
+ [[git-annex-required]](1).
+
+* `annex.defaultgroups`
+
+ When this is set to a list of groups (separated by whitespace), all
(Diff truncated)
comment
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment new file mode 100644 index 0000000000..b9f723232c --- /dev/null +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_5_9b8884860c6d8ddee1fb019236b98da7._comment @@ -0,0 +1,72 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2026-01-12T15:54:26Z" + content=""" +Seems I really dropped the ball on following up to this one. On the other +hand, it seems a lot of things need to be thought through still.. + +--- + +I suppose there are two ways a default preferred content config could work: + +1. Something that gets set in the repository's config at `git-annex init` + (or autoinit) time, when the repository does not already have a + preferred content setting. Also at `git-annex initremote` time for + special remotes. +2. Something that is used rather than the current default of "" + when a repository does not have a preferred content setting. + +With option #1, it gets baked into the repo, while with option #2 you can +change a single git config later and it affects whatever repos. + +Pretty sure people have been wanting option #1. + +And option #2 seems to have a problem, that git-annex could see different +preferred content settings for the same repository when run in different +places. Which could result in a churn of content being added to a +repository, and later dropped from it. + +So option #1 seems like the right one. + +--- + +Looking back at the original request, there was the idea that +`git annex config` could set the default. + +Every `git annex config` setting needs to be considered for +security and unwanted behavior. + +As far as security goes, if someone can set `git-annex config`, +they can just go in and change the preferred content settings of any +repository. So no difference? + +Well, there is a small one. If I have made a clone of a repository, +I may be hiding the existence of that repository from others. +So nobody knows its uuid, and so they cannot change its preferred content +setting. But with `git-annex config` allowing overriding the default, +I'd risk a pull from origin changing it. + +Which, since the default is to want all files, must change my repo +to want fewer files. + +So for this to be an actual security problem, I would need to be relying +on my repository getting all files for some security reason. Which could be +auditing the content of annexed files. As the auditing repository, I want +it to get every file that passes through origin. And by foolishly relying +on the current default preferred content (which after all joey seems like +he's never gonna get around to changing!), I open myself up to an attacker +breaking my auditing process. + +That's a bit tortured, but it does seem to argue against making this +a `git-annex config` setting. + +---- + +The original request also included annex.defaultgroupwanted ... +I don't see how that would work. groupwanted varies by group, it does +not make sense to have a default that works across groups. + +It does seem to make sense to allow annex.defaultgroup to set the default +group(s) of a new repository. +"""]]
comment
diff --git a/doc/bugs/git_annex_get_is_silently_stuck_on__P2P___62___GET_0/comment_4_99ccacee9746b3b30980c5f94f5ebf49._comment b/doc/bugs/git_annex_get_is_silently_stuck_on__P2P___62___GET_0/comment_4_99ccacee9746b3b30980c5f94f5ebf49._comment new file mode 100644 index 0000000000..ea7704974a --- /dev/null +++ b/doc/bugs/git_annex_get_is_silently_stuck_on__P2P___62___GET_0/comment_4_99ccacee9746b3b30980c5f94f5ebf49._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-12T15:31:48Z" + content=""" +Makes sense it could be locking. As part of recording the currently running +transfer, a lock is held. + +Pid locking still involves a regular unix lock, the side lock, which is in +/dev/shm or /tmp. So I guess it could be that /tmp is on nfs and lockd +misbehaving caused the problem? +"""]]
Added a comment
diff --git a/doc/design/external_special_remote_protocol/comment_61_f214c6f610a2be1beec00a973e3ed994._comment b/doc/design/external_special_remote_protocol/comment_61_f214c6f610a2be1beec00a973e3ed994._comment new file mode 100644 index 0000000000..da81bf72a6 --- /dev/null +++ b/doc/design/external_special_remote_protocol/comment_61_f214c6f610a2be1beec00a973e3ed994._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="Katie" + avatar="http://cdn.libravatar.org/avatar/38e04123b913160b66d8117cada14532" + subject="comment 61" + date="2026-01-11T06:18:07Z" + content=""" +Thanks a lot for the quick fix, Joey! +"""]]
external: Respond to GETGITREMOTENAME during INITREMOTE with the remote name
diff --git a/CHANGELOG b/CHANGELOG
index d7f8ec5d7b..92a10ca152 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -7,6 +7,8 @@ git-annex (10.20251216) UNRELEASED; urgency=medium
* Pass www-authenticate headers in to to git credential, to support
eg, git-credential-oauth.
* import: Fix display of some import errors.
+ * external: Respond to GETGITREMOTENAME during INITREMOTE with the remote
+ name.
* When displaying sqlite error messages, include the path to the database.
* webapp: Remove support for local pairing; use wormhole pairing instead.
* git-annex.cabal: Removed pairing build flag, and no longer depends
diff --git a/Remote/External.hs b/Remote/External.hs
index d9871eaf41..87a23a2b9e 100644
--- a/Remote/External.hs
+++ b/Remote/External.hs
@@ -1,6 +1,6 @@
{- External special remote interface.
-
- - Copyright 2013-2025 Joey Hess <id@joeyh.name>
+ - Copyright 2013-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -193,7 +193,7 @@ externalSetup externalprogram setgitconfig ss mu remotename _ c gc = do
else do
pc' <- either giveup return $ parseRemoteConfig c' (lenientRemoteConfigParser externalprogram)
let p = fromMaybe (ExternalType externaltype) externalprogram
- external <- newExternal p (Just u) pc' (Just gc) Nothing Nothing
+ external <- newExternal p (Just u) pc' (Just gc) (Just remotename) Nothing
-- Now that we have an external, ask it to LISTCONFIGS,
-- and re-parse the RemoteConfig strictly, so we can
-- error out if the user provided an unexpected config.
@@ -953,3 +953,4 @@ remoteConfigParser externalprogram c
where
isproposed (Accepted _) = False
isproposed (Proposed _) = True
+
diff --git a/doc/design/external_special_remote_protocol.mdwn b/doc/design/external_special_remote_protocol.mdwn
index 5a1f9fa969..f79b8230ae 100644
--- a/doc/design/external_special_remote_protocol.mdwn
+++ b/doc/design/external_special_remote_protocol.mdwn
@@ -379,6 +379,9 @@ handling a request.
passed to `git-annex initremote` and `enableremote`, but it is possible
for git remotes to be renamed, and this will provide the remote's current
name.
+ If this is used during INITREMOTE, the git remote may not be
+ configured yet. (Older versions of git-annex responded with an ERROR
+ when this is used during INITREMOTE.)
(git-annex replies with VALUE followed by the name.)
This message is a protocol extension; it's only safe to send it to
git-annex after it sent an `EXTENSIONS` that included `GETGITREMOTENAME`.
diff --git a/doc/design/external_special_remote_protocol/comment_60_92ddb8c0da5260c619467ba8a5bf753c._comment b/doc/design/external_special_remote_protocol/comment_60_92ddb8c0da5260c619467ba8a5bf753c._comment
new file mode 100644
index 0000000000..12aa7212e7
--- /dev/null
+++ b/doc/design/external_special_remote_protocol/comment_60_92ddb8c0da5260c619467ba8a5bf753c._comment
@@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""Re: How do I get GETGITREMOTENAME to work in INITREMOTE?"""
+ date="2026-01-09T17:26:59Z"
+ content="""
+@Katie, thanks for pointing out that doesn't work. I was able to fix that,
+so check out a daily build.
+"""]]
comment
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_4_f6d3abcc128796acc7ccfa50a3d0f907._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_4_f6d3abcc128796acc7ccfa50a3d0f907._comment new file mode 100644 index 0000000000..0d8cdfb882 --- /dev/null +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_4_f6d3abcc128796acc7ccfa50a3d0f907._comment @@ -0,0 +1,35 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2026-01-08T19:46:02Z" + content=""" +Unfortunately, that design doesn't optimize the preferred content +expression that you were wanting to use: + +`include=docs/* or (include=*.md and exclude=*/*)` + +In this case, the exclude limits the include to md files in the top directory, +not subdirectories, but with the current design it will recurse and find +all files to handle the `include=*.md`. + +To optimise that, it needs to look at when includes are ANDed with +excludes. With `"exclude=*/*"`, only files in the root directory can match, +and those are always listed. So, that include can be filtered out before +step #3 above. + +The other cases of excludes that can be ANDed with an include are: + +* `exclude=bar/*` -- This needs to do a full listing, same reasons I + discussed in comment 2. +* `exclude=*/foo.*` -- Also needs a full listing. +* `exclude=foo` -- Also needs a full listing. +* `exclude=foo.*` -- Also needs a full listing. +* `exclude=*[/]*` -- Same as "exclude=*/*" +* `exclude=*[//]*` -- Same (and so on for other numbers of slashes). +* `exclude=*/**` -- Same (and so on for more asterisks in the front or back) +* `exclude=*[/]**` -- Same (and so on for more slashes and asterisks in the + front or back) +* `exclude=*` -- Pointless to AND with an include since the combination + can never match. May as well optimise it anyway by avoiding a full listing. +* `exclude=**` -- Same as above (and so on) +"""]]
correction
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment index a7cca3fb31..fbdcd966f4 100644 --- a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment @@ -3,11 +3,14 @@ subject="""comment 1""" date="2026-01-08T13:49:52Z" content=""" -Paths in preferred content expressions match relative to the top, so -this preferred content expression will match only md files in the top, +This preferred content expression will match only md files in the top, and files in the docs subdirectory: -`include=docs/* or include=*.md` +`include=docs/* or (include=*.md and exclude=*/*)` + +I got this wrong at first; this version will work! The `"include=*.md"` +matches files with that extension anywhere in the tree, so the `"exclude=*/*` +is needed to limit to ones not in a subdirectory. Only preferred content is downloaded, but S3 is still queried for the entire list of files in the bucket.
markdown
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment index b1a0c2585c..4d32e682d6 100644 --- a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment @@ -12,11 +12,11 @@ subdirectories. Eg, if the bucket contains "foo", "bar/...", and "baz/...", the response will list only the file "foo", and CommonPrefixes contains "bar" and "baz". -So, git-annex could make that request, and then if "include=bar/*" is not -in preferred content, but "include=foo/*" is, it could make a request to +So, git-annex could make that request, and then if `"include=bar/*"` is not +in preferred content, but `"include=foo/*"` is, it could make a request to list files prefixed by "foo/". And so avoid listing all the files in "bar". -If preferred content contained "include=foo/x/*" and "include=foo/y/*", +If preferred content contained `"include=foo/x/*"` and `"include=foo/y/*"`, when CommonPrefixes includes "foo", git-annex could follow up with 2 requests to list those subdirectories. @@ -24,7 +24,7 @@ So this ends up making at most 1 additional request per subdirectory included in preferred content. When preferred content excludes a subdirectory though, more requests would -be needed. For "exclude=bar/*", if the response lists 100 other +be needed. For `"exclude=bar/*"`, if the response lists 100 other subdirectories in CommonPrefixes, it would need to make 100 separate requests to list those while avoiding listing bar. That could easily be more expensive than the current behavior. So it does not seem to make sense
markdown
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment
index e18502378a..361471327b 100644
--- a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment
+++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment
@@ -5,12 +5,12 @@
content="""
There are some complications in possible preferred content expressions:
-"include=foo*/*" -- we want "foo/*" but also "foooooom/*"... but what if
+`"include=foo*/*"` -- we want `"foo/*"` but also `"foooooom/*"`... but what if
there are 100 such subdirectories? It would be an unexpected cost to need
to make so many requests. Like exclude=, the optimisation should not be
used in this case.
-"include=foo/bar" -- we want only this file.. so would prefer to avoid
+`"include=foo/bar"` -- we want only this file.. so would prefer to avoid
recursing through the rest of foo. If there are multiple ones like this
that are all in the same subdirectory, it might be nice to make
one single request to find them all. But this seems like an edge case,
@@ -22,16 +22,16 @@ Here's a design:
2. Filter for "include=" that contain a "/" in the value. If none are
found, do the usual full listing of the bucket.
3. If any of those includes contain a glob before a "/", do the usual full
- listing of the bucket. (This handles the "include=foo*/* case)
+ listing of the bucket. (This handles the `"include=foo*/*"` case)
4. Otherwise, list the top level of the bucket with delimiter set to "/".
5. Include all the top-level files in the list.
6. Filter the includes to ones that start with a subdirectory in the
CommonPrefixes.
7. For each remaining include, make a request to list the bucket, with
the prefix set to the non-glob directory from the include. For example,
- for "include=foo/bar/*", set prefix to "foo/bar/", but for
- "include=foo/*bar", set prefix to "foo/". And for "include=foo/bar",
- set prefix to "foo/".
+ for `"include=foo/bar/*"`, set prefix to `"foo/bar/"`, but for
+ `"include=foo/*bar"`, set prefix to `"foo/"`. And for
+ `"include=foo/bar"`, set prefix to `"foo/"`.
8. Add back the prefixes to each file in the responses.
Note that, step #1 hides some complexity, because currently preferred
design
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment new file mode 100644 index 0000000000..a7cca3fb31 --- /dev/null +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_1_842a1243cd6f15004a178f607912ca33._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2026-01-08T13:49:52Z" + content=""" +Paths in preferred content expressions match relative to the top, so +this preferred content expression will match only md files in the top, +and files in the docs subdirectory: + +`include=docs/* or include=*.md` + +Only preferred content is downloaded, but S3 is still queried for the +entire list of files in the bucket. +"""]] diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment new file mode 100644 index 0000000000..b1a0c2585c --- /dev/null +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_2_f5a391a3e62284e0c503139eade4fdda._comment @@ -0,0 +1,32 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-01-08T14:16:26Z" + content=""" +I do think it would be possible to avoid the overhead of listing the +contents of subdirectories that are not preferred content. At +least sometimes. + +When a bucket is listed with a "/" delimiter, S3 does not recurse into +subdirectories. Eg, if the bucket contains "foo", "bar/...", and "baz/...", +the response will list only the file "foo", and CommonPrefixes contains +"bar" and "baz". + +So, git-annex could make that request, and then if "include=bar/*" is not +in preferred content, but "include=foo/*" is, it could make a request to +list files prefixed by "foo/". And so avoid listing all the files in "bar". + +If preferred content contained "include=foo/x/*" and "include=foo/y/*", +when CommonPrefixes includes "foo", git-annex could follow up with 2 requests +to list those subdirectories. + +So this ends up making at most 1 additional request per subdirectory included +in preferred content. + +When preferred content excludes a subdirectory though, more requests would +be needed. For "exclude=bar/*", if the response lists 100 other +subdirectories in CommonPrefixes, it would need to make 100 separate +requests to list those while avoiding listing bar. That could easily be +more expensive than the current behavior. So it does not seem to make sense +to try to optimise handling of excludes. +"""]] diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment new file mode 100644 index 0000000000..e18502378a --- /dev/null +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree/comment_3_0914c14c2b2b97bd0c79f3d9c990719f._comment @@ -0,0 +1,42 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-08T14:46:44Z" + content=""" +There are some complications in possible preferred content expressions: + +"include=foo*/*" -- we want "foo/*" but also "foooooom/*"... but what if +there are 100 such subdirectories? It would be an unexpected cost to need +to make so many requests. Like exclude=, the optimisation should not be +used in this case. + +"include=foo/bar" -- we want only this file.. so would prefer to avoid +recursing through the rest of foo. If there are multiple ones like this +that are all in the same subdirectory, it might be nice to make +one single request to find them all. But this seems like an edge case, +and one request per include is probably acceptable. + +Here's a design: + +1. Get preferred content expression of the remote. +2. Filter for "include=" that contain a "/" in the value. If none are + found, do the usual full listing of the bucket. +3. If any of those includes contain a glob before a "/", do the usual full + listing of the bucket. (This handles the "include=foo*/* case) +4. Otherwise, list the top level of the bucket with delimiter set to "/". +5. Include all the top-level files in the list. +6. Filter the includes to ones that start with a subdirectory in the + CommonPrefixes. +7. For each remaining include, make a request to list the bucket, with + the prefix set to the non-glob directory from the include. For example, + for "include=foo/bar/*", set prefix to "foo/bar/", but for + "include=foo/*bar", set prefix to "foo/". And for "include=foo/bar", + set prefix to "foo/". +8. Add back the prefixes to each file in the responses. + +Note that, step #1 hides some complexity, because currently preferred +content is loaded and parsed to a MatchFiles, which does not allow +introspecting to get the expression. Since we only care about include +expressions, it would suffice to add to MatchFiles a +`matchInclude :: Maybe String` which gets set for includes. +"""]]
Added a comment: How do I get GETGITREMOTENAME to work in INITREMOTE?
diff --git a/doc/design/external_special_remote_protocol/comment_59_ca91c66cf172e0e859dfe6c6e8d62dd3._comment b/doc/design/external_special_remote_protocol/comment_59_ca91c66cf172e0e859dfe6c6e8d62dd3._comment new file mode 100644 index 0000000000..657eca57b2 --- /dev/null +++ b/doc/design/external_special_remote_protocol/comment_59_ca91c66cf172e0e859dfe6c6e8d62dd3._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="Katie" + avatar="http://cdn.libravatar.org/avatar/38e04123b913160b66d8117cada14532" + subject="How do I get GETGITREMOTENAME to work in INITREMOTE?" + date="2026-01-07T23:37:01Z" + content=""" +I am writing a external special remote using this protocol. This is little similar to the directory remote and there's a path on the local system where content is stored. + +I don't want this location to be saved in the git-annex branch and I thought I'll be able to use GETGITREMOTENAME to persist it myself. However, I'm running into an issue where GETGITREMOTENAME fails during INITREMOTE (presumably since the remote has not yet been created). It does work during Prepare, but that feels a bit late to ask for a required piece of configuration. + +What are my options? My ideal behavior would be if it behaves very similar to `directory=` field in directory remote, but I can hand-manage it too if that's the recommendation as long as I get some identifier for this remote (there can be multiple of these in the same repo) +"""]]
desire for a limited import/export.
diff --git a/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree.mdwn b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree.mdwn new file mode 100644 index 0000000000..3eacf23742 --- /dev/null +++ b/doc/todo/way_to_limit_recursion_for_import__47__export_S3_tree.mdwn @@ -0,0 +1,6 @@ +I wanted to implement management and synchronization of descriptive files (README.md etc) on the top of the large S3 bucket via git-annex so I could keep files in a git repo and rely on importree/exporttree functionality to keep bucket and repo in sync. + +Looking at [special_remotes/S3/](https://git-annex.branchable.com/special_remotes/S3/) I didn't spot any option to achieve that. + +I am not sure what would be the best option for this, given that greedy me might want to also eventually `sync` some `docs/` prefix there: may be could be a white list of some keys/paths to include and/or exclude? May be some [preferred content](https://git-annex.branchable.com/preferred_content/) `include` expression could be specific enough to not demand full bucket traversal (unrealistic in feasible time) but rather limit to top level, e.g. `include=^docs/ and include=^*.md` or smth smarter? +
Pass www-authenticate headers in to to git credential
To support eg, git-credential-oauth.
To support eg, git-credential-oauth.
diff --git a/Annex/Url.hs b/Annex/Url.hs
index 1cc742f522..6d0cb43767 100644
--- a/Annex/Url.hs
+++ b/Annex/Url.hs
@@ -157,7 +157,7 @@ withUrlOptions :: Maybe RemoteGitConfig -> (U.UrlOptions -> Annex a) -> Annex a
withUrlOptions mgc a = a =<< getUrlOptions mgc
-- When downloading an url, if authentication is needed, uses
--- git-credential to prompt for username and password.
+-- git-credential for the prompting.
--
-- Note that, when the downloader is curl, it will not use git-credential.
-- If the user wants to, they can configure curl to use a netrc file that
@@ -169,8 +169,8 @@ withUrlOptionsPromptingCreds mgc a = do
prompter <- mkPrompter
cc <- Annex.getRead Annex.gitcredentialcache
a $ uo
- { U.getBasicAuth = \u -> prompter $
- getBasicAuthFromCredential g cc u
+ { U.getBasicAuth = \u respheaders -> prompter $
+ getBasicAuthFromCredential g cc u respheaders
}
checkBoth :: U.URLString -> Maybe Integer -> U.UrlOptions -> Annex Bool
diff --git a/CHANGELOG b/CHANGELOG
index 8d8605dca5..39e6e628a2 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -13,6 +13,8 @@ git-annex (10.20251216) UNRELEASED; urgency=medium
* import: Fix display of some import errors.
* Fix bug that could result in a tree imported from a remote containing
missing git blobs.
+ * Pass www-authenticate headers in to to git credential, to support
+ eg, git-credential-oauth.
-- Joey Hess <id@joeyh.name> Thu, 01 Jan 2026 12:20:29 -0400
diff --git a/Git/Credential.hs b/Git/Credential.hs
index 379fe585b0..1b69381996 100644
--- a/Git/Credential.hs
+++ b/Git/Credential.hs
@@ -1,6 +1,6 @@
{- git credential interface
-
- - Copyright 2019-2022 Joey Hess <id@joeyh.name>
+ - Copyright 2019-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -19,6 +19,8 @@ import Utility.Url.Parse
import qualified Data.Map as M
import Network.URI
+import Network.HTTP.Types
+import Network.HTTP.Types.Header
import Control.Concurrent.STM
data Credential = Credential { fromCredential :: M.Map String String }
@@ -35,7 +37,7 @@ credentialBasicAuth cred = BasicAuth
<*> credentialPassword cred
getBasicAuthFromCredential :: Repo -> TMVar CredentialCache -> GetBasicAuth
-getBasicAuthFromCredential r ccv u = do
+getBasicAuthFromCredential r ccv u respheaders = do
(CredentialCache cc) <- atomically $ readTMVar ccv
case mkCredentialBaseURL r u of
Just bu -> case M.lookup bu cc of
@@ -44,8 +46,8 @@ getBasicAuthFromCredential r ccv u = do
let storeincache = \c -> atomically $ do
CredentialCache cc' <- takeTMVar ccv
putTMVar ccv (CredentialCache (M.insert bu c cc'))
- go storeincache =<< getUrlCredential u r
- Nothing -> go (const noop) =<< getUrlCredential u r
+ go storeincache =<< getUrlCredential u respheaders r
+ Nothing -> go (const noop) =<< getUrlCredential u respheaders r
where
go storeincache c =
case credentialBasicAuth c of
@@ -61,8 +63,9 @@ getBasicAuthFromCredential r ccv u = do
-- | This may prompt the user for the credential, or get a cached
-- credential from git.
-getUrlCredential :: URLString -> Repo -> IO Credential
-getUrlCredential = runCredential "fill" . urlCredential
+getUrlCredential :: URLString -> ResponseHeaders -> Repo -> IO Credential
+getUrlCredential url respheaders = runCredential "fill" $
+ urlCredential url respheaders
-- | Call if the credential the user entered works, and can be cached for
-- later use if git is configured to do so.
@@ -73,8 +76,12 @@ approveUrlCredential c = void . runCredential "approve" c
rejectUrlCredential :: Credential -> Repo -> IO ()
rejectUrlCredential c = void . runCredential "reject" c
-urlCredential :: URLString -> Credential
-urlCredential = Credential . M.singleton "url"
+urlCredential :: URLString -> ResponseHeaders -> Credential
+urlCredential url respheaders = Credential $ M.fromList $
+ ("url", url) : map wwwauth (filter iswwwauth respheaders)
+ where
+ iswwwauth (h, _) = h == hWWWAuthenticate
+ wwwauth (_, v) = ("wwwauth[]", decodeBS v)
runCredential :: String -> Credential -> Repo -> IO Credential
runCredential action input r =
diff --git a/P2P/Http/Client.hs b/P2P/Http/Client.hs
index 024fce2242..1588728850 100644
--- a/P2P/Http/Client.hs
+++ b/P2P/Http/Client.hs
@@ -2,7 +2,7 @@
-
- https://git-annex.branchable.com/design/p2p_protocol_over_http/
-
- - Copyright 2024 Joey Hess <id@joeyh.name>
+ - Copyright 2024-2026 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@@ -42,7 +42,7 @@ import Servant hiding (BasicAuthData(..))
import Servant.Client.Streaming
import qualified Servant.Types.SourceT as S
import Network.HTTP.Types.Status
-import Network.HTTP.Client
+import Network.HTTP.Client hiding (responseHeaders)
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy.Internal as LI
import qualified Data.Map as M
@@ -52,6 +52,7 @@ import Control.Concurrent
import System.IO.Unsafe
import Data.Time.Clock.POSIX
import qualified Data.ByteString.Lazy as L
+import Data.Foldable (toList)
type ClientAction a
= ClientEnv
@@ -119,7 +120,7 @@ p2pHttpClientVersions' allowedversion rmt rmtrepo fallback clientaction =
go clientenv mcred credcached mauth vs
| statusCode (responseStatusCode resp) == 401 ->
case mcred of
- Nothing -> authrequired clientenv (v:vs)
+ Nothing -> authrequired clientenv resp (v:vs)
Just cred -> do
inRepo $ Git.rejectUrlCredential cred
Just <$> fallback (showstatuscode resp)
@@ -134,9 +135,10 @@ p2pHttpClientVersions' allowedversion rmt rmtrepo fallback clientaction =
catchclienterror a = a `catch` \(ex :: ClientError) -> pure (Left ex)
- authrequired clientenv vs = do
+ authrequired clientenv resp vs = do
+ let respheaders = toList $ responseHeaders resp
cred <- prompt $
- inRepo $ Git.getUrlCredential credentialbaseurl
+ inRepo $ Git.getUrlCredential credentialbaseurl respheaders
go clientenv (Just cred) False (credauth cred) vs
showstatuscode resp =
diff --git a/Remote/GitLFS.hs b/Remote/GitLFS.hs
index 2ec2f429d7..89d70b6e91 100644
--- a/Remote/GitLFS.hs
+++ b/Remote/GitLFS.hs
@@ -316,7 +316,10 @@ discoverLFSEndpoint tro h =
resp <- makeSmallAPIRequest testreq
if needauth (responseStatus resp)
then do
- cred <- prompt $ inRepo $ Git.getUrlCredential (show lfsrepouri)
+ cred <- prompt $ inRepo $
+ Git.getUrlCredential
+ (show lfsrepouri)
+ (responseHeaders resp)
let endpoint' = addbasicauth (Git.credentialBasicAuth cred) endpoint
let testreq' = LFS.startTransferRequest endpoint' transfernothing
flip catchNonAsync (const (returnendpoint endpoint')) $ do
diff --git a/Utility/Url.hs b/Utility/Url.hs
index d98ade2738..c40a3ee748 100644
--- a/Utility/Url.hs
+++ b/Utility/Url.hs
@@ -281,7 +281,7 @@ getUrlInfo url uo = case parseURIRelaxed url of
fn <- extractFromResourceT (extractfilename resp)
return $ found len fn
else if responseStatus resp == unauthorized401
- then return $ getBasicAuth uo' (show (getUri req)) >>= \case
+ then return $ getBasicAuth uo' (show (getUri req)) (responseHeaders resp) >>= \case
Nothing -> return dne
Just (ba, signalsuccess) -> do
ui <- existsconduit'
@@ -476,7 +476,7 @@ downloadConduit meterupdate iv req file uo =
else do
rf <- extractFromResourceT (respfailure resp)
if responseStatus resp == unauthorized401
- then return $ getBasicAuth uo (show (getUri req')) >>= \case
+ then return $ getBasicAuth uo (show (getUri req')) (responseHeaders resp) >>= \case
Nothing -> giveup rf
Just ba -> retryauthed ba
else return $ giveup rf
@@ -516,7 +516,7 @@ downloadConduit meterupdate iv req file uo =
else do
rf <- extractFromResourceT (respfailure resp)
if responseStatus resp == unauthorized401
- then return $ getBasicAuth uo (show (getUri req'')) >>= \case
(Diff truncated)
sig
diff --git a/doc/todo/support_push_to_create.mdwn b/doc/todo/support_push_to_create.mdwn index 2bda5e2520..50d9cd0a65 100644 --- a/doc/todo/support_push_to_create.mdwn +++ b/doc/todo/support_push_to_create.mdwn @@ -30,4 +30,4 @@ remotes that don't have a UUID. This would slow down pushes to eg github slightl since it would ignore annex-ignore being set, and re-probe the git config to see if a UUID has appeared. That seems a small enough price to pay. -The assistant would also need to be made to handle this. jjjj +The assistant would also need to be made to handle this. --[[Joey]]
break todo out of bug report
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment index e6fed5674d..cba034b3da 100644 --- a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment @@ -10,4 +10,6 @@ than "push to create". I do think my idea in comment #2 would be better than how you implemented that. But it's also not directly relevant to this bug report. + +I did open [[todo/support_push_to_create]]. """]] diff --git a/doc/todo/support_push_to_create.mdwn b/doc/todo/support_push_to_create.mdwn new file mode 100644 index 0000000000..2bda5e2520 --- /dev/null +++ b/doc/todo/support_push_to_create.mdwn @@ -0,0 +1,33 @@ +"push to create" as supported by eg Forgejo makes a `git push` to a new +git repository create the repository. + +Since the repository does not exist when git-annex probes the UUID, +which happens before any push, annex-ignore is set to true. +So a command like `git-annex push` will do the git push and create the +repository, but fail to discover the uuid of that repository, and so +not send annexed files to it. + +forgejo-aneksajo has worked around this by making git-annex's request for +"$url/config" create the repository. See: + +* <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/commit/3c53e9803de9c59e9e78ac19f0bb107651bb48f8> +* <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/85> +* <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/83#issuecomment-5093679> and following comments + +But that means that `git-annex pull` will also auto-create the repository. +Or even a command like `git-annex info` that does UUID discovery of a newly +added remote. + +git-annex could support push to create better by having `git-annex push`, +after pushing the git branches, regenerate the remote list, while +ignoring the annex-ignore configuration of remotes. +So if the branch push created the git repo, any annex uuid that the +new repo has would be discovered at that point. (And at that point annex-ignore +would need to be cleared.) + +The remote list regeneration would only need to be done when there are git +remotes that don't have a UUID. This would slow down pushes to eg github slightly, +since it would ignore annex-ignore being set, and re-probe the git config +to see if a UUID has appeared. That seems a small enough price to pay. + +The assistant would also need to be made to handle this. jjjj
followup
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment new file mode 100644 index 0000000000..e6fed5674d --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_6_1855b50e8aa0124b9f526c40b6498133._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2026-01-07T16:45:48Z" + content=""" +> Forgejo-aneksajo also creates the repository for requests to /config, and will git-annex-init it if the request comes from a git-annex user agent and the user has write permissions. + +Hmm, then `git-annex pull` will create a repository. Which is going further +than "push to create". + +I do think my idea in comment #2 would be better than how you implemented +that. But it's also not directly relevant to this bug report. +"""]] diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_7_13b7c0b807f6b19be1d2b097fe597f5c._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_7_13b7c0b807f6b19be1d2b097fe597f5c._comment new file mode 100644 index 0000000000..a7b472c66a --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_7_13b7c0b807f6b19be1d2b097fe597f5c._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2026-01-07T16:47:38Z" + content=""" +The www-authenticate header is also sent when the request for `/config` is +a 401. So git-annex can use that to set the wwwauth field. + +The capability fields are indicating capabilities of git. +I checked and git-credential-oauth does not rely on those capabilities. + +(Wildly, git-credential-oauth is looking for "GitLab", "GitHub", and +"Gitea" in order to sniff what backend it's authenticating to, and that's +all it uses the wwwauth for.) +"""]]
Added a comment
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_5_53cc071de11aa604e6eecb68ce15baba._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_5_53cc071de11aa604e6eecb68ce15baba._comment new file mode 100644 index 0000000000..ebb2a6c868 --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_5_53cc071de11aa604e6eecb68ce15baba._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 5" + date="2026-01-06T17:47:53Z" + content=""" +`git push` seems to first make a GET request for something like `/m.risse/test-push-oauth2.git/info/refs?service=git-receive-pack`, which responds with a 401 and `www-authenticate: Basic realm=\"Gitea\"` among the headers. Git then seems to pass this information on to the git-credential-helper. + +`git annex push` likewise receives a 401 response from the `/config` endpoint with the same www-authenticate header, so it could pass it on to the credential helper too. + +I am not sure where the `capability`s are coming from... +"""]]
Added a comment
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_4_094fe78aaf919e54d5457fb3274a023e._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_4_094fe78aaf919e54d5457fb3274a023e._comment new file mode 100644 index 0000000000..d44c824587 --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_4_094fe78aaf919e54d5457fb3274a023e._comment @@ -0,0 +1,52 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 4" + date="2026-01-06T17:36:19Z" + content=""" +The chicken-and-egg problem you are describing is actually something msz has already encountered and reported, but that issue is fixed: Forgejo-aneksajo also creates the repository for requests to /config, and will git-annex-init it if the request comes from a git-annex user agent and the user has write permissions. More about that here: + +- <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/commit/3c53e9803de9c59e9e78ac19f0bb107651bb48f8> +- <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/85> +- <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/83#issuecomment-5093679> and following comments + +So that's not it... I've investigated a bit and I think I led you astray with the comment about a \"non-existing repository\". I am also seeing the issue with a pre-created repository, and even with a pre-created and git-annex-init'ialized repository. + +The issue is actually that for ATRIS I rely on git-credential-oauth's \"Gitea-like-Server\" discovery here: <https://github.com/hickford/git-credential-oauth/blob/f01271d94c70b9280c19f489f90c05e9aba0d757/main.go#L206> + +When doing a `git push origin main` the git-credential-oauth helper actually receives this request: + +``` +$ git push origin main +capability[]=authtype +capability[]=state +protocol=https +host=atris.fz-juelich.de +wwwauth[]=Basic realm=\"Gitea\" +``` + +while with `git annex push` it is just this: + +``` +$ git annex push +protocol=https +host=atris.fz-juelich.de +``` + +Git-credential-oauth recognizes that it is talking to a Gitea/Forgejo server based on this `wwwauth[]=Basic realm=\"Gitea\"` data. Without it and in the absence of a more specific configuration for the server it doesn't try to handle it and falls back to the standard http credential handling of git. I am not sure where these capability and wwwauth fields are coming from, but I think git-annex should somehow do the same as git here... + +--- + +I've gotten at the data git sends to the credential helper with this trivial script: + +``` +$ cat ~/bin/git-credential-echo +#!/usr/bin/env bash + +exec cat >&2 +``` + +and configuring it as my credential helper. + +I have to say, I like this pattern of processes communicating over simple line-based protocols :) +"""]]
comment
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_3_570e6b61adef7c2f8ee0dcdcff225f76._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_3_570e6b61adef7c2f8ee0dcdcff225f76._comment new file mode 100644 index 0000000000..43fc603dae --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_3_570e6b61adef7c2f8ee0dcdcff225f76._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2026-01-06T17:28:34Z" + content=""" +Looks like the 401 Unauthorized happens for all non-existent repos when accessing `/config`. + +Eg: + + joey@darkstar:~>curl https://atris.fz-juelich.de/m.risse/joeytestmadeup.git + Not found. + joey@darkstar:~>curl https://atris.fz-juelich.de/m.risse/joeytestmadeup.git/config + Unauthorized + +A bug in Forgejo? +"""]]
corrections
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment index 17c86a0550..dd6cd3e520 100644 --- a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_1_a3bb87a5cfd010f7f453f5adf1110fd9._comment @@ -6,24 +6,10 @@ git-annex is actually using git credential here. That's where the "Username for" prompt comes from. -I think that this is a chicken and egg problem. git-annex is doing UUID -discovery, which is the first thing it does when run with a new remote that -does not have a UUID. But the repository does not exist, so has no UUID, -and it won't be created until git push happens. - -Deferring git-annex UUID discovery would avoid the problem, but I think -that would be very complicated if possible at all. - -I wonder if there is some way that git-annex could tell, at the http level, -that this URL does not exist yet? If so, it could avoid doing UUID -discovery. Then `git-annex push` would at least be able to push the git -repo. And then on the next run git-annex would discover the UUID and would -be able to fully use the repository. Not an ideal solution perhaps, since -you would need to `git-annex push` twice in a row to fully populate the -repisitory. - -Looks like the url you gave just 404's, but I'm not sure if I'm seeing -now the same as what you would have seen. +Looks like the url you gave 404's. But git-annex is hitting +`https://atris.fz-juelich.de/m.risse/test1.git/config` and getting a 401 +Unauthorized for that. Which is why it is using git credential. +But I'm not sure if I'm seeing now the same now as what you would have seen. @matrs Any chance you could give me access to reproduce this using your server so I could look into that? diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment index eb8396320a..213f93e4a2 100644 --- a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment @@ -3,16 +3,16 @@ subject="""comment 2""" date="2026-01-06T16:39:40Z" content=""" -The chicken and egg problem could be solved by making `git-annex push`, -after pushing the git branches, regenerate the remote list. So if the -branch push created the git repo, any annex uuid that the new repo has -would be discovered at that point. +If the server sent back 404 for the /config hit, then the early UUID +discovery would not prompt with git credential. + +Then, to make "push to create" work smoothly, `git-annex push`, +after pushing the git branches, could regenerate the remote list. So if +the branch push created the git repo, any annex uuid that the new repo +has would be discovered at that point. The remote list regeneration would only need to be done when there are git remotes that don't have a UUID yet. The assistant would also need to be made to do that. - -This, combined with avoiding prompting on 404 in -UUID discovery would make "push to create" work smoothly. """]]
update
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment index 654dc7a04c..eb8396320a 100644 --- a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment @@ -13,7 +13,6 @@ git remotes that don't have a UUID yet. The assistant would also need to be made to do that. -This, combined with avoiding the early -UUID discovery that led to the git-credential prompt, would make -"push to create" work smoothly. +This, combined with avoiding prompting on 404 in +UUID discovery would make "push to create" work smoothly. """]]
comment
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment new file mode 100644 index 0000000000..654dc7a04c --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth/comment_2_f4eaa6c45cc7cb20aa613617b78f5f56._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2026-01-06T16:39:40Z" + content=""" +The chicken and egg problem could be solved by making `git-annex push`, +after pushing the git branches, regenerate the remote list. So if the +branch push created the git repo, any annex uuid that the new repo has +would be discovered at that point. + +The remote list regeneration would only need to be done when there are +git remotes that don't have a UUID yet. + +The assistant would also need to be made to do that. + +This, combined with avoiding the early +UUID discovery that led to the git-credential prompt, would make +"push to create" work smoothly. +"""]]