Recent changes to this wiki:
Add openneuro tag
diff --git a/doc/bugs/S3_remote_fails_for_GCP_with_multiple_prefixes.mdwn b/doc/bugs/S3_remote_fails_for_GCP_with_multiple_prefixes.mdwn index da4e6c9bce..923209bb04 100644 --- a/doc/bugs/S3_remote_fails_for_GCP_with_multiple_prefixes.mdwn +++ b/doc/bugs/S3_remote_fails_for_GCP_with_multiple_prefixes.mdwn @@ -93,3 +93,5 @@ initremote: 1 failed ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) Thanks for all your great work, Joey! + +[[!tag projects/openneuro]]
fix address example
diff --git a/doc/tips/peer_to_peer_network_with_iroh.mdwn b/doc/tips/peer_to_peer_network_with_iroh.mdwn index e20d9a6241..2259ba7af1 100644 --- a/doc/tips/peer_to_peer_network_with_iroh.mdwn +++ b/doc/tips/peer_to_peer_network_with_iroh.mdwn @@ -80,7 +80,7 @@ Here's how it all looks: remote: Compressing objects: 100% (7/7), done. remote: Total 8 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (8/8), done. - From tor-annex::wa3i6wgttmworwli.onion:5162 + From p2p-annex::iroh:endpointadroxtad5dj5vaweczqnmkhk2sb7dmysazljjul6zeug7bexymejaaa 452db22..a894c60 git-annex -> peer1/git-annex c0ac431..44ca7f6 master -> peer1/master
remove now-obsolate warnings
diff --git a/doc/tips/peer_to_peer_network_with_iroh.mdwn b/doc/tips/peer_to_peer_network_with_iroh.mdwn index 6dfdbf325c..e20d9a6241 100644 --- a/doc/tips/peer_to_peer_network_with_iroh.mdwn +++ b/doc/tips/peer_to_peer_network_with_iroh.mdwn @@ -13,13 +13,6 @@ To use this, you need a few things: executable. * You also need to install [Magic Wormhole](https://github.com/warner/magic-wormhole) - here are [the installation instructions](https://magic-wormhole.readthedocs.io/en/latest/welcome.html#installation). - -*Important:* - -* The installation process must make a `wormhole` executable available - somewhere on your `$PATH`. Some distributions may only install executables - which reference the Python version, e.g. `wormhole-2.7`, in which case you - will need to manually create a symlink (and maybe file a bug with your distribution). * You need git-annex version 10.20251103 or newer. Older versions of git-annex unfortunately had a bug that prevents this process from working correctly. diff --git a/doc/tips/peer_to_peer_network_with_tor.mdwn b/doc/tips/peer_to_peer_network_with_tor.mdwn index 90f000c197..9d2d9995ba 100644 --- a/doc/tips/peer_to_peer_network_with_tor.mdwn +++ b/doc/tips/peer_to_peer_network_with_tor.mdwn @@ -16,16 +16,6 @@ To use this, you need to get Tor installed and running. See You also need to install [Magic Wormhole](https://github.com/warner/magic-wormhole) - here are [the installation instructions](https://magic-wormhole.readthedocs.io/en/latest/welcome.html#installation). -*Important:* - -* The installation process must make a `wormhole` executable available - somewhere on your `$PATH`. Some distributions may only install executables - which reference the Python version, e.g. `wormhole-2.7`, in which case you - will need to manually create a symlink (and maybe file a bug with your distribution). - -* You need git-annex version 6.20180705 or newer. Older versions of git-annex - unfortunately had a bug that prevents this process from working correctly. - ## pairing two repositories You have two git-annex repositories on different computers, and want to diff --git a/doc/tips/peer_to_peer_network_with_tor/comment_6_5237c2b408dc1841ca01a51084702b90._comment b/doc/tips/peer_to_peer_network_with_tor/comment_6_5237c2b408dc1841ca01a51084702b90._comment new file mode 100644 index 0000000000..d556240ce9 --- /dev/null +++ b/doc/tips/peer_to_peer_network_with_tor/comment_6_5237c2b408dc1841ca01a51084702b90._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: Issue on openSUSE with Tor's requirement for Python 2.7 """ + date="2025-11-03T19:44:09Z" + content=""" +Thanks for that. Since that issue got fixed in 2020, it seems unncessary to +complicate this tip with the warning about it, so I've removed your +addition now. +"""]]
git-annex version for p2p --pair fix for iroh
diff --git a/CHANGELOG b/CHANGELOG index 527c5d46dd..a580a28dae 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,4 +1,4 @@ -git-annex (10.20251030) UNRELEASED; urgency=medium +git-annex (10.20251103) UNRELEASED; urgency=medium * p2p --pair: Fix to work with external P2P networks. * remotedaemon: Avoid crashing when run with --debug. diff --git a/doc/tips/peer_to_peer_network_with_iroh.mdwn b/doc/tips/peer_to_peer_network_with_iroh.mdwn index ce162ffcf5..6dfdbf325c 100644 --- a/doc/tips/peer_to_peer_network_with_iroh.mdwn +++ b/doc/tips/peer_to_peer_network_with_iroh.mdwn @@ -20,6 +20,8 @@ To use this, you need a few things: somewhere on your `$PATH`. Some distributions may only install executables which reference the Python version, e.g. `wormhole-2.7`, in which case you will need to manually create a symlink (and maybe file a bug with your distribution). +* You need git-annex version 10.20251103 or newer. Older versions of git-annex + unfortunately had a bug that prevents this process from working correctly. ## pairing two repositories diff --git a/git-annex.cabal b/git-annex.cabal index 701c4c2530..087182bcbf 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -1,5 +1,5 @@ Name: git-annex -Version: 10.20251029 +Version: 10.20251103 Cabal-Version: 1.12 License: AGPL-3 Maintainer: Joey Hess <id@joeyh.name>
dumbpipe versioning
diff --git a/doc/special_remotes/p2p/git-annex-p2p-iroh b/doc/special_remotes/p2p/git-annex-p2p-iroh index b1969c7380..83be015c6f 100755 --- a/doc/special_remotes/p2p/git-annex-p2p-iroh +++ b/doc/special_remotes/p2p/git-annex-p2p-iroh @@ -1,13 +1,10 @@ #!/bin/sh # Allows git-annex to use iroh for P2P connections. # -# This uses a modified version of iroh's dumbpipe program, adding the -# generate-ticket command. This pull request has the necessary changes: +# This uses iroh's dumbpipe program. It needs a version with the +# generate-ticket command, which was added in this pull request: # https://github.com/n0-computer/dumbpipe/pull/86 # -# Quality: experimental. Has worked at least twice, but there are known and -# unknown bugs. -# # Copyright 2025 Joey Hess; licenced under the GNU GPL version 3 or higher. set -e diff --git a/doc/tips/peer_to_peer_network_with_iroh.mdwn b/doc/tips/peer_to_peer_network_with_iroh.mdwn index d743d8f46e..ce162ffcf5 100644 --- a/doc/tips/peer_to_peer_network_with_iroh.mdwn +++ b/doc/tips/peer_to_peer_network_with_iroh.mdwn @@ -8,7 +8,7 @@ It can be used with git-annex, to connect together two repositories. To use this, you need a few things: * Install [dumbpipe](https://www.dumbpipe.dev/). This will be used to talk - over Iroh. + over Iroh. Note that this needs version 0.33 or newer of dumbpipe. * Download [[special_remotes/p2p/git-annex-p2p-iroh]] and make the script executable. * You also need to install [Magic Wormhole](https://github.com/warner/magic-wormhole) -
add iroh tip
Adapted from the tor tip.
Also, removed some out of date stuff from the tor tip.
Adapted from the tor tip.
Also, removed some out of date stuff from the tor tip.
diff --git a/doc/tips/peer_to_peer_network_with_iroh.mdwn b/doc/tips/peer_to_peer_network_with_iroh.mdwn new file mode 100644 index 0000000000..d743d8f46e --- /dev/null +++ b/doc/tips/peer_to_peer_network_with_iroh.mdwn @@ -0,0 +1,139 @@ +[Iroh](https://www.iroh.computer/) is a peer to peer protocol that can +connect any two devices on the planet -- fast! + +It can be used with git-annex, to connect together two repositories. + +## dependencies + +To use this, you need a few things: + +* Install [dumbpipe](https://www.dumbpipe.dev/). This will be used to talk + over Iroh. +* Download [[special_remotes/p2p/git-annex-p2p-iroh]] and make the script + executable. +* You also need to install [Magic Wormhole](https://github.com/warner/magic-wormhole) - + here are [the installation instructions](https://magic-wormhole.readthedocs.io/en/latest/welcome.html#installation). + +*Important:* + +* The installation process must make a `wormhole` executable available + somewhere on your `$PATH`. Some distributions may only install executables + which reference the Python version, e.g. `wormhole-2.7`, in which case you + will need to manually create a symlink (and maybe file a bug with your distribution). + +## pairing two repositories + +You have two git-annex repositories on different computers, and want to +connect them together over Iroh so they share their contents. Or, you and a +friend want to connect your repositories together. Pairing is an easy way +to accomplish this. + +In each git-annex repository, run these commands: + + git annex p2p --enable iroh + git annex remotedaemon + +Now git-annex is listening for connections on Iroh, but +it will only talk to peers after pairing with them. + +In both repositories, run this command: + + git annex p2p --pair + +This will print out a pairing code, like "11-incredible-tumeric", +and prompt for you to enter the other repository's pairing code. + +So you have to get in contact with your friend to exchange codes. +See the section below "how to exchange pairing codes" for tips on +how to do that securely. + +Once the pairing codes are exchanged, the two repositories will be +connected to one-another via Iroh. Each will have a git remote, with a name +like "peer1", which connects to the other repository. + +Then, you can run commands like `git annex sync peer1 --content` to sync +with the paired repository. + +Pairing connects just two repositories, but you can repeat the process to +pair with as many other repositories as you like, in order to build up +larger networks of repositories. + +## example session + +Here's how it all looks: + + $ git annex p2p --enable iroh + p2p enable iroh ok + $ git annex remotedaemon + $ git annex p2p --pair + p2p pair peer1 (using Magic Wormhole) + + This repository's pairing code is: 11-incredible-tumeric + + Enter the other repository's pairing code: 1-revenue-icecream + Exchanging pairing data... + Successfully exchanged pairing data. Connecting to peer1... + ok + $ git annex sync peer1 --content + commit + On branch master + nothing to commit, working tree clean + ok + pull peer1 + remote: Enumerating objects: 10, done. + remote: Counting objects: 100% (10/10), done. + remote: Compressing objects: 100% (7/7), done. + remote: Total 8 (delta 0), reused 0 (delta 0) + Unpacking objects: 100% (8/8), done. + From tor-annex::wa3i6wgttmworwli.onion:5162 + 452db22..a894c60 git-annex -> peer1/git-annex + c0ac431..44ca7f6 master -> peer1/master + + Updating c0ac431..44ca7f6 + Fast-forward + amazing_file | 1 + + 1 file changed, 1 insertion(+) + create mode 120000 amazing_file + ok + (merging peer1/git-annex into git-annex...) + get amazing_file (from peer1...) + (checksum...) ok + +## how to exchange pairing codes + +When pairing with a friend's repository, you have to exchange +pairing codes. How to do this securely? + +The pairing codes can only be used once, so it's ok to exchange them in +a way that someone else can access later. However, if someone can overhear +your exchange of codes in real time, they could trick you into pairing +with them. + +Here are some suggestions for how to exchange the codes, +with the most secure ways first: + +* In person. +* In an encrypted message (gpg signed email, Off The Record (OTR) + conversation, etc). +* By a voice phone call. + +## starting git-annex remotedaemon on boot + +Notice the `git annex remotedaemon` being run in the above examples. +That command listens for incoming Iroh connections so that other peers +can connect to your repository over Tor. + +So, you may want to arrange for the remotedaemon to be started on boot. +You can do that with a simple cron job: + + @reboot cd ~/myannexrepo && git annex remotedaemon + +If you use the git-annex assistant, and have it auto-starting on boot, it +will take care of starting the remotedaemon for you. + +## speed of large transfers + +This should be fast! Iroh often gets peers directly connected to +one-another, handling the necessary punching through firewalls and NAT. +In some cases, when Iroh is not able to do that, traffic will be sent via a +relay, which could be slower. diff --git a/doc/tips/peer_to_peer_network_with_tor.mdwn b/doc/tips/peer_to_peer_network_with_tor.mdwn index 2a9287a5a8..90f000c197 100644 --- a/doc/tips/peer_to_peer_network_with_tor.mdwn +++ b/doc/tips/peer_to_peer_network_with_tor.mdwn @@ -3,6 +3,9 @@ git-annex has recently gotten support for running as a and easy to use way to connect repositories in different locations. No account on a central server is needed; it's peer-to-peer. +(See also [[peer_to_peer_network_with_iroh]] for something similar but +faster if you don't need all the layered security of tor.) + ## dependencies To use this, you need to get Tor installed and running. See @@ -15,15 +18,12 @@ here are [the installation instructions](https://magic-wormhole.readthedocs.io/e *Important:* -* At the time of writing, you need to install Magic Wormhole under Python 2, - because [Tor support is only available under python2.7](https://magic-wormhole.readthedocs.io/en/latest/tor.html). - * The installation process must make a `wormhole` executable available somewhere on your `$PATH`. Some distributions may only install executables which reference the Python version, e.g. `wormhole-2.7`, in which case you will need to manually create a symlink (and maybe file a bug with your distribution). -* You need git-annex version 6.20180705. Older versions of git-annex +* You need git-annex version 6.20180705 or newer. Older versions of git-annex unfortunately had a bug that prevents this process from working correctly. ## pairing two repositories
p2p --pair: Fix to work with external P2P networks
When storing a P2P authtoken, it needs to have our local address, not the
address of the peer.
When storing a P2P authtoken, it needs to have our local address, not the
address of the peer.
diff --git a/CHANGELOG b/CHANGELOG index a6490580ce..b87b05e4cd 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,9 @@ +git-annex (10.20251030) UNRELEASED; urgency=medium + + * p2p --pair: Fix to work with external P2P networks. + + -- Joey Hess <id@joeyh.name> Mon, 03 Nov 2025 14:02:46 -0400 + git-annex (10.20251029) upstream; urgency=medium * Support ssh remotes with '#' and '?' in the path to the repository, diff --git a/Command/P2P.hs b/Command/P2P.hs index 491355507c..2aa3f674cf 100644 --- a/Command/P2P.hs +++ b/Command/P2P.hs @@ -263,7 +263,7 @@ wormholePairing remotename ouraddrs ui = do Left _e -> return ReceiveFailed Right ls -> maybe (return ReceiveFailed) - (finishPairing 100 remotename ourhalf) + (finishPairing 100 remotename ourhalf ouraddrs) (deserializePairData ls) -- | Allow the peer we're pairing with to authenticate to us, @@ -276,8 +276,8 @@ wormholePairing remotename ouraddrs ui = do -- Since we're racing the peer as they do the same, the first try is likely -- to fail to authenticate. Can retry any number of times, to avoid the -- users needing to redo the whole process. -finishPairing :: Int -> RemoteName -> HalfAuthToken -> PairData -> Annex PairingResult -finishPairing retries remotename (HalfAuthToken ourhalf) (PairData (HalfAuthToken theirhalf) theiraddrs) = do +finishPairing :: Int -> RemoteName -> HalfAuthToken -> [P2PAddress] -> PairData -> Annex PairingResult +finishPairing retries remotename (HalfAuthToken ourhalf) ouraddrs (PairData (HalfAuthToken theirhalf) theiraddrs) = do case (toAuthToken (ourhalf <> theirhalf), toAuthToken (theirhalf <> ourhalf)) of (Just ourauthtoken, Just theirauthtoken) -> do liftIO $ putStrLn $ "Successfully exchanged pairing data. Connecting to " ++ remotename ++ "..." @@ -289,9 +289,9 @@ finishPairing retries remotename (HalfAuthToken ourhalf) (PairData (HalfAuthToke liftIO $ threadDelaySeconds (Seconds 2) liftIO $ putStrLn $ "Unable to connect to " ++ remotename ++ ". Retrying..." go (n-1) theiraddrs theirauthtoken ourauthtoken - go n (addr:rest) theirauthtoken ourauthtoken = do - storeP2PAuthToken addr ourauthtoken - r <- setupLink remotename (P2PAddressAuth addr theirauthtoken) + go n (theiraddr:rest) theirauthtoken ourauthtoken = do + forM_ ouraddrs $ \ouraddr -> storeP2PAuthToken ouraddr ourauthtoken + r <- setupLink remotename (P2PAddressAuth theiraddr theirauthtoken) case r of LinkSuccess -> return PairSuccess _ -> go n rest theirauthtoken ourauthtoken diff --git a/doc/bugs/p2p_--pair_seems_broken_for_iroh.mdwn b/doc/bugs/p2p_--pair_seems_broken_for_iroh.mdwn index 7c4de04a2f..3a2946881c 100644 --- a/doc/bugs/p2p_--pair_seems_broken_for_iroh.mdwn +++ b/doc/bugs/p2p_--pair_seems_broken_for_iroh.mdwn @@ -3,3 +3,9 @@ magic wormhole step. `git-annex p2p --link` does work with the iroh script, so this is probably a bug in git-annex. --[[Joey]] + +> --debug shows the problem is `AUTH-FAILURE`. And it appears that the +> remotedaemon's loadP2PAuthTokens is not loading any auth tokens after +> pairing writes one to `.git/annex/creds/p2pauth`. The written auth token +> incorrectly has the address of the peer, rather than the local repository. +> [[fixed|done]] --[[Joey]
diff --git a/doc/bugs/S3_remote_fails_for_GCP_with_multiple_prefixes.mdwn b/doc/bugs/S3_remote_fails_for_GCP_with_multiple_prefixes.mdwn
new file mode 100644
index 0000000000..da4e6c9bce
--- /dev/null
+++ b/doc/bugs/S3_remote_fails_for_GCP_with_multiple_prefixes.mdwn
@@ -0,0 +1,95 @@
+### Please describe the problem.
+initremote of an S3 special remote with a GCP object storage bucket and a fileprefix fails if another repo with a different fileprefix has already been configured in the same bucket.
+
+### What steps will reproduce the problem?
+With two git-annex repos and an initially empty bucket configured without versioning or hierarchical namespaces:
+
+For the first repo:
+
+git-annex --debug initremote s3-BACKUP type=S3 partsize=1GiB fileprefix=ds001263/ encryption=none public=no bucket=openneuro-nell-test host=storage.googleapis.com storageclass=ARCHIVE cost=400
+
+For a second repo:
+
+git-annex --debug initremote s3-BACKUP type=S3 partsize=1GiB fileprefix=ds001264/ encryption=none public=no bucket=openneuro-nell-test host=storage.googleapis.com storageclass=ARCHIVE cost=400
+
+The first initremote will succeed and configure the remote. The second attempts to create a bucket and fails because it already exists. Manually populating remote.log and annex-uuid in the bucket allows this remote to function after enableremote.
+
+### What version of git-annex are you using? On what operating system?
+
+10.20250929 on Fedora 43.
+
+### Please provide any additional information below.
+
+[[!format sh """
+# If you can, paste a complete transcript of the problem occurring here.
+# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
+
+initremote s3-BACKUP [2025-11-02 15:07:58.441842173] (Utility.Process) process [3914104] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
+[2025-11-02 15:07:58.442407721] (Utility.Process) process [3914104] done ExitSuccess
+[2025-11-02 15:07:58.442547945] (Utility.Process) process [3914105] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
+[2025-11-02 15:07:58.443007509] (Utility.Process) process [3914105] done ExitSuccess
+[2025-11-02 15:07:58.443341839] (Utility.Process) process [3914106] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
+[2025-11-02 15:07:58.445120803] (Remote.S3) String to sign: "GET\n\n\nSun, 02 Nov 2025 23:07:58 GMT\n/openneuro-nell-test/?location"
+[2025-11-02 15:07:58.445148464] (Remote.S3) Host: "openneuro-nell-test.storage.googleapis.com"
+[2025-11-02 15:07:58.445161875] (Remote.S3) Path: "/"
+[2025-11-02 15:07:58.445173305] (Remote.S3) Query string: "location"
+[2025-11-02 15:07:58.445188565] (Remote.S3) Header: [("Date","Sun, 02 Nov 2025 23:07:58 GMT"),("User-Agent","git-annex/10.20250929")]
+[2025-11-02 15:07:58.635355111] (Remote.S3) Response status: Status {statusCode = 403, statusMessage = "Forbidden"}
+[2025-11-02 15:07:58.635400652] (Remote.S3) Response header 'Content-Type': 'application/xml; charset=UTF-8'
+[2025-11-02 15:07:58.635424923] (Remote.S3) Response header 'X-GUploader-UploadID': 'AOCedOEofSsg_ed3IPSuAQerc3FtHvXPALQhf2W1S26R_51sPNFu-0-ZozTZuBqhr5pV-3fK'
+[2025-11-02 15:07:58.635441664] (Remote.S3) Response header 'Content-Length': '298'
+[2025-11-02 15:07:58.635455574] (Remote.S3) Response header 'Date': 'Sun, 02 Nov 2025 23:07:58 GMT'
+[2025-11-02 15:07:58.635469194] (Remote.S3) Response header 'Expires': 'Sun, 02 Nov 2025 23:07:58 GMT'
+[2025-11-02 15:07:58.635481435] (Remote.S3) Response header 'Cache-Control': 'private, max-age=0'
+[2025-11-02 15:07:58.635495745] (Remote.S3) Response header 'Server': 'UploadServer'
+(checking bucket...) [2025-11-02 15:07:58.635780314] (Remote.S3) String to sign: "GET\n\n\nSun, 02 Nov 2025 23:07:58 GMT\n/openneuro-nell-test/ds001264/annex-uuid"
+[2025-11-02 15:07:58.635796454] (Remote.S3) Host: "openneuro-nell-test.storage.googleapis.com"
+[2025-11-02 15:07:58.635818565] (Remote.S3) Path: "/ds001264/annex-uuid"
+[2025-11-02 15:07:58.635828655] (Remote.S3) Query string: ""
+[2025-11-02 15:07:58.635840346] (Remote.S3) Header: [("Date","Sun, 02 Nov 2025 23:07:58 GMT"),("Authorization","..."),("User-Agent","git-annex/10.20250929")]
+[2025-11-02 15:07:58.685220703] (Remote.S3) Response status: Status {statusCode = 404, statusMessage = "Not Found"}
+[2025-11-02 15:07:58.685251934] (Remote.S3) Response header 'Content-Type': 'application/xml; charset=UTF-8'
+[2025-11-02 15:07:58.685268695] (Remote.S3) Response header 'X-GUploader-UploadID': 'AOCedOHoPd6zdBzYMMr-ON5aWjlDBbGd7ZIaf_Iit8Gt74l3aRT-Ty4Fayk9Tx9tlBMYuMKH'
+[2025-11-02 15:07:58.685280535] (Remote.S3) Response header 'Content-Length': '201'
+[2025-11-02 15:07:58.685290386] (Remote.S3) Response header 'Date': 'Sun, 02 Nov 2025 23:07:58 GMT'
+[2025-11-02 15:07:58.685299996] (Remote.S3) Response header 'Expires': 'Sun, 02 Nov 2025 23:07:58 GMT'
+[2025-11-02 15:07:58.685310096] (Remote.S3) Response header 'Cache-Control': 'private, max-age=0'
+[2025-11-02 15:07:58.685319476] (Remote.S3) Response header 'Server': 'UploadServer'
+[2025-11-02 15:07:58.685365338] (Remote.S3) String to sign: "GET\n\n\nSun, 02 Nov 2025 23:07:58 GMT\n/openneuro-nell-test/"
+[2025-11-02 15:07:58.685376888] (Remote.S3) Host: "openneuro-nell-test.storage.googleapis.com"
+[2025-11-02 15:07:58.685386298] (Remote.S3) Path: "/"
+[2025-11-02 15:07:58.685394309] (Remote.S3) Query string: ""
+[2025-11-02 15:07:58.685403479] (Remote.S3) Header: [("Date","Sun, 02 Nov 2025 23:07:58 GMT"),("Authorization","..."),("User-Agent","git-annex/10.20250929")]
+[2025-11-02 15:07:58.725819533] (Remote.S3) Response status: Status {statusCode = 200, statusMessage = "OK"}
+[2025-11-02 15:07:58.725847874] (Remote.S3) Response header 'Content-Type': 'application/xml; charset=UTF-8'
+[2025-11-02 15:07:58.725861764] (Remote.S3) Response header 'X-GUploader-UploadID': 'AOCedOGjVuiFnd4UNsb069xhhamfE7ttizD8j1W9S7fGeUBqVoPxKff00jMdZyvUGFo90z_N'
+[2025-11-02 15:07:58.725873324] (Remote.S3) Response header 'x-goog-metageneration': '3'
+[2025-11-02 15:07:58.725883625] (Remote.S3) Response header 'Content-Length': '784'
+[2025-11-02 15:07:58.725893065] (Remote.S3) Response header 'Date': 'Sun, 02 Nov 2025 23:07:58 GMT'
+[2025-11-02 15:07:58.725907215] (Remote.S3) Response header 'Expires': 'Sun, 02 Nov 2025 23:07:58 GMT'
+[2025-11-02 15:07:58.725983778] (Remote.S3) Response header 'Cache-Control': 'private, max-age=0'
+[2025-11-02 15:07:58.72604907] (Remote.S3) Response header 'Server': 'UploadServer'
+(creating bucket in US...) [2025-11-02 15:07:58.726309948] (Remote.S3) String to sign: "PUT\n\n\nSun, 02 Nov 2025 23:07:58 GMT\n/openneuro-nell-test/"
+[2025-11-02 15:07:58.726329498] (Remote.S3) Host: "openneuro-nell-test.storage.googleapis.com"
+[2025-11-02 15:07:58.726341689] (Remote.S3) Path: "/"
+[2025-11-02 15:07:58.726350049] (Remote.S3) Query string: ""
+[2025-11-02 15:07:58.726366349] (Remote.S3) Header: [("Date","Sun, 02 Nov 2025 23:07:58 GMT"),("Authorization","..."),("User-Agent","git-annex/10.20250929")]
+[2025-11-02 15:07:58.75553637] (Remote.S3) Response status: Status {statusCode = 409, statusMessage = "Conflict"}
+[2025-11-02 15:07:58.755576871] (Remote.S3) Response header 'Content-Type': 'application/xml; charset=UTF-8'
+[2025-11-02 15:07:58.755590572] (Remote.S3) Response header 'X-GUploader-UploadID': 'AOCedOFn-ViFqzgcWiIW6Pun3lCz6lMnBFrRxyRpyC9LIdnv9j20Yz2Cd7MnuXIcNxZ-j6_J'
+[2025-11-02 15:07:58.755603132] (Remote.S3) Response header 'Content-Length': '421'
+[2025-11-02 15:07:58.755610962] (Remote.S3) Response header 'Vary': 'Origin'
+[2025-11-02 15:07:58.755618952] (Remote.S3) Response header 'Date': 'Sun, 02 Nov 2025 23:07:58 GMT'
+[2025-11-02 15:07:58.755628153] (Remote.S3) Response header 'Server': 'UploadServer'
+
+git-annex: S3Error {s3StatusCode = Status {statusCode = 409, statusMessage = "Conflict"}, s3ErrorCode = "BucketNameUnavailable", s3ErrorMessage = "The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.", s3ErrorResource = Nothing, s3ErrorHostId = Nothing, s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
+failed
+[2025-11-02 15:07:58.755843459] (Utility.Process) process [3914106] done ExitSuccess
+initremote: 1 failed
+
+# End of transcript or log.
+"""]]
+
+### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
+
+Thanks for all your great work, Joey!
removed
diff --git a/doc/special_remotes/directory/comment_25_11ce2a9f48ab9a043cc90d125e796685._comment b/doc/special_remotes/directory/comment_25_11ce2a9f48ab9a043cc90d125e796685._comment deleted file mode 100644 index 35fa9399e1..0000000000 --- a/doc/special_remotes/directory/comment_25_11ce2a9f48ab9a043cc90d125e796685._comment +++ /dev/null @@ -1,14 +0,0 @@ -[[!comment format=mdwn - username="hatzka" - avatar="http://cdn.libravatar.org/avatar/446138196d9d09c19f57e739e9786a99" - subject="a potentially bad idea" - date="2025-10-31T00:20:54Z" - content=""" -I have some git-annex repositories that are large enough that the objects don't fit on my SSD. I want to keep the repositories themselves on my SSD, because they also contain small versioned files that benefit from fast access. And I want git-annex to know which files are on which physical drives, so that I don't have to fsck if a drive fails (even with `--fast` it takes a while, and if one drive already failed I would rather avoid using the rest unnecessarily). - -I think it should be possible to meet all of these requirements by mounting an overlayfs over the `.git/annex/objects` folder. The writable `upperdir` would be on the same device as the rest of the repository; the read-only lower layers would be the hard drives, which I would also make accessible to git-annex as directory special remotes. This way, I could add objects to the repository normally, then move them to the hard drives without making them inaccessible. - -Obviously for this to be safe I would need to untrust the repository itself, as otherwise git-annex would see two real copies where in fact there was only one. (I'm fine with not being able to permanently store anything only on the SSD.) The other obstacle I've run into is that directory remotes don't have the same layout as an objects folder. - -Is this a terrible idea? Is there a better way? And, assuming the answers are \"not too terrible\" and \"not really\", how can I set up a directory special remote so that this will work? -"""]]
Added a comment: a potentially bad idea
diff --git a/doc/special_remotes/directory/comment_25_11ce2a9f48ab9a043cc90d125e796685._comment b/doc/special_remotes/directory/comment_25_11ce2a9f48ab9a043cc90d125e796685._comment new file mode 100644 index 0000000000..35fa9399e1 --- /dev/null +++ b/doc/special_remotes/directory/comment_25_11ce2a9f48ab9a043cc90d125e796685._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="hatzka" + avatar="http://cdn.libravatar.org/avatar/446138196d9d09c19f57e739e9786a99" + subject="a potentially bad idea" + date="2025-10-31T00:20:54Z" + content=""" +I have some git-annex repositories that are large enough that the objects don't fit on my SSD. I want to keep the repositories themselves on my SSD, because they also contain small versioned files that benefit from fast access. And I want git-annex to know which files are on which physical drives, so that I don't have to fsck if a drive fails (even with `--fast` it takes a while, and if one drive already failed I would rather avoid using the rest unnecessarily). + +I think it should be possible to meet all of these requirements by mounting an overlayfs over the `.git/annex/objects` folder. The writable `upperdir` would be on the same device as the rest of the repository; the read-only lower layers would be the hard drives, which I would also make accessible to git-annex as directory special remotes. This way, I could add objects to the repository normally, then move them to the hard drives without making them inaccessible. + +Obviously for this to be safe I would need to untrust the repository itself, as otherwise git-annex would see two real copies where in fact there was only one. (I'm fine with not being able to permanently store anything only on the SSD.) The other obstacle I've run into is that directory remotes don't have the same layout as an objects folder. + +Is this a terrible idea? Is there a better way? And, assuming the answers are \"not too terrible\" and \"not really\", how can I set up a directory special remote so that this will work? +"""]]
diff --git a/doc/bugs/p2phttp_deadlocks_with_concurrent_clients.mdwn b/doc/bugs/p2phttp_deadlocks_with_concurrent_clients.mdwn new file mode 100644 index 0000000000..78a552343d --- /dev/null +++ b/doc/bugs/p2phttp_deadlocks_with_concurrent_clients.mdwn @@ -0,0 +1,41 @@ +### Please describe the problem. + +P2phttp can deadlock with multiple concurrent clients talking to it. + + +### What steps will reproduce the problem? + +1. Create a git-annex repository with a bunch of annexed files served via p2phttp like so: `git-annex --debug p2phttp -J2 --bind 127.0.0.1 --wideopen` +2. Create multiple different clones of that repository connected via p2phttp all doing `while true; do git annex drop .; git annex get --in origin; done` +3. Observe a deadlock after an indeterminate amount of time + +This deadlock seems to occur faster the more repos you use. I've tried increasing -J to 3 and had it deadlock with two client repos once, but that seems to happen much less often. + +### What version of git-annex are you using? On what operating system? + +``` +$ git annex version +git-annex version: 10.20250929-g33ab579243742b0b18ffec2ea4ce1e3a827720b4 +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV Servant OsPath +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.2 http-client-0.7.19 persistent-sqlite-2.13.3.1 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +[[!tag projects/ICE4]]
diff --git a/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn b/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn index 2a52839629..af9c36998c 100644 --- a/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn +++ b/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn @@ -16,4 +16,4 @@ Worth it to note that AGit-Flow already works for contributors with write access Do you have any other ideas on how git-annex could be used in this workflow? -[[!tag projects/INM7]] +[[!tag projects/ICE4]]
typo
diff --git a/doc/projects/FJZ.mdwn b/doc/projects/FZJ.mdwn similarity index 94% rename from doc/projects/FJZ.mdwn rename to doc/projects/FZJ.mdwn index 1e2a2635c8..c2953d4600 100644 --- a/doc/projects/FJZ.mdwn +++ b/doc/projects/FZJ.mdwn @@ -1,4 +1,4 @@ -At FJZ, the INM-7 and ICE-4 data hosting infrastructures use git-annex. +At FZJ, the INM-7 and ICE-4 data hosting infrastructures use git-annex. This is a tracking page for issues relating to those projects. It includes issues relating to [forgejo-aneksajo](https://codeberg.org/matrss/forgejo-aneksajo). diff --git a/doc/projects/ICE4.mdwn b/doc/projects/ICE4.mdwn index 4e9588d414..1ff01d36d4 100644 --- a/doc/projects/ICE4.mdwn +++ b/doc/projects/ICE4.mdwn @@ -1 +1 @@ -[[!meta redir=FJZ]] +[[!meta redir=FZJ]] diff --git a/doc/projects/INM7.mdwn b/doc/projects/INM7.mdwn index 4e9588d414..1ff01d36d4 100644 --- a/doc/projects/INM7.mdwn +++ b/doc/projects/INM7.mdwn @@ -1 +1 @@ -[[!meta redir=FJZ]] +[[!meta redir=FZJ]]
Added a comment
diff --git a/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_2_a2d5b4bda70398422636652b8bf6e9f2._comment b/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_2_a2d5b4bda70398422636652b8bf6e9f2._comment new file mode 100644 index 0000000000..20e97fc099 --- /dev/null +++ b/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config/comment_2_a2d5b4bda70398422636652b8bf6e9f2._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 2" + date="2025-10-29T17:38:12Z" + content=""" +Due to your explanation of how p2phttp is supposed to work with proxying I realized that I have made a mistake in how I have integrated it into Forgejo-aneksajo. Right now there is a single p2phttp endpoint serving all repositories, and the provided UUID is used to determine on which repository to act. But this breaks with proxying, since the UUID then doesn't necessarily correspond to a repository on the instance. I will therefore have to move the p2phttp endpoints under the `<owner>/<repo>` routing namespace. Having git-annex retrieve updated config data from remotes would make this change propagate to clones automatically, which would be nice I think. +"""]]
diff --git a/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config.mdwn b/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config.mdwn index 95f3f2f612..e483b00c76 100644 --- a/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config.mdwn +++ b/doc/todo/p2phttp__58___regularly_re-check_for_annex.url_config.mdwn @@ -4,3 +4,5 @@ From my experimentation it seems to be that git-annex does not discover the `ann 2. Likewise, if the server-side initially didn't support p2phttp and didn't set `annex.url` when the repository was cloned, but is later updated to support it, git-annex doesn't automatically pick up this change. This automatic discovery would be nice for p2phttp support in forgejo-aneksajo, as existing clones could automatically start making use of it as soon as the instance is updated to support it on the server-side and the git-annex version is updated to be recent enough on the client-side. + +[[!tag projects/ICE4]]
diff --git a/doc/todo/More_fine-grained_testremote_command.mdwn b/doc/todo/More_fine-grained_testremote_command.mdwn index bb4eb63f26..25182b28fc 100644 --- a/doc/todo/More_fine-grained_testremote_command.mdwn +++ b/doc/todo/More_fine-grained_testremote_command.mdwn @@ -9,4 +9,4 @@ If that's not possible for some reason it would also be an improvement with rega What do you think? -[[!tag projects/INM7]] +[[!tag projects/ICE4]]
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth.mdwn b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth.mdwn index da0edd22da..06a2cf4cd0 100644 --- a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth.mdwn +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth.mdwn @@ -83,4 +83,4 @@ $ ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) -[[!tag projects/INM7]] +[[!tag projects/ICE4]]
fix
diff --git a/doc/projects/FJZ.mdwn b/doc/projects/FJZ.mdwn index 849de58bb6..1e2a2635c8 100644 --- a/doc/projects/FJZ.mdwn +++ b/doc/projects/FJZ.mdwn @@ -27,7 +27,7 @@ Bugs <details> <summary>Fixed</summary> -[[!inline pages="bugs/* and !bugs/done and link(bugs/done)) and +[[!inline pages="bugs/* and !bugs/done and link(bugs/done) and (tagged(projects/INM7) or tagged(projects/ICE4))" feeds=no actions=yes archive=yes show=0 template=buglist]] </details>
fix
diff --git a/doc/projects/FJZ.mdwn b/doc/projects/FJZ.mdwn index b06738f104..849de58bb6 100644 --- a/doc/projects/FJZ.mdwn +++ b/doc/projects/FJZ.mdwn @@ -27,7 +27,7 @@ Bugs <details> <summary>Fixed</summary> -[[!inline pages="(bugs/* and !bugs/done and link(bugs/done)) and -(tagged(projects/INM7 or tagged(projects/ICE4))" feeds=no actions=yes archive=yes show=0 template=buglist]] +[[!inline pages="bugs/* and !bugs/done and link(bugs/done)) and +(tagged(projects/INM7) or tagged(projects/ICE4))" feeds=no actions=yes archive=yes show=0 template=buglist]] </details>
fix
diff --git a/doc/projects/FJZ.mdwn b/doc/projects/FJZ.mdwn index 830a29f8f5..b06738f104 100644 --- a/doc/projects/FJZ.mdwn +++ b/doc/projects/FJZ.mdwn @@ -28,6 +28,6 @@ Bugs <summary>Fixed</summary> [[!inline pages="(bugs/* and !bugs/done and link(bugs/done)) and -(tagged(projects/INM7 or tagged(projects/ICE4)" feeds=no actions=yes archive=yes show=0 template=buglist]] +(tagged(projects/INM7 or tagged(projects/ICE4))" feeds=no actions=yes archive=yes show=0 template=buglist]] </details>
add ICE4 page, as a redirect
diff --git a/doc/projects.mdwn b/doc/projects.mdwn index 9a750943dd..aacb8fe897 100644 --- a/doc/projects.mdwn +++ b/doc/projects.mdwn @@ -1,5 +1,5 @@ Projects that rely on git-annex can put pages here to do things like track bugs that affect them, etc. (See also: [[related_software]]) -[[!inline pages="projects/* and !projects/*/* and !*/Discussion and !projects/INM7" +[[!inline pages="projects/* and !projects/*/* and !*/Discussion and !projects/INM7 and !projects/ICE4" feeds=no archive=yes sort=title rootpage="projects" postformtext="Add your project:"]] diff --git a/doc/projects/ICE4.mdwn b/doc/projects/ICE4.mdwn new file mode 100644 index 0000000000..4e9588d414 --- /dev/null +++ b/doc/projects/ICE4.mdwn @@ -0,0 +1 @@ +[[!meta redir=FJZ]]
rename project page (left a redirect)
diff --git a/doc/projects.mdwn b/doc/projects.mdwn index bbf2b2d2b9..9a750943dd 100644 --- a/doc/projects.mdwn +++ b/doc/projects.mdwn @@ -1,5 +1,5 @@ Projects that rely on git-annex can put pages here to do things like track bugs that affect them, etc. (See also: [[related_software]]) -[[!inline pages="projects/* and !projects/*/* and !*/Discussion" +[[!inline pages="projects/* and !projects/*/* and !*/Discussion and !projects/INM7" feeds=no archive=yes sort=title rootpage="projects" postformtext="Add your project:"]] diff --git a/doc/projects/FJZ.mdwn b/doc/projects/FJZ.mdwn new file mode 100644 index 0000000000..830a29f8f5 --- /dev/null +++ b/doc/projects/FJZ.mdwn @@ -0,0 +1,33 @@ +At FJZ, the INM-7 and ICE-4 data hosting infrastructures use git-annex. +This is a tracking page for issues relating to those projects. +It includes issues relating to +[forgejo-aneksajo](https://codeberg.org/matrss/forgejo-aneksajo). + +TODOs +===== + +[[!inline pages="todo/* and !todo/done and !link(todo/done) and +(tagged(projects/INM7) or tagged(projects/ICE4))" sort=mtime feeds=no actions=yes archive=yes show=0 template=buglist]] + + +<details> +<summary>Done</summary> + +[[!inline pages="todo/* and !todo/done and link(todo/done) and +(tagged(projects/INM7) or tagged(projects/ICE4))" feeds=no actions=yes archive=yes show=0 template=buglist]] + +</details> + +Bugs +==== + +[[!inline pages="bugs/* and !bugs/done and !link(bugs/done) and +(tagged(projects/INM7) or tagged(projects/ICE4))" sort=mtime feeds=no actions=yes archive=yes show=0 template=buglist template=buglist]] + +<details> +<summary>Fixed</summary> + +[[!inline pages="(bugs/* and !bugs/done and link(bugs/done)) and +(tagged(projects/INM7 or tagged(projects/ICE4)" feeds=no actions=yes archive=yes show=0 template=buglist]] + +</details> diff --git a/doc/projects/INM7.mdwn b/doc/projects/INM7.mdwn index 917d1428cf..4e9588d414 100644 --- a/doc/projects/INM7.mdwn +++ b/doc/projects/INM7.mdwn @@ -1,32 +1 @@ -The INM7 data hosting infrastructure uses git-annex. This is a tracking -page for issues relating to that project. It includes issues relating to -[forgejo-aneksajo](https://codeberg.org/matrss/forgejo-aneksajo). - -TODOs -===== - -[[!inline pages="todo/* and !todo/done and !link(todo/done) and -tagged(projects/INM7)" sort=mtime feeds=no actions=yes archive=yes show=0 template=buglist]] - - -<details> -<summary>Done</summary> - -[[!inline pages="todo/* and !todo/done and link(todo/done) and -tagged(projects/INM7)" feeds=no actions=yes archive=yes show=0 template=buglist]] - -</details> - -Bugs -==== - -[[!inline pages="bugs/* and !bugs/done and !link(bugs/done) and -tagged(projects/INM7)" sort=mtime feeds=no actions=yes archive=yes show=0 template=buglist template=buglist]] - -<details> -<summary>Fixed</summary> - -[[!inline pages="(bugs/* and !bugs/done and link(bugs/done)) and -tagged(projects/INM7)" feeds=no actions=yes archive=yes show=0 template=buglist]] - -</details> +[[!meta redir=FJZ]]
add news item for git-annex 10.20251029
diff --git a/doc/news/version_10.20250630.mdwn b/doc/news/version_10.20250630.mdwn deleted file mode 100644 index 7761138fea..0000000000 --- a/doc/news/version_10.20250630.mdwn +++ /dev/null @@ -1,9 +0,0 @@ -git-annex 10.20250630 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Work around git 2.50 bug that caused it to crash when there is a merge - conflict with an unlocked annexed file. - * Skip and warn when a tree import includes empty filenames, - which can happen with eg a S3 bucket. - * Avoid a problem with temp file names ending in whitespace on - filesystems like VFAT that don't support such filenames. - * webapp: Rename "Upgrade Repository" to "Convert Repository" - to avoid confusion with git-annex upgrade."""]] \ No newline at end of file diff --git a/doc/news/version_10.20251029.mdwn b/doc/news/version_10.20251029.mdwn new file mode 100644 index 0000000000..b98b28583c --- /dev/null +++ b/doc/news/version_10.20251029.mdwn @@ -0,0 +1,5 @@ +git-annex 10.20251029 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * Support ssh remotes with '#' and '?' in the path to the repository, + the same way git does. + * assistant: Fix reversion that caused files to be added locked by + default."""]] \ No newline at end of file
Added a comment
diff --git a/doc/bugs/install_on_android_boox__58___xargs_Permission_denied/comment_4_2e7fc84160c9bd75f7dc1a6a44ab132a._comment b/doc/bugs/install_on_android_boox__58___xargs_Permission_denied/comment_4_2e7fc84160c9bd75f7dc1a6a44ab132a._comment new file mode 100644 index 0000000000..0564fa74db --- /dev/null +++ b/doc/bugs/install_on_android_boox__58___xargs_Permission_denied/comment_4_2e7fc84160c9bd75f7dc1a6a44ab132a._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 4" + date="2025-10-26T11:55:55Z" + content=""" +I am seeing the same issue on a Pixel 9a with GrapheneOS and Termux installed from the Play Store, but _not_ with Termux installed from F-Droid. + +Digging a bit I found that: + +- Termux does not advertise the Google Play Store as a means of installation on their website: <https://termux.dev/en/> +- The Termux wiki states that everyone who can use F-Droid should get it from there: <https://wiki.termux.com/wiki/Termux_Google_Play> +- The F-Droid and Play Store builds are created from different codebases, at least temporarily: <https://github.com/termux-play-store#:~:text=As%20the%20F%2DDroid%20build%20of,and%20details%20are%20worked%20out.> + +I've also noticed that some executables just don't exist in Termux-from-Play-Store, e.g. `termux-change-repo`. + +Considering that you might just want to try the F-Droid build, if possible. +"""]]
bug
diff --git a/doc/bugs/p2p_--pair_seems_broken_for_iroh.mdwn b/doc/bugs/p2p_--pair_seems_broken_for_iroh.mdwn new file mode 100644 index 0000000000..7c4de04a2f --- /dev/null +++ b/doc/bugs/p2p_--pair_seems_broken_for_iroh.mdwn @@ -0,0 +1,5 @@ +When using git-annex-p2p-iroh, `git-annnex p2p --pair` times out after the +magic wormhole step. + +`git-annex p2p --link` does work with the iroh script, so this is probably +a bug in git-annex. --[[Joey]]
Propose emphemeral special remotes
diff --git a/doc/todo/Ephemeral_special_remotes.mdwn b/doc/todo/Ephemeral_special_remotes.mdwn new file mode 100644 index 0000000000..7082e52512 --- /dev/null +++ b/doc/todo/Ephemeral_special_remotes.mdwn @@ -0,0 +1,14 @@ +Connecting to a discussion we had at distribits.... + +It would be useful to extend the external special remote protocol with the ability to create ephemeral special remotes. Ephemeral in the sense that they are created by and during the runtime of a special remote, and only exist until that special remote process is terminated by git-annex. + +There could be a new protocol command that takes the same parameters as `initremote` as arguments. Its response would be the UUID of the created special remote. + +The second part of the protocol extension would be a third response value for `CHECKPRESENT`, `TRANSFER*`, `REMOVE`. The addition to `SUCCESS`, and `FAILURE` would by `REDIRECT-REMOTE <UUID>`, and instruct git-annex to perform the same request against the special remote given by `UUID` instead. + +The corresponding change in key availability would be recorded for the original special remote. + +A use case would be to have an "orchestration" special remotes that maybe represent a particular infrastructure. They dynamically deploy appropriate transfer setups, and do not commit them to a repository. This can be useful for setups with short-lived tokens/urls. This is +in some way also an alternative to the `sameas` approach, where the alternatives are hidden in the implementation of a special remote, rather than in *each* repository. + +[[!tag projects/INM7]]
diff --git a/doc/todo/Special_remote_redirect_to_URL.mdwn b/doc/todo/Special_remote_redirect_to_URL.mdwn
index 1407702620..b3b51e18a3 100644
--- a/doc/todo/Special_remote_redirect_to_URL.mdwn
+++ b/doc/todo/Special_remote_redirect_to_URL.mdwn
@@ -3,7 +3,7 @@ The [external special remote protocol](/design/external_special_remote_protocol/
* `TRANSFER-SUCCESS RETRIEVE {key}`
* `TRANSFER-FAILURE RETRIEVE {key} {message}`
-I propose a third response: `TRANSFER-REDIRECT RETRIEVE {key} {url}`
+I propose a third response: `TRANSFER-REDIRECT-URL RETRIEVE {key} {url}`
This will permit the following use cases:
Initiate request for request redirection
diff --git a/doc/todo/Special_remote_redirect_to_URL.mdwn b/doc/todo/Special_remote_redirect_to_URL.mdwn
new file mode 100644
index 0000000000..1407702620
--- /dev/null
+++ b/doc/todo/Special_remote_redirect_to_URL.mdwn
@@ -0,0 +1,15 @@
+The [external special remote protocol](/design/external_special_remote_protocol/) allows the following responses to `TRANSFER RETRIEVE {key} {file}`:
+
+* `TRANSFER-SUCCESS RETRIEVE {key}`
+* `TRANSFER-FAILURE RETRIEVE {key} {message}`
+
+I propose a third response: `TRANSFER-REDIRECT RETRIEVE {key} {url}`
+
+This will permit the following use cases:
+
+1) Make a request against an authentication server that provides a short-lived access token to the same or a different server. The authentication server does not need to relay the data.
+2) Deterministically calculate a remote URL (or local path) without reimplementing HTTP fetch logic, taking advantage of the testing and security hardening of the git-annex implementation.
+
+
+[[!meta author=cjmarkie]]
+[[!tag projects/openneuro]]
remove redundant comment
diff --git a/doc/special_remotes/p2p/git-annex-p2p-iroh b/doc/special_remotes/p2p/git-annex-p2p-iroh
index 545bdc4166..b1969c7380 100755
--- a/doc/special_remotes/p2p/git-annex-p2p-iroh
+++ b/doc/special_remotes/p2p/git-annex-p2p-iroh
@@ -33,7 +33,6 @@ if [ "$1" = address ]; then
else
socketfile="$2"
if [ -z "$socketfile" ]; then
- # Connect to the peer's address and relay stdin and stdout.
peeraddress="$1"
dumbpipe connect "$peeraddress"
else
add
diff --git a/doc/special_remotes/p2p.mdwn b/doc/special_remotes/p2p.mdwn index 37e606dd4d..6a959f4c49 100644 --- a/doc/special_remotes/p2p.mdwn +++ b/doc/special_remotes/p2p.mdwn @@ -10,6 +10,8 @@ For other P2P networks, a fairly simple program is used to connect git-annex up with the network. Install one of these programs to use the P2P network of your choice: +* [[git-annex-p2p-iroh]] + Uses [Iroh](https://www.iroh.computer/) for fast P2P with hole punching. * [[git-annex-p2p-unix-sockets]] This is only a demo, using unix sockets in `/tmp` rather than a real P2P network. Not for real world use.
add
diff --git a/doc/special_remotes/p2p/git-annex-p2p-iroh b/doc/special_remotes/p2p/git-annex-p2p-iroh
new file mode 100755
index 0000000000..545bdc4166
--- /dev/null
+++ b/doc/special_remotes/p2p/git-annex-p2p-iroh
@@ -0,0 +1,43 @@
+#!/bin/sh
+# Allows git-annex to use iroh for P2P connections.
+#
+# This uses a modified version of iroh's dumbpipe program, adding the
+# generate-ticket command. This pull request has the necessary changes:
+# https://github.com/n0-computer/dumbpipe/pull/86
+#
+# Quality: experimental. Has worked at least twice, but there are known and
+# unknown bugs.
+#
+# Copyright 2025 Joey Hess; licenced under the GNU GPL version 3 or higher.
+
+set -e
+
+git_dir=$(git rev-parse --git-dir)
+creds_dir="$git_dir/annex/creds"
+iroh_secret_file="$creds_dir/iroh-secret"
+
+get_iroh_secret () {
+ IROH_SECRET=$(cat "$iroh_secret_file")
+ export IROH_SECRET
+}
+
+if [ "$1" = address ]; then
+ if [ ! -e "$iroh_secret_file" ]; then
+ mkdir -p "$creds_dir"
+ umask 077
+ gpg --gen-random 16 32 > $iroh_secret_file
+ fi
+ get_iroh_secret
+ # avoid display of the iroh secret to stderr
+ dumbpipe generate-ticket 2>/dev/null
+else
+ socketfile="$2"
+ if [ -z "$socketfile" ]; then
+ # Connect to the peer's address and relay stdin and stdout.
+ peeraddress="$1"
+ dumbpipe connect "$peeraddress"
+ else
+ get_iroh_secret
+ dumbpipe listen-unix --socket-path="$socketfile"
+ fi
+fi
typo
diff --git a/doc/special_remotes/p2p/git-annex-p2p-unix-sockets b/doc/special_remotes/p2p/git-annex-p2p-unix-sockets index 100dee5291..2fe1568efb 100755 --- a/doc/special_remotes/p2p/git-annex-p2p-unix-sockets +++ b/doc/special_remotes/p2p/git-annex-p2p-unix-sockets @@ -4,7 +4,7 @@ # This simulates a multi-node P2P network using unix # socket files in /tmp. # -# Copyright 2025 Joey Hess; icenced under the GNU GPL version 3 or higher. +# Copyright 2025 Joey Hess; licenced under the GNU GPL version 3 or higher. set -e
Added a comment
diff --git a/doc/bugs/some_conflict_resolution_tests_fail_some_time/comment_3_951e1bd5ea9828205988d5d61a4acd54._comment b/doc/bugs/some_conflict_resolution_tests_fail_some_time/comment_3_951e1bd5ea9828205988d5d61a4acd54._comment new file mode 100644 index 0000000000..13c374630e --- /dev/null +++ b/doc/bugs/some_conflict_resolution_tests_fail_some_time/comment_3_951e1bd5ea9828205988d5d61a4acd54._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 3" + date="2025-10-24T14:26:07Z" + content=""" +ha -- ran into this issue while looking for some demo of `datalad foreach-subdatset`, FWIW, may be it could be considered \"done\"/closed since I do not see consistent re-manifestation in 2025. Last one of some kind failures `git grep` matching only in early 2024 + +"""]]
Support ssh remotes with '#' and '?' in the path to the repository
The same way git does.
Affected repository types are regular git ssh remotes, and also gcrypt
remotes, and potentially also bup remotes.
repoPath is used for such repositories accessed over ssh. uriPath is used
in some other places, eg the bittorrent special remote, where it would not
be appropriate to mimic git's behavior. The distinction seems to hold up
well from what I can see.
The ordering of uriFragment after uriQuery is to correctly handle cases
where both appear in an url. "ssh://localhost/tmp/foo?baz#bar" has an
uriFragment of "#bar" and an uriQuery of "?baz". On the other hand,
"ssh://localhost/tmp/foo#baz?bar" has an uriFragment of "#baz?bar" and no
uriQuery.
Sponsored-by: Dartmouth College's DANDI project
The same way git does.
Affected repository types are regular git ssh remotes, and also gcrypt
remotes, and potentially also bup remotes.
repoPath is used for such repositories accessed over ssh. uriPath is used
in some other places, eg the bittorrent special remote, where it would not
be appropriate to mimic git's behavior. The distinction seems to hold up
well from what I can see.
The ordering of uriFragment after uriQuery is to correctly handle cases
where both appear in an url. "ssh://localhost/tmp/foo?baz#bar" has an
uriFragment of "#bar" and an uriQuery of "?baz". On the other hand,
"ssh://localhost/tmp/foo#baz?bar" has an uriFragment of "#baz?bar" and no
uriQuery.
Sponsored-by: Dartmouth College's DANDI project
diff --git a/CHANGELOG b/CHANGELOG
index c9eabe919c..21888c75f1 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,3 +1,10 @@
+git-annex (10.20250930) UNRELEASED; urgency=medium
+
+ * Support ssh remotes with '#' and '?' in the path to the repository,
+ the same way git does.
+
+ -- Joey Hess <id@joeyh.name> Mon, 20 Oct 2025 15:22:30 -0400
+
git-annex (10.20250929) upstream; urgency=medium
* enableremote: Allow type= to be provided when it does not change the
diff --git a/Git.hs b/Git.hs
index 3eafcd674b..30930c2c17 100644
--- a/Git.hs
+++ b/Git.hs
@@ -38,7 +38,7 @@ module Git (
relPath,
) where
-import Network.URI (uriPath, uriScheme, unEscapeString)
+import Network.URI (uriPath, uriScheme, uriQuery, uriFragment, unEscapeString)
#ifndef mingw32_HOST_OS
import System.Posix.Files
#endif
@@ -73,7 +73,10 @@ repoLocation Repo { location = Unknown } = giveup "unknown repoLocation"
- it's the gitdir, and for URL repositories, is the path on the remote
- host. -}
repoPath :: Repo -> OsPath
-repoPath Repo { location = Url u } = toOsPath $ unEscapeString $ uriPath u
+repoPath Repo { location = Url u } = toOsPath $ unEscapeString $
+ -- git allows the path of a ssh url to include both '?' and '#',
+ -- and treats them as part of the path
+ uriPath u ++ uriQuery u ++ uriFragment u
repoPath Repo { location = Local { worktree = Just d } } = d
repoPath Repo { location = Local { gitdir = d } } = d
repoPath Repo { location = LocalUnknown dir } = dir
diff --git a/doc/bugs/fails_to_discover_uuid_over_ssh_with___35___in_path_.mdwn b/doc/bugs/fails_to_discover_uuid_over_ssh_with___35___in_path_.mdwn
index bd5130b64f..1b3ebda7eb 100644
--- a/doc/bugs/fails_to_discover_uuid_over_ssh_with___35___in_path_.mdwn
+++ b/doc/bugs/fails_to_discover_uuid_over_ssh_with___35___in_path_.mdwn
@@ -72,3 +72,5 @@ FTR: I was trying to backup some old behavioral videos (octopus) from the laptop
[[!meta author=yoh]]
[[!tag projects/dandi]]
+
+> [[fixed|done]] --[[Joey]]
diff --git a/doc/bugs/fails_to_discover_uuid_over_ssh_with___35___in_path_/comment_1_00c1062abe02a42cea491f6bb8e6e5dc._comment b/doc/bugs/fails_to_discover_uuid_over_ssh_with___35___in_path_/comment_1_00c1062abe02a42cea491f6bb8e6e5dc._comment
new file mode 100644
index 0000000000..f035f5644d
--- /dev/null
+++ b/doc/bugs/fails_to_discover_uuid_over_ssh_with___35___in_path_/comment_1_00c1062abe02a42cea491f6bb8e6e5dc._comment
@@ -0,0 +1,11 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 1"""
+ date="2025-10-20T19:16:15Z"
+ content="""
+Also affected is '?' in the path. It's somewhat surprising to me that git
+treats these parts of an url as path components, but
+not too surprising, as git's definition of "url" is pretty loose.
+
+Fixed git-annex to follow suite.
+"""]]
Added a comment: Re: support for bulk write/read/test remote - ps
diff --git a/doc/design/external_special_remote_protocol/comment_58_d2ba09a90544cdfa245e69b951107702._comment b/doc/design/external_special_remote_protocol/comment_58_d2ba09a90544cdfa245e69b951107702._comment new file mode 100644 index 0000000000..6df734ae3a --- /dev/null +++ b/doc/design/external_special_remote_protocol/comment_58_d2ba09a90544cdfa245e69b951107702._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="Re: support for bulk write/read/test remote - ps" + date="2025-10-19T05:00:34Z" + content=""" +P.S.: And to make it more clear why I told about dar first and then about writing BDXLs - when I mentioned dar - it was the stage when I was experimenting with dar to use it as an intermediary to write to BDXLs. But then I started to experiment with plain files because it could be better for a long-term archival solution. +"""]]
removed
diff --git a/doc/design/external_special_remote_protocol/comment_58_2bd7eb40046423b1424eaa2aae78ba95._comment b/doc/design/external_special_remote_protocol/comment_58_2bd7eb40046423b1424eaa2aae78ba95._comment deleted file mode 100644 index c48cd63c2b..0000000000 --- a/doc/design/external_special_remote_protocol/comment_58_2bd7eb40046423b1424eaa2aae78ba95._comment +++ /dev/null @@ -1,8 +0,0 @@ -[[!comment format=mdwn - username="psxvoid" - avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" - subject="Rr: support for bulk write/read/test remote (PS)" - date="2025-10-19T04:58:54Z" - content=""" -P.S.: And to make it more clear why I told about dar first and then about writing BDXLs - when I mentioned dar - it was the stage when I experimented with dar to use it as an intermediary to write to BDXLs. But then I started to experiment with plain files because it could be better for long-term archival solution. -"""]]
Added a comment: Rr: support for bulk write/read/test remote (PS)
diff --git a/doc/design/external_special_remote_protocol/comment_58_2bd7eb40046423b1424eaa2aae78ba95._comment b/doc/design/external_special_remote_protocol/comment_58_2bd7eb40046423b1424eaa2aae78ba95._comment new file mode 100644 index 0000000000..c48cd63c2b --- /dev/null +++ b/doc/design/external_special_remote_protocol/comment_58_2bd7eb40046423b1424eaa2aae78ba95._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="Rr: support for bulk write/read/test remote (PS)" + date="2025-10-19T04:58:54Z" + content=""" +P.S.: And to make it more clear why I told about dar first and then about writing BDXLs - when I mentioned dar - it was the stage when I experimented with dar to use it as an intermediary to write to BDXLs. But then I started to experiment with plain files because it could be better for long-term archival solution. +"""]]
Added a comment: Re: support for bulk write/read/test remote - joey
diff --git a/doc/design/external_special_remote_protocol/comment_57_20103729c97cb4392715987dca5408ae._comment b/doc/design/external_special_remote_protocol/comment_57_20103729c97cb4392715987dca5408ae._comment new file mode 100644 index 0000000000..f74740ca6f --- /dev/null +++ b/doc/design/external_special_remote_protocol/comment_57_20103729c97cb4392715987dca5408ae._comment @@ -0,0 +1,34 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="Re: support for bulk write/read/test remote - joey" + date="2025-10-19T04:36:37Z" + content=""" +Hi Joey, + +Sorry, for the late response, and thanks for the feedback. + +> \"that's fundamentally different than how git-annex works\" + +Hence the previous comment :) + +> \"And I think you could put it in your special remote.\" + +That's exactly what I was doing around a year ago. I was implementing a special remote to support writing data on BDXL disks. + +> \"So that when git-annex sends a file to your remote, the file is actually stored in the remote, rather than in a temporary location.\" + +Yep, roughly that's how I was implementing it - storing intermediate data in an sqlite database. + +I'd put the project on hold because I started to ask myself the following questions: + +1. OK, I can store transactions in the special remote. It means storing what is where on which disk. Isn't it what git annex supposed to do? +2. If a BDXL disk get's corrupted or lost, how to reflect it in the git annex repo and the special remote? I can mark it as \"lost\" in the remote, then run fsck in git annex remote. +3. Because I have to track location data separately in the special remote, what if it get's corrupted (the sqlite database)? +4. What if I buy 50GB BDXL instead of 100GB which I'm using? Does it means the special remote also should track free space on each disk? +5. Burning a disk - what if it won't be successful? Git annex will think that it was successful, cause it doesn't support bulk operations and numcopies rules will be violated. + +There were many more questions like this. + +And at some point the design started to look more like a blown-up feature-reach archival application/solution. The main point here is that it's definitely possible. I can limit the scope but there are many many issues, and nobody except me will be interested in it. Plus, many responsibilities would be overlapping with git annex. +"""]]
diff --git a/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth.mdwn b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth.mdwn new file mode 100644 index 0000000000..da0edd22da --- /dev/null +++ b/doc/bugs/__96__git_annex_push__96___does_not_use_git-credential-oauth.mdwn @@ -0,0 +1,86 @@ +### Please describe the problem. + +I have git-credential-oauth configured to ease http authentication against Forgejo instances: + +``` +[credential] + helper = cache --timeout 21600 + helper = oauth +``` + +When I am using `git annex push` to push to a non-existing repository on a Forgejo-aneksajo instance it doesn't utilize that credential helper though, and instead asks for username and password (see log below). The same also happens for `git annex sync`. Once oauth authorization has happened and an access token is cached (i.e. after the `git push` in the log) git-annex does use it properly. + + +### What steps will reproduce the problem? + +See log below, combined with the git-credential-oauth configuration from above. + + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20250828-gfe7ecf505146342fe8df2430a0bcaf5f02d89a80 +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.34 DAV-1.3.4 feed-1.3.2.1 ghc-9.6.6 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +``` + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + +$ datalad create test1 +create(ok): /home/icg149/Playground/test1 (dataset) +$ cd test1 +$ git remote add origin https://atris.fz-juelich.de/m.risse/test1.git +$ git annex push origin +Username for 'https://atris.fz-juelich.de': ^C +[ble: exit 130] +$ git push origin main +Please complete authentication in your browser... +https://atris.fz-juelich.de/login/oauth/authorize?client_id=a4792ccc-144e-407e-86c9-5e7d8d9c3269&code_challenge=uEYAd0rzQY4JG0yOkDNMUNEBHqIQInrvdMqOIFL3AWI&code_challenge_method=S256&redirect_uri=http%3A%2F%2F127.0.0.1%3A42305&response_type=code&state=9UFx41eIbP3Qn9PWh_5eGaE3UDWMNdWBKE6_nwIo_DM +Enumerating objects: 6, done. +Counting objects: 100% (6/6), done. +Delta compression using up to 8 threads +Compressing objects: 100% (5/5), done. +Writing objects: 100% (6/6), 521 bytes | 521.00 KiB/s, done. +Total 6 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0) +To https://atris.fz-juelich.de/m.risse/test1.git + * [new branch] main -> main +$ git annex push origin +push origin +Everything up-to-date +Enumerating objects: 5, done. +Counting objects: 100% (5/5), done. +Delta compression using up to 8 threads +Compressing objects: 100% (3/3), done. +Writing objects: 100% (5/5), 426 bytes | 426.00 KiB/s, done. +Total 5 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0) +remote: +remote: Create a new pull request for 'synced/main': +remote: https://atris.fz-juelich.de/m.risse/test1/compare/main...synced/main +remote: +remote: +remote: Create a new pull request for 'synced/git-annex': +remote: https://atris.fz-juelich.de/m.risse/test1/compare/main...synced/git-annex +remote: +To https://atris.fz-juelich.de/m.risse/test1.git + * [new branch] main -> synced/main + * [new branch] git-annex -> synced/git-annex +ok +$ + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + + +[[!tag projects/INM7]]
Added a comment
diff --git a/doc/todo/More_fine-grained_testremote_command/comment_3_88a943ac262a279a46d0b761d1e2a24e._comment b/doc/todo/More_fine-grained_testremote_command/comment_3_88a943ac262a279a46d0b761d1e2a24e._comment new file mode 100644 index 0000000000..8094c11623 --- /dev/null +++ b/doc/todo/More_fine-grained_testremote_command/comment_3_88a943ac262a279a46d0b761d1e2a24e._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 3" + date="2025-10-14T16:52:24Z" + content=""" +> It's not as simple as just plumbing that up though, because testremote has implicit dependencies in its test ordering. It has to do the storeKey test before it can do the present test, for example. + +I already thought that this might be the case, so running the tests independently isn't really infeasible. + +To address my second point I might be able to just parse the output of testremote into \"sub-tests\" on the Forgejo-aneksajo side. Tasty doesn't seem to have a nice streaming output format for that though, right? There is a TAP formatter, but that looks unmaintained... + +--- + +> There are actually only two write operations, storeKey and removeKey. Since removeKey is supposed to succeed when a key is not present, if storeKey fails, then removeKey will succeed. But removeKey should fail to remove a key that is stored on the remote. To test that, the --test-readonly=file option would need to be used to provide a file that is already stored on the remote. + +Now that you are saying this, is a new option even necessary? --test-readonly already takes a filename that is expected to be present on the remote, so instead of adding a new option --test-readonly could ensure that this key can't be removed, and that a different key can't be stored (and that removeKey succeeds on this not-present key). +"""]]
comment
diff --git a/doc/todo/More_fine-grained_testremote_command/comment_1_d0d1406f9b1619b57908e62ac3200f69._comment b/doc/todo/More_fine-grained_testremote_command/comment_1_d0d1406f9b1619b57908e62ac3200f69._comment new file mode 100644 index 0000000000..8d61f17ac7 --- /dev/null +++ b/doc/todo/More_fine-grained_testremote_command/comment_1_d0d1406f9b1619b57908e62ac3200f69._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-10-14T14:54:53Z" + content=""" +It would be possible to make `git-annex testremote` support the +command-line options of the underlying test framework (tasty). +`git-annex test` already does that, so has --list-test and --pattern. + +It's not as simple as just plumbing that up though, because testremote has +implicit dependencies in its test ordering. It has to do the `storeKey` +test before it can do the `present` test, for example. Those dependencies +would need to be made explict, rather than implicit. + +Explict dependencies, though, would also make it not really possible to run +most of the tests separately. Running testremote 5 times to run the listed +tests, if each run does the necessary `storeKey` would add a lot of overhead. + +Not declaring dependencies and leaving it up to the user to run testremote +repeatedly to run a sequence of tests in the necessary order would also +run into problems with testremote using random test keys which change every +time it's run, as well as it having an end cleanup stage where it removes +any lingering test keys from the local repository and the remote. + +This seems to be a bit of an impasse... :-/ +"""]] diff --git a/doc/todo/More_fine-grained_testremote_command/comment_2_de940a1c4ca0194582cd0ad449eefe28._comment b/doc/todo/More_fine-grained_testremote_command/comment_2_de940a1c4ca0194582cd0ad449eefe28._comment new file mode 100644 index 0000000000..94958c870e --- /dev/null +++ b/doc/todo/More_fine-grained_testremote_command/comment_2_de940a1c4ca0194582cd0ad449eefe28._comment @@ -0,0 +1,34 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-10-14T15:21:02Z" + content=""" +I don't know about the "--write-only" name, but I see the value in having a +way for testremote to check what a remote that is expected to only allow +read access does not allow any writes, as well as otherwise behaving +correctly. + +There are actually only two write operations, `storeKey` and `removeKey`. +Since `removeKey` is supposed to succeed when a key is not present, if +`storeKey` fails, then `removeKey` will succeed. But `removeKey` should +fail to remove a key that is stored on the remote. To test that, +the --test-readonly=file option would need to be used to provide a file +that is already stored on the remote. + +I think it would make sense to require that option be present +in order to use this new "--write-only" (or whatever name) option. + +--- + +Also, git-annex does know internally that some remotes are readonly. For +example, a regular http git remote that does not use p2phttp. +Or any remote that has `remote.<name>.annex-readonly` set. Currently +`testremote` only skips all the write tests for those, rather than +confirming that writes fail. It would make sense for testremote of a known +readonly remote to behave as if this new option were provided. + +(But, setting `remote.<name>.annex-readonly` rather than using +the "--write-only" option would not work for you, because that config +causes git-annex to refuse to try to write to the remote. Which doesn't +tell you if your server is configured to correctly reject writes.) +"""]]
diff --git a/doc/todo/More_fine-grained_testremote_command.mdwn b/doc/todo/More_fine-grained_testremote_command.mdwn new file mode 100644 index 0000000000..bb4eb63f26 --- /dev/null +++ b/doc/todo/More_fine-grained_testremote_command.mdwn @@ -0,0 +1,12 @@ +I am using `git annex testremote` as a baseline test bench for Forgejo-aneksajo as a git-annex remote, and it is awesome to have that. I have some pain points with it though: + +- I would like to use these tests to confirm that I don't accidentally give write access to read-only users. This means I would need a way to ensure that all tests which require write access fail against the remote. +- I am spawning a `git annex testremote` subprocess within the integration tests of Forgejo-aneksajo, which are written in Go. Sometimes this "large blackbox test" gets stuck in CI and I haven't figured out why yet. It would be nice to have a more transparent integration into the Forgejo-aneksajo test suite. + +Both of those points could be addressed if `git annex testremote` provided a way to run each test individually, and to get a list of all the available tests categorized by if they are read-only or read-write. I could then integrate each as an individual sub-test into Forgejo-aneksajo's test suite and properly assert on the outcome of the test given the respective test setup. + +If that's not possible for some reason it would also be an improvement with regards to the first point if there was something like a `git annex testremote --write-only` with the option to only report success if all of those tests have failed. + +What do you think? + +[[!tag projects/INM7]]
diff --git a/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn b/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn index f1662dc3a0..2a52839629 100644 --- a/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn +++ b/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn @@ -7,11 +7,12 @@ I am wondering how git-annex could best fit into this flow. I would like to be a The fundamental issue seems to be that annexed objects always belong to the entire repository, and are not scoped to any branch. I've thought of these options so far: + - Provide a "per PR special remote" that the creator of the PR could push annexed files to. This would require the user to configure an additional remote, which the AGit-Flow tries to avoid for plain-git contributions. - A per-user special remote that is assumed to contain the annexed files for all of the users AGit-PRs. If git recognizes remote configs in the users' global git config then it could be possible to get away with configuring things once, but I am not sure of the behavior of git in that case. - Allow read-only users to have append-only access to the annex. This must at least be limited to secure hashes though, and there are implications of DoS by malicious users filling disk space / quotas. -Worth it to note that AGit-Flow already works for Contributors with write access, since they can write to the annex freely anyway. +Worth it to note that AGit-Flow already works for contributors with write access, since they can write to the annex freely anyway. Do you have any other ideas on how git-annex could be used in this workflow?
diff --git a/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn b/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn new file mode 100644 index 0000000000..f1662dc3a0 --- /dev/null +++ b/doc/forum/Git-annex_in___34__AGit-Flow__34__.mdwn @@ -0,0 +1,18 @@ +Forgejo supports ["AGit-Flow"](https://forgejo.org/docs/latest/user/agit-support/) to make pull requests without requiring a user to fork a repository first. This is achieved by having a sort of branch namespace `refs/for/<target-branch>/<topic>` which can be pushed to by users that only have read access to the repository. This will open a PR from this branch to the named target branch. + +There are efforts in upstream Forgejo to make this a more prominent alternative to forking for contributions: <https://codeberg.org/forgejo/discussions/issues/131>. + +I am wondering how git-annex could best fit into this flow. I would like to be able to create PRs containing annexed files on Forgejo-aneksajo in this way (tracking issue on the Forgejo-aneksajo side: <https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/32>). Obviously annexed objects copied to the Forgejo-aneksajo instance via this path should only be available in the context of that PR in some way. + +The fundamental issue seems to be that annexed objects always belong to the entire repository, and are not scoped to any branch. + +I've thought of these options so far: +- Provide a "per PR special remote" that the creator of the PR could push annexed files to. This would require the user to configure an additional remote, which the AGit-Flow tries to avoid for plain-git contributions. +- A per-user special remote that is assumed to contain the annexed files for all of the users AGit-PRs. If git recognizes remote configs in the users' global git config then it could be possible to get away with configuring things once, but I am not sure of the behavior of git in that case. +- Allow read-only users to have append-only access to the annex. This must at least be limited to secure hashes though, and there are implications of DoS by malicious users filling disk space / quotas. + +Worth it to note that AGit-Flow already works for Contributors with write access, since they can write to the annex freely anyway. + +Do you have any other ideas on how git-annex could be used in this workflow? + +[[!tag projects/INM7]]
diff --git a/doc/bugs/S3_fails_with_v4_signing.mdwn b/doc/bugs/S3_fails_with_v4_signing.mdwn
new file mode 100644
index 0000000000..4713d454fe
--- /dev/null
+++ b/doc/bugs/S3_fails_with_v4_signing.mdwn
@@ -0,0 +1,45 @@
+### Please describe the problem.
+
+As mentioned in various places on this wiki, git annex fails with S3 backends requiring v4 signatures, e.g. London, Frankfurt.
+
+Support v4 has apparently been merged in [upstream](https://github.com/aristidb/aws/pull/241).
+
+Would it be possible to migrate to v4 signing? I'd do the PR myself but my Haskell is currently non existent, sadly.
+
+
+### What steps will reproduce the problem?
+
+Interact with a git annex remote using v4.
+
+### What version of git-annex are you using? On what operating system?
+
+```shell
+git-annex version
+git-annex version: 10.20250630
+build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Servant Feeds Testsuite S3 WebDAV
+dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.8.4 http-client-0.7.19 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1
+key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X*
+remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask
+operating system: linux x86_64
+supported repository versions: 8 9 10
+upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
+local repository version: 10
+```
+
+On NixOs, although I *think* the same will happen anywhere.
+
+
+### Please provide any additional information below.
+
+[[!format sh """
+ git annex initremote s3 type=S3 bucket=thema-assembly-line region=EU datacenter=eu-west-2 encryption=none
+initremote s3 (checking bucket...) (creating bucket in eu-west-2...)
+git-annex: S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "InvalidRequest", s3ErrorMessage = "The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.", s3ErrorResource = Nothing, s3ErrorHostId = Just "Hi0geDlta/PbTTIhfzHvtGxcoWq14VWxp/y5RugFCDEext1aOw0wFBhRP8+jVkHHDqTvoWqCgcY=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
+failed
+initremote: 1 failed
+
+"""]]
+
+### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
+
+Yep. It's a great little tool; up till now always local network + rsync.
diff --git a/doc/bugs/Walrus_storage_backend.mdwn b/doc/bugs/Walrus_storage_backend.mdwn new file mode 100644 index 0000000000..c5b1168c15 --- /dev/null +++ b/doc/bugs/Walrus_storage_backend.mdwn @@ -0,0 +1,20 @@ +### Please describe the feature. + +Walrus is a new type of decentralized storage, which allows programmable ownership of blob data. +This fits perfectly to git-annex and allows to store huge amounts of blob data with 100% uptime and a good price economy. + +https://www.walrus.xyz/ + +The coordination layer is SUI, a decentralized global programmable object database. +Unfortunately, nobody implemented a git storage on SUI yet, so currently this can be seen a a normal blob storage. Long term, hosting git on SUI/walrus will create real decentralized git repos. + +As a nice bonus, when using walrus/sui, those, how use the git-annex default package, could pay a small fee to support the project. This would allow a steady income long term for the project. + +Since walrus is a storage backend and only guarantees that objects are available for epochs the underlying Storage object is reserved for (you buy a contingent ob blob storage for a duration of time). +This Storage object needs to be extended / managed. +The whole infrastructure, allow to build a decentralized annex cloud storage, where the user actually own his data and only storage payment etc is automated. Notifications etc. + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +I love git annex, works like a charm. Using it for 5+ years
Revert "update"
This reverts commit 48090579d91d92318d8b394c17b26b4a6af6ee69.
This reverts commit 48090579d91d92318d8b394c17b26b4a6af6ee69.
diff --git a/doc/thanks/list b/doc/thanks/list index dd52e76d21..dfeda7a813 100644 --- a/doc/thanks/list +++ b/doc/thanks/list @@ -126,4 +126,3 @@ Lilia.Nanne, Dusty Mabe, mpol, Andrew Poelstra, -~1877056,
update
diff --git a/doc/thanks/list b/doc/thanks/list index dfeda7a813..dd52e76d21 100644 --- a/doc/thanks/list +++ b/doc/thanks/list @@ -126,3 +126,4 @@ Lilia.Nanne, Dusty Mabe, mpol, Andrew Poelstra, +~1877056,
update
diff --git a/doc/thanks/list b/doc/thanks/list index 1f590c5572..dfeda7a813 100644 --- a/doc/thanks/list +++ b/doc/thanks/list @@ -125,3 +125,4 @@ oz, Lilia.Nanne, Dusty Mabe, mpol, +Andrew Poelstra,
much better
diff --git a/doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn b/doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn index 00048cb64e..5e06039132 100644 --- a/doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn +++ b/doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn @@ -1,72 +1,93 @@ -# Acquaintances: Sharing Files through Connected Projects +[[!meta author="Spencer"]] + +# Friends: Sharing Files through Connected Projects I often connect repos together during my scientific work, in which I like to use the [YODA (Datalad)](https://handbook.datalad.org/en/latest/_images/dataset_modules.svg) standard of connecting related projects via submodules. However, I've recently found that sometimes I have to connect an entire repo to, say, a paper just to use one resource. For the sake of provenance, this connection is essential, but it feels extremely inefficient and unscalable to have one repo filled with submodules just for individual files. -For these specific instances, I'm devising an alternative solution: acquaintance repos. +For these specific instances, I'm devising an alternative solution: friend repos. -## Acquaintances are Unrelated Repos +## Friends are Unrelated Repos -In general, an acquaintance is a repo whose *history* (branches, worktree, commits) is not relevant to the current repo, but is the origin for some files that the current repo uses. This is unlike *clones* (where everything is related), *parents/children* (where the entire child is derived or related to the parent, e.g. like superproject team repos and their children), or other [groups](https://git-annex.branchable.com/preferred_content/standard_groups/) defined by git-annex (archives, sources, etc.) +In general, a friend is a repo whose *history* (branches, worktree, commits) is not relevant to the current repo, but is the origin for some files that the current repo uses. This is unlike *clones* (where everything is related), *parents/children* (where the entire child is derived or related to the parent, e.g. like superproject team repos and their children), or other [groups](https://git-annex.branchable.com/preferred_content/standard_groups/) defined by git-annex (archives, sources, etc.) This definition requires upholding some technical details: -1. Acquaintances should **never sync**. This precludes defining them as normal git remotes unless you are very dilligent about undefining `remote.<name>.fetch` and setting `remote.<name>.sync=false` -1. Acquaintances don't need to know about *all* files in the acquaintance repo (neither in a git sense or annex sense), just the files used. Therefore `git annex filter-branch` is a bit overkill, but could be done manually via selecting exactly the keys needed. +1. Friends should **never sync**. This precludes defining them as normal git remotes unless you are very dilligent about undefining `remote.<name>.fetch` and setting `remote.<name>.sync=false` +1. Friends don't need to know about *all* files in the friend repo (neither their history (git) or key logs (annex)), they just the files they use. Therefore while `git annex filter-branch` could be used to filter for just the files needed, it is a bit overkill. ## Solution - A Special Remote with Custom Groups (`gx` is short for `git annex`) -Define a special repo that points to the primary storage location for the acquaintance repo. -I like to define it with a name like `acq.X` so it's obvious by inspection that it's an acquaintance. -Other metadata also tells you this (`gx group acq.X` will list `acquaintance`, or something could be added to the description), +Define a special repo that points to the primary storage location for the friend repo. +I like to define it with a name like `fri.X` so it's obvious by inspection that it's an friend. +Other metadata also tells you this (`gx group fri.X` will list `friend`, or something could be added to the description), but being in the name makes it clear especially for e.g. `gx list`. ### Depot: Primary Storage The depot is where a repo stores its *own* stuff. This prevents others' stuff from being duplicated into the referencing repo. -For those familiar with the `client` group, `depot`s are just clients with acquaintances replacing archives. +For those familiar with the `client` group, `depot`s are just clients with friends replacing archives. + +```bash +gx groupwanted depot "(include=* and (not (copies=friend:1))) or approxlackingcopies=1" +``` + +#### Client Replacement Version + +If you want to be able to use the assistant or archives, here's a version that can stand in for `client`: -`gx groupwanted depot "(include=* and (not (copies=acquaintance:1))) or approxlackingcopies=1"` +```bash +gx groupwanted depot "(include=* and ((exclude=*/archive/* and exclude=archive/*) or (not (copies=archive:1 or copies=smallarchive:1 or copies=friend:1)))) or approxlackingcopies=1" +``` -### Acquaintance +### Friend: Related Repos -The acquaintance is the source for stuff the current repo references. +The friend is the source for stuff the current repo references. Therefore, it doesn't need to be stored by the repo (i.e. in its depot) -`gx groupwanted acquaintance present` +```bash +gx groupwanted friend present +``` ### Finishing Up -To actually register where acquaintance files are, the ideal way is `gx fsck`. +To actually register where friend files are, the ideal way is `gx fsck`. This is better than e.g. `gx filter-branch` mentioned above because it's automatic. The default behavior of `fsck`, like other annex commands, is to check against files *in the current worktree*, so it will only populate the metadata for a special remote about the files the current repo is trained to care about. -`gx fsck -f acq.X -J 10` +```bash +gx fsck -f fri.X --fast -J 10 +``` -This may be a bit slow initially because it has to check each file in the worktree by seeking the remote, downloading known files, and verifying their hashes before they're registered as present in the new acquaintance. +Without `--fast`, the process will be slower as it verifies hashes by downloading files. In short the process involves: -1. For every external file desired by a repo: - 1. Copy the file (or a symlink) to the current repo and track it with annex - 1. Define a new special remote `acq.X` pointing to the depot/storage location for the file from the acquaintance repo. - 1. Assign the special remote with group `acquaintance` - 1. Assign any storage locations for the current remote with group `depot` - 1. Run `gx fsck -f acq.X` to populate the new special remote's contents relative to the current repo's worktree/branch - 1. Run `gx sync` if desired. The result should be files present in the current repo (if desired), and only in the acquaintance but not the depot(s). - 1. Now, the acquaintance acts as a link back to the origin for referenced files without duplication or having to add the entire acquaintance as a submodule! +1. For every repo that wants a friend: + 1. Define the group `friend` with its `groupwanted` rule (above for easy copying) + 1. Define the group `depot` with its `groupwanted` rule (above for easy copying) + 1. Set existing depots to use the `depot` group and have `groupwanted` as their `wanted` rule +1. For every friend: + 1. Define a new special remote `fri.X` pointing to the depot/storage location for friend repo. + 1. Assign the special remote with group `friend` and ensure it has `groupwanted` as its `wanted` rule +1. For every batch of files added from a friend: + 1. Copy the files (or symlinks) and track them with annex + 1. Run the `gx fsck` above to update the friend with the new files + 1. Run `gx sync` if desired. + 1. The result should be files present in the friend (and maybe the current), but not the depot(s). + 1. Now, the friend tells us where a file came from without having to add the entire friend as a submodule! ## FAQ/Open Questions -1. Is there a way to define the custom groups globally, or will I have to re-define special groups in every repo that uses acquaitances/depots? - 1. Not sure yet. I wonder where custom groups could be defined globally? Maybe in the user `.gitconfig`. +1. Is there a way to define the custom groups globally, or will I have to re-define special groups in every repo that uses friend/depots? + 1. Not sure yet. I wonder where custom groups could be defined globally? Maybe in the user `.gitconfig`. 1. Is there a way to get CLI autocomplete to suggest custom groups? - 1. Not sure yet. -1. Will this play well with standard groups and the assistant, especially if `client`s and `archive`s are used? - 1. Probably not, I don't use the assistant, but I suspect if one wanted to they'd have to define depots as clients with the acquantaince logic added instead of substituted for archives. + 1. I don't think there's support for this yet: only the standard groups are suggested in my zsh/omz setup. +1. Is this a replacement for Datalad datasets? + 1. I think of this as a tool to use alongside datasets. Datalad datasets are great when one project depends on the entirety of another (like a technical paper on an analysis) while this technique is better for collecting files from many projects under one umbrella (like a Thesis, which coincidentally, is what I'm developing this for). + 1. This also helps separate the ideas of storage (where files live) and referencing (how files are used). When I originally started using datasets, I had one special repo for each repo since I figured each repo has to have its own unique remote for git in whatever Github/Organization/Team the project belongs to anyway. Now, this is motivating me to consider how to rationally store contents for projects that share some commonality (a collaboration, an experimental phase, a taskforce, a super-repo as a parent). In this way, I can maintain a provenance record while minimizing the number of clones and remotes I need to maintain. -<!-- Work in progress! Feel free to leave comments like this if you have questions about the final idea once I finish it. --> <!-- Learning in Public: I've only just begun to use this for myself and am eliciting feedback and fleshing it out by describing it here (Feynmann Technique Style) -->
rename tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn to tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn
diff --git a/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn b/doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn similarity index 100% rename from doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn rename to doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn diff --git a/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment b/doc/tips/Friends_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment similarity index 100% rename from doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment rename to doc/tips/Friends_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment
comment
diff --git a/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment b/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment new file mode 100644 index 0000000000..f58b964892 --- /dev/null +++ b/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-10-06T13:31:33Z" + content=""" +Passing --fast to fsck will prevent it needing to download the files. +"""]]
new idea, work in progress
diff --git a/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn b/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn new file mode 100644 index 0000000000..00048cb64e --- /dev/null +++ b/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn @@ -0,0 +1,72 @@ +# Acquaintances: Sharing Files through Connected Projects + +I often connect repos together during my scientific work, in which I like to use the [YODA (Datalad)](https://handbook.datalad.org/en/latest/_images/dataset_modules.svg) standard of connecting related projects via submodules. However, I've recently found that sometimes I have to connect an entire repo to, say, a paper just to use one resource. For the sake of provenance, this connection is essential, but it feels extremely inefficient and unscalable to have one repo filled with submodules just for individual files. + +For these specific instances, I'm devising an alternative solution: acquaintance repos. + +## Acquaintances are Unrelated Repos + +In general, an acquaintance is a repo whose *history* (branches, worktree, commits) is not relevant to the current repo, but is the origin for some files that the current repo uses. This is unlike *clones* (where everything is related), *parents/children* (where the entire child is derived or related to the parent, e.g. like superproject team repos and their children), or other [groups](https://git-annex.branchable.com/preferred_content/standard_groups/) defined by git-annex (archives, sources, etc.) + +This definition requires upholding some technical details: + +1. Acquaintances should **never sync**. This precludes defining them as normal git remotes unless you are very dilligent about undefining `remote.<name>.fetch` and setting `remote.<name>.sync=false` +1. Acquaintances don't need to know about *all* files in the acquaintance repo (neither in a git sense or annex sense), just the files used. Therefore `git annex filter-branch` is a bit overkill, but could be done manually via selecting exactly the keys needed. + +## Solution - A Special Remote with Custom Groups + +(`gx` is short for `git annex`) + +Define a special repo that points to the primary storage location for the acquaintance repo. +I like to define it with a name like `acq.X` so it's obvious by inspection that it's an acquaintance. +Other metadata also tells you this (`gx group acq.X` will list `acquaintance`, or something could be added to the description), +but being in the name makes it clear especially for e.g. `gx list`. + +### Depot: Primary Storage + +The depot is where a repo stores its *own* stuff. +This prevents others' stuff from being duplicated into the referencing repo. +For those familiar with the `client` group, `depot`s are just clients with acquaintances replacing archives. + +`gx groupwanted depot "(include=* and (not (copies=acquaintance:1))) or approxlackingcopies=1"` + +### Acquaintance + +The acquaintance is the source for stuff the current repo references. +Therefore, it doesn't need to be stored by the repo (i.e. in its depot) + +`gx groupwanted acquaintance present` + +### Finishing Up + +To actually register where acquaintance files are, the ideal way is `gx fsck`. +This is better than e.g. `gx filter-branch` mentioned above because it's automatic. +The default behavior of `fsck`, like other annex commands, is to check against files *in the current worktree*, +so it will only populate the metadata for a special remote about the files the current repo is trained to care about. + +`gx fsck -f acq.X -J 10` + +This may be a bit slow initially because it has to check each file in the worktree by seeking the remote, downloading known files, and verifying their hashes before they're registered as present in the new acquaintance. + +In short the process involves: + +1. For every external file desired by a repo: + 1. Copy the file (or a symlink) to the current repo and track it with annex + 1. Define a new special remote `acq.X` pointing to the depot/storage location for the file from the acquaintance repo. + 1. Assign the special remote with group `acquaintance` + 1. Assign any storage locations for the current remote with group `depot` + 1. Run `gx fsck -f acq.X` to populate the new special remote's contents relative to the current repo's worktree/branch + 1. Run `gx sync` if desired. The result should be files present in the current repo (if desired), and only in the acquaintance but not the depot(s). + 1. Now, the acquaintance acts as a link back to the origin for referenced files without duplication or having to add the entire acquaintance as a submodule! + +## FAQ/Open Questions + +1. Is there a way to define the custom groups globally, or will I have to re-define special groups in every repo that uses acquaitances/depots? + 1. Not sure yet. I wonder where custom groups could be defined globally? Maybe in the user `.gitconfig`. +1. Is there a way to get CLI autocomplete to suggest custom groups? + 1. Not sure yet. +1. Will this play well with standard groups and the assistant, especially if `client`s and `archive`s are used? + 1. Probably not, I don't use the assistant, but I suspect if one wanted to they'd have to define depots as clients with the acquantaince logic added instead of substituted for archives. + +<!-- Work in progress! Feel free to leave comments like this if you have questions about the final idea once I finish it. --> +<!-- Learning in Public: I've only just begun to use this for myself and am eliciting feedback and fleshing it out by describing it here (Feynmann Technique Style) -->
Added a comment: My config works now
diff --git a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add/comment_1_132d155d5445745e5ee086370be48aad._comment b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add/comment_1_132d155d5445745e5ee086370be48aad._comment new file mode 100644 index 0000000000..2055ee769c --- /dev/null +++ b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add/comment_1_132d155d5445745e5ee086370be48aad._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="incogshift" + avatar="http://cdn.libravatar.org/avatar/fe527f5047693f6657cd03a6893da975" + subject="My config works now" + date="2025-10-04T08:05:00Z" + content=""" +I have `.gitattributes`: + +``` +* annex.largefiles=nothing filter=annex +*.pdf annex.largefiles=anything filter=annex +``` + +and git config: + +``` +[annex] + gitaddtoannex = true +``` + +Using `git add` now adds it to annex. This can be confirmed with + +``` +git annex info file.pdf +``` + +The output should show `present = true` at the end. If it wasn't added to annex, the output would show `fatal: Not a valid object name file.pdf`. + +And it seems that, by default, the files are stored in the working tree in their unlocked state. So `git add` doesn't replace the file with a symlink unlike `git annex add` +"""]]
diff --git a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn index 27c94248c1..128341f4ca 100644 --- a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn +++ b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn @@ -14,3 +14,18 @@ My config is the one below: *.pptx annex.largefiles=anything *.docx annex.largefiles=anything ``` + +I'm using NixOS. My git annex version info is below: + +``` +git annex version +git-annex version: 10.20250630 +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Servant Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.8.4 http-client-0.7.19 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +```
diff --git a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn index c43d41edda..27c94248c1 100644 --- a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn +++ b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn @@ -1,4 +1,4 @@ -I set up `annex.largefiles` in my global `.gitattributes` config. But git add doesn't add the defined large files to annex. But git annex works with large files and small as intended. +I set up `annex.largefiles` in my global `.gitattributes` config. But git add doesn't add the defined large files to annex. But git annex works with large and small files as intended. My config is the one below:
diff --git a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn index b357fcf5a7..c43d41edda 100644 --- a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn +++ b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn @@ -1,7 +1,8 @@ -I set up annex.largefiles in my global .gitattributes config. But git add doesn't add the defined large files to annex. But git annex works with large files and small as intended. +I set up `annex.largefiles` in my global `.gitattributes` config. But git add doesn't add the defined large files to annex. But git annex works with large files and small as intended. My config is the one below: +``` * annex.largefiles=nothing *.pdf annex.largefiles=anything *.mp4 annex.largefiles=anything @@ -12,3 +13,4 @@ My config is the one below: *.DOC annex.largefiles=anything *.pptx annex.largefiles=anything *.docx annex.largefiles=anything +```
diff --git a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn new file mode 100644 index 0000000000..b357fcf5a7 --- /dev/null +++ b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn @@ -0,0 +1,14 @@ +I set up annex.largefiles in my global .gitattributes config. But git add doesn't add the defined large files to annex. But git annex works with large files and small as intended. + +My config is the one below: + +* annex.largefiles=nothing +*.pdf annex.largefiles=anything +*.mp4 annex.largefiles=anything +*.mp3 annex.largefiles=anything +*.mkv annex.largefiles=anything +*.odt annex.largefiles=anything +*.wav annex.largefiles=anything +*.DOC annex.largefiles=anything +*.pptx annex.largefiles=anything +*.docx annex.largefiles=anything
comment
diff --git a/doc/todo/very_confusing_name_annex.assistant.allowunlocked/comment_1_b7ad0090e29776c61babbc7bf0ccd684._comment b/doc/todo/very_confusing_name_annex.assistant.allowunlocked/comment_1_b7ad0090e29776c61babbc7bf0ccd684._comment new file mode 100644 index 0000000000..73e3afccec --- /dev/null +++ b/doc/todo/very_confusing_name_annex.assistant.allowunlocked/comment_1_b7ad0090e29776c61babbc7bf0ccd684._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-10-02T17:32:52Z" + content=""" +I think that "annex.assistant.allowlocked" would be as confusing, like you +say the user would then have to RTFM to realize that they need to use +annex.addunlocked to configure it, and that it doesn't cause files to be +locked by default. + +To me, "treataddunlocked" is vague. Treat it as what? +"allowaddunlocked" would be less vague since it does get the (full) +name of the other config in there, so says it's allowing use of +the other config. + +I agree this is a confusing name, and I wouldn't mind changing it, but I +don't think it warrants an entire release to do that. So there would be +perhaps a month for people to start using the current name. If this had +come up in the 2 weeks between implementation and release I would have +changed it, but at this point it starts to need a backwards compatability +transition to change it, and I don't know if the minor improvement of +"allowaddunlocked" is worth that. +"""]]
Added a comment
diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_2_33575c4a6477e3384a16533ff8b258ee._comment b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_2_33575c4a6477e3384a16533ff8b258ee._comment new file mode 100644 index 0000000000..54f8af3378 --- /dev/null +++ b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_2_33575c4a6477e3384a16533ff8b258ee._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="caleb@2b0d6f0eabf955cc8fd04c634b09f0ca4aad9233" + nickname="caleb" + avatar="http://cdn.libravatar.org/avatar/1d84382865c6c3378c04a35348fdfa07" + subject="comment 2" + date="2025-10-01T22:15:14Z" + content=""" +Thank you for the fix, that built just fine and I've successfully bumped the Arch Linux package to 20250929. +"""]]
Added a comment
diff --git a/doc/todo/import_tree_from_rsync_special_remote/comment_8_b545d29519e57fbc2d563ce6d9aafdb7._comment b/doc/todo/import_tree_from_rsync_special_remote/comment_8_b545d29519e57fbc2d563ce6d9aafdb7._comment new file mode 100644 index 0000000000..268f8cab57 --- /dev/null +++ b/doc/todo/import_tree_from_rsync_special_remote/comment_8_b545d29519e57fbc2d563ce6d9aafdb7._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 8" + date="2025-10-01T20:18:43Z" + content=""" +FTR: apparently sshfs is based on sftp and that one provides no means to access original inode. Not yet sure on what then it could say about stability of inode across remounts/as whether sensible to rely on it. Useful ref with pointers [sshfs/issues/109#issuecomment-2755824670](https://github.com/libfuse/sshfs/issues/109#issuecomment-2755824670) +"""]]
complaining about choice of variable
diff --git a/doc/todo/very_confusing_name_annex.assistant.allowunlocked.mdwn b/doc/todo/very_confusing_name_annex.assistant.allowunlocked.mdwn new file mode 100644 index 0000000000..e58d392424 --- /dev/null +++ b/doc/todo/very_confusing_name_annex.assistant.allowunlocked.mdwn @@ -0,0 +1,8 @@ +Thank you for addressing that [todo](https://git-annex.branchable.com/todo/allow_configuring_assistant_to_add_files_locked/)! + +But I must say though that the choice of `annex.assistant.allowunlocked` is very confusing! Without careful RTFM it suggests that by default assistant **does not** `allowunlocked`, thus using `locked` and thus to the **opposite** effect of the default behavior. + +Since really it instructs assistant to consider `addunlocked`, then I would have named it like `treataddunlocked` or alike. +Or the smallest change to make it semantically sensible would have been to remove `un` from it and make `annex.assistant.allowlocked` thus allowing for `locked` files in general, which would then in reality (after RTFM) mean using `addunlocked` config. + +Just wanted to check if you stick to current choice before I start making use of it!
comment
diff --git a/doc/todo/import_tree_from_rsync_special_remote/comment_7_9716fc56ccfb622c964a64b37c1c5fdc._comment b/doc/todo/import_tree_from_rsync_special_remote/comment_7_9716fc56ccfb622c964a64b37c1c5fdc._comment new file mode 100644 index 0000000000..1036bbd920 --- /dev/null +++ b/doc/todo/import_tree_from_rsync_special_remote/comment_7_9716fc56ccfb622c964a64b37c1c5fdc._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2025-10-01T15:35:53Z" + content=""" +I wonder how sshfs manages stable inodes that differ from the actual ones? +But if it's really reliably stable, it would be ok to use it with the +directory special remote. + +Extending the external special remote interface to support +[import](https://git-annex.branchable.com/design/external_special_remote_protocol/export_and_import_appendix/#index1h2) +would let you roll your own special remote, that could use ssh with +rsync or whatever. + +The current design for that tries to support both import and export, but +noone has yet stepped up to the plate to try to implement a special remote +that supports both safely. So I am leaning toward thinking that it would be +a good idea to make the external special remote interface support *only* +import (or export) for a given external special remote, but not both. + +Then would become pretty easy to make your own special remote that +implements import only. Using whatever ssh commands make sense for the +server. +"""]] diff --git a/doc/todo/importtree_only_remotes.mdwn b/doc/todo/importtree_only_remotes.mdwn index 8f140c9450..2f9174b670 100644 --- a/doc/todo/importtree_only_remotes.mdwn +++ b/doc/todo/importtree_only_remotes.mdwn @@ -32,7 +32,9 @@ the wrong content. (So the remote should have retrievalSecurityPolicy = RetrievalVerifiableKeysSecure to make downloads be verified well enough.) I said this would not use a ContentIdentifier, but it seems it needs some -simple form of ContentIdentifier, which could be just an mtime. +simple form of ContentIdentifier, which could be just an mtime +(but mtime or mtime+size is not able to detect swaps of 2 files that share +both; using inode or something like that is better). Without any ContentIdentifier, it seems that each time `git annex import --from remote` is run, it would need to re-download all files from the remote, because it would have no way of knowing
followup
diff --git a/doc/forum/Is_there_a_way_to_have_assistant_add_files_locked__63__/comment_11_48d03d7cc1a5e007d3d06d9753d467ff._comment b/doc/forum/Is_there_a_way_to_have_assistant_add_files_locked__63__/comment_11_48d03d7cc1a5e007d3d06d9753d467ff._comment new file mode 100644 index 0000000000..5e8d14000d --- /dev/null +++ b/doc/forum/Is_there_a_way_to_have_assistant_add_files_locked__63__/comment_11_48d03d7cc1a5e007d3d06d9753d467ff._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2025-10-01T15:32:45Z" + content=""" +This did get implemented, `git config annex.assistant.allowunlocked true` +and that will make it use your `annex.addunlocked` setting. +"""]]
Added a comment
diff --git a/doc/todo/import_tree_from_rsync_special_remote/comment_6_abc34860aed11d274a91d3134b6a7040._comment b/doc/todo/import_tree_from_rsync_special_remote/comment_6_abc34860aed11d274a91d3134b6a7040._comment
new file mode 100644
index 0000000000..da3609a6fd
--- /dev/null
+++ b/doc/todo/import_tree_from_rsync_special_remote/comment_6_abc34860aed11d274a91d3134b6a7040._comment
@@ -0,0 +1,34 @@
+[[!comment format=mdwn
+ username="yarikoptic"
+ avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
+ subject="comment 6"
+ date="2025-10-01T13:09:00Z"
+ content="""
+quick check -- according to `ls` - original inodes are not mapped but some are given and persist across remounts:
+
+```
+❯ ls -li /tmp/glances-root.log ~/.emacs ~/20250807-15forzabava.pdf
+ 132280 lrwxrwxrwx 1 yoh yoh 17 Nov 11 2014 /home/yoh/.emacs -> .etc/emacs/.emacs
+152278557 -rw-rw-r-- 1 yoh yoh 207101 Aug 7 10:30 /home/yoh/20250807-15forzabava.pdf
+ 34 -rw-r--r-- 1 root root 1165 Oct 1 08:43 /tmp/glances-root.log
+
+❯ sshfs localhost:/ /tmp/localhost
+
+❯ ls -li /tmp/localhost{/tmp/glances-root.log,/home/yoh/{.emacs,20250807-15forzabava.pdf}}
+ 6 lrwxrwxrwx 1 yoh yoh 17 Nov 11 2014 /tmp/localhost/home/yoh/.emacs -> .etc/emacs/.emacs
+10 -rw-rw-r-- 1 yoh yoh 207101 Aug 7 10:30 /tmp/localhost/home/yoh/20250807-15forzabava.pdf
+ 3 -rw-r--r-- 1 root root 1165 Oct 1 08:43 /tmp/localhost/tmp/glances-root.log
+
+❯ fusermount -u /tmp/localhost
+
+❯ sshfs localhost:/ /tmp/localhost
+
+❯ ls -li /tmp/localhost{/tmp/glances-root.log,/home/yoh/{.emacs,20250807-15forzabava.pdf}}
+ 6 lrwxrwxrwx 1 yoh yoh 17 Nov 11 2014 /tmp/localhost/home/yoh/.emacs -> .etc/emacs/.emacs
+10 -rw-rw-r-- 1 yoh yoh 207101 Aug 7 10:30 /tmp/localhost/home/yoh/20250807-15forzabava.pdf
+ 3 -rw-r--r-- 1 root root 1165 Oct 1 08:43 /tmp/localhost/tmp/glances-root.log
+
+```
+
+ok, if not `sshfs` and not `rsync` -- any other way you see? e.g. could it be easily setup for some `git` with ssh URL type \"special\" remote? ;-)
+"""]]
comments
diff --git a/doc/todo/Recent_remote_activities/comment_4_766ce3ab6c4ff368ec8e06e6c6f6aa8e._comment b/doc/todo/Recent_remote_activities/comment_4_766ce3ab6c4ff368ec8e06e6c6f6aa8e._comment new file mode 100644 index 0000000000..7ca4788d07 --- /dev/null +++ b/doc/todo/Recent_remote_activities/comment_4_766ce3ab6c4ff368ec8e06e6c6f6aa8e._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="joey" + subject="""git-annex activity""" + date="2025-09-30T14:29:54Z" + content=""" +Copying a related idea from @nobodyinperson on [[todo/remove_webapp]]: + +Furthermore, a command like `git annex activity` that goes arbitrarily far back in time and statically (non-live) lists recent activities like: + +- yesterday 23:32: remote1 downloaded 5 files (45MB) +- today 10:45: you modified file `document.txt` (10MB) +- today 10:46: you uploaded file `document.txt` (from today 10:45) to remote1, remote2 and remote3 +- today 12:35: Fred McGitFace modified file `document.txt` (12MB) and uploaded to remote2 +- ... + +Basically a human-readable (or as JSON), chronological log of things that happened in the repo. This is a superpower of git-annex: all this information is available as far back as one wants, we just don't have a way to access it nicely. `git log` and `git annex log` exist, but they are too specific, too broad or a bit hard to parse on their own. For example: + +- `git annex activity --since=\"2 weeks ago\" --include='*.doc'` would list things (who committed, which remote received it, etc.) that happened in the last two weeks to *.doc files +- `git annex activity --only-annex --in=remote2` would list recent annex operations (in the `git-annex` branch only) of remote2 +- `git annex activity --only-changes --largerthan=10MB` would list recent file changes (additions, modifications, deletions, etc., in `git log` only) + +This `git annex assistant-log` and `git annex activity` would be a very nice feature to showcase git-annex's power (which other file syncing tool can to this? 🤔) and also solve [[todo/Recent_remote_activities]]. +"""]] diff --git a/doc/todo/Recent_remote_activities/comment_5_1f4f43b32af276ef3b3db54fc2cb33f7._comment b/doc/todo/Recent_remote_activities/comment_5_1f4f43b32af276ef3b3db54fc2cb33f7._comment new file mode 100644 index 0000000000..ca7d2061b6 --- /dev/null +++ b/doc/todo/Recent_remote_activities/comment_5_1f4f43b32af276ef3b3db54fc2cb33f7._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-09-30T14:31:59Z" + content=""" +A `git-annex activity` (or `git-annex log`) could also optionally stream live +activity as it is happening. Eg, when a transfer is started it could display +the start, and then later the end. That would be easy to build with what's +in git-annex already. The assistant already uses the transfer logs that way, +using inotify to notice changes. +"""]] diff --git a/doc/todo/Recent_remote_activities/comment_6_9e686c20ccd2c81f72f479441ca57698._comment b/doc/todo/Recent_remote_activities/comment_6_9e686c20ccd2c81f72f479441ca57698._comment new file mode 100644 index 0000000000..7f06dd5337 --- /dev/null +++ b/doc/todo/Recent_remote_activities/comment_6_9e686c20ccd2c81f72f479441ca57698._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: git-annex activity""" + date="2025-09-30T14:34:50Z" + content=""" +> `git annex activity --since="2 weeks ago" --include='*.doc' + +This is essentially the same as `git-annex log` with a path. It also +supports --since and --json. The difference I guess is the idea to also +include information about git commits of the files, not only git-annex +location changes. That would complicate the output, and apparently +`git-annex log`'s output is too hard to parse already. So a design for a +better output would be needed. + +> `git annex activity --only-annex --in=remote2` + +This is the same as `git-annex log --all` with the output filtered to only +list a given remote. (`--in` does not influence `--all` currently). + +> `git annex activity --only-changes --largerthan=10MB` + +Can probably be accomplished with `git log` with some +-S regexp. +"""]] diff --git a/doc/todo/remove_webapp/comment_4_d80ec1b3534ffa514df926925a0105f7._comment b/doc/todo/remove_webapp/comment_4_d80ec1b3534ffa514df926925a0105f7._comment new file mode 100644 index 0000000000..ec4a8b0ae1 --- /dev/null +++ b/doc/todo/remove_webapp/comment_4_d80ec1b3534ffa514df926925a0105f7._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-09-30T14:22:09Z" + content=""" +git-annex does support desktop notifications of file uploads/downloads, +via --notify-start and --notify-finish. (When built with dbus support.) +That can be used with the assistant w/o webapp to keep a desktop user +informed about what is going on. +"""]] diff --git a/doc/todo/remove_webapp/comment_5_75c22d9f3a84c259084468c03f5735bb._comment b/doc/todo/remove_webapp/comment_5_75c22d9f3a84c259084468c03f5735bb._comment new file mode 100644 index 0000000000..f1225bb40d --- /dev/null +++ b/doc/todo/remove_webapp/comment_5_75c22d9f3a84c259084468c03f5735bb._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-09-30T14:55:25Z" + content=""" +I've copied the `git-annex activity` idea over to +[[todo/Recent_remote_activities]] so it doesn't get lost. + +I don't think it makes sense to make that a blocker for removing the webapp +though. That would only let an advanced user build some kind of activity +display, doesn't address the needs of most users of the webapp. +"""]]
Added a comment: Fixed in 20050929
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_7_4d6559666e8b53957ed93ffa5928cb00._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_7_4d6559666e8b53957ed93ffa5928cb00._comment
new file mode 100644
index 0000000000..00264d5eba
--- /dev/null
+++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_7_4d6559666e8b53957ed93ffa5928cb00._comment
@@ -0,0 +1,26 @@
+[[!comment format=mdwn
+ username="ewen"
+ avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
+ subject="Fixed in 20050929"
+ date="2025-09-29T21:54:14Z"
+ content="""
+Thanks for the very quick turn around on a new release!
+
+Conveniently HomeBrew also turned around building the new release quickly (I suspect it might be one of the packages in their CI for auto upgrade now), so I've been able to test the HomeBrew build of 20050929.
+
+20250929 seems to be working correctly to download podcast feeds, parse them, and download the media attachments as before.
+
+Ewen
+
+PS: Test example below. But also worked for my regular podcast downloads, which were failing with 20250926.
+
+```
+ewen@basadi:/tmp/retest$ TEMPLATE='archive/${feedtitle}/${itemtitle}${extension}'
+ewen@basadi:/tmp/retest$ git annex importfeed --relaxed --template=\"${TEMPLATE}\" \"https://risky.biz/feeds/risky-business\"
+importfeed gathering known urls ok
+importfeed https://risky.biz/feeds/risky-business (\"Risky Business\") ok
+addurl https://dts.podtrac.com/redirect.mp3/media3.risky.biz/RB808.mp3 (to archive/Risky_Business/Risky_Business__808_--_Insane_megabug_in_Entra_left_all_tenants_exposed.mp3) ok
+addurl https://dts.podtrac.com/redirect.mp3/media3.risky.biz/RB807.mp3 (to archive/Risky_Business/Risky_Business__807_--_Shai-Hulud_npm_worm_wreaks_old-school_havoc.mp3) ok
+...
+```
+"""]]
Revert "webapp: Remove support for local pairing"
This reverts commit 8ea6d7acc548cb35b4905c9c663e8a7de66ac752.
Temporarily, until builds finish for today's release.
This reverts commit 8ea6d7acc548cb35b4905c9c663e8a7de66ac752.
Temporarily, until builds finish for today's release.
diff --git a/Assistant.hs b/Assistant.hs
index cd81895861..911ebd33d3 100644
--- a/Assistant.hs
+++ b/Assistant.hs
@@ -40,6 +40,9 @@ import Assistant.Threads.Glacier
#ifdef WITH_WEBAPP
import Assistant.WebApp
import Assistant.Threads.WebApp
+#ifdef WITH_PAIRING
+import Assistant.Threads.PairListener
+#endif
#else
import Assistant.Types.UrlRenderer
#endif
@@ -152,6 +155,11 @@ startDaemon assistant foreground startdelay cannotrun listenhost listenport star
then webappthread
else webappthread ++
[ watch commitThread
+#ifdef WITH_WEBAPP
+#ifdef WITH_PAIRING
+ , assist $ pairListenerThread urlrenderer
+#endif
+#endif
, assist pushThread
, assist pushRetryThread
, assist exportThread
diff --git a/Assistant/Pairing/MakeRemote.hs b/Assistant/Pairing/MakeRemote.hs
new file mode 100644
index 0000000000..f4468bc07c
--- /dev/null
+++ b/Assistant/Pairing/MakeRemote.hs
@@ -0,0 +1,98 @@
+{- git-annex assistant pairing remote creation
+ -
+ - Copyright 2012 Joey Hess <id@joeyh.name>
+ -
+ - Licensed under the GNU AGPL version 3 or higher.
+ -}
+
+module Assistant.Pairing.MakeRemote where
+
+import Assistant.Common
+import Assistant.Ssh
+import Assistant.Pairing
+import Assistant.Pairing.Network
+import Assistant.MakeRemote
+import Assistant.Sync
+import Config.Cost
+import Config
+import qualified Types.Remote as Remote
+
+import Network.Socket
+import qualified Data.Text as T
+
+{- Authorized keys are set up before pairing is complete, so that the other
+ - side can immediately begin syncing. -}
+setupAuthorizedKeys :: PairMsg -> OsPath -> IO ()
+setupAuthorizedKeys msg repodir = case validateSshPubKey $ remoteSshPubKey $ pairMsgData msg of
+ Left err -> giveup err
+ Right pubkey -> do
+ absdir <- absPath repodir
+ unlessM (liftIO $ addAuthorizedKeys True absdir pubkey) $
+ giveup "failed setting up ssh authorized keys"
+
+{- When local pairing is complete, this is used to set up the remote for
+ - the host we paired with. -}
+finishedLocalPairing :: PairMsg -> SshKeyPair -> Assistant ()
+finishedLocalPairing msg keypair = do
+ sshdata <- liftIO $ installSshKeyPair keypair =<< pairMsgToSshData msg
+ {- Ensure that we know the ssh host key for the host we paired with.
+ - If we don't, ssh over to get it. -}
+ liftIO $ unlessM (knownHost $ sshHostName sshdata) $
+ void $ sshTranscript
+ [ sshOpt "StrictHostKeyChecking" "no"
+ , sshOpt "NumberOfPasswordPrompts" "0"
+ , "-n"
+ ]
+ (genSshHost (sshHostName sshdata) (sshUserName sshdata))
+ ("git-annex-shell -c configlist " ++ T.unpack (sshDirectory sshdata))
+ Nothing
+ r <- liftAnnex $ addRemote $ makeSshRemote sshdata
+ repo <- liftAnnex $ Remote.getRepo r
+ liftAnnex $ setRemoteCost repo semiExpensiveRemoteCost
+ syncRemote r
+
+{- Mostly a straightforward conversion. Except:
+ - * Determine the best hostname to use to contact the host.
+ - * Strip leading ~/ from the directory name.
+ -}
+pairMsgToSshData :: PairMsg -> IO SshData
+pairMsgToSshData msg = do
+ let d = pairMsgData msg
+ hostname <- liftIO $ bestHostName msg
+ let dir = case remoteDirectory d of
+ ('~':'/':v) -> v
+ v -> v
+ return SshData
+ { sshHostName = T.pack hostname
+ , sshUserName = Just (T.pack $ remoteUserName d)
+ , sshDirectory = T.pack dir
+ , sshRepoName = genSshRepoName hostname (toOsPath dir)
+ , sshPort = 22
+ , needsPubKey = True
+ , sshCapabilities = [GitAnnexShellCapable, GitCapable, RsyncCapable]
+ , sshRepoUrl = Nothing
+ }
+
+{- Finds the best hostname to use for the host that sent the PairMsg.
+ -
+ - If remoteHostName is set, tries to use a .local address based on it.
+ - That's the most robust, if this system supports .local.
+ - Otherwise, looks up the hostname in the DNS for the remoteAddress,
+ - if any. May fall back to remoteAddress if there's no DNS. Ugh. -}
+bestHostName :: PairMsg -> IO HostName
+bestHostName msg = case remoteHostName $ pairMsgData msg of
+ Just h -> do
+ let localname = h ++ ".local"
+ addrs <- catchDefaultIO [] $
+ getAddrInfo Nothing (Just localname) Nothing
+ maybe fallback (const $ return localname) (headMaybe addrs)
+ Nothing -> fallback
+ where
+ fallback = do
+ let a = pairMsgAddr msg
+ let sockaddr = case a of
+ IPv4Addr addr -> SockAddrInet (fromInteger 0) addr
+ IPv6Addr addr -> SockAddrInet6 (fromInteger 0) 0 addr 0
+ fromMaybe (showAddr a)
+ <$> catchDefaultIO Nothing
+ (fst <$> getNameInfo [] True False sockaddr)
diff --git a/Assistant/Pairing/Network.hs b/Assistant/Pairing/Network.hs
new file mode 100644
index 0000000000..62a4ea02e8
--- /dev/null
+++ b/Assistant/Pairing/Network.hs
@@ -0,0 +1,132 @@
+{- git-annex assistant pairing network code
+ -
+ - All network traffic is sent over multicast UDP. For reliability,
+ - each message is repeated until acknowledged. This is done using a
+ - thread, that gets stopped before the next message is sent.
+ -
+ - Copyright 2012 Joey Hess <id@joeyh.name>
+ -
+ - Licensed under the GNU AGPL version 3 or higher.
+ -}
+
+module Assistant.Pairing.Network where
+
+import Assistant.Common
+import Assistant.Pairing
+import Assistant.DaemonStatus
+import Utility.ThreadScheduler
+import Utility.Verifiable
+
+import Network.Multicast
+import Network.Info
+import Network.Socket
+import qualified Network.Socket.ByteString as B
+import qualified Data.ByteString.UTF8 as BU8
+import qualified Data.Map as M
+import Control.Concurrent
+
+{- This is an arbitrary port in the dynamic port range, that could
+ - conceivably be used for some other broadcast messages.
+ - If so, hope they ignore the garbage from us; we'll certainly
+ - ignore garbage from them. Wild wild west. -}
+pairingPort :: PortNumber
+pairingPort = 55556
+
+{- Goal: Reach all hosts on the same network segment.
+ - Method: Use same address that avahi uses. Other broadcast addresses seem
+ - to not be let through some routers. -}
+multicastAddress :: AddrClass -> HostName
+multicastAddress IPv4AddrClass = "224.0.0.251"
+multicastAddress IPv6AddrClass = "ff02::fb"
+
+{- Multicasts a message repeatedly on all interfaces, with a 2 second
+ - delay between each transmission. The message is repeated forever
+ - unless a number of repeats is specified.
+ -
+ - The remoteHostAddress is set to the interface's IP address.
+ -
+ - Note that new sockets are opened each time. This is hardly efficient,
+ - but it allows new network interfaces to be used as they come up.
+ - On the other hand, the expensive DNS lookups are cached.
+ -}
+multicastPairMsg :: Maybe Int -> Secret -> PairData -> PairStage -> IO ()
+multicastPairMsg repeats secret pairdata stage = go M.empty repeats
+ where
+ go _ (Just 0) = noop
+ go cache n = do
+ addrs <- activeNetworkAddresses
+ let cache' = updatecache cache addrs
+ mapM_ (sendinterface cache') addrs
+ threadDelaySeconds (Seconds 2)
+ go cache' $ pred <$> n
+ {- The multicast library currently chokes on ipv6 addresses. -}
+ sendinterface _ (IPv6Addr _) = noop
+ sendinterface cache i = void $ tryIO $
(Diff truncated)
webapp: Remove support for local pairing
As a feature only supported by the webapp, and not by git-annex at the
command line, this is by now a very obscure corner of git-annex, and not
one I want to keep maintaining.
It's worth removing it to avoid the security expsure alone. People using
the assistant w/o the webapp probably don't expect it to be listening on
a UDP port for a handrolled protocol, but it was.
The webapp has supported pairing via magic-wormhole since 2016, which
makes a link including between local computers, albeit with the overhead
of tor. That sort of covers the same use case. Of course advanced users
can easily enough add a ssh remote to their repository themselves, using
a hostname on the local network.
Sponsored-by: unqueued
As a feature only supported by the webapp, and not by git-annex at the
command line, this is by now a very obscure corner of git-annex, and not
one I want to keep maintaining.
It's worth removing it to avoid the security expsure alone. People using
the assistant w/o the webapp probably don't expect it to be listening on
a UDP port for a handrolled protocol, but it was.
The webapp has supported pairing via magic-wormhole since 2016, which
makes a link including between local computers, albeit with the overhead
of tor. That sort of covers the same use case. Of course advanced users
can easily enough add a ssh remote to their repository themselves, using
a hostname on the local network.
Sponsored-by: unqueued
diff --git a/Assistant.hs b/Assistant.hs
index 911ebd33d3..cd81895861 100644
--- a/Assistant.hs
+++ b/Assistant.hs
@@ -40,9 +40,6 @@ import Assistant.Threads.Glacier
#ifdef WITH_WEBAPP
import Assistant.WebApp
import Assistant.Threads.WebApp
-#ifdef WITH_PAIRING
-import Assistant.Threads.PairListener
-#endif
#else
import Assistant.Types.UrlRenderer
#endif
@@ -155,11 +152,6 @@ startDaemon assistant foreground startdelay cannotrun listenhost listenport star
then webappthread
else webappthread ++
[ watch commitThread
-#ifdef WITH_WEBAPP
-#ifdef WITH_PAIRING
- , assist $ pairListenerThread urlrenderer
-#endif
-#endif
, assist pushThread
, assist pushRetryThread
, assist exportThread
diff --git a/Assistant/Pairing/MakeRemote.hs b/Assistant/Pairing/MakeRemote.hs
deleted file mode 100644
index f4468bc07c..0000000000
--- a/Assistant/Pairing/MakeRemote.hs
+++ /dev/null
@@ -1,98 +0,0 @@
-{- git-annex assistant pairing remote creation
- -
- - Copyright 2012 Joey Hess <id@joeyh.name>
- -
- - Licensed under the GNU AGPL version 3 or higher.
- -}
-
-module Assistant.Pairing.MakeRemote where
-
-import Assistant.Common
-import Assistant.Ssh
-import Assistant.Pairing
-import Assistant.Pairing.Network
-import Assistant.MakeRemote
-import Assistant.Sync
-import Config.Cost
-import Config
-import qualified Types.Remote as Remote
-
-import Network.Socket
-import qualified Data.Text as T
-
-{- Authorized keys are set up before pairing is complete, so that the other
- - side can immediately begin syncing. -}
-setupAuthorizedKeys :: PairMsg -> OsPath -> IO ()
-setupAuthorizedKeys msg repodir = case validateSshPubKey $ remoteSshPubKey $ pairMsgData msg of
- Left err -> giveup err
- Right pubkey -> do
- absdir <- absPath repodir
- unlessM (liftIO $ addAuthorizedKeys True absdir pubkey) $
- giveup "failed setting up ssh authorized keys"
-
-{- When local pairing is complete, this is used to set up the remote for
- - the host we paired with. -}
-finishedLocalPairing :: PairMsg -> SshKeyPair -> Assistant ()
-finishedLocalPairing msg keypair = do
- sshdata <- liftIO $ installSshKeyPair keypair =<< pairMsgToSshData msg
- {- Ensure that we know the ssh host key for the host we paired with.
- - If we don't, ssh over to get it. -}
- liftIO $ unlessM (knownHost $ sshHostName sshdata) $
- void $ sshTranscript
- [ sshOpt "StrictHostKeyChecking" "no"
- , sshOpt "NumberOfPasswordPrompts" "0"
- , "-n"
- ]
- (genSshHost (sshHostName sshdata) (sshUserName sshdata))
- ("git-annex-shell -c configlist " ++ T.unpack (sshDirectory sshdata))
- Nothing
- r <- liftAnnex $ addRemote $ makeSshRemote sshdata
- repo <- liftAnnex $ Remote.getRepo r
- liftAnnex $ setRemoteCost repo semiExpensiveRemoteCost
- syncRemote r
-
-{- Mostly a straightforward conversion. Except:
- - * Determine the best hostname to use to contact the host.
- - * Strip leading ~/ from the directory name.
- -}
-pairMsgToSshData :: PairMsg -> IO SshData
-pairMsgToSshData msg = do
- let d = pairMsgData msg
- hostname <- liftIO $ bestHostName msg
- let dir = case remoteDirectory d of
- ('~':'/':v) -> v
- v -> v
- return SshData
- { sshHostName = T.pack hostname
- , sshUserName = Just (T.pack $ remoteUserName d)
- , sshDirectory = T.pack dir
- , sshRepoName = genSshRepoName hostname (toOsPath dir)
- , sshPort = 22
- , needsPubKey = True
- , sshCapabilities = [GitAnnexShellCapable, GitCapable, RsyncCapable]
- , sshRepoUrl = Nothing
- }
-
-{- Finds the best hostname to use for the host that sent the PairMsg.
- -
- - If remoteHostName is set, tries to use a .local address based on it.
- - That's the most robust, if this system supports .local.
- - Otherwise, looks up the hostname in the DNS for the remoteAddress,
- - if any. May fall back to remoteAddress if there's no DNS. Ugh. -}
-bestHostName :: PairMsg -> IO HostName
-bestHostName msg = case remoteHostName $ pairMsgData msg of
- Just h -> do
- let localname = h ++ ".local"
- addrs <- catchDefaultIO [] $
- getAddrInfo Nothing (Just localname) Nothing
- maybe fallback (const $ return localname) (headMaybe addrs)
- Nothing -> fallback
- where
- fallback = do
- let a = pairMsgAddr msg
- let sockaddr = case a of
- IPv4Addr addr -> SockAddrInet (fromInteger 0) addr
- IPv6Addr addr -> SockAddrInet6 (fromInteger 0) 0 addr 0
- fromMaybe (showAddr a)
- <$> catchDefaultIO Nothing
- (fst <$> getNameInfo [] True False sockaddr)
diff --git a/Assistant/Pairing/Network.hs b/Assistant/Pairing/Network.hs
deleted file mode 100644
index 62a4ea02e8..0000000000
--- a/Assistant/Pairing/Network.hs
+++ /dev/null
@@ -1,132 +0,0 @@
-{- git-annex assistant pairing network code
- -
- - All network traffic is sent over multicast UDP. For reliability,
- - each message is repeated until acknowledged. This is done using a
- - thread, that gets stopped before the next message is sent.
- -
- - Copyright 2012 Joey Hess <id@joeyh.name>
- -
- - Licensed under the GNU AGPL version 3 or higher.
- -}
-
-module Assistant.Pairing.Network where
-
-import Assistant.Common
-import Assistant.Pairing
-import Assistant.DaemonStatus
-import Utility.ThreadScheduler
-import Utility.Verifiable
-
-import Network.Multicast
-import Network.Info
-import Network.Socket
-import qualified Network.Socket.ByteString as B
-import qualified Data.ByteString.UTF8 as BU8
-import qualified Data.Map as M
-import Control.Concurrent
-
-{- This is an arbitrary port in the dynamic port range, that could
- - conceivably be used for some other broadcast messages.
- - If so, hope they ignore the garbage from us; we'll certainly
- - ignore garbage from them. Wild wild west. -}
-pairingPort :: PortNumber
-pairingPort = 55556
-
-{- Goal: Reach all hosts on the same network segment.
- - Method: Use same address that avahi uses. Other broadcast addresses seem
- - to not be let through some routers. -}
-multicastAddress :: AddrClass -> HostName
-multicastAddress IPv4AddrClass = "224.0.0.251"
-multicastAddress IPv6AddrClass = "ff02::fb"
-
-{- Multicasts a message repeatedly on all interfaces, with a 2 second
- - delay between each transmission. The message is repeated forever
- - unless a number of repeats is specified.
- -
- - The remoteHostAddress is set to the interface's IP address.
- -
- - Note that new sockets are opened each time. This is hardly efficient,
- - but it allows new network interfaces to be used as they come up.
- - On the other hand, the expensive DNS lookups are cached.
- -}
-multicastPairMsg :: Maybe Int -> Secret -> PairData -> PairStage -> IO ()
-multicastPairMsg repeats secret pairdata stage = go M.empty repeats
- where
- go _ (Just 0) = noop
- go cache n = do
- addrs <- activeNetworkAddresses
- let cache' = updatecache cache addrs
- mapM_ (sendinterface cache') addrs
- threadDelaySeconds (Seconds 2)
- go cache' $ pred <$> n
- {- The multicast library currently chokes on ipv6 addresses. -}
- sendinterface _ (IPv6Addr _) = noop
- sendinterface cache i = void $ tryIO $
(Diff truncated)
remove old assistant release notes
diff --git a/doc/assistant/release_notes.mdwn b/doc/assistant/release_notes.mdwn deleted file mode 100644 index 6c7c432de4..0000000000 --- a/doc/assistant/release_notes.mdwn +++ /dev/null @@ -1,422 +0,0 @@ -## version 6.20170101 - -XMPP support has been removed from the assistant in this release. - -If your repositories used XMPP to keep in sync, that will no longer -work, and you should enable some other remote to keep them in sync. -A ssh server is one way, or use the new Tor pairing feature. - -## version 5.20140421 - -This release begins to deprecate XMPP support. In particular, if you use -the assistant with a ssh remote that has this version of git-annex -installed, you don't need XMPP any longer to get immediate syncing of -changes. - -## version 5.20140411 - -This release fixes a bug that could cause the assistant to use a *lot* of -CPU, when monthly fscking was set up. - -Automatic upgrading was broken on OSX for previous versions. This has been -fixed, but you'll need to manually upgrade to this version to get it going -again. Workaround: Remove the wget bundled inside the git-annex dmg. - -## version 5.20140221 - -The Windows port of the assistant and webapp is now considered to be beta -quality. There are important missing features (notably Jabber), documented -on [[todo/windows_support]], but the webapp is broadly usable on Windows -now. - -## version 5.20131221 - -There is now a arm [[install/linux_standalone]] build of git-annex, -including the assistant and webapp, -which can be installed on a variety of systems including Raspberry Pi, -Synology NAS, and Google Chromebooks. Details in -[[this forum thread|forum/new_linux_arm_tarball_build]]. - -## version 5.20131213 - -The assistant can now be used on Windows! However, it has known problems, -described in [[todo/windows_support]], and should be considered an -alpha-level preview. - -## version 5.20131127 - -Starting with this version, when git-annex is installed from a build on -this website, it will detect when new versions are available, and allow -easily upgrading. Automatic upgrades can also be configured if desired, -or automatic upgrade checking can be disabled in the preferences page. - -git-annex builds from distributions, like Debian will not automatically -upgrade; use the distribution's package manager for that. However, the -git-annex webapp will also detect when a distribution has upgraded -git-annex and offer to restart the assistant. - -## version 4.20131024 - -This version fixes several different bugs that could cause the webapp to -refuse to create a repository. Several other bugs are also fixed, including -a bug that caused it to not add files on Android. - -New in this release is the ability to use the webapp to set up scheduled -consistency checks of your repositories. Many problems with repositories -are now automatically corrected, and it can even repair damaged git -repositories. - -This is a recommended upgrade. - -## version 4.20131002 - -Now you can use the webapp to set up an encrypted git repository on a -remote ssh server, or on rsync.net, and use it as a live cloud backup. Or, -use the webapp to make an encrypted git repository on a removable drive, -and store it offsite as a secure backup. - -## version 4.20130920 - -This release is the first to support fully encrypted git repositories -stored on removable drives. This can be set up easily using the webapp. - -## version 4.20130909 - -This release fixes a crash that could occur when using XMPP with the -assitant. It has only been seen on OS X so far. The bug is not believed to -be explitable, but upgrading is still recommended. - -## version 4.20130802 - -This release fixes several bugs, including a reversion introduced in the last -version that broke direct mode on Windows, Android, and other crippled -filesystems. It contains a workaround for a bug in recent git pre-releases -that broke handling of filenames containing spaces. -It is a highly recommended upgrade. - -The webapp can now detect repositories that did not finish getting properly set -up, and can recover from one common bug that broke local pairing and remote -ssh server setups on systems using `ssh-agent`. - -## version 4.20130723 - -This release fixes some bugs. Notably it fixes a bug that could result in data -loss when adding a tarball of a git-annex repository to your git-annex -repository. - -Rsync.net have committed to support git-annex and offer a special -discounted rate for git-annex users. -<http://www.rsync.net/products/git-annex-pricing.html> - -## version 4.20130709 - -This release is mostly bug fixes. - -One of the bugs involved setting up rsync remotes on servers other than -rsync.net. The wrong `.ssh/authorized_keys` line was deployed to the -remote server. If you set up a rsync remote with a past release, and it does -not work, you will need to manually edit the `.ssh/authorized_keys` file, -and remove the `command=` forced command. - -## version 4.20130621, 4.20130627 - -These releases mostly consist of bug fixes. - -## version 4.20130601 - -This is a bugfix release, featuring significant XMPP improvements and -more robustness thanks to automated fuzz testing. Recommended upgrade. - -This version changes its XMPP protocol, so it will fail to sync with older -git-annex versions over XMPP. - -## version 4.20130521 - -This is a bugfix release. Recommended upgrade. - -## version 4.20130516 - -This version contains numerous bug fixes, and improvements. - -This is the first release with a fully usable Android app. No command-line -typing needed to set up syncing to your Android phone or tablet! -A few of the more advanced features may not work (or not work reliably) -on Android. The Android app is still beta quality. - -This is also the first release with a Windows port! The Windows port -is in an alpha quality state, and is missing many features. -It does not yet include the assistant. - -## version 4.20130501 - -This version contains numerous bug fixes, and improvements. - -## version 4.20130417 - -This version contains numerous bug fixes, and improvements. - -One bug that was fixed can affect users of gnome-keyring who -have set up remote repositories on ssh servers using the webapp. -The gnome-keyring may load the restricted key that is set up -for that, and make it be used for regular logins to the server; -with the result that you'll get an error message about "git-annex-shell" -when sshing to the server. - -If you experience this problem you can fix it by -moving `.ssh/key.git-annex*` to `.ssh/git-annex/` (creating -that directory first), and edit `.ssh/config` to reflect the new -location of the key. You will also need to restart gnome-keyring. - -Another change relates to files in `archive/` directories. Client repositories -now sync these files between themselves like any other files, until -the files reach an archive repository. Only then are they removed from -the client repositories. So you need to ensure you have at least one -archive repository if you want to use the `archive/` directory feature. - -## version 4.20130323, 4.20130405 - -These versions continue fixing bugs and adding features. - -## version 4.20130314 - -This version makes a great many improvements and bugfixes, and is -a recommended upgrade. - -If you have already used the webapp to locally pair two computers, -a bug caused the paired repository to not be given an appropriate cost. -To fix this, go into the Repositories page in the webapp, and drag the -repository for the locally paired computer to come before any repositories -that it's more expensive to transfer data to. - -## version 4.20130227 - -This release fixes a bug with globbing that broke preferred content expressions. -So, it is a recommended upgrade from the previous release, which introduced (Diff truncated)
add news item for git-annex 10.20250929
diff --git a/doc/news/version_10.20250605.mdwn b/doc/news/version_10.20250605.mdwn deleted file mode 100644 index 5a9016e9f5..0000000000 --- a/doc/news/version_10.20250605.mdwn +++ /dev/null @@ -1,19 +0,0 @@ -git-annex 10.20250605 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * sync: Push the current branch first, rather than a synced branch, - to better support git forges (gitlab, gitea, forgejo, etc.) which - use push-to-create with the first pushed branch becoming the default - branch. - * Added annex.fastcopy and remote.name.annex-fastcopy config setting. - When set, this allows the copy\_file\_range syscall to be used, which - can eg allow for server-side copies on NFS. (For fastest copying, - also disable annex.verify or remote.name.annex-verify.) - * map: Support --json option. - * map: Improve display of remote names. - * When annex.freezecontent-command or annex.thawcontent-command is - configured but fails, prevent initialization. This allows the user to - fix their configuration and avoid crippled filesystem detection - entering an adjusted branch. - * assistant: Avoid hanging at startup when a process has a *.lock file - open in the .git directory. - * Windows: Fix duplicate file bug that could occur when files were - supposed to be moved across devices."""]] \ No newline at end of file diff --git a/doc/news/version_10.20250929.mdwn b/doc/news/version_10.20250929.mdwn new file mode 100644 index 0000000000..4d46ac2cf1 --- /dev/null +++ b/doc/news/version_10.20250929.mdwn @@ -0,0 +1,7 @@ +git-annex 10.20250929 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * enableremote: Allow type= to be provided when it does not change the + type of the special remote. + * importfeed: Fix encoding issues parsing feeds when built with OsPath. + * Fix build with ghc 9.0.2. + * Remove the Servant build flag; always build with support for + annex+http urls and git-annex p2phttp."""]] \ No newline at end of file
Fix build with ghc 9.0.2.
diff --git a/CHANGELOG b/CHANGELOG
index aa63a40a0e..a220b171d2 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -5,6 +5,7 @@ git-annex (10.20250926) UNRELEASED; urgency=medium
* enableremote: Allow type= to be provided when it does not change the
type of the special remote.
* importfeed: Fix encoding issues parsing feeds when built with OsPath.
+ * Fix build with ghc 9.0.2.
-- Joey Hess <id@joeyh.name> Thu, 25 Sep 2025 13:36:21 -0400
diff --git a/Utility/OpenFd.hs b/Utility/OpenFd.hs
index 95f18085a6..62ce4ace91 100644
--- a/Utility/OpenFd.hs
+++ b/Utility/OpenFd.hs
@@ -14,6 +14,9 @@ module Utility.OpenFd where
import System.Posix.IO.ByteString
import System.Posix.Types
+#if ! MIN_VERSION_unix(2,8,0)
+import Control.Monad
+#endif
import Utility.RawFilePath
diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn
index 5ae44072f5..a0a7a2882c 100644
--- a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn
+++ b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn
@@ -11,3 +11,5 @@ Utility/OpenFd.hs:28:9: error:
```
I'm not sure this error is directly caused by the antiquated compiler, but also not sure how to debug this further or work around it either.
+
+> [[fixed|done]] --[[Joey]]
diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_1_b8ace7d676bdecfd0e3bb47331e48a13._comment b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_1_b8ace7d676bdecfd0e3bb47331e48a13._comment
new file mode 100644
index 0000000000..1c923b26e1
--- /dev/null
+++ b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_1_b8ace7d676bdecfd0e3bb47331e48a13._comment
@@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 1"""
+ date="2025-09-29T15:19:41Z"
+ content="""
+git-annex is still targeting supporting ghc back to 9.0.2, so your old
+ghc should not yet be a problem. However, I don't have any CI left that
+uses such old versions of ghc, so it might break from time to time.
+
+I've fixed this one, which was a missing `import Control.Monad`. Please
+report if you find other build failures.
+"""]]
response
diff --git a/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__/comment_1_9657a0979fae0b88f8a9b8fcdd2417de._comment b/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__/comment_1_9657a0979fae0b88f8a9b8fcdd2417de._comment new file mode 100644 index 0000000000..07e7cefce5 --- /dev/null +++ b/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__/comment_1_9657a0979fae0b88f8a9b8fcdd2417de._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-29T15:16:06Z" + content=""" +The inode cache is something git-annex uses internally to keep track of +changes to files. There are some known situations where it can get out of +date, including an upgrade from a v8 repository. Sometimes inodes change +for various reasons, like copying a repository from one filesystem to +another. So this just means that fsck has detected and updated the +information. I would not worry about it unless git-annex has other +unexpected behavior. +"""]]
comment
diff --git a/doc/todo/import_tree_from_rsync_special_remote/comment_5_28462adcccadd9a51a3c714a30cec23a._comment b/doc/todo/import_tree_from_rsync_special_remote/comment_5_28462adcccadd9a51a3c714a30cec23a._comment new file mode 100644 index 0000000000..3e240acd9c --- /dev/null +++ b/doc/todo/import_tree_from_rsync_special_remote/comment_5_28462adcccadd9a51a3c714a30cec23a._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-09-29T15:12:00Z" + content=""" +How efficient that would be would depend, I think, on how stable inodes are +between remounts of a sshfs mount. If it sees new inodes, it will +re-download all the files. +"""]]
add libghc-unbounded-delays-dev to debian/control deps
diff --git a/debian/control b/debian/control index 3ae6084609..a6911a9724 100644 --- a/debian/control +++ b/debian/control @@ -75,6 +75,7 @@ Build-Depends: libghc-optparse-applicative-dev (>= 0.11.0), libghc-torrent-dev, libghc-concurrent-output-dev, + libghc-unbounded-delays-dev, libghc-disk-free-space-dev, libghc-mountpoints-dev, libghc-magic-dev, diff --git a/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn index f0e3c7b47c..e032d72f17 100644 --- a/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn +++ b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn @@ -11,3 +11,5 @@ unbounded-delays ``` oddly we still built fine I believe for the http://github.com/datalad/git-annex where we also do not have that one I think + +> [[fixed|done]] presumably --[[Joey]] diff --git a/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev/comment_1_2415e9fc5ff3a66bacc039f5476dc013._comment b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev/comment_1_2415e9fc5ff3a66bacc039f5476dc013._comment new file mode 100644 index 0000000000..b670ce5a25 --- /dev/null +++ b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev/comment_1_2415e9fc5ff3a66bacc039f5476dc013._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-29T15:07:41Z" + content=""" +git-annex has a dependency on unbounded-delays, listed in git-annex.cabal. + +Noting has changed here since 2024 when it stopped vendoring part of that +library and added the dependency. + +I do see that the debian/control shipped with git-annex was missing that +dep, I've added it and I *guess* that will fix your problem +"""]]
don't set locale encoding when opening binary file
importfeed: Fix encoding issues parsing feeds when built with OsPath.
importfeed: Fix encoding issues parsing feeds when built with OsPath.
diff --git a/CHANGELOG b/CHANGELOG
index d61836f7c7..aa63a40a0e 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -4,6 +4,7 @@ git-annex (10.20250926) UNRELEASED; urgency=medium
annex+http urls and git-annex p2phttp.
* enableremote: Allow type= to be provided when it does not change the
type of the special remote.
+ * importfeed: Fix encoding issues parsing feeds when built with OsPath.
-- Joey Hess <id@joeyh.name> Thu, 25 Sep 2025 13:36:21 -0400
diff --git a/Utility/FileIO/CloseOnExec.hs b/Utility/FileIO/CloseOnExec.hs
index 3d1bb739f7..1a91add1e7 100644
--- a/Utility/FileIO/CloseOnExec.hs
+++ b/Utility/FileIO/CloseOnExec.hs
@@ -3,9 +3,9 @@
- All functions have been modified to set the close-on-exec
- flag to True.
-
- - Also, functions that return a Handle have been modified to
- - use the locale encoding, working around this bug:
- - https://github.com/haskell/file-io/issues/45
+ - Also, functions that return a Handle (for a non-binary file)
+ - have been modified to use the locale encoding, working around
+ - this bug: https://github.com/haskell/file-io/issues/45
-
- Copyright 2025 Joey Hess <id@joeyh.name>
- Copyright 2024 Julian Ospald
@@ -70,12 +70,12 @@ openFile osfp iomode = augmentError "openFile" osfp $
withBinaryFile :: OsPath -> IOMode -> (Handle -> IO r) -> IO r
withBinaryFile osfp iomode act = (augmentError "withBinaryFile" osfp
- $ withOpenFileEncoding osfp iomode True False closeOnExec (try . act) True)
+ $ withOpenFile' osfp iomode True False closeOnExec (try . act) True)
>>= either ioError pure
openBinaryFile :: OsPath -> IOMode -> IO Handle
openBinaryFile osfp iomode = augmentError "openBinaryFile" osfp $
- withOpenFileEncoding osfp iomode True False closeOnExec pure False
+ withOpenFile' osfp iomode True False closeOnExec pure False
readFile :: OsPath -> IO BSL.ByteString
readFile fp = withFileNoEncoding' fp ReadMode BSL.hGetContents
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn
index a0ab387188..e6ee11eb86 100644
--- a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn
+++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn
@@ -96,3 +96,5 @@ And it seems a fairly recent breakage, as IIRC the previous installed was from 2
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes, for many years. git-annex has worked vey well for downloading/collecting podcasts for years, which is why t was surprising it's suddenly failing like this.
+
+> [[fixed|done]] --[[Joey]]
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_6_9c6851e659c977eb5106dcd83ea7765a._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_6_9c6851e659c977eb5106dcd83ea7765a._comment
new file mode 100644
index 0000000000..c11f288b81
--- /dev/null
+++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_6_9c6851e659c977eb5106dcd83ea7765a._comment
@@ -0,0 +1,18 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 6"""
+ date="2025-09-29T14:45:50Z"
+ content="""
+Thanks for some really good detective work @ewen.
+
+Note that this only happens when git-annex is built with the OsPath build
+flag.
+
+That seems to indicate that the problem is in
+Utility.FileIO.openBinaryFile,
+which is the only way that parseFeedFromFile' varies depending on that
+build flag.
+
+Aha yes, the problem is that uses withOpenFileEncoding, which is
+inappropriate for a binary file!
+"""]]
Added a comment: Cross link to importfeed parsing
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_26_6552491d65593df8346a764cb1cd3709._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_26_6552491d65593df8346a764cb1cd3709._comment new file mode 100644 index 0000000000..7f32b84e2e --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_26_6552491d65593df8346a764cb1cd3709._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="Cross link to importfeed parsing" + date="2025-09-28T22:49:31Z" + content=""" +As a cross link, the changes in [comment 8 on this bug](http://git-annex.branchable.com/bugs/35_failed_tests_on_beegfs/#comment-d7e4cf0592937215e3acd3c08c03288c) seem to have changed the feed parsing from binary mode to decoding UTF-8, which appears to be breaking on feeds which actually contain UTF-8 (eg, smart quotes, smart dashes, etc). + +See [comment on bug about importfeed breaking on `toEnum` out of range](http://git-annex.branchable.com/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/#comment-dbdafeb801ad23e2ccb3c2aa066a4efb) (where it took me a while to figure out what the root cause was). + +Ewen + + +"""]]
Added a comment: Feed seems to now be parsed as UTF-8 characters, not binary mode
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_5_9982bda0b8b224edd2300083f7e1ec00._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_5_9982bda0b8b224edd2300083f7e1ec00._comment new file mode 100644 index 0000000000..56b0b23315 --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_5_9982bda0b8b224edd2300083f7e1ec00._comment @@ -0,0 +1,31 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="Feed seems to now be parsed as UTF-8 characters, not binary mode" + date="2025-09-28T22:42:32Z" + content=""" +I think the relevant change is likely to be: + +``` +* feed (update: parseFeedFromFile uses openBinaryFile, updated git-annex to open + the file itself instead) +``` + +from [https://git-annex.branchable.com/bugs/35_failed_tests_on_beegfs/#comment-d7e4cf0592937215e3acd3c08c03288c](https://git-annex.branchable.com/bugs/35_failed_tests_on_beegfs/#comment-d7e4cf0592937215e3acd3c08c03288c) + +Based on the fact that's a 2025-09-04 change (so since previous release), refers to `parseFeedFromFile`, and the relevant commit seems to be: + +[http://source.git-annex.branchable.com/?p=source.git;a=commit;h=2b1e9eced2fe825c882b4e9549a3a12f41d08055](http://source.git-annex.branchable.com/?p=source.git;a=commit;h=2b1e9eced2fe825c882b4e9549a3a12f41d08055) + +and particular in this file: + +[http://source.git-annex.branchable.com/?p=source.git;a=blobdiff;f=Command/ImportFeed.hs;h=e36e72370204ece44a05bfae5954272a46f34f5c;hp=7b66a2b5077613b7e33dc8597a8272e7fdea7102;hb=2b1e9eced2fe825c882b4e9549a3a12f41d08055;hpb=56cd59a9f4e24c5a6842179e0da9180875d837cc](http://source.git-annex.branchable.com/?p=source.git;a=blobdiff;f=Command/ImportFeed.hs;h=e36e72370204ece44a05bfae5954272a46f34f5c;hp=7b66a2b5077613b7e33dc8597a8272e7fdea7102;hb=2b1e9eced2fe825c882b4e9549a3a12f41d08055;hpb=56cd59a9f4e24c5a6842179e0da9180875d837cc) + +My reading of that code is that the feed parsing switched from (implicitly) \"just bytes\" (`openBinaryFile`) to decoding UTF-8 into full UTF-8 characters, but there's either (a) something in the later git-annex code or (b) the XML parser that does not expect to receive non-ASCII Unicode characters resulting from opening in \"character\" mode rather than \"binary\" mode, resulting in out of range values. + +Which results in the crash on encountering the first non-ASCII character in the feed :-/ + +It's not clear to me why in fixing \"set close-on-exec bit on open files\" the feed parsing was changed from bytes (binary mode) to decoded characters. But it appears it wasn't tested on feeds where the text has been through a wordprocessor throwing in smart quotes and smart dashes and the like all over the place. + +Ewen +"""]]
Added a comment: importfeed: utf-8 XML is (now?) parsed into 8-bit characters
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_4_3ee57c43594f381747b8463b8acadb9f._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_4_3ee57c43594f381747b8463b8acadb9f._comment new file mode 100644 index 0000000000..fb5436d072 --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_4_3ee57c43594f381747b8463b8acadb9f._comment @@ -0,0 +1,69 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="importfeed: utf-8 XML is (now?) parsed into 8-bit characters" + date="2025-09-28T22:24:23Z" + content=""" +Based on looking at some examples, I'm fairly convinced that the podcast feeds are now being parsed into 8 bit characters (extended ASCII?), even when (only when?) they have `encoding=\"UTF-8\"` on the `<?xml ...?>` prelude tag. UTF-8 decoding can obviously can easily result in characters outside the 8-bit range, which seems to be the exception thrown, based on examining the feed contents (below) and the \"tag\" values outside range. + +8217 == 0x2019 (in hex). + +And [U+2019](https://www.compart.com/en/unicode/U+2019) is a single quotation mark, which encodes in UTF-8 as `0xE2 0x80 0x99`. + +The first problematic feed is littered with that exact byte sequence: + +``` +ewen@basadi:/tmp$ curl -s https://risky.biz/feeds/risky-business/ | head -1 +<?xml version=\"1.0\" encoding=\"utf-8\" ?> +ewen@basadi:/tmp$ +``` + +``` +ewen@basadi:/tmp$ curl -s https://risky.biz/feeds/risky-business/ | hexdump -C | grep \"e2 80 99\" | head +000008b0 65 65 6b e2 80 99 73 20 73 68 6f 77 20 50 61 74 |eek...s show Pat| +00000a20 20 77 65 65 6b e2 80 99 73 20 65 70 69 73 6f 64 | week...s episod| +00000a60 e2 80 99 73 20 73 70 6f 6e 73 6f 72 20 69 6e 74 |...s sponsor int| +00000bf0 20 74 68 65 20 77 65 65 6b e2 80 99 73 20 63 79 | the week...s cy| +00000d60 20 77 65 65 6b e2 80 99 73 20 65 70 69 73 6f 64 | week...s episod| +00000da0 e2 80 99 73 20 73 70 6f 6e 73 6f 72 20 69 6e 74 |...s sponsor int| +00001580 65 e2 80 9d 20 69 73 6e e2 80 99 74 20 74 68 65 |e... isn...t the| +00001c20 e2 80 99 20 61 73 20 73 75 70 70 6c 69 65 72 20 |... as supplier | +00002290 20 74 68 69 73 20 77 65 65 6b e2 80 99 73 20 73 | this week...s s| +000022d0 65 6b e2 80 99 73 20 63 79 62 65 72 73 65 63 75 |ek...s cybersecu| +ewen@basadi:/tmp$ +``` + +Another of the problematic feeds (reported as 8211; see first post) has lots of the UTF-8 sequence `e2 80 93` for [U+2103](https://www.compart.com/en/unicode/U+2013) (an en dash), and 8211 == 0x2013: + +``` +ewen@basadi:/tmp$ curl -s https://theamphour.libsyn.com/rss | hexdump -C | grep \" e2 80 \" | head +0001e800 31 39 36 20 e2 80 93 20 41 6e 20 49 6e 74 65 72 |196 ... An Inter| +0001e860 31 39 36 20 e2 80 93 20 41 6e 20 49 6e 74 65 72 |196 ... An Inter| +0003e510 68 74 3d 22 30 22 3e 4c 6f 61 64 69 6e 67 e2 80 |ht=\"0\">Loading..| +0003f660 3e 20 3c 70 3e 4c 6f 61 64 69 6e 67 e2 80 a6 20 |> <p>Loading... | +00052440 6d 70 20 48 6f 75 72 20 23 33 37 39 20 e2 80 93 |mp Hour #379 ...| +0007a7d0 e2 80 93 20 4f 73 74 72 6f 62 6f 67 75 6c 6f 75 |... Ostrobogulou| +00088480 72 20 23 38 33 20 e2 80 94 20 41 67 67 72 61 76 |r #83 ... Aggrav| +00088b40 41 6d 70 20 48 6f 75 72 20 23 38 32 20 e2 80 94 |Amp Hour #82 ...| +000891e0 20 23 38 31 20 e2 80 94 20 4a 65 72 73 65 79 20 | #81 ... Jersey | +000898a0 30 20 e2 80 94 20 4f 74 69 6f 73 65 20 4f 6e 74 |0 ... Otiose Ont| +ewen@basadi:/tmp$ +``` + +``` +ewen@basadi:/tmp$ curl -s https://theamphour.libsyn.com/rss | head -1 +<?xml version=\"1.0\" encoding=\"UTF-8\"?> +ewen@basadi:/tmp$ +``` + +The working feed appears to have no non-ASCII characters in it: + +``` +ewen@basadi:/tmp$ curl -s 'https://www.2600.com/oth-broadband.xml' | hexdump -C | grep ' [89abcdef][0-9a-f] ' +ewen@basadi:/tmp$ +``` + +So it appears non-ASCII UTF-8 encoding is required to trigger this problem. + +Ewen +"""]]
Added a comment: Example still working feed
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_3_f9d976fc829826401838b285698e22ee._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_3_f9d976fc829826401838b285698e22ee._comment
new file mode 100644
index 0000000000..b3763d3fcb
--- /dev/null
+++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_3_f9d976fc829826401838b285698e22ee._comment
@@ -0,0 +1,109 @@
+[[!comment format=mdwn
+ username="ewen"
+ avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
+ subject="Example still working feed"
+ date="2025-09-28T22:05:57Z"
+ content="""
+My *hunch* is that this error occurs during *parsing* the feed XML, based on not getting to the feed *title* and \"ok\" being displayed in the error case. But I'm not sure if there's a specific way to test just that.
+
+Example of a podcast feed that still works:
+
+https://www.2600.com/oth-broadband.xml
+
+There's no redirect on this one, and the `Content-Type` header has an explicit `charset=utf-8`, but so far I don't know if that matters.
+
+The failing feed has `encoding=\"utf-8\"` in the `<?xml ...?>` header of the file, which in theory is functionally equivalent in terms of XML communicating how to expect the file to be encoded. But maybe git-annex is not treating that the same any longer?
+
+```
+ewen@basadi:/tmp/podcasts$ git annex importfeed --relaxed \"https://www.2600.com/oth-broadband.xml\"
+importfeed gathering known urls ok
+importfeed https://www.2600.com/oth-broadband.xml (\"Off The Hook\") ok
+ewen@basadi:/tmp/podcasts$
+```
+
+second import attempt above, matching what my podcast downloads normally do; the first one was also `--relaxed` but with `--debug` and the debug output is quote long, so here's just the start of it, showing it got a lot further than the feeds that don't work:
+
+```
+ewen@basadi:/tmp/podcasts$ git annex importfeed --debug --relaxed \"https://www.2600.com/oth-broadband.xml\"
+[2025-09-29 10:57:01.984117] (Utility.Process) process [8003] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"git-annex\"]
+[2025-09-29 10:57:01.99142] (Utility.Process) process [8003] done ExitSuccess
+[2025-09-29 10:57:01.992598] (Utility.Process) process [8004] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
+[2025-09-29 10:57:01.999387] (Utility.Process) process [8004] done ExitSuccess
+[2025-09-29 10:57:02.00066] (Utility.Process) process [8005] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch\"]
+importfeed gathering known urls [2025-09-29 10:57:02.01013] (Utility.Process) process [8006] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"rev-parse\",\"--verify\",\"--quiet\",\"refs/heads/git-annex:\"]
+[2025-09-29 10:57:02.016994] (Utility.Process) process [8006] done ExitSuccess
+ok
+importfeed https://www.2600.com/oth-broadband.xml [2025-09-29 10:57:02.101169] (Utility.Url) Request {
+ host = \"www.2600.com\"
+ port = 443
+ secure = True
+ requestHeaders = [(\"Accept-Encoding\",\"identity\"),(\"User-Agent\",\"git-annex/10.20250925\")]
+ path = \"/oth-broadband.xml\"
+ queryString = \"\"
+ method = \"GET\"
+ proxy = Nothing
+ rawBody = False
+ redirectCount = 10
+ responseTimeout = ResponseTimeoutDefault
+ requestVersion = HTTP/1.1
+ proxySecureMode = ProxySecureWithConnect
+}
+
+(\"Off The Hook\") ok
+addurl https://download.2600.com/mediadownload/www.2600.com/offthehook/mp3files/2025/off_the_hook__20250924-128.mp3 [2025-09-29 10:57:02.798482] (Utility.Process) process [8025] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"-c\",\"annex.debug=true\",\"check-ignore\",\"-z\",\"--stdin\",\"--verbose\",\"--non-matching\"]
+(to Off_The_Hook/Off_The_Hook__-_Wed__24_Sep_2025_19_00_00_EST.mp3) [2025-09-29 10:57:02.807045] (Annex.Branch) read 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log.web
+[2025-09-29 10:57:02.808092] (Annex.Branch) set 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log.web
+[2025-09-29 10:57:02.808366] (Annex.Branch) read 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log
+[2025-09-29 10:57:02.809307] (Annex.Branch) set 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log
+[2025-09-29 10:57:02.809547] (Annex.Branch) read 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log
+[2025-09-29 10:57:02.809684] (Messages.explain) [ Off_The_Hook/Off_The_Hook__-_Wed__24_Sep_2025_19_00_00_EST.mp3 does not match annex.addunlocked: nothing[FALSE] ]
+
+[2025-09-29 10:57:02.810562] (Utility.Process) process [8026] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"symbolic-ref\",\"-q\",\"HEAD\"]
+[2025-09-29 10:57:02.816781] (Utility.Process) process [8026] done ExitSuccess
+[2025-09-29 10:57:02.817626] (Utility.Process) process [8027] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"refs/heads/main\"]
+[2025-09-29 10:57:02.824386] (Utility.Process) process [8027] done ExitFailure 1
+[2025-09-29 10:57:02.825815] (Utility.Process) process [8028] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"hash-object\",\"-w\",\"--no-filters\",\"--stdin-paths\"]
+[2025-09-29 10:57:02.833619] (Annex.Branch) read 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log.met
+[2025-09-29 10:57:02.834642] (Annex.Branch) set 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log.met
+ok
+...
+(recording state in git...)
+[2025-09-29 10:57:03.027308] (Utility.Process) process [8047] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
+[2025-09-29 10:57:03.033836] (Utility.Process) process [8047] done ExitSuccess
+[2025-09-29 10:57:03.03538] (Utility.Process) process [8048] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
+[2025-09-29 10:57:03.052124] (Utility.Process) process [8048] done ExitSuccess
+[2025-09-29 10:57:03.052815] (Utility.Process) process [8049] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
+[2025-09-29 10:57:03.060486] (Utility.Process) process [8049] done ExitSuccess
+[2025-09-29 10:57:03.061529] (Utility.Process) process [8050] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"write-tree\"]
+[2025-09-29 10:57:03.080761] (Utility.Process) process [8050] done ExitSuccess
+[2025-09-29 10:57:03.081517] (Utility.Process) process [8051] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"commit-tree\",\"6333ccc39e4da89afeb821dcb39b6ef3ba84c936\",\"--no-gpg-sign\",\"-p\",\"refs/heads/git-annex\",\"-m\",\"update\"]
+[2025-09-29 10:57:03.090011] (Utility.Process) process [8051] done ExitSuccess
+[2025-09-29 10:57:03.090729] (Utility.Process) process [8052] call: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-ref\",\"refs/heads/git-annex\",\"1b74c6e0e12093128fe6f6ba4c895e9115df3015\"]
+[2025-09-29 10:57:03.099106] (Utility.Process) process [8052] done ExitSuccess
+[2025-09-29 10:57:03.102583] (Utility.Process) process [8005] done ExitSuccess
+[2025-09-29 10:57:03.102969] (Utility.Process) process [8028] done ExitSuccess
+[2025-09-29 10:57:03.103307] (Utility.Process) process [8025] done ExitFailure 1
+```
+
+```
+ewen@basadi:/tmp$ curl --head https://www.2600.com/oth-broadband.xml
+HTTP/1.1 200 OK
+Date: Sun, 28 Sep 2025 21:56:26 GMT
+Server: Apache
+X-Content-Type-Options: nosniff
+X-Drupal-Cache: HIT
+Etag: \"1759088060-0\"
+Content-Language: en
+X-Frame-Options: SAMEORIGIN
+Cache-Control: public, max-age=0
+Last-Modified: Sun, 28 Sep 2025 19:34:20 GMT
+Expires: Sun, 19 Nov 1978 05:00:00 GMT
+Vary: Cookie,Accept-Encoding
+Content-Type: application/rss+xml; charset=utf-8
+Strict-Transport-Security: max-age=16070400;
+orig_req_proto: https
+Connection: close
+
+ewen@basadi:/tmp$
+```
+"""]]
Added a comment: Debug output
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_2_0a790f8fd42304f17887536102af09d4._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_2_0a790f8fd42304f17887536102af09d4._comment
new file mode 100644
index 0000000000..c39b2ba71b
--- /dev/null
+++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_2_0a790f8fd42304f17887536102af09d4._comment
@@ -0,0 +1,70 @@
+[[!comment format=mdwn
+ username="ewen"
+ avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
+ subject="Debug output"
+ date="2025-09-28T21:58:18Z"
+ content="""
+Having found `--debug` (by trying to scan the source; I barely know Haskell, but found almost no *explicit* `toEnum` and none that have changed in the last month AFAICT), it does seem like it's getting as far as downloading the feed URL contents, and then failing (presumably on doing something about parsing it).
+
+```
+ewen@basadi:/tmp/podcasts$ git annex importfeed --debug \"https://risky.biz/feeds/risky-business\"
+[2025-09-29 10:51:54.947712] (Utility.Process) process [7859] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"git-annex\"]
+[2025-09-29 10:51:54.954732] (Utility.Process) process [7859] done ExitSuccess
+[2025-09-29 10:51:54.955442] (Utility.Process) process [7860] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
+[2025-09-29 10:51:54.962048] (Utility.Process) process [7860] done ExitSuccess
+[2025-09-29 10:51:54.963011] (Utility.Process) process [7861] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch\"]
+importfeed gathering known urls [2025-09-29 10:51:54.97316] (Utility.Process) process [7862] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"rev-parse\",\"--verify\",\"--quiet\",\"refs/heads/git-annex:\"]
+[2025-09-29 10:51:54.980308] (Utility.Process) process [7862] done ExitSuccess
+ok
+importfeed https://risky.biz/feeds/risky-business [2025-09-29 10:51:55.067656] (Utility.Url) Request {
+ host = \"risky.biz\"
+ port = 443
+ secure = True
+ requestHeaders = [(\"Accept-Encoding\",\"identity\"),(\"User-Agent\",\"git-annex/10.20250925\")]
+ path = \"/feeds/risky-business\"
+ queryString = \"\"
+ method = \"GET\"
+ proxy = Nothing
+ rawBody = False
+ redirectCount = 10
+ responseTimeout = ResponseTimeoutDefault
+ requestVersion = HTTP/1.1
+ proxySecureMode = ProxySecureWithConnect
+}
+
+
+git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
+failed
+[2025-09-29 10:51:57.214034] (Utility.Process) process [7861] done ExitSuccess
+importfeed: 1 failed
+ewen@basadi:/tmp/podcasts$
+```
+
+Request headers (via `curl`; note there's a 301 redirect, but asking `git-annex` to download the version at the end of the redirect doens't change the `git-annex` symptoms):
+
+```
+ewen@basadi:/tmp$ curl --head https://risky.biz/feeds/risky-business
+HTTP/1.1 301 Moved Permanently
+Date: Sun, 28 Sep 2025 21:54:28 GMT
+Server: Apache
+Strict-Transport-Security: max-age=63072000;
+Location: https://risky.biz/feeds/risky-business/
+Connection: close
+Content-Type: text/html; charset=iso-8859-1
+
+ewen@basadi:/tmp$ curl --head https://risky.biz/feeds/risky-business/
+HTTP/1.1 200 OK
+Date: Sun, 28 Sep 2025 21:54:40 GMT
+Server: Apache
+Strict-Transport-Security: max-age=63072000;
+Last-Modified: Sun, 28 Sep 2025 19:34:25 GMT
+ETag: \"864f7-63fe19b449bd4\"
+Accept-Ranges: bytes
+Content-Length: 550135
+Vary: Accept-Encoding
+Connection: close
+Content-Type: application/xml
+
+ewen@basadi:/tmp$
+```
+"""]]
Added a comment: Previous working build was 20250828
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_1_f24dadf21fc4a95e627e508d1e22488d._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_1_f24dadf21fc4a95e627e508d1e22488d._comment new file mode 100644 index 0000000000..9e55b54394 --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_1_f24dadf21fc4a95e627e508d1e22488d._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="Previous working build was 20250828" + date="2025-09-28T21:35:07Z" + content=""" +For context, [previous HomeBrew build](https://github.com/Homebrew/homebrew-core/commit/9dac1897529335a9115830a1c646ca3e90f39292) that I would have had installed, and working, before was `20250828`. + +Ewen + + +"""]]
importfeed: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn
new file mode 100644
index 0000000000..a0ab387188
--- /dev/null
+++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn
@@ -0,0 +1,98 @@
+### Please describe the problem.
+
+Since upgrading to git-annex version 10.20250925, from the macOS HomeBrew build, `git annex importfeed` seems to *usually* fail with an `Enum.toEnum{Word8}` out of bounds error. The exact value reported for the out of bounds value varies between feed URLs, but from a little bit of testing the error appears deterministic between those feed URLs.
+
+A few examples (plus one more in the reproducer below):
+
+```
+importfeed https://popculturedetective.agency/feed/podcast
+git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
+failed
+```
+
+```
+importfeed https://contextualelectronics.com/feed/podcast/
+git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
+failed
+```
+
+```
+importfeed https://theamphour.libsyn.com/rss
+git-annex: Enum.toEnum{Word8}: tag (8211) is outside of bounds (0,255)
+failed
+```
+
+(A couple of podcast feeds with no new changes just report "ok"; but I'd also expect most of the above to not have any recent changes as they're weekly-or-less podcasts.)
+
+### What steps will reproduce the problem?
+
+Indicative example (one of the feed URLs I follow; but it's happening on all of *most* of them that all worked with the previous version of git-annex):
+
+```
+ewen@basadi:/tmp/podcasts$ git init
+Initialized empty Git repository in /private/tmp/podcasts/.git/
+ewen@basadi:/tmp/podcasts$ git annex init 'Test repo'
+init Test repo ok
+(recording state in git...)
+ewen@basadi:/tmp/podcasts$ TEMPLATE='archive/${feedtitle}/${itemtitle}${extension}'
+ewen@basadi:/tmp/podcasts$ git annex importfeed --template="${TEMPLATE}" "https://risky.biz/feeds/risky-business"
+importfeed gathering known urls ok
+importfeed https://risky.biz/feeds/risky-business
+git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
+failed
+importfeed: 1 failed
+ewen@basadi:/tmp/podcasts$
+```
+
+The `--template` part does not seem necessary to the reproducer either, as I get the same error without (it's just the `--template` is in my standard run that I've used for years):
+
+```
+ewen@basadi:/tmp/podcasts$ git annex importfeed "https://risky.biz/feeds/risky-business"
+importfeed gathering known urls ok
+importfeed https://risky.biz/feeds/risky-business
+git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
+failed
+importfeed: 1 failed
+ewen@basadi:/tmp/podcasts$
+```
+
+### What version of git-annex are you using? On what operating system?
+
+```
+ewen@basadi:~$ git annex version
+git-annex version: 10.20250925
+build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV OsPath
+dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.3 http-client-0.7.19 persistent-sqlite-2.13.3.1 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1
+key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X*
+remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask
+operating system: darwin x86_64
+supported repository versions: 8 9 10
+upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
+ewen@basadi:~$
+```
+
+on macOS 15.6.1 (Sequoia), which is the latest release apart from the macOS 26 released this month. On Intel in this case, but seems to also reproduce on the same macOS 15.6.1 (Sequoia) on Apple M2 processor, with the same HomeBrew build of git-annex.
+
+### Please provide any additional information below.
+
+[[!format sh """
+ewen@basadi:/tmp/podcasts$ git annex --verbose --verbose importfeed --verbose --verbose "https://risky.biz/feeds/risky-business"
+importfeed gathering known urls ok
+importfeed https://risky.biz/feeds/risky-business
+git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
+failed
+importfeed: 1 failed
+ewen@basadi:/tmp/podcasts$
+"""]]
+
+At this stage I don't know if this is specific to `importfeed` or specific to the HomeBrew build of git-annex.
+
+Other annexes tracking files do seem to work (`git annex add` / `git annex sync` / `git annex copy ...` all work) with this version of git-annex. So I suspect it's somehow specific to importfeed and/or the HomeBrew build.
+
+And it seems a fairly recent breakage, as IIRC the previous installed was from 2025-08.
+
+[HomeBrew git-annex package information](https://formulae.brew.sh/formula/git-annex)
+
+### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
+
+Yes, for many years. git-annex has worked vey well for downloading/collecting podcasts for years, which is why t was surprising it's suddenly failing like this.
diff --git a/doc/forum/remove_old_versions_of_files_in_archive__63__.mdwn b/doc/forum/remove_old_versions_of_files_in_archive__63__.mdwn new file mode 100644 index 0000000000..cce238807e --- /dev/null +++ b/doc/forum/remove_old_versions_of_files_in_archive__63__.mdwn @@ -0,0 +1,9 @@ +I have set up a number of annex repos for storing various different things (media, ebooks, audiobooks, gopro footage, archived files, files to sync to mobile devices over adb, etc). Many of them I sync to backblaze (as s3 special remote) and gdrive (as rclone special remote). + +Both backblaze and gdrive remotes are configured as "redundantarchive" groups (configured as `not (copies=redundantarchive:2)`). This all seems to be working properly. + +As time goes on, I expect I'll run out of storage in gdrive (backblaze I can keep storing more stuff in it as long as I keep paying money). This got me thinking about longer term storage management. How should one limit the size of an "archive" remote? Or decide to delete versions of files? Are there preferred content configs I could use to be smarter about which data I store where? + +What about keeping a certain number of versions of a file (last n) or versions before a particular date (no older than)? I see expireunused, but I don't think I understand how it interacts with archive groups or special remotes generally. + +How are we supposed to think about archives and removing old versions of data generally?
Added a comment
diff --git a/doc/todo/import_tree_from_rsync_special_remote/comment_4_55351f379349d2d7e6c769fa54f8a7ee._comment b/doc/todo/import_tree_from_rsync_special_remote/comment_4_55351f379349d2d7e6c769fa54f8a7ee._comment new file mode 100644 index 0000000000..3528199efe --- /dev/null +++ b/doc/todo/import_tree_from_rsync_special_remote/comment_4_55351f379349d2d7e6c769fa54f8a7ee._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 4" + date="2025-09-27T17:27:51Z" + content=""" +Am I right to assume that I might achieve this (import with efficient reimports if needed from a remote via ssh directory) using a `directory` special remote on top of `sshfs` mount? +Or is there a better way to achieve that? +"""]]
missing build dep for debian?
diff --git a/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn new file mode 100644 index 0000000000..f0e3c7b47c --- /dev/null +++ b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn @@ -0,0 +1,13 @@ +### Please describe the problem. + +encountered while building standalone `10.20250828-1~ndall+1` under trixie + +``` + checking git-remote-gcrypt... git-remote-gcrypt + checking ssh connection caching... yes +Configuring git-annex-10.20250828... +Error: Setup: Encountered missing or private dependencies: +unbounded-delays +``` + +oddly we still built fine I believe for the http://github.com/datalad/git-annex where we also do not have that one I think
diff --git a/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__.mdwn b/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__.mdwn new file mode 100644 index 0000000000..07dc326192 --- /dev/null +++ b/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__.mdwn @@ -0,0 +1 @@ +What does "stale or missing inode cache; updating" from a fsck mean?
diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn new file mode 100644 index 0000000000..5ae44072f5 --- /dev/null +++ b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn @@ -0,0 +1,13 @@ +I'm attempting to update Arch Linux packaging ... the major caveat being we're stuck using an old (9.4.8) version of `ghc` for now... + +Building the latest tagged release now produces this error: + +``` +Utility/OpenFd.hs:28:9: error: + Variable not in scope: when :: Bool -> IO () -> IO a0 + | +28 | when closeonexec $ + | ^^^^ +``` + +I'm not sure this error is directly caused by the antiquated compiler, but also not sure how to debug this further or work around it either.
add news item for git-annex 10.20250925
diff --git a/doc/news/version_10.20250520.mdwn b/doc/news/version_10.20250520.mdwn deleted file mode 100644 index 07a4e9c893..0000000000 --- a/doc/news/version_10.20250520.mdwn +++ /dev/null @@ -1,12 +0,0 @@ -git-annex 10.20250520 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Preferred content now supports "balanced=groupname:lackingcopies" - to make files be evenly balanced amoung as many repositories as are - needed to satisfy numcopies. - * map: Fix buggy handling of remotes that are bare git repositories - accessed via ssh. - * map: Avoid looping forever with mutually recursive paths between - repositories accessed via ssh. - * whereused: Fix bug that could find matches from grafts - in remote git-annex branches. - * Windows: Fix bug that can cause git status to show annexed files as - modified when built with OsPath."""]] \ No newline at end of file diff --git a/doc/news/version_10.20250925.mdwn b/doc/news/version_10.20250925.mdwn new file mode 100644 index 0000000000..3cba8b8b77 --- /dev/null +++ b/doc/news/version_10.20250925.mdwn @@ -0,0 +1,28 @@ +git-annex 10.20250925 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * Fix bug that made changes to a special remote sometimes be missed when + importing a tree from it. After upgrading, any such missed changes + will be included in the next tree imported from a special remote. + Fixes reversion introduced in version 10.20230626. + * Fix crash operating on filenames that are exactly 21 bytes long + and begin with a utf-8 character. + * Fix hang that could occur when using git-annex adjust on a branch with + a number of files greater than annex.queuesize. + * Fix bug that could cause an invalid utf-8 sequence to be used in a + temporary filename when the input filename was valid utf-8. + * Improve performance when used with a local git remote that has a + large working tree. + * drop: --fast support when dropping from a remote. + * Added annex.assistant.allowunlocked config. + * Add git-remote-p2p-annex and git-remote-tor-annex to standalone builds. + * enableremote: Disallow using type= to attempt to change the type of an + existing remote. + * Add build warnings when git-annex is built without the OsPath + build flag. + * version: Report on whether it was built with the OsPath build flag. + * Avoid leaking file descriptors to child processes started by git-annex + in some situations. Note that when not built with the OsPath build + flag, these leaks can still happen. + * git-annex.cabal: Turn on the OsPath build flag by default. + * p2phttp: Fix a hang that could occur when used with --directory, + and a repository in the directory got removed. + * Removed support for building with unmaintained cryptonite, use crypton."""]] \ No newline at end of file
Added a comment
diff --git a/doc/devblog/day_649-650__speeding_up_repeated_imports/comment_1_a58663214bc81c0cbd50d53f55e3325b._comment b/doc/devblog/day_649-650__speeding_up_repeated_imports/comment_1_a58663214bc81c0cbd50d53f55e3325b._comment new file mode 100644 index 0000000000..99856a1ab3 --- /dev/null +++ b/doc/devblog/day_649-650__speeding_up_repeated_imports/comment_1_a58663214bc81c0cbd50d53f55e3325b._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nadir" + avatar="http://cdn.libravatar.org/avatar/2af9174cf6c06de802104d632dc40071" + subject="comment 1" + date="2025-09-24T21:52:32Z" + content=""" +A bit late, but this is actually the feature I use the most. I mainly use git-annex to catalogue my media, not so much to manage it directly. Appreciate the work you've put into improving imports (and of course git annex in general). +"""]]
update
diff --git a/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment b/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment
index cf6c7cd167..0a5cd38c69 100644
--- a/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment
+++ b/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment
@@ -17,6 +17,11 @@ This change will fix it:
- hashtype="${0##*git-annex-backend-X}"
+ hashtype="${0##*git-annex-backend-}"
+However, since the hash is named "XXHASH", and this is an external backend,
+I think the backend name you should really be using is "XXXHASH". This
+leaves the "XXHASH" backend name free for git-annex to use if it
+implemented it as a built-in backend.
+
Once you have the program working, we can add it to the list of external
backends.
"""]]
comments
diff --git a/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment b/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment
new file mode 100644
index 0000000000..cf6c7cd167
--- /dev/null
+++ b/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment
@@ -0,0 +1,22 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 5"""
+ date="2025-09-24T16:30:12Z"
+ content="""
+This is a bug in your program. It is generating a
+key using the XH3 backend, rather than the XXH3 backend.
+
+ [2025-09-24 12:29:41.565937669] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-backend-XXH3[1] <-- GENKEY .git/annex/othertmp/ingest-bar89415-0
+ [2025-09-24 12:29:41.568293334] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-backend-XXH3[1] --> GENKEY-SUCCESS XH3-s30--88ad06d188b880a1
+
+When git-annex later wants to do something that that key,
+it expects to find a git-annex-backend-XH3 program.
+
+This change will fix it:
+
+ - hashtype="${0##*git-annex-backend-X}"
+ + hashtype="${0##*git-annex-backend-}"
+
+Once you have the program working, we can add it to the list of external
+backends.
+"""]]
diff --git a/doc/todo/add_xxHash_backend/comment_6_6889f05d633cb340046c9d4796735a57._comment b/doc/todo/add_xxHash_backend/comment_6_6889f05d633cb340046c9d4796735a57._comment
new file mode 100644
index 0000000000..a13c264ef8
--- /dev/null
+++ b/doc/todo/add_xxHash_backend/comment_6_6889f05d633cb340046c9d4796735a57._comment
@@ -0,0 +1,20 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 6"""
+ date="2025-09-24T16:39:02Z"
+ content="""
+I am inclined to keep this todo open despite external backend
+programs existing, because it would be nice to have xxHash in
+git-annex natively due to its speed.
+
+I found this haskell library which includes xxh3
+and which would be easy to add as a git-annex dependency,
+although it would need to be gated behind a build flag for now:
+<https://hackage.haskell.org/package/xxhash-ffi>
+
+(Since that library uses Hashable, it generates an Int for the hash.
+This seems to limit it to be used on 64 bit platforms.
+<https://github.com/haskell-haskey/xxhash-ffi/issues/6>
+The lower-level Data.Digest.XXHash.FFI.C uses CULLong so will work on 32
+bit.)
+"""]]
fixed
diff --git a/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn b/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn index 6be9060190..82eddd3aa3 100644 --- a/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn +++ b/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn @@ -58,3 +58,4 @@ Error: Process completed with exit code 1. e.g. [here](https://github.com/datalad/git-annex/actions/runs/17903960530/job/50901925018) +> [[fixed|done]] --[[Joey]]
Added a comment: the X prefix conflicts with the eXternal backend namespace
diff --git a/doc/todo/add_xxHash_backend/comment_4_3e5b815dfea0939a6affa7443701a911._comment b/doc/todo/add_xxHash_backend/comment_4_3e5b815dfea0939a6affa7443701a911._comment
new file mode 100644
index 0000000000..bf18b44cfc
--- /dev/null
+++ b/doc/todo/add_xxHash_backend/comment_4_3e5b815dfea0939a6affa7443701a911._comment
@@ -0,0 +1,79 @@
+[[!comment format=mdwn
+ username="Arnie97"
+ avatar="http://cdn.libravatar.org/avatar/607ed64cbd8e7a4cc2035a865b6cb5b2"
+ subject="the X prefix conflicts with the eXternal backend namespace"
+ date="2025-09-24T12:05:05Z"
+ content="""
+I'm trying to create a external backend for xxHash, but experienced weird behaviors.
+
+If only `/bin/git-annex-backend-XXH3` is present in `$PATH`, and `git config annex.backend XXH3` is set, then git annex complains `Cannot run git-annex-backend-XH3 -- It is not installed in PATH`, which seems like a bug.
+And if `/bin/git-annex-backend-XXH3` is moved to `/bin/git-annex-backend-XH3` according to the error message, it will complain `Cannot run git-annex-backend-XXH3 -- It is not installed in PATH` (this is expected).
+Finally I have to link the same shell script to both `/bin/git-annex-backend-XH3` and `/bin/git-annex-backend-XXH3` to make the backend config `XXH3` work.
+
+```bash
+#!/bin/sh
+
+set -e
+
+hashtype=\"${0##*git-annex-backend-X}\"
+
+# could send PROGRESS while doing this, but it's
+# hard to implement that in shell
+case \"$hashtype\" in
+ BLAKE3_256)
+ hashfile() { b3sum --no-names \"$1\"; } ;;
+ BLAKE3_512)
+ hashfile() { b3sum --no-names -l 64 \"$1\"; } ;;
+ XXH32|XH32)
+ hashfile() { xxhsum -H0 \"$1\" | cut -d ' ' -f 1; } ;;
+ XXH64|XH64)
+ hashfile() { xxhsum -H1 \"$1\" | cut -d ' ' -f 1; } ;;
+ XXH128|XH128)
+ hashfile() { xxhsum -H2 \"$1\" | cut -d ' ' -f 1; } ;;
+ XXH3|XH3)
+ hashfile() { xxhsum -H3 --tag \"$1\" | awk '{ print $NF }'; } ;;
+esac
+
+while read line; do
+ set -- $line
+ case \"$1\" in
+ GETVERSION)
+ echo VERSION 1
+ ;;
+ CANVERIFY)
+ echo CANVERIFY-YES
+ ;;
+ ISSTABLE)
+ echo ISSTABLE-YES
+ ;;
+ ISCRYPTOGRAPHICALLYSECURE)
+ echo ISCRYPTOGRAPHICALLYSECURE-YES
+ ;;
+ GENKEY)
+ contentfile=\"$2\"
+ hash=$(hashfile \"$contentfile\")
+ sz=$(wc -c \"$contentfile\" | cut -d ' ' -f 1)
+ if [ -n \"$hash\" ]; then
+ echo \"GENKEY-SUCCESS\" \"$hashtype-s$sz--$hash\"
+ else
+ echo \"GENKEY-FAILURE\" \"calculate hash sum failed\"
+ fi
+ ;;
+ VERIFYKEYCONTENT)
+ key=\"$2\"
+ contentfile=\"$3\"
+ hash=$(hashfile \"$contentfile\")
+ khash=$(echo \"$key\" | sed 's/.*--//')
+ if [ \"$hash\" = \"$khash\" ]; then
+ echo \"VERIFYKEYCONTENT-SUCCESS\"
+ else
+ echo \"VERIFYKEYCONTENT-FAILURE\"
+ fi
+ ;;
+ *)
+ echo ERROR protocol error
+ ;;
+ esac
+done
+```
+"""]]
diff --git a/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn b/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn index 02460f6426..9ba6311675 100644 --- a/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn +++ b/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn @@ -21,3 +21,5 @@ I have three repositories in groups archive and backup. These five failes are on Just manually copying them to other repositories solved it. What did I miss or did not understand? Git assistant runs on all repositories and are ssh connected nearly 24/7 and all other syncing works fine. + +`git annex sync --all --content ONE_OF_THE_ARCHIVE_BACKUP_REPOSITORIES` does not change anything either.