Recent changes to this wiki:
Added a comment: Feedback on encryptonlycreds=yes
diff --git a/doc/todo/encrypt_only_the_credentials/comment_7_8a0265614583aa8a9f040b5fed0ff480._comment b/doc/todo/encrypt_only_the_credentials/comment_7_8a0265614583aa8a9f040b5fed0ff480._comment new file mode 100644 index 0000000000..9bf6c78ee5 --- /dev/null +++ b/doc/todo/encrypt_only_the_credentials/comment_7_8a0265614583aa8a9f040b5fed0ff480._comment @@ -0,0 +1,35 @@ +[[!comment format=mdwn + username="stv0g" + avatar="http://cdn.libravatar.org/avatar/6faa6cc783a165b25fc1c8f3154ba449" + subject="Feedback on encryptonlycreds=yes" + date="2025-08-24T11:20:23Z" + content=""" +Hi Joey, + +thank you very mich for the quick implementation of this feature! I have tested it already successfully :) + +There seems to be an minor issue when I am trying to set `encryptonlycreds=yes` via `SETCONFIG`, followed by a subsequent `SETCREDS`: + +``` +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> VERSION 2 +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] <-- EXTENSIONS INFO GETGITREMOTENAME UNAVAILABLERESPONSE ASYNC +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> EXTENSIONS INFO +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] <-- LISTCONFIGS +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> CONFIGEND +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] <-- INITREMOTE +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> GETCONFIG encryption +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] <-- VALUE hybrid +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> GETCONFIG onlyencryptcreds +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] <-- VALUE +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> SETCONFIG onlyencryptcreds yes +(Annex.ExternalAddonProcess) git-annex-remote-tape[1] --> SETCREDS key aes256 70482ccb1e0a0b8f8bf6f2603174d2d6f40d940033285d3ae94e7258595dca26 +git-annex: getRemoteConfigValue onlyencryptcreds found value of unexpected type PassedThrough. This is a bug in git-annex! +CallStack (from HasCallStack): + error, called at ./Annex/SpecialRemote/Config.hs:209:28 in main:Annex.SpecialRemote.Config + getRemoteConfigValue, called at ./Remote/Helper/Encryptable.hs:297:27 in main:Remote.Helper.Encryptable +failed +initremote: 1 failed +``` + +I am not sure if this is even supposed to be supported? Let me know if I am using it in the wrong way :) +"""]]
Added a comment: Download with the git-annex-install script fail
diff --git a/doc/Android/comment_8_7ba624f545e991bf11d5fcf8378d3ada._comment b/doc/Android/comment_8_7ba624f545e991bf11d5fcf8378d3ada._comment new file mode 100644 index 0000000000..17cd028ebd --- /dev/null +++ b/doc/Android/comment_8_7ba624f545e991bf11d5fcf8378d3ada._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="waldi1985@90ec1a42d21b8182257f7a16ba8b8b08c51a0b6e" + nickname="waldi1985" + avatar="http://cdn.libravatar.org/avatar/128c0f882560337aad72a15ab7ee3766" + subject="Download with the git-annex-install script fail" + date="2025-08-24T11:10:50Z" + content=""" + Downloading git-annex... + --2025-08-24 12:49:08-- https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-arm64-ancient.tar.gz + Resolving downloads.kitenet.net (downloads.kitenet.net)... 66.228.36.95, 2600:3c03::f03c:91ff:fe73:b0d2 + Connecting to downloads.kitenet.net (downloads.kitenet.net)|66.228.36.95|:443... connected. + HTTP request sent, awaiting response... 403 Forbidden + 2025-08-24 12:49:09 ERROR 403: Forbidden. + +The Download with the git-annex-install script fail, I guess because the build is failed. see: https://downloads.kitenet.net/git-annex/autobuild/arm64-ancient/ + + enableremote encryption changes: FAIL (0.73s) + ./Test/Framework.hs:92: + initremote failed with unexpected exit code (transcript follows) + initremote baz + git-annex: There is no security benefit to using onlyencryptcreds=yes with encryption=shared + failed + initremote: 1 failed + +It would be grate if you could fix that. Thanks in advance. +"""]]
Added a comment
diff --git a/doc/bugs/Re-Adjust_Loses_Commits/comment_4_6366ab14803dd8303752807741baec2c._comment b/doc/bugs/Re-Adjust_Loses_Commits/comment_4_6366ab14803dd8303752807741baec2c._comment new file mode 100644 index 0000000000..623afe7056 --- /dev/null +++ b/doc/bugs/Re-Adjust_Loses_Commits/comment_4_6366ab14803dd8303752807741baec2c._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="Lukey" + avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b" + subject="comment 4" + date="2025-08-23T06:51:15Z" + content=""" +You can use `git reflog` to see previous states of a branch. +"""]]
fix specialRemote confusion with tahoe
tahoe: Fix bug that made initremote require an encryption= parameter,
despite git-annex encryption not being used with this special remote,
since tahoe handles encryption itself.
The chunking parameters were also accepted and won't be any longer either.
They were also not actually used.
c4ea3ca40ae6ba973287ca94e892e93973a8376e was the commit. At that point
specialRemote was being added to most remotes and I forgot tahoe doesn't
need these parameters.
Turns out that, when embedcreds=yes was used, it did *not* cause the
introducer-furl and shared-convergence-secret to be encrypted, even
though encryption= was specified. Which is only not a security hole
because encryption= was not documented to work with the tahoe special
remote at all!
It might be nice to support onlyencryptcreds=yes with tahoe, and it
would make sense to accept the encryption= parameter then, and only use
it for encrypting the creds. That would take some work, since the
encryption= parameter would need to be optional, and the usual encrypted
special remote code couldn't be used.
Sponsored-by: unqueued
tahoe: Fix bug that made initremote require an encryption= parameter,
despite git-annex encryption not being used with this special remote,
since tahoe handles encryption itself.
The chunking parameters were also accepted and won't be any longer either.
They were also not actually used.
c4ea3ca40ae6ba973287ca94e892e93973a8376e was the commit. At that point
specialRemote was being added to most remotes and I forgot tahoe doesn't
need these parameters.
Turns out that, when embedcreds=yes was used, it did *not* cause the
introducer-furl and shared-convergence-secret to be encrypted, even
though encryption= was specified. Which is only not a security hole
because encryption= was not documented to work with the tahoe special
remote at all!
It might be nice to support onlyencryptcreds=yes with tahoe, and it
would make sense to accept the encryption= parameter then, and only use
it for encrypting the creds. That would take some work, since the
encryption= parameter would need to be optional, and the usual encrypted
special remote code couldn't be used.
Sponsored-by: unqueued
diff --git a/CHANGELOG b/CHANGELOG index 9a478550fe..1e43b9638f 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -27,6 +27,9 @@ git-annex (10.20250722) UNRELEASED; urgency=medium * Don't allow the type of encryption of an existing special remote to be changed. Fixes reversion introduced in version 7.20191230. * tahoe: Support tahoe-lafs command versions newer than 1.16. + * tahoe: Fix bug that made initremote require an encryption= parameter, + despite git-annex encryption not being used with this special remote. + Fixes reversion introduced in version 7.20191230. * Removed support for git versions older than 2.22. * Bump aws build dependency to 0.24.1. * stack.yaml: Update to lts-24.2. diff --git a/Remote/Tahoe.hs b/Remote/Tahoe.hs index e51ccf9531..2f0cebf188 100644 --- a/Remote/Tahoe.hs +++ b/Remote/Tahoe.hs @@ -58,7 +58,7 @@ type IntroducerFurl = String type Capability = String remote :: RemoteType -remote = specialRemoteType $ RemoteType +remote = RemoteType { typename = "tahoe" , enumerate = const (findSpecialRemotes "tahoe") , generate = gen diff --git a/doc/special_remotes/tahoe.mdwn b/doc/special_remotes/tahoe.mdwn index 80f176d73c..281151182d 100644 --- a/doc/special_remotes/tahoe.mdwn +++ b/doc/special_remotes/tahoe.mdwn @@ -35,10 +35,6 @@ the tahoe remote. whether you want to give them access to your tahoe system before using embedcreds! -* `onlyencryptcreds` - Optional. Set to "yes" to make the `encryption` - only be used for the embedded tahoe credentials, but not used to encrypt - the content stored on the special remote. - Setup example: # TAHOE_FURL=... git annex initremote tahoe type=tahoe embedcreds=yes
don't refer to tahoe daemon
since tahoe no longer supports daemonization
since tahoe no longer supports daemonization
diff --git a/doc/special_remotes/tahoe.mdwn b/doc/special_remotes/tahoe.mdwn index 38762f502a..80f176d73c 100644 --- a/doc/special_remotes/tahoe.mdwn +++ b/doc/special_remotes/tahoe.mdwn @@ -14,8 +14,8 @@ Typically you will have an account on a Tahoe-LAFS storage grid, which is represented by an "introducer furl". You need to supply this to git-annex in the `TAHOE_FURL` environment variable when initializing the remote. git-annex will then generate a tahoe configuration directory for -the remote under `~/.tahoe-git-annex/`, and automatically start the tahoe -daemon as needed. +the remote under `~/.tahoe-git-annex/`, and automatically start `tahoe run` +as needed. ## configuration
fix disrectory name
diff --git a/doc/special_remotes/tahoe.mdwn b/doc/special_remotes/tahoe.mdwn index 60e3bad619..38762f502a 100644 --- a/doc/special_remotes/tahoe.mdwn +++ b/doc/special_remotes/tahoe.mdwn @@ -14,7 +14,7 @@ Typically you will have an account on a Tahoe-LAFS storage grid, which is represented by an "introducer furl". You need to supply this to git-annex in the `TAHOE_FURL` environment variable when initializing the remote. git-annex will then generate a tahoe configuration directory for -the remote under `~/.tahoe/git-annex/`, and automatically start the tahoe +the remote under `~/.tahoe-git-annex/`, and automatically start the tahoe daemon as needed. ## configuration
tahoe: Support tahoe-lafs command versions newer than 1.16
tahoe start was deprecated and removed in 2020.
This feels like a very janky way to run a daemon, but it does work.
Sponsored-by: k0ld
tahoe start was deprecated and removed in 2020.
This feels like a very janky way to run a daemon, but it does work.
Sponsored-by: k0ld
diff --git a/CHANGELOG b/CHANGELOG index 76387efc6f..9a478550fe 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -26,6 +26,7 @@ git-annex (10.20250722) UNRELEASED; urgency=medium for exporttree=yes/importtree=yes remotes. * Don't allow the type of encryption of an existing special remote to be changed. Fixes reversion introduced in version 7.20191230. + * tahoe: Support tahoe-lafs command versions newer than 1.16. * Removed support for git versions older than 2.22. * Bump aws build dependency to 0.24.1. * stack.yaml: Update to lts-24.2. diff --git a/Remote/Tahoe.hs b/Remote/Tahoe.hs index 9495a3c082..76b17caa5b 100644 --- a/Remote/Tahoe.hs +++ b/Remote/Tahoe.hs @@ -13,7 +13,7 @@ - - Tahoe has its own encryption, so git-annex's encryption is not used. - - - Copyright 2014-2020 Joey Hess <id@joeyh.name> + - Copyright 2014-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -45,6 +45,9 @@ import Utility.UserInfo import Utility.Metered import Utility.Env import Utility.ThreadScheduler +import Utility.Process.Transcript + +import Control.Concurrent {- The TMVar is left empty until tahoe has been verified to be running. -} data TahoeHandle = TahoeHandle TahoeConfigDir (TMVar ()) @@ -233,8 +236,42 @@ convergenceFile :: TahoeConfigDir -> OsPath convergenceFile configdir = configdir </> literalOsPath "private" </> literalOsPath "convergence" +{- tahoe run stays in the foreground, but behaves as a daemon, servicing + - requests. Attempting to start a second tahoe run process will fail. So, + - in order to support multiple git-annex processes, it is run in the + - background, and left running on exit. + - + - It can take a while for tahoe to begin accepting connections. + - This function waits for it to get ready. + -} startTahoeDaemon :: TahoeConfigDir -> IO () -startTahoeDaemon configdir = void $ boolTahoe configdir "start" [] +startTahoeDaemon configdir = withNullHandle $ \nullh -> do + let ps = tahoeParams configdir "run" [Param "--allow-stdin-close"] + let p = (proc "tahoe" $ toCommand ps) + { std_in = UseHandle nullh + , std_out = UseHandle nullh + , std_err = UseHandle nullh + } + void $ forkIO $ void $ createProcess p + waitready (5 :: Int) + hClose nullh + where + waitready n + | n <= 0 = giveup "The tahoe run process is not responding to requests. Waited 5 seconds for it to start." + | otherwise = do + -- Need something that will always succeed + -- once the server has started and is accepting + -- requests, and this seems to do the trick. + let ps = tahoeParams configdir "check" + [ Param "--raw", Param "URI:LIT:"] + (_, ok) <- processTranscript "tahoe" + (toCommand ps) + Nothing + if ok + then return () + else do + threadDelaySeconds (Seconds 1) + waitready (pred n) {- Ensures that tahoe has been started, before running an action - that uses it. -} diff --git a/doc/bugs/tahoe_special_remote_needs_updating.mdwn b/doc/bugs/tahoe_special_remote_needs_updating.mdwn index e7b1aaf831..d9ff806b07 100644 --- a/doc/bugs/tahoe_special_remote_needs_updating.mdwn +++ b/doc/bugs/tahoe_special_remote_needs_updating.mdwn @@ -13,3 +13,4 @@ The tahoe man page says: COMMANDS run Run a node without daemonizing. This is the only command for running nodes. +> [[done]] --[[Joey]]
comment
diff --git a/doc/profiling/comment_9_d000f254f44a2365dcaad87282104c13._comment b/doc/profiling/comment_9_d000f254f44a2365dcaad87282104c13._comment new file mode 100644 index 0000000000..6f2038a431 --- /dev/null +++ b/doc/profiling/comment_9_d000f254f44a2365dcaad87282104c13._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2025-08-22T00:19:41Z" + content=""" +3x more allocations now than before. This is probably due to the switch to +OsPath, which means extra copying from ByteString. + + Thu Aug 21 20:18 2025 Time and Allocation Profiling Report (Final) + + git-annex +RTS -p -RTS find + + total time = 0.99 secs (989 ticks @ 1000 us, 1 processor) + total alloc = 1,514,545,208 bytes (excludes profiling overheads) + + COST CENTRE MODULE SRC %time %alloc + + keyFile Annex.Locations Annex/Locations.hs:(790,1)-(806,44) 7.1 12.9 + >>=.\ Data.Attoparsec.Internal.Types Data/Attoparsec/Internal/Types.hs:(148,9)-(149,44) 6.9 2.4 + >>=.\.succ' Data.Attoparsec.Internal.Types Data/Attoparsec/Internal/Types.hs:148:13-76 5.6 0.7 + keyFile.esc Annex.Locations Annex/Locations.hs:(796,9)-(801,32) 5.3 10.7 + ifM Utility.Monad Utility/Monad.hs:(62,1)-(64,44) 5.1 8.7 + ifM.\ Utility.Monad Utility/Monad.hs:64:9-44 3.5 8.6 + hashUpdates.processBlocks Crypto.Hash Crypto/Hash.hs:(103,5)-(112,76) 3.1 0.3 + inAnnex'.\ Annex.Content.Presence Annex/Content/Presence.hs:(53,61)-(68,31) 3.1 8.5 + keyFile.anyneedesc Annex.Locations Annex/Locations.hs:806:9-44 3.1 1.0 + seekFilteredKeys.exists CmdLine.Seek CmdLine/Seek.hs:465:9-92 2.8 0.4 + fileKey Annex.Locations Annex/Locations.hs:(814,1)-(824,41) 2.1 1.0 + keyPath Annex.Locations Annex/Locations.hs:(834,1)-(836,23) 1.8 5.9 +"""]]
prevent changing onlyencryptcreds of existing remote
That would break accessing data already stored in the remote, the same
as changing encryption type would do.
Sponsored-by: Jack Hill
That would break accessing data already stored in the remote, the same
as changing encryption type would do.
Sponsored-by: Jack Hill
diff --git a/Remote/Helper/Encryptable.hs b/Remote/Helper/Encryptable.hs index c18c5acf7f..7bc73e115f 100644 --- a/Remote/Helper/Encryptable.hs +++ b/Remote/Helper/Encryptable.hs @@ -1,6 +1,6 @@ {- common functions for encryptable remotes - - - Copyright 2011-2021 Joey Hess <id@joeyh.name> + - Copyright 2011-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -165,8 +165,8 @@ parseMac (Just (Proposed s)) = case readMac s of - could opt to use a shared cipher, which is stored unencrypted. -} encryptionSetup :: SetupStage -> RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, EncryptionIsSetup) encryptionSetup setupstage c gc = do - checkallowedchange pc <- either giveup return $ parseEncryptionConfig c + checkallowedchange pc gpgcmd <- gpgCmd <$> Annex.getGitConfig maybe (genCipher pc gpgcmd) (updateCipher pc gpgcmd) (extractCipher pc) where @@ -220,23 +220,24 @@ encryptionSetup setupstage c gc = do -- public-key encryption, hence we leave it on newer -- remotes (while being backward-compatible). (map Accepted ["keyid", "keyid+", "keyid-", "highRandomQuality"]) - oldpc = either (const Nothing) Just $ parseEncryptionConfig $ + moldpc = either (const Nothing) Just $ parseEncryptionConfig $ case setupstage of Init -> mempty Enable oldc -> oldc AutoEnable oldc -> oldc - checkallowedchange = case oldpc of + checkallowedchange pc = case moldpc of Nothing -> return () - Just oldpc' -> case extractCipher oldpc' of - Nothing -> req NoneEncryption - Just (EncryptedCipher _ Hybrid _) -> req HybridEncryption - Just (EncryptedCipher _ PubKey _) -> req PubKeyEncryption - Just (SharedCipher _) -> req SharedEncryption - Just (SharedPubKeyCipher _ _) -> req SharedPubKeyEncryption + Just oldpc -> do + case extractCipher oldpc of + Nothing -> req NoneEncryption + Just (EncryptedCipher _ Hybrid _) -> req HybridEncryption + Just (EncryptedCipher _ PubKey _) -> req PubKeyEncryption + Just (SharedCipher _) -> req SharedEncryption + Just (SharedPubKeyCipher _ _) -> req SharedPubKeyEncryption + when (onlyEncryptCreds oldpc /= onlyEncryptCreds pc) $ + giveup "Cannot change onlyencryptcreds of existing remotes." where - req v - | encryption /= Right v = cannotchange - | otherwise = return () + req v = when (encryption /= Right v) cannotchange data CipherPurpose t = CipherAllPurpose t | CipherOnlyCreds t diff --git a/doc/bugs/prevent_enableremote_changing_encryption.mdwn b/doc/bugs/prevent_enableremote_changing_encryption.mdwn index b4e92e571e..8ed9f1761e 100644 --- a/doc/bugs/prevent_enableremote_changing_encryption.mdwn +++ b/doc/bugs/prevent_enableremote_changing_encryption.mdwn @@ -7,9 +7,15 @@ encryption for such a remote: enableremote d (encryption setup) (encryption key stored in git repository) ok (recording state in git...) -This config change should not be allowed. This is a reversion, -probably introduced around [[!commit 71f78fe45dc91dbef0bedd79b33d6a9fed85704d]] +This config change should not be allowed. + +Indeed, changing encryption type of an existing special remote should never +be allowed, whether or not it uses exporttree. This is a reversion, +probably introduced around +[[!commit 71f78fe45dc91dbef0bedd79b33d6a9fed85704d]] Also, the new onlyencryptcreds=yes setting can passed to enableremote, which changes a previously encrypted remote to not use encryption for the data stored on it. That should also not be allowed. --[[Joey]] + +> [[fixed|done]] --[[Joey]]
Don't allow the type of encryption of an existing special remote to be changed.
eg, git-annex enableremote foo encryption=none will not remove encryption,
and other encryption= settings don't change the type of encryption used.
Either of which would render data stored in a special remote inaccessible.
Probably fixes reversion introduced in
71f78fe45dc91dbef0bedd79b33d6a9fed85704d.
That commit got rid of the hasEncryptionConfig check, which I think would
have detected this before. I've not gone back to verify that.
Sponsored-by: mycroft
eg, git-annex enableremote foo encryption=none will not remove encryption,
and other encryption= settings don't change the type of encryption used.
Either of which would render data stored in a special remote inaccessible.
Probably fixes reversion introduced in
71f78fe45dc91dbef0bedd79b33d6a9fed85704d.
That commit got rid of the hasEncryptionConfig check, which I think would
have detected this before. I've not gone back to verify that.
Sponsored-by: mycroft
diff --git a/CHANGELOG b/CHANGELOG index 8f71921a56..11787d6f6d 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -27,6 +27,8 @@ git-annex (10.20250722) UNRELEASED; urgency=medium * Bump aws build dependency to 0.24.1. * stack.yaml: Update to lts-24.2. * Removed support for git versions older than 2.22. + * Don't allow the type of encryption of an existing special remote to be + changed. Fixes reversion introduced in version 7.20191230. -- Joey Hess <id@joeyh.name> Wed, 30 Jul 2025 13:45:42 -0400 diff --git a/Remote/Adb.hs b/Remote/Adb.hs index 41f815fb0e..c512e9a4e6 100644 --- a/Remote/Adb.hs +++ b/Remote/Adb.hs @@ -140,7 +140,7 @@ gen r u rc gc rs = do (remoteAnnexAndroidSerial gc) adbSetup :: SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, UUID) -adbSetup _ mu _ c gc = do +adbSetup ss mu _ c gc = do u <- maybe (liftIO genUUID) return mu -- verify configuration @@ -151,7 +151,7 @@ adbSetup _ mu _ c gc = do serial <- getserial =<< enumerateAdbConnected let c' = M.insert androidserialField (Proposed (fromAndroidSerial serial)) c - (c'', _encsetup) <- encryptionSetup c' gc + (c'', _encsetup) <- encryptionSetup ss c' gc ok <- adbShellBool serial [Param "mkdir", Param "-p", File (fromAndroidPath adir)] diff --git a/Remote/Bup.hs b/Remote/Bup.hs index d98901d441..b3f7c72ec1 100644 --- a/Remote/Bup.hs +++ b/Remote/Bup.hs @@ -124,13 +124,13 @@ gen r u rc gc rs = do buprepo = fromMaybe (giveup "missing buprepo") $ remoteAnnexBupRepo gc bupSetup :: SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, UUID) -bupSetup _ mu _ c gc = do +bupSetup ss mu _ c gc = do u <- maybe (liftIO genUUID) return mu -- verify configuration is sane let buprepo = maybe (giveup "Specify buprepo=") fromProposedAccepted $ M.lookup buprepoField c - (c', _encsetup) <- encryptionSetup c gc + (c', _encsetup) <- encryptionSetup ss c gc -- bup init will create the repository. -- (If the repository already exists, bup init again appears safe.) diff --git a/Remote/Ddar.hs b/Remote/Ddar.hs index e9e0ba5589..760dd4cdca 100644 --- a/Remote/Ddar.hs +++ b/Remote/Ddar.hs @@ -115,13 +115,13 @@ gen r u rc gc rs = do ddarrepo = maybe (giveup "missing ddarrepo") (DdarRepo gc) (remoteAnnexDdarRepo gc) ddarSetup :: SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, UUID) -ddarSetup _ mu _ c gc = do +ddarSetup ss mu _ c gc = do u <- maybe (liftIO genUUID) return mu -- verify configuration is sane let ddarrepo = maybe (giveup "Specify ddarrepo=") fromProposedAccepted $ M.lookup ddarrepoField c - (c', _encsetup) <- encryptionSetup c gc + (c', _encsetup) <- encryptionSetup ss c gc -- The ddarrepo is stored in git config, as well as this repo's -- persistent state, so it can vary between hosts. diff --git a/Remote/Directory.hs b/Remote/Directory.hs index 5392caafa3..da97d06c03 100644 --- a/Remote/Directory.hs +++ b/Remote/Directory.hs @@ -151,7 +151,7 @@ gen r u rc gc rs = do (remoteAnnexDirectory gc) directorySetup :: SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, UUID) -directorySetup _ mu _ c gc = do +directorySetup ss mu _ c gc = do u <- maybe (liftIO genUUID) return mu -- verify configuration is sane let dir = maybe (giveup "Specify directory=") fromProposedAccepted $ @@ -159,7 +159,7 @@ directorySetup _ mu _ c gc = do absdir <- liftIO $ absPath (toOsPath dir) liftIO $ unlessM (doesDirectoryExist absdir) $ giveup $ "Directory does not exist: " ++ fromOsPath absdir - (c', _encsetup) <- encryptionSetup c gc + (c', _encsetup) <- encryptionSetup ss c gc -- The directory is stored in git config, not in this remote's -- persistent state, so it can vary between hosts. diff --git a/Remote/External.hs b/Remote/External.hs index dcfbaacbf2..c392b3f31e 100644 --- a/Remote/External.hs +++ b/Remote/External.hs @@ -172,7 +172,7 @@ gen rt externalprogram r u rc gc rs (remoteAnnexExternalType gc) externalSetup :: Maybe ExternalProgram -> Maybe (String, String) -> SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, UUID) -externalSetup externalprogram setgitconfig _ mu _ c gc = do +externalSetup externalprogram setgitconfig ss mu _ c gc = do u <- maybe (liftIO genUUID) return mu pc <- either giveup return $ parseRemoteConfig c (lenientRemoteConfigParser externalprogram) let readonlyconfig = getRemoteConfigValue readonlyField pc == Just True @@ -180,7 +180,7 @@ externalSetup externalprogram setgitconfig _ mu _ c gc = do then "readonly" else fromMaybe (giveup "Specify externaltype=") $ getRemoteConfigValue externaltypeField pc - (c', _encsetup) <- encryptionSetup c gc + (c', _encsetup) <- encryptionSetup ss c gc c'' <- if readonlyconfig then do diff --git a/Remote/GCrypt.hs b/Remote/GCrypt.hs index 180922783c..0369856b1e 100644 --- a/Remote/GCrypt.hs +++ b/Remote/GCrypt.hs @@ -220,12 +220,12 @@ unsupportedUrl :: a unsupportedUrl = giveup "unsupported repo url for gcrypt" gCryptSetup :: SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, UUID) -gCryptSetup _ mu _ c gc = go $ fromProposedAccepted <$> M.lookup gitRepoField c +gCryptSetup ss mu _ c gc = go $ fromProposedAccepted <$> M.lookup gitRepoField c where remotename = fromJust (lookupName c) go Nothing = giveup "Specify gitrepo=" go (Just gitrepo) = do - (c', _encsetup) <- encryptionSetup c gc + (c', _encsetup) <- encryptionSetup ss c gc let url = Git.GCrypt.urlPrefix ++ gitrepo rs <- Annex.getGitRemotes diff --git a/Remote/GitLFS.hs b/Remote/GitLFS.hs index fde56b05ed..2ec2f429d7 100644 --- a/Remote/GitLFS.hs +++ b/Remote/GitLFS.hs @@ -148,7 +148,7 @@ mySetup :: SetupStage -> Maybe UUID -> Maybe CredPair -> RemoteConfig -> RemoteG mySetup ss mu _ c gc = do u <- maybe (liftIO genUUID) return mu - (c', _encsetup) <- encryptionSetup c gc + (c', _encsetup) <- encryptionSetup ss c gc pc <- either giveup return . parseRemoteConfig c' =<< configParser remote c' let failinitunlessforced msg = case ss of Init -> unlessM (Annex.getRead Annex.force) (giveup msg) diff --git a/Remote/Glacier.hs b/Remote/Glacier.hs index 4e32b88cf0..c112ceb7dc 100644 --- a/Remote/Glacier.hs +++ b/Remote/Glacier.hs @@ -124,7 +124,7 @@ glacierSetup ss mu mcreds c gc = do glacierSetup' ss u mcreds c gc glacierSetup' :: SetupStage -> UUID -> Maybe CredPair -> RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, UUID) glacierSetup' ss u mcreds c gc = do - (c', encsetup) <- encryptionSetup (c `M.union` defaults) gc + (c', encsetup) <- encryptionSetup ss (c `M.union` defaults) gc pc <- either giveup return . parseRemoteConfig c' =<< configParser remote c' c'' <- setRemoteCredPair ss encsetup pc gc (AWS.creds u) mcreds diff --git a/Remote/Helper/Encryptable.hs b/Remote/Helper/Encryptable.hs index 46ab018f7a..c18c5acf7f 100644 --- a/Remote/Helper/Encryptable.hs +++ b/Remote/Helper/Encryptable.hs @@ -163,8 +163,9 @@ parseMac (Just (Proposed s)) = case readMac s of - an encryption key, or not encrypt. An encrypted cipher is created, or is - updated to be accessible to an additional encryption key. Or the user - could opt to use a shared cipher, which is stored unencrypted. -} -encryptionSetup :: RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, EncryptionIsSetup) -encryptionSetup c gc = do +encryptionSetup :: SetupStage -> RemoteConfig -> RemoteGitConfig -> Annex (RemoteConfig, EncryptionIsSetup) +encryptionSetup setupstage c gc = do + checkallowedchange pc <- either giveup return $ parseEncryptionConfig c gpgcmd <- gpgCmd <$> Annex.getGitConfig maybe (genCipher pc gpgcmd) (updateCipher pc gpgcmd) (extractCipher pc) @@ -219,6 +220,23 @@ encryptionSetup c gc = do -- public-key encryption, hence we leave it on newer -- remotes (while being backward-compatible). (map Accepted ["keyid", "keyid+", "keyid-", "highRandomQuality"]) + oldpc = either (const Nothing) Just $ parseEncryptionConfig $ + case setupstage of + Init -> mempty + Enable oldc -> oldc + AutoEnable oldc -> oldc + checkallowedchange = case oldpc of + Nothing -> return () + Just oldpc' -> case extractCipher oldpc' of + Nothing -> req NoneEncryption + Just (EncryptedCipher _ Hybrid _) -> req HybridEncryption + Just (EncryptedCipher _ PubKey _) -> req PubKeyEncryption + Just (SharedCipher _) -> req SharedEncryption + Just (SharedPubKeyCipher _ _) -> req SharedPubKeyEncryption + where + req v + | encryption /= Right v = cannotchange (Diff truncated)
onlyencryptcreds=yes
initremote: When onlyencryptcreds=yes is used along with embedcreds=yes,
and encryption is enabled, only encrypt the embedded creds, without
encrypting the content of the special remote.
Useful for exporttree=yes/importtree=yes remotes.
Sponsored-by: Joshua Antonishen
initremote: When onlyencryptcreds=yes is used along with embedcreds=yes,
and encryption is enabled, only encrypt the embedded creds, without
encrypting the content of the special remote.
Useful for exporttree=yes/importtree=yes remotes.
Sponsored-by: Joshua Antonishen
diff --git a/Annex/SpecialRemote/Config.hs b/Annex/SpecialRemote/Config.hs index 059a62f901..5f9d6db831 100644 --- a/Annex/SpecialRemote/Config.hs +++ b/Annex/SpecialRemote/Config.hs @@ -85,6 +85,9 @@ chunksizeField = Accepted "chunksize" embedCredsField :: RemoteConfigField embedCredsField = Accepted "embedcreds" +onlyEncryptCredsField :: RemoteConfigField +onlyEncryptCredsField = Accepted "onlyencryptcreds" + preferreddirField :: RemoteConfigField preferreddirField = Accepted "preferreddir" diff --git a/CHANGELOG b/CHANGELOG index a43bae3fd4..b7ce9fc91f 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -20,6 +20,10 @@ git-annex (10.20250722) UNRELEASED; urgency=medium branch, re-adjusting errors out, rather than losing that merge commit. * sync: When another branch has been manually merged into an adjusted branch, error out rather than only displaying a warning. + * initremote: When onlyencryptcreds=yes is used along with + embedcreds=yes, and encryption is enabled, only encrypt the embedded + creds, without encrypting the content of the special remote. Useful + for exporttree=yes/importtree=yes remotes. * Bump aws build dependency to 0.24.1. * stack.yaml: Update to lts-24.2. diff --git a/Creds.hs b/Creds.hs index 4e197d7001..2003f691e2 100644 --- a/Creds.hs +++ b/Creds.hs @@ -33,7 +33,7 @@ import Annex.Perms import Utility.FileMode import Crypto import Types.ProposedAccepted -import Remote.Helper.Encryptable (remoteCipher, remoteCipher', embedCreds, EncryptionIsSetup, extractCipher) +import Remote.Helper.Encryptable (remoteCipher, remoteCipher', CipherPurpose(..), embedCreds, EncryptionIsSetup, extractCipher) import Utility.Env (getEnv) import Utility.Base64 import qualified Utility.FileIO as F @@ -95,14 +95,19 @@ setRemoteCredPair' pc encsetup gc storage mcreds = case mcreds of where localcache creds = writeCacheCredPair creds storage - storeconfig creds key (Just cipher) = do + storeconfig creds key (Just (CipherAllPurpose cipher)) = + storeconfigcipher creds key cipher + storeconfig creds key (Just (CipherOnlyCreds cipher)) = + storeconfigcipher creds key cipher + storeconfig creds key Nothing = + storeconfig' key (Accepted (decodeBS $ toB64 $ encodeBS $ encodeCredPair creds)) + + storeconfigcipher creds key cipher = do cmd <- gpgCmd <$> Annex.getGitConfig s <- liftIO $ encrypt cmd (pc, gc) cipher (feedBytes $ L8.pack $ encodeCredPair creds) (readBytesStrictly return) storeconfig' key (Accepted (decodeBS (toB64 s))) - storeconfig creds key Nothing = - storeconfig' key (Accepted (decodeBS $ toB64 $ encodeBS $ encodeCredPair creds)) storeconfig' key val = return $ pc { parsedRemoteConfigMap = M.insert key (RemoteConfigValue val) (parsedRemoteConfigMap pc) @@ -127,14 +132,16 @@ getRemoteCredPair c gc storage = maybe fromcache (return . Just) =<< fromenv <|> getRemoteConfigValue key c case (getval, mcipher) of (Nothing, _) -> return Nothing - (Just enccreds, Just (cipher, storablecipher)) -> - fromenccreds (encodeBS enccreds) cipher storablecipher + (Just enccreds, Just ((CipherAllPurpose cipher, storablecipher))) -> + fromenccreds enccreds cipher storablecipher + (Just enccreds, Just ((CipherOnlyCreds cipher, storablecipher))) -> + fromenccreds enccreds cipher storablecipher (Just bcreds, Nothing) -> fromcreds $ decodeBS $ fromB64 $ encodeBS bcreds fromenccreds enccreds cipher storablecipher = do cmd <- gpgCmd <$> Annex.getGitConfig mcreds <- liftIO $ catchMaybeIO $ decrypt cmd (c, gc) cipher - (feedBytes $ L8.fromStrict $ fromB64 enccreds) + (feedBytes $ L8.fromStrict $ fromB64 $ encodeBS enccreds) (readBytesStrictly $ return . S8.unpack) case mcreds of Just creds -> fromcreds creds @@ -145,7 +152,7 @@ getRemoteCredPair c gc storage = maybe fromcache (return . Just) =<< fromenv case storablecipher of SharedCipher {} -> showLongNote "gpg error above was caused by an old git-annex bug in credentials storage. Working around it.." _ -> giveup "*** Insecure credentials storage detected for this remote! See https://git-annex.branchable.com/upgrades/insecure_embedded_creds/" - fromcreds $ decodeBS $ fromB64 enccreds + fromcreds $ decodeBS $ fromB64 $ encodeBS enccreds fromcreds creds = case decodeCredPair creds of Just credpair -> do writeCacheCredPair credpair storage diff --git a/Remote/Helper/Encryptable.hs b/Remote/Helper/Encryptable.hs index 33eb5b3837..46ab018f7a 100644 --- a/Remote/Helper/Encryptable.hs +++ b/Remote/Helper/Encryptable.hs @@ -15,6 +15,7 @@ module Remote.Helper.Encryptable ( encryptionConfigParsers, parseEncryptionConfig, parseEncryptionMethod, + CipherPurpose(..), remoteCipher, remoteCipher', embedCreds, @@ -63,6 +64,8 @@ encryptionConfigParsers = , optionalStringParser pubkeysField HiddenField , yesNoParser embedCredsField Nothing (FieldDesc "embed credentials into git repository") + , yesNoParser onlyEncryptCredsField Nothing + (FieldDesc "only encrypt embedded credentials, not annexed files") , macFieldParser , optionalStringParser (Accepted "keyid") (FieldDesc "gpg key id") @@ -217,12 +220,14 @@ encryptionSetup c gc = do -- remotes (while being backward-compatible). (map Accepted ["keyid", "keyid+", "keyid-", "highRandomQuality"]) -remoteCipher :: ParsedRemoteConfig -> RemoteGitConfig -> Annex (Maybe Cipher) -remoteCipher c gc = fmap fst <$> remoteCipher' c gc +data CipherPurpose t = CipherAllPurpose t | CipherOnlyCreds t {- Gets encryption Cipher. The decrypted Ciphers are cached in the Annex - state. -} -remoteCipher' :: ParsedRemoteConfig -> RemoteGitConfig -> Annex (Maybe (Cipher, StorableCipher)) +remoteCipher :: ParsedRemoteConfig -> RemoteGitConfig -> Annex (Maybe (CipherPurpose Cipher)) +remoteCipher c gc = fmap fst <$> remoteCipher' c gc + +remoteCipher' :: ParsedRemoteConfig -> RemoteGitConfig -> Annex (Maybe (CipherPurpose Cipher, StorableCipher)) remoteCipher' c gc = case extractCipher c of Nothing -> return Nothing Just encipher -> do @@ -230,7 +235,7 @@ remoteCipher' c gc = case extractCipher c of cachedciper <- liftIO $ atomically $ M.lookup encipher <$> readTMVar cachev case cachedciper of - Just cipher -> return $ Just (cipher, encipher) + Just cipher -> return $ Just (purpose cipher, encipher) -- Not cached; decrypt it, making sure -- to only decrypt one at a time. Avoids -- prompting for decrypting the same thing twice @@ -245,7 +250,10 @@ remoteCipher' c gc = case extractCipher c of cipher <- liftIO $ decryptCipher gpgcmd (c, gc) encipher liftIO $ atomically $ putTMVar cachev $ M.insert encipher cipher cache - return $ Just (cipher, encipher) + return $ Just (purpose cipher, encipher) + purpose + | onlyEncryptCreds c = CipherOnlyCreds + | otherwise = CipherAllPurpose {- Checks if the remote's config allows storing creds in the remote's config. - @@ -262,11 +270,19 @@ embedCreds c = case getRemoteConfigValue embedCredsField c of (Just (_ :: String), Just (_ :: String)) -> True _ -> False -{- Gets encryption Cipher, and key encryptor. -} +onlyEncryptCreds :: ParsedRemoteConfig -> Bool +onlyEncryptCreds c = case getRemoteConfigValue onlyEncryptCredsField c of + Just v -> v + Nothing -> False + +{- Gets key data encryption Cipher, and key encryptor. -} cipherKey :: ParsedRemoteConfig -> RemoteGitConfig -> Annex (Maybe (Cipher, EncKey)) -cipherKey c gc = fmap make <$> remoteCipher c gc +cipherKey c gc = go <$> remoteCipher c gc where - make ciphertext = (ciphertext, encryptKey mac ciphertext) + go (Just (CipherAllPurpose ciphertext)) = + Just (ciphertext, encryptKey mac ciphertext) + go (Just (CipherOnlyCreds _)) = Nothing + go Nothing = Nothing mac = fromMaybe defaultMac $ getRemoteConfigValue macField c {- Stores an StorableCipher in a remote's configuration. -} @@ -297,7 +313,7 @@ extractCipher c = case (getRemoteConfigValue cipherField c, readkeys = KeyIds . splitc ',' isEncrypted :: ParsedRemoteConfig -> Bool -isEncrypted = isJust . extractCipher +isEncrypted c = isJust (extractCipher c) && not (onlyEncryptCreds c) -- Check if encryption is enabled. This can be done before encryption -- is fully set up yet, so the cipher might not be present yet. @@ -305,12 +321,16 @@ encryptionIsEnabled :: ParsedRemoteConfig -> Bool encryptionIsEnabled c = case getRemoteConfigValue encryptionField c of Nothing -> False Just NoneEncryption -> False - Just _ -> True + Just _ -> not (onlyEncryptCreds c) describeEncryption :: ParsedRemoteConfig -> String describeEncryption c = case extractCipher c of Nothing -> "none" - Just cip -> nameCipher cip ++ " (" ++ describeCipher cip ++ ")" + Just cip + | onlyEncryptCreds c -> "creds only; " ++ desc cip + | otherwise -> desc cip (Diff truncated)
bug
diff --git a/doc/bugs/tahoe_special_remote_needs_updating.mdwn b/doc/bugs/tahoe_special_remote_needs_updating.mdwn new file mode 100644 index 0000000000..e7b1aaf831 --- /dev/null +++ b/doc/bugs/tahoe_special_remote_needs_updating.mdwn @@ -0,0 +1,15 @@ +After `apt-get install tahoe-lafs` installed version 1.20.0-6, this fails + + TAHOE_FURL=foo git-annex initremote t type=tahoe embedcreds=yes + + /usr/bin/tahoe: Unknown command: start + +The tahoe man page says: + + CONTROLLING NODES + In the past, the 'tahoe' command offered service watching (with start, restart, stop commands), but this was not very portable + and has been deprecated. + + COMMANDS + run Run a node without daemonizing. This is the only command for running nodes. +
bug
diff --git a/doc/bugs/possible_to_enable_encryption_for_exporttree_remote.mdwn b/doc/bugs/possible_to_enable_encryption_for_exporttree_remote.mdwn new file mode 100644 index 0000000000..7976058550 --- /dev/null +++ b/doc/bugs/possible_to_enable_encryption_for_exporttree_remote.mdwn @@ -0,0 +1,10 @@ +An exporttree=yes remote cannot be initremoted with encryption enabled. +However, it is possible to use enableremote after the fact to enable +encryption for such a remote: + + # git-annex initremote d type=directory exporttree=yes importtree=yes encryption=none directory=../d + # git-annex enableremote d directory=../d encryption=shared + enableremote d (encryption setup) (encryption key stored in git repository) ok + (recording state in git...) + +This config change should not be allowed. --[[Joey]]
comment
diff --git a/doc/todo/encrypt_only_the_credentials/comment_5_dc9c94892b4f8a7d072e6dc036adc05a._comment b/doc/todo/encrypt_only_the_credentials/comment_5_dc9c94892b4f8a7d072e6dc036adc05a._comment new file mode 100644 index 0000000000..a86867f989 --- /dev/null +++ b/doc/todo/encrypt_only_the_credentials/comment_5_dc9c94892b4f8a7d072e6dc036adc05a._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-08-20T17:44:07Z" + content=""" +I think I was assuming that encryption=onlycreds would use the same scheme as +encryption=hybrid, so new gpg keys can later be given access to the creds. + +It might be possible that someone would want the equivilant of +encryption=pubkey instead. (encryption=sharedpubkey is the same as +encryption=pubkey as far as encryption of creds goes). + +In future there might be some other, better encryption scheme that might be +desirable to use only for creds. Eg, something other than gpg.. + +An alternative to support such would be to use: + + encryption=<whatever> embedcreds=yes onlyencryptcreds=yes +"""]]
fixed
diff --git a/doc/bugs/Re-Adjust_Loses_Commits.mdwn b/doc/bugs/Re-Adjust_Loses_Commits.mdwn index bc0f236553..c3c75089c5 100644 --- a/doc/bugs/Re-Adjust_Loses_Commits.mdwn +++ b/doc/bugs/Re-Adjust_Loses_Commits.mdwn @@ -78,3 +78,5 @@ Git-annex is such a great piece of software, thanks for creating it. I use it to But now I started using some SW that cannot deal with symlinks, so I use an adjusted branch of main. Merging the new import branch into the adjusted branch leads to the described issue. Many thanks and have a great day! + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/Re-Adjust_Loses_Commits/comment_3_80f036b355f445776232fe4b104790f1._comment b/doc/bugs/Re-Adjust_Loses_Commits/comment_3_80f036b355f445776232fe4b104790f1._comment new file mode 100644 index 0000000000..7f75ec9550 --- /dev/null +++ b/doc/bugs/Re-Adjust_Loses_Commits/comment_3_80f036b355f445776232fe4b104790f1._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-08-20T17:08:24Z" + content=""" +I've made `git-annex sync` and also `git-annex adjust` error out in this +situation, rather than treating it as a warning. + +I also improved the error message a bit. And improved the documentation +to warn against getting into this situation. +"""]]
error out when another branch has been manually merged into the adjusted branch
This avoids losing the merge commit when re-running git-annex adjust in the
adjusted branch.
It also makes git-annex sync error out, rather than displaying a warning
and exiting successfully.
Sponsored-by: Leon Schuermann on Patreon
This avoids losing the merge commit when re-running git-annex adjust in the
adjusted branch.
It also makes git-annex sync error out, rather than displaying a warning
and exiting successfully.
Sponsored-by: Leon Schuermann on Patreon
diff --git a/Annex/AdjustedBranch.hs b/Annex/AdjustedBranch.hs index 95bd8cfc34..4f72cbd979 100644 --- a/Annex/AdjustedBranch.hs +++ b/Annex/AdjustedBranch.hs @@ -524,15 +524,12 @@ propigateAdjustedCommits' propigateAdjustedCommits' warnwhendiverged origbranch adj _commitsprevented = inRepo (Git.Ref.sha basis) >>= \case Just origsha -> catCommit currbranch >>= \case - Just currcommit -> - newcommits >>= go origsha origsha False >>= \case - Left e -> do - warning (UnquotedString e) - return (Nothing, return ()) - Right newparent -> return - ( Just newparent - , rebase currcommit newparent - ) + Just currcommit -> do + newparent <- newcommits >>= go origsha origsha False + return + ( Just newparent + , rebase currcommit newparent + ) Nothing -> return (Nothing, return ()) Nothing -> do warning $ UnquotedString $ @@ -553,16 +550,14 @@ propigateAdjustedCommits' warnwhendiverged origbranch adj _commitsprevented = warning $ UnquotedString $ "Original branch " ++ fromRef origbranch ++ " has diverged from current adjusted branch " ++ fromRef currbranch _ -> inRepo $ Git.Branch.update' origbranch parent - return (Right parent) + return parent go origsha parent pastadjcommit (sha:l) = catCommit sha >>= \case Just c | hasAdjustedBranchCommitMessage c -> go origsha parent True l - | pastadjcommit -> - reverseAdjustedCommit parent adj (sha, c) origbranch - >>= \case - Left e -> return (Left e) - Right commit -> go origsha commit pastadjcommit l + | pastadjcommit -> do + commit <- reverseAdjustedCommit parent adj (sha, c) origbranch + go origsha commit pastadjcommit l _ -> go origsha parent pastadjcommit l rebase currcommit newparent = do -- Reuse the current adjusted tree, and reparent it @@ -582,10 +577,10 @@ rebaseOnTopMsg = "rebasing adjusted branch on top of updated original branch" - The commit message, and the author and committer metadata are - copied over from the basiscommit. However, any gpg signature - will be lost, and any other headers are not copied either. -} -reverseAdjustedCommit :: Sha -> Adjustment -> (Sha, Commit) -> OrigBranch -> Annex (Either String Sha) +reverseAdjustedCommit :: Sha -> Adjustment -> (Sha, Commit) -> OrigBranch -> Annex Sha reverseAdjustedCommit commitparent adj (csha, basiscommit) origbranch - | length (commitParent basiscommit) > 1 = return $ - Left $ "unable to propagate merge commit " ++ show csha ++ " back to " ++ show origbranch + | length (commitParent basiscommit) > 1 = giveup $ + "unable to propagate merge commit " ++ show csha ++ " back to " ++ show origbranch | otherwise = do cmode <- annexCommitMode <$> Annex.getGitConfig treesha <- reverseAdjustedTree commitparent adj csha @@ -595,7 +590,7 @@ reverseAdjustedCommit commitparent adj (csha, basiscommit) origbranch Git.Branch.commitTree cmode [commitMessage basiscommit] [commitparent] treesha - return (Right revadjcommit) + return revadjcommit {- Adjusts the tree of the basis, changing only the files that the - commit changed, and reverse adjusting those changes. diff --git a/CHANGELOG b/CHANGELOG index adba28c4d4..a43bae3fd4 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -16,6 +16,10 @@ git-annex (10.20250722) UNRELEASED; urgency=medium and display. * Improve behavior when there are special remotes configured with autoenable=yes with names that conflict with other remotes. + * adjust: When another branch has been manually merged into the adjusted + branch, re-adjusting errors out, rather than losing that merge commit. + * sync: When another branch has been manually merged into an adjusted + branch, error out rather than only displaying a warning. * Bump aws build dependency to 0.24.1. * stack.yaml: Update to lts-24.2. diff --git a/doc/bugs/Re-Adjust_Loses_Commits/comment_2_5b9c6c1966f967ff417c2acc39cfada4._comment b/doc/bugs/Re-Adjust_Loses_Commits/comment_2_5b9c6c1966f967ff417c2acc39cfada4._comment new file mode 100644 index 0000000000..b3ff29c440 --- /dev/null +++ b/doc/bugs/Re-Adjust_Loses_Commits/comment_2_5b9c6c1966f967ff417c2acc39cfada4._comment @@ -0,0 +1,37 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-08-20T16:34:25Z" + content=""" +Thanks for bumping this. It was in my backlog. I've taken a look at it now. + +Note that you can use the reflog to get back to the missing commits. + +The [[git-annex-adjust]] warns about merging into an adjusted branch. And +suggests to use `git-annex merge` to merge a branch into an adjusted +branch. Which avoids this problem. + +Probably the best thing for it to do in this situation is to fail in a way +that leaves the adjusted branch as-is. The user can then address the +problem, eg by resetting the adjusted branch to a point before the merge +and doing the merge some other way. + +It would be difficult to handle propagating a merge commit back to the +original branch. Usually when on an adjusted branch, any commit of annexed +files can be assumed to have the adjustment (eg unlocking) applied to the +files. And so reversing the adjustment will yield the desired state (eg +locked files). But a merge commit may not be of another adjusted branch, +it could be a non-adjusted branch. Or it could be a branch with a different +adjustment applied to it. Reversing the adjustment would then do the wrong +thing. + +Consider for example, if the --unlock adjustment were used. But then a +branch adjusted with --hide-missing were merged in. This is basically +indistingushable from merging in a branch where some unwanted annexed file +is removed. + +Also, it looks at the diff of changes made in a commit to know which +annexed files were changed and reverse adjusts those files. In a merge +commit, it's not clear which of the multiple parents it should diff +against. +"""]] diff --git a/doc/git-annex-adjust.mdwn b/doc/git-annex-adjust.mdwn index fd573ec904..55f7f646e7 100644 --- a/doc/git-annex-adjust.mdwn +++ b/doc/git-annex-adjust.mdwn @@ -21,7 +21,7 @@ to a public branch with commands like `git-annex unlock`. While in the adjusted branch, you can use git-annex and git commands as usual. Any commits that you make will initially only be made to the -adjusted branch. +adjusted branch. To propagate commits from the adjusted branch back to the original branch, and to other repositories, as well as to merge in changes from other @@ -31,8 +31,9 @@ made by this command. When in an adjusted branch, using `git merge otherbranch` is often not ideal, because merging a non-adjusted branch may lead to unnecessary -merge conflicts, or add files in non-adjusted form. To avoid those -problems, use `git annex merge otherbranch`. +merge conflicts, or add files in non-adjusted form. And such merges +cannot be propagated from the adjusted branch back to the original branch. +To avoid those problems, use `git annex merge otherbranch`. Re-running this command with the same options while inside the adjusted branch will update the adjusted branch
Added a comment: Reproduce Issue
diff --git a/doc/bugs/Re-Adjust_Loses_Commits/comment_1_63c5f829b53c60355da2be974bdc4f1d._comment b/doc/bugs/Re-Adjust_Loses_Commits/comment_1_63c5f829b53c60355da2be974bdc4f1d._comment new file mode 100644 index 0000000000..37d04a19a1 --- /dev/null +++ b/doc/bugs/Re-Adjust_Loses_Commits/comment_1_63c5f829b53c60355da2be974bdc4f1d._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jcjgraf" + avatar="http://cdn.libravatar.org/avatar/9dda752f83ac44906fefbadb35e8a6ac" + subject="Reproduce Issue" + date="2025-08-18T19:02:34Z" + content=""" +Bump; Are there any difficulties reproducing this issue? Let me know if I can provide any more information +"""]]
Added a comment: encryption=credsonly
diff --git a/doc/todo/encrypt_only_the_credentials/comment_4_50799c026dfe44aa2d447596e351f61a._comment b/doc/todo/encrypt_only_the_credentials/comment_4_50799c026dfe44aa2d447596e351f61a._comment new file mode 100644 index 0000000000..eff2b678de --- /dev/null +++ b/doc/todo/encrypt_only_the_credentials/comment_4_50799c026dfe44aa2d447596e351f61a._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="stv0g" + avatar="http://cdn.libravatar.org/avatar/6faa6cc783a165b25fc1c8f3154ba449" + subject="encryption=credsonly" + date="2025-08-18T17:07:40Z" + content=""" +Hi Joey, + +> I agree it would make sense to have some way to embedcreds without encrypting content stored on the remote. +> I suppose one way to express it is as encryption=onlycreds embedcreds=yes with one or more keyids. + +I am also in need of the `encryption=credsonly` option for the LTO tape special remote on which I am currently working. + +LTO tape drives provide hardware-based AES encryption which I would like to use. However, to enable this HW-accellerated encryption, I need to initialize the tape drive with an appropriate key, which I would like to store in the annex using credentials. +"""]]
bug
diff --git a/doc/bugs/bittorrent_downloading_torrent_file_is_not_concurrency_safe.mdwn b/doc/bugs/bittorrent_downloading_torrent_file_is_not_concurrency_safe.mdwn new file mode 100644 index 0000000000..05b21109ac --- /dev/null +++ b/doc/bugs/bittorrent_downloading_torrent_file_is_not_concurrency_safe.mdwn @@ -0,0 +1,3 @@ +`git-annex get -J10` of a bunch of files that were all addurled from the +same torrent will fail. It uses the same temp file for the torrent file +that each thread tries to download, resulting in "file does not exist" errors. --[[Joey]]
update
diff --git a/doc/todo/finish_sync_content_transition.mdwn b/doc/todo/finish_sync_content_transition.mdwn index 55994a2915..f7a552e939 100644 --- a/doc/todo/finish_sync_content_transition.mdwn +++ b/doc/todo/finish_sync_content_transition.mdwn @@ -12,3 +12,7 @@ A warning was added in August 2023 when it's run in a way that will change behavior. It would be good to wait until all git-annex users have gotten the version with the warning, and used it for a while, before finishing the transition. + +> Using the August 2025 debian stable release that included the warning as the +> start point, I suggest September 2026 as the transition end date. +> --[[Joey]]
warn and refuse to autoenable a special remote when name is in use
Improve behavior when there are special remotes configured with
autoenable=yes with names that conflict with other remotes.
The use of remoteList' is to avoid using the cached remote list in the case
where there are two special remotes both configured to autoenable and both
with the same name. Once the 1st is autoenabled, this makes reload the
remote list and so see the 1st, and so refuse to autoenable the second.
This adds a little bit of overhead, but it should be sufficiently small not
to need optimising.
Sponsored-by: Dartmouth College's OpenNeuro project
Improve behavior when there are special remotes configured with
autoenable=yes with names that conflict with other remotes.
The use of remoteList' is to avoid using the cached remote list in the case
where there are two special remotes both configured to autoenable and both
with the same name. Once the 1st is autoenabled, this makes reload the
remote list and so see the 1st, and so refuse to autoenable the second.
This adds a little bit of overhead, but it should be sufficiently small not
to need optimising.
Sponsored-by: Dartmouth College's OpenNeuro project
diff --git a/Annex/SpecialRemote.hs b/Annex/SpecialRemote.hs index 2b23b06b5d..d3b1215b63 100644 --- a/Annex/SpecialRemote.hs +++ b/Annex/SpecialRemote.hs @@ -96,12 +96,7 @@ autoEnable = do Just (Sameas u') -> u' Nothing -> cu case (lookupName c, findType c) of - -- Avoid auto-enabling when the name contains a - -- control character, because git does not avoid - -- displaying control characters in the name of a - -- remote, and an attacker could leverage - -- autoenabling it as part of an attack. - (Just name, Right t) | safeOutput name == name -> do + (Just name, Right t) -> checkcanenable u name $ do showSideAction $ UnquotedString $ "Auto enabling special remote " ++ name dummycfg <- liftIO dummyRemoteGitConfig tryNonAsync (setup t (AutoEnable c) (Just u) Nothing c dummycfg) >>= \case @@ -117,6 +112,19 @@ autoEnable = do getcu r = fromMaybe (Remote.uuid r) (remoteAnnexConfigUUID (Remote.gitconfig r)) + checkcanenable u name cont + -- Avoid auto-enabling when the name contains a control + -- character, because git does not avoid displaying control + -- characters in the name of a remote, and an attacker could + -- leverage autoenabling it as part of an attack. + | safeOutput name /= name = return () + | otherwise = do + rs <- remoteList' False + case filter (\rmt -> Remote.name rmt == name) rs of + (rmt:_) | Remote.uuid rmt /= u -> warning $ + UnquotedString $ "Cannot auto enable special remote " + ++ name ++ " because there is another remote with the same name." + _ -> cont autoEnableable :: Annex (M.Map UUID RemoteConfig) autoEnableable = do diff --git a/CHANGELOG b/CHANGELOG index 16580febc0..adba28c4d4 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -14,6 +14,8 @@ git-annex (10.20250722) UNRELEASED; urgency=medium to be explicitly set. * info: Added --show option to pick which parts of the info to calculate and display. + * Improve behavior when there are special remotes configured with + autoenable=yes with names that conflict with other remotes. * Bump aws build dependency to 0.24.1. * stack.yaml: Update to lts-24.2. diff --git a/doc/bugs/multiple_records_in_remote.log_for_the_same_remote.mdwn b/doc/bugs/multiple_records_in_remote.log_for_the_same_remote.mdwn index d7682da413..c79f9980f7 100644 --- a/doc/bugs/multiple_records_in_remote.log_for_the_same_remote.mdwn +++ b/doc/bugs/multiple_records_in_remote.log_for_the_same_remote.mdwn @@ -71,3 +71,6 @@ ATM 10.20240430+git26-g5f61667f27-1~ndall+1 but I guess it is unrelated. [[!meta author=yoh]] [[!tag projects/openneuro]] [[!meta title="warn when two special remotes with the same name are both configured to autoenable, and avoid one overwriting the git config of the other"]] + +> I think I have explained what happened here, and the behavior change is +> enough to prevent the confusing behavior. [[done]] --[[Joey]] diff --git a/doc/bugs/multiple_records_in_remote.log_for_the_same_remote/comment_3_fe26531dcfa26b30225dc34a841153d1._comment b/doc/bugs/multiple_records_in_remote.log_for_the_same_remote/comment_3_fe26531dcfa26b30225dc34a841153d1._comment index 46dbb73d94..e43e88f275 100644 --- a/doc/bugs/multiple_records_in_remote.log_for_the_same_remote/comment_3_fe26531dcfa26b30225dc34a841153d1._comment +++ b/doc/bugs/multiple_records_in_remote.log_for_the_same_remote/comment_3_fe26531dcfa26b30225dc34a841153d1._comment @@ -3,8 +3,8 @@ subject="""comment 3""" date="2025-08-14T14:03:08Z" content=""" -Beyond a warning, it would be possible two autoenable both, but use a new -name for the second one. +Beyond a warning, it would be possible to autoenable both, but use a new +name for the second one. Although that could lead to its own problems. It occurs to me that it's also possible for autoenable of a special remote to overwrite/change the git config of a regular git remote that has the diff --git a/doc/bugs/multiple_records_in_remote.log_for_the_same_remote/comment_4_87330e2378759679069ab0fce1df3e92._comment b/doc/bugs/multiple_records_in_remote.log_for_the_same_remote/comment_4_87330e2378759679069ab0fce1df3e92._comment new file mode 100644 index 0000000000..62cbd15350 --- /dev/null +++ b/doc/bugs/multiple_records_in_remote.log_for_the_same_remote/comment_4_87330e2378759679069ab0fce1df3e92._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-08-14T14:58:11Z" + content=""" +I've added a warning in these cases, and it will avoid autoenabling a special +remote when there is already a remote with the same name. + +In the case of two special remotes with the same name that are both set to +autoenable, it's essentially random which gets enabled first and so "wins". + +Decided against autoenabling it with a different name, because: a) There's +the potential that the name it comes up with is actually the name of +another special remote that is also due to be autoenabled. b) It seems like +potentially confusing behavior for there to be a remote with a different +name than that usually used for a particular special remote. +"""]]
followup
diff --git a/doc/bugs/multiple_records_in_remote.log_for_the_same_remote.mdwn b/doc/bugs/multiple_records_in_remote.log_for_the_same_remote.mdwn index a19b80b89e..d7682da413 100644 --- a/doc/bugs/multiple_records_in_remote.log_for_the_same_remote.mdwn +++ b/doc/bugs/multiple_records_in_remote.log_for_the_same_remote.mdwn @@ -70,3 +70,4 @@ ATM 10.20240430+git26-g5f61667f27-1~ndall+1 but I guess it is unrelated. [[!meta author=yoh]] [[!tag projects/openneuro]] +[[!meta title="warn when two special remotes with the same name are both configured to autoenable, and avoid one overwriting the git config of the other"]] diff --git a/doc/bugs/multiple_records_in_remote.log_for_the_same_remote/comment_3_fe26531dcfa26b30225dc34a841153d1._comment b/doc/bugs/multiple_records_in_remote.log_for_the_same_remote/comment_3_fe26531dcfa26b30225dc34a841153d1._comment new file mode 100644 index 0000000000..46dbb73d94 --- /dev/null +++ b/doc/bugs/multiple_records_in_remote.log_for_the_same_remote/comment_3_fe26531dcfa26b30225dc34a841153d1._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-08-14T14:03:08Z" + content=""" +Beyond a warning, it would be possible two autoenable both, but use a new +name for the second one. + +It occurs to me that it's also possible for autoenable of a special remote +to overwrite/change the git config of a regular git remote that has the +same name. This would be unlikely except in the case of one named "origin", +but it could happen, just needs the git remote to have been added before +git-annex inits. That seems like a problem that ought to be avoided too. +"""]]
close
diff --git a/doc/bugs/get_is_busy_doing_nothing.mdwn b/doc/bugs/get_is_busy_doing_nothing.mdwn index 3dd630ced5..2f71c53c21 100644 --- a/doc/bugs/get_is_busy_doing_nothing.mdwn +++ b/doc/bugs/get_is_busy_doing_nothing.mdwn @@ -81,3 +81,6 @@ Is there any diagnostic information I should collect to help troubleshooting the [[!tag projects/datalad]] [[!meta title="SQLite3 returned ErrorBusy while attempting to perform step: database is locked"]] + +> It seems that this bug has been taken as far as it can be toward a fix. +> So I'm going to call this [[done]] --[[Joey]]
info: Added --show option
To pick which parts of the info to calculate and display.
Sponsored-by: Dartmouth College's DANDI project
To pick which parts of the info to calculate and display.
Sponsored-by: Dartmouth College's DANDI project
diff --git a/CHANGELOG b/CHANGELOG index 2d046bf671..8a340af35b 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -12,6 +12,8 @@ git-annex (10.20250722) UNRELEASED; urgency=medium * S3: When initremote is given the name of a bucket that already exists, automatically set datacenter to the right value, rather than needing it to be explicitly set. (This needs aws-0.23) + * info: Added --show option to pick which parts of the info to calculate + and display. * Bump aws build dependency to 0.24.1. * stack.yaml: Update to lts-24.2. diff --git a/Command/Info.hs b/Command/Info.hs index 3c0b7c030e..d2f690027a 100644 --- a/Command/Info.hs +++ b/Command/Info.hs @@ -1,6 +1,6 @@ {- git-annex command - - - Copyright 2011-2024 Joey Hess <id@joeyh.name> + - Copyright 2011-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -55,7 +55,10 @@ import qualified Command.Unused import qualified Utility.RawFilePath as R -- a named computation that produces a statistic -type Stat = StatState (Maybe (String, StatState String)) +data Stat = Stat + { statDesc :: String + , statComp :: StatState (Maybe (StatState String)) + } -- data about a set of keys data KeyInfo = KeyInfo @@ -116,6 +119,7 @@ data InfoOptions = InfoOptions , batchOption :: BatchMode , autoenableOption :: Bool , deadrepositoriesOption :: Bool + , showOption :: [String] } optParser :: CmdParamsDesc -> Parser InfoOptions @@ -134,6 +138,10 @@ optParser desc = InfoOptions ( long "dead-repositories" <> help "list repositories that have been marked as dead" ) + <*> many (strOption + ( long "show" <> metavar paramName + <> help "limit info output" + )) seek :: InfoOptions -> CommandSeek seek o = case batchOption o of @@ -158,7 +166,7 @@ globalInfo o = do u <- getUUID whenM ((==) DeadTrusted <$> lookupTrust u) $ earlyWarning "Warning: This repository is currently marked as dead." - stats <- selStats global_fast_stats global_slow_stats + stats <- selStats o global_fast_stats global_slow_stats showCustom "info" (SeekInput []) $ do evalStateT (mapM_ showStat stats) (emptyStatInfo o) return True @@ -211,7 +219,7 @@ noInfo s si msg = do dirInfo :: InfoOptions -> FilePath -> SeekInput -> Annex () dirInfo o dir si = showCustom (unwords ["info", dir]) si $ do - stats <- selStats + stats <- selStats o (tostats (dir_name:tree_fast_stats True)) (tostats tree_slow_stats) evalStateT (mapM_ showStat stats) =<< getDirStatInfo o dir @@ -226,7 +234,7 @@ treeishInfo o t si = do Nothing -> noInfo t si "not a directory or an annexed file or a treeish or a remote or a uuid" Just i -> showCustom (unwords ["info", t]) si $ do - stats <- selStats + stats <- selStats o (tostats (tree_name:tree_fast_stats False)) (tostats tree_slow_stats) evalStateT (mapM_ showStat stats) i @@ -247,7 +255,7 @@ remoteInfo :: InfoOptions -> Remote -> SeekInput -> Annex () remoteInfo o r si = showCustom (unwords ["info", Remote.name r]) si $ do i <- map (\(k, v) -> simpleStat k (pure v)) <$> Remote.getInfo r let u = Remote.uuid r - l <- selStats + l <- selStats o (uuid_fast_stats u ++ remote_fast_stats r ++ i) (uuid_slow_stats u) evalStateT (mapM_ showStat l) (emptyStatInfo o) @@ -255,16 +263,21 @@ remoteInfo o r si = showCustom (unwords ["info", Remote.name r]) si $ do uuidInfo :: InfoOptions -> UUID -> SeekInput -> Annex () uuidInfo o u si = showCustom (unwords ["info", fromUUID u]) si $ do - l <- selStats (uuid_fast_stats u) (uuid_slow_stats u) + l <- selStats o (uuid_fast_stats u) (uuid_slow_stats u) evalStateT (mapM_ showStat l) (emptyStatInfo o) return True -selStats :: [Stat] -> [Stat] -> Annex [Stat] -selStats fast_stats slow_stats = do - fast <- Annex.getRead Annex.fast - return $ if fast - then fast_stats - else fast_stats ++ slow_stats +selStats :: InfoOptions -> [Stat] -> [Stat] -> Annex [Stat] +selStats o fast_stats slow_stats + | null (showOption o) = do + fast <- Annex.getRead Annex.fast + return $ if fast + then fast_stats + else fast_stats ++ slow_stats + | otherwise = return $ + let wanted = S.fromList (showOption o) + in filter (\s -> S.member (statDesc s) wanted) + (fast_stats ++ slow_stats) {- Order is significant. Less expensive operations, and operations - that share data go together. @@ -337,14 +350,14 @@ uuid_slow_stats u = map (\s -> s u) ] stat :: String -> (String -> StatState String) -> Stat -stat desc a = return $ Just (desc, a desc) +stat desc a = Stat desc $ return $ Just $ a desc -- The json simply contains the same string that is displayed. simpleStat :: String -> StatState String -> Stat simpleStat desc getval = stat desc $ json id getval -nostat :: Stat -nostat = return Nothing +nostat :: String -> Stat +nostat desc = Stat desc $ return Nothing json :: ToJSON' j => (j -> String) -> StatState j -> String -> StatState String json fmt a desc = do @@ -356,10 +369,10 @@ nojson :: StatState String -> String -> StatState String nojson a _ = a showStat :: Stat -> StatState () -showStat s = maybe noop calc =<< s +showStat s = maybe noop calc =<< statComp s where - calc (desc, a) = do - (lift . showHeader . encodeBS) desc + calc a = do + (lift . showHeader . encodeBS) (statDesc s) lift . showRaw . encodeBS =<< a repo_list :: TrustLevel -> Stat @@ -557,15 +570,16 @@ numcopies_stats = stat "numcopies stats" $ json fmt $ reposizes_stats_tree :: Stat reposizes_stats_tree = reposizes_stats True "repositories containing these files" - =<< cachedRepoData + cachedRepoData reposizes_stats_global :: Stat reposizes_stats_global = reposizes_stats False "annex sizes of repositories" - . repoData =<< cachedAllRepoData + (repoData <$> cachedAllRepoData) -reposizes_stats :: Bool -> String -> M.Map UUID KeyInfo -> Stat -reposizes_stats count desc m = stat desc $ nojson $ do +reposizes_stats :: Bool -> String -> StatState (M.Map UUID KeyInfo) -> Stat +reposizes_stats count desc getm = stat desc $ nojson $ do sizer <- mkSizer + m <- getm let l = map (\(u, kd) -> (u, sizer storageUnits True (sizeKeys kd))) $ sortBy (flip (comparing (sizeKeys . snd))) $ M.toList m @@ -818,15 +832,16 @@ showSizeKeys d = do " unknown size" staleSize :: String -> (Git.Repo -> OsPath) -> Stat -staleSize label dirspec = go =<< lift (dirKeys dirspec) +staleSize label dirspec = Stat label $ do + keys <- lift $ dirKeys dirspec + onsize =<< sum <$> keysizes keys where - go [] = nostat - go keys = onsize =<< sum <$> keysizes keys - onsize 0 = nostat - onsize size = stat label $ - json (++ aside "clean up with git-annex unused") $ do + onsize 0 = return Nothing + onsize size = return $ Just $ + let val = do sizer <- mkSizer return $ sizer storageUnits False size + in json (++ aside "clean up with git-annex unused") val label keysizes keys = do dir <- lift $ fromRepo dirspec liftIO $ forM keys $ \k -> diff --git a/doc/git-annex-info.mdwn b/doc/git-annex-info.mdwn index 9b0e9715df..6e1cdd2dc6 100644 --- a/doc/git-annex-info.mdwn (Diff truncated)
comment
diff --git a/doc/todo/make___96__info_--in_REMOTE__96___report_only_for_the_REMOTE/comment_1_43a157f7e98b7714ef766fddda266704._comment b/doc/todo/make___96__info_--in_REMOTE__96___report_only_for_the_REMOTE/comment_1_43a157f7e98b7714ef766fddda266704._comment new file mode 100644 index 0000000000..6540cc968c --- /dev/null +++ b/doc/todo/make___96__info_--in_REMOTE__96___report_only_for_the_REMOTE/comment_1_43a157f7e98b7714ef766fddda266704._comment @@ -0,0 +1,31 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-08-13T19:34:26Z" + content=""" +The "annex sizes of repositories" table is indeed what you want. +Since about a year ago, git-annex maintains a running total of the sizes +of all repositories. So it can generally get that information very fast. + +In cases where it needs to do work to update the running total, +it has to replay changes to the location log, which is the expensive bit. +Updating the running totals for all repositories does not really impact +the speed. So focusing on the size of a specific remote doesn't seem useful. + +(Using `--in` to do it would also overload the meaning of that option in a +confusing way, bearing in mind that it can already be used with a command +like `git-annex info --in=here .`) + +I think that what is needed is a way to make git-annex info only generate +specific parts that you request, and skip the work to calculate other +parts. Eg: + + git-annex info --show "untrusted repositories" --show "annex sizes of repositories" + +The --fast option kind of does this, for things that can be generated +really quickly. The annex sizes of repositories does not quite fit in +--fast though, since it could take a long time in some edge cases to update +the running totals. + +I think this --show option would be easy to add to info. +"""]]
close
diff --git a/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__.mdwn b/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__.mdwn index c658fb78ce..e3c7fcdc71 100644 --- a/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__.mdwn +++ b/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__.mdwn @@ -28,5 +28,5 @@ AWS_ACCESS_KEY_ID=AKIA?GIMZPVVE??Y?NKH AWS_SECRET_ACCESS_KEY=/??U??TV?LH?L???KZJ ``` [[!meta author=yoh]] -[[!tag projects/dandi]] -[[!tag moreinfo]] + +> [[fixed|done]] --[[Joey]]
Bump aws build dependency to 0.24.1
That's the version in Debian stable now. And this removes a lot of ifdefs.
Also I'm pretty sure a recent commit broke building with older versions of
aws, although that could be fixed with sufficent testing.
That's the version in Debian stable now. And this removes a lot of ifdefs.
Also I'm pretty sure a recent commit broke building with older versions of
aws, although that could be fixed with sufficent testing.
diff --git a/CHANGELOG b/CHANGELOG index 6fb6f99d6b..2d046bf671 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -4,7 +4,6 @@ git-annex (10.20250722) UNRELEASED; urgency=medium provided by external commands git-annex-p2p-<netname> * Added git-remote-p2p-annex, which allows git pull and push to P2P networks provided by commands git-annex-p2p-<netname> - * stack.yaml: Update to lts-24.2. * S3: Default to signature=v4 when using an AWS endpoint, since some AWS regions need v4 and all support it. When host= is used to specify a different S3 host, the default remains signature=v2. @@ -13,6 +12,8 @@ git-annex (10.20250722) UNRELEASED; urgency=medium * S3: When initremote is given the name of a bucket that already exists, automatically set datacenter to the right value, rather than needing it to be explicitly set. (This needs aws-0.23) + * Bump aws build dependency to 0.24.1. + * stack.yaml: Update to lts-24.2. -- Joey Hess <id@joeyh.name> Wed, 30 Jul 2025 13:45:42 -0400 diff --git a/Remote/S3.hs b/Remote/S3.hs index 4da5ff3f36..cc8c6d7c83 100644 --- a/Remote/S3.hs +++ b/Remote/S3.hs @@ -438,7 +438,6 @@ retrieve hv r rs c info = fileRetriever' $ \f k p iv -> withS3Handle hv $ \case giveup "cannot download content" Right us -> unlessM (withUrlOptions Nothing $ downloadUrl False k p iv us f) $ giveup "failed to download content" - Left S3HandleAnonymousOldAws -> giveupS3HandleProblem S3HandleAnonymousOldAws (uuid r) retrieveHelper :: S3Info -> S3Handle -> (Either S3.Object S3VersionID) -> OsPath -> MeterUpdate -> Maybe IncrementalVerifier -> Annex () retrieveHelper info h loc f p iv = retrieveHelper' h f p iv $ @@ -487,7 +486,6 @@ checkKey hv r rs c info k = withS3Handle hv $ \case let check u = withUrlOptions Nothing $ Url.checkBoth u (fromKey keySize k) anyM check us - Left S3HandleAnonymousOldAws -> giveupS3HandleProblem S3HandleAnonymousOldAws (uuid r) checkKeyHelper :: S3Info -> S3Handle -> (Either S3.Object S3VersionID) -> Annex Bool checkKeyHelper info h loc = checkKeyHelper' info h o limit @@ -528,7 +526,6 @@ retrieveExportS3 hv r info k loc f p = verifyKeyContentIncrementally AlwaysVerif withUrlOptions Nothing (Url.download' p iv (geturl exportloc) f) Nothing -> giveup $ needS3Creds (uuid r) - Left S3HandleAnonymousOldAws -> giveupS3HandleProblem S3HandleAnonymousOldAws (uuid r) where exportloc = bucketExportLocation info loc @@ -549,7 +546,6 @@ checkPresentExportS3 hv r info k loc = withS3Handle hv $ \case Just geturl -> withUrlOptions Nothing $ Url.checkBoth (geturl $ bucketExportLocation info loc) (fromKey keySize k) Nothing -> giveupS3HandleProblem S3HandleNeedCreds (uuid r) - Left S3HandleAnonymousOldAws -> giveupS3HandleProblem S3HandleAnonymousOldAws (uuid r) -- S3 has no move primitive; copy and delete. renameExportS3 :: S3HandleVar -> Remote -> RemoteStateHandle -> S3Info -> Key -> ExportLocation -> ExportLocation -> Annex (Maybe ()) @@ -787,7 +783,6 @@ checkPresentExportWithContentIdentifierS3 hv r info _k loc knowncids = getBucketLocation :: ParsedRemoteConfig -> RemoteGitConfig -> UUID -> Annex (Maybe S3.LocationConstraint) getBucketLocation c gc u = do -#if MIN_VERSION_aws(0,23,0) info <- extractS3Info c let info' = info { region = Nothing, host = Nothing } -- Force anonymous access, because this API call does not work @@ -797,9 +792,6 @@ getBucketLocation c gc u = do r <- liftIO $ tryNonAsync $ runResourceT $ sendS3Handle h (S3.getBucketLocation $ bucket info') return $ either (const Nothing) (Just . S3.gblrLocationConstraint) r -#else - return Nothing -#endif {- Generate the bucket if it does not already exist, including creating the - UUID file within the bucket. @@ -909,28 +901,19 @@ sendS3Handle h r = AWS.pureAws (hawscfg h) (hs3cfg h) (hmanager h) r type S3HandleVar = TVar (Either (Annex (Either S3HandleProblem S3Handle)) (Either S3HandleProblem S3Handle)) -data S3HandleProblem - = S3HandleNeedCreds - | S3HandleAnonymousOldAws +data S3HandleProblem = S3HandleNeedCreds giveupS3HandleProblem :: S3HandleProblem -> UUID -> Annex a giveupS3HandleProblem S3HandleNeedCreds u = do warning $ UnquotedString $ needS3Creds u giveup "No S3 credentials configured" -giveupS3HandleProblem S3HandleAnonymousOldAws _ = - giveup "This S3 special remote is configured with signature=anonymous, but git-annex is built with too old a version of the aws library to support that." {- Prepares a S3Handle for later use. Does not connect to S3 or do anything - else expensive. -} mkS3HandleVar :: Bool -> ParsedRemoteConfig -> RemoteGitConfig -> UUID -> Annex S3HandleVar mkS3HandleVar forceanonymous c gc u = liftIO $ newTVarIO $ Left $ if forceanonymous || isAnonymous c - then -#if MIN_VERSION_aws(0,23,0) - go =<< liftIO AWS.anonymousCredentials -#else - return (Left S3HandleAnonymousOldAws) -#endif + then go =<< liftIO AWS.anonymousCredentials else do mcreds <- getRemoteCredPair c gc (AWS.creds u) case mcreds of @@ -1011,11 +994,8 @@ s3Configuration _ua c = cfg | otherwise -> AWS.HTTP cfg = if usev4 $ getRemoteConfigValue signatureField c then (S3.s3v4 proto endpoint False S3.SignWithEffort) -#if MIN_VERSION_aws(0,24,0) { S3.s3Region = r } -#endif else (S3.s3 proto endpoint False) -#if MIN_VERSION_aws(0,24,0) { S3.s3Region = r } -- Use signature v4 for all AWS hosts by default, but don't use it by @@ -1029,7 +1009,6 @@ s3Configuration _ua c = cfg usev4 Nothing = False r = encodeBS <$> getRemoteConfigValue regionField c -#endif data S3Info = S3Info { bucket :: S3.Bucket @@ -1209,11 +1188,9 @@ s3Info :: ParsedRemoteConfig -> S3Info -> [(String, String)] s3Info c info = catMaybes [ Just ("bucket", fromMaybe "unknown" (getBucketName c)) , Just ("endpoint", decodeBS (S3.s3Endpoint s3c)) -#if MIN_VERSION_aws(0,24,0) , case S3.s3Region s3c of Nothing -> Nothing Just r -> Just ("region", decodeBS r) -#endif , Just ("port", show (S3.s3Port s3c)) , Just ("protocol", map toLower (show (S3.s3Protocol s3c))) , Just ("storage class", showstorageclass (getStorageClass c)) diff --git a/doc/special_remotes/S3.mdwn b/doc/special_remotes/S3.mdwn index 935839a815..36c3b101a3 100644 --- a/doc/special_remotes/S3.mdwn +++ b/doc/special_remotes/S3.mdwn @@ -47,7 +47,7 @@ the S3 remote. When using Amazon S3, if the remote will be used for backup or archival, and so its files are Infrequently Accessed, `STANDARD_IA` is a - good choice to save money (requires a git-annex built with aws-0.13.0). + good choice to save money. If you have configured git-annex to preserve multiple [[copies]], also consider setting this to `ONEZONE_IA` to save even more money. @@ -56,7 +56,7 @@ the S3 remote. use the [[glacier]] special remote, rather than this one. When using Google Cloud Storage, to make a nearline bucket, set this to - `NEARLINE`. (Requires a git-annex built with aws-0.13.0) + `NEARLINE`. Note that changing the storage class of an existing S3 remote will affect new objects sent to the remote, but not objects already @@ -67,7 +67,6 @@ the S3 remote. * `region` - Specify the region to use. Only makes sense to use when you also set `host`. - (Requires a git-annex built with aws-0.24.) * `protocol` - Either "http" (the default) or "https". Setting protocol=https implies port=443. diff --git a/git-annex.cabal b/git-annex.cabal index 0da9a56845..bd678f1a5c 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -278,7 +278,7 @@ Executable git-annex tasty-quickcheck, tasty-rerun, ansi-terminal >= 0.9, - aws (>= 0.22.1), + aws (>= 0.24.1), DAV (>= 1.0), network (>= 3.0.0.0), network-bsd,
probe AWS datacenter
S3: When initremote is given the name of a bucket that already exists,
automatically set datacenter to the right value, rather than needing it to
be explicitly set.
This needs aws-0.23. But, initremote stores the datacenter value, so
a remote set up this way can be used with git-annex built with an older aws.
This is not done when signature=anonymous, because in that case,
using AWS.defaultRegion works fine for accessing buckets on other
datacenters.
It feels a bit round-about to need to do this probing. But without it,
the problem seems to be that, with a v4 signature, the location constraint
is included in the Authorization header. When that is the wrong location,
AWS S3 rejects it. I do wonder though if there is an easier way that I
am currently missing.
Sponsored-by: Dartmouth College's DANDI project
S3: When initremote is given the name of a bucket that already exists,
automatically set datacenter to the right value, rather than needing it to
be explicitly set.
This needs aws-0.23. But, initremote stores the datacenter value, so
a remote set up this way can be used with git-annex built with an older aws.
This is not done when signature=anonymous, because in that case,
using AWS.defaultRegion works fine for accessing buckets on other
datacenters.
It feels a bit round-about to need to do this probing. But without it,
the problem seems to be that, with a v4 signature, the location constraint
is included in the Authorization header. When that is the wrong location,
AWS S3 rejects it. I do wonder though if there is an easier way that I
am currently missing.
Sponsored-by: Dartmouth College's DANDI project
diff --git a/CHANGELOG b/CHANGELOG index 434d2c160c..6fb6f99d6b 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -10,6 +10,9 @@ git-annex (10.20250722) UNRELEASED; urgency=medium a different S3 host, the default remains signature=v2. * webapp: Support setting up S3 buckets in regions that need v4 signatures. + * S3: When initremote is given the name of a bucket that already exists, + automatically set datacenter to the right value, rather than needing it + to be explicitly set. (This needs aws-0.23) -- Joey Hess <id@joeyh.name> Wed, 30 Jul 2025 13:45:42 -0400 diff --git a/Remote/S3.hs b/Remote/S3.hs index 1fa8d44cd3..4da5ff3f36 100644 --- a/Remote/S3.hs +++ b/Remote/S3.hs @@ -193,7 +193,7 @@ gen r u rc gc rs = do c <- parsedRemoteConfig remote rc cst <- remoteCost gc c expensiveRemoteCost info <- extractS3Info c - hdl <- mkS3HandleVar c gc u + hdl <- mkS3HandleVar False c gc u magic <- liftIO initMagicMime return $ new c cst info hdl magic where @@ -287,7 +287,17 @@ s3Setup' ss u mcreds c gc =<< configParser remote c' c'' <- if isAnonymous pc then pure c' - else setRemoteCredPair ss encsetup pc gc (AWS.creds u) mcreds + else do + v <- setRemoteCredPair ss encsetup pc gc (AWS.creds u) mcreds + if M.member datacenterField c || M.member regionField c + then return v + -- Check if a bucket with this name + -- already exists, and if so, use + -- that location, rather than the + -- default datacenterField. + else getBucketLocation pc gc u >>= return . \case + Nothing -> v + Just loc -> M.insert datacenterField (Proposed $ T.unpack loc) v pc' <- either giveup return . parseRemoteConfig c'' =<< configParser remote c'' info <- extractS3Info pc' @@ -322,7 +332,7 @@ s3Setup' ss u mcreds c gc =<< configParser remote archiveconfig info <- extractS3Info pc' checkexportimportsafe pc' info - hdl <- mkS3HandleVar pc' gc u + hdl <- mkS3HandleVar False pc' gc u withS3HandleOrFail u hdl $ writeUUIDFile pc' u info use archiveconfig pc' info @@ -775,6 +785,22 @@ checkPresentExportWithContentIdentifierS3 hv r info _k loc knowncids = where o = T.pack $ bucketExportLocation info loc +getBucketLocation :: ParsedRemoteConfig -> RemoteGitConfig -> UUID -> Annex (Maybe S3.LocationConstraint) +getBucketLocation c gc u = do +#if MIN_VERSION_aws(0,23,0) + info <- extractS3Info c + let info' = info { region = Nothing, host = Nothing } + -- Force anonymous access, because this API call does not work + -- when used in an authenticated context. + hdl <- mkS3HandleVar True c gc u + withS3HandleOrFail u hdl $ \h -> do + r <- liftIO $ tryNonAsync $ runResourceT $ + sendS3Handle h (S3.getBucketLocation $ bucket info') + return $ either (const Nothing) (Just . S3.gblrLocationConstraint) r +#else + return Nothing +#endif + {- Generate the bucket if it does not already exist, including creating the - UUID file within the bucket. - @@ -786,7 +812,7 @@ genBucket :: ParsedRemoteConfig -> RemoteGitConfig -> UUID -> Annex () genBucket c gc u = do showAction "checking bucket" info <- extractS3Info c - hdl <- mkS3HandleVar c gc u + hdl <- mkS3HandleVar False c gc u withS3HandleOrFail u hdl $ \h -> go info h =<< checkUUIDFile c u info h where @@ -896,9 +922,9 @@ giveupS3HandleProblem S3HandleAnonymousOldAws _ = {- Prepares a S3Handle for later use. Does not connect to S3 or do anything - else expensive. -} -mkS3HandleVar :: ParsedRemoteConfig -> RemoteGitConfig -> UUID -> Annex S3HandleVar -mkS3HandleVar c gc u = liftIO $ newTVarIO $ Left $ - if isAnonymous c +mkS3HandleVar :: Bool -> ParsedRemoteConfig -> RemoteGitConfig -> UUID -> Annex S3HandleVar +mkS3HandleVar forceanonymous c gc u = liftIO $ newTVarIO $ Left $ + if forceanonymous || isAnonymous c then #if MIN_VERSION_aws(0,23,0) go =<< liftIO AWS.anonymousCredentials @@ -1380,7 +1406,7 @@ enableBucketVersioning ss info c gc u = do where enableversioning b = do showAction "checking bucket versioning" - hdl <- mkS3HandleVar c gc u + hdl <- mkS3HandleVar False c gc u let setversioning = S3.putBucketVersioning b S3.VersioningEnabled withS3HandleOrFail u hdl $ \h -> #if MIN_VERSION_aws(0,24,3) diff --git a/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_5_4f41ad125c1af26cb94bd40c61cfcd9f._comment b/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_5_4f41ad125c1af26cb94bd40c61cfcd9f._comment new file mode 100644 index 0000000000..fc1f37bcb8 --- /dev/null +++ b/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_5_4f41ad125c1af26cb94bd40c61cfcd9f._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-08-13T18:22:53Z" + content=""" +I got GetBucketLocation to work, although only when git-annex is built with +aws-0.23 or newer. + + joey@darkstar:~/tmp/a9>git-annex initremote s3-originnew type=S3 importtree=yes encryption=none autoenable=true bucket=dandiarchive fileprefix=zarr-checksums/2ac71edb-738c-40ac-bd8c-8ca985adaa12/ + initremote s3-originnew (checking bucket...) ok + (recording state in git...) +"""]] diff --git a/doc/special_remotes/S3.mdwn b/doc/special_remotes/S3.mdwn index 96dfdb56d7..935839a815 100644 --- a/doc/special_remotes/S3.mdwn +++ b/doc/special_remotes/S3.mdwn @@ -35,11 +35,10 @@ the S3 remote. embedcreds without gpg encryption. * `datacenter` - Specifies which Amazon datacenter - to use for the bucket. Defaults to "US". Other values include "EU" - (which is EU/Ireland), "us-west-1", "us-west-2", "ap-southeast-1", - "ap-southeast-2", and "sa-east-1". See Amazon's documentation for a - complete list. Configuring this is equivilant to configuring both - `host` and `region`. + to use when creating a bucket. Defaults to "US". Other values include "EU" + (which is EU/Ireland), "us-west-1", "us-west-2", etc. See Amazon's + documentation for a complete list. Configuring this is equivilant to + configuring both `host` and `region`. * `storageclass` - Default is "STANDARD". Consult S3 provider documentation for pricing details and available
wishlist of faster/specific info for a remote
diff --git a/doc/todo/make___96__info_--in_REMOTE__96___report_only_for_the_REMOTE.mdwn b/doc/todo/make___96__info_--in_REMOTE__96___report_only_for_the_REMOTE.mdwn new file mode 100644 index 0000000000..8dac199517 --- /dev/null +++ b/doc/todo/make___96__info_--in_REMOTE__96___report_only_for_the_REMOTE.mdwn @@ -0,0 +1,40 @@ +I would like to discover how much data is stored in a special remote across repositories. But it might take annex many minutes to figure out other stats like size of the files in the tree etc, which I do not care. So I wondered if for e.g. + +``` +(venv-annex) dandi@drogon:/mnt/backup/dandi/dandisets$ git -C 000003 annex info --in dandi-dandisets-dropbox +trusted repositories: 0 +semitrusted repositories: 3 + 00000000-0000-0000-0000-000000000001 -- web + 00000000-0000-0000-0000-000000000002 -- bittorrent + b7fcf214-e492-4f2c-8789-708af9fd4656 -- dandi@drogon:/mnt/backup/dandi/dandisets/000003 [here] +untrusted repositories: 1 + 727f466f-60c3-4778-90b2-b2332856c2f8 -- [dandi-dandisets-dropbox] +transfers in progress: none +available local disk space: 24.88 terabytes (+100 megabytes reserved) +local annex keys: 101 +local annex size: 2.56 terabytes +annexed files in working tree: 101 +size of annexed files in working tree: 2.56 terabytes +combined annex size of all repositories: 7.68 terabytes +annex sizes of repositories: + 2.56 TB: 00000000-0000-0000-0000-000000000001 -- web + 2.56 TB: 727f466f-60c3-4778-90b2-b2332856c2f8 -- [dandi-dandisets-dropbox] + 2.56 TB: b7fcf214-e492-4f2c-8789-708af9fd4656 -- dandi@drogon:/mnt/backup/dandi/dandisets/000003 [here] +backend usage: + SHA256E: 101 +bloom filter size: 32 mebibytes (0% full) +``` + +I could just (quickly) get + +``` +untrusted repositories: + 727f466f-60c3-4778-90b2-b2332856c2f8 -- [dandi-dandisets-dropbox] +annex sizes of repositories: + 2.56 TB: 727f466f-60c3-4778-90b2-b2332856c2f8 -- [dandi-dandisets-dropbox] +``` + +with `--json` also correspondingly trimmed up. Or it could potentially be a different record output entirely, concentrating on that remote? + +[[!meta author=yoh]] +[[!tag projects/dandi]]
S3: Default to signature=v4 when using an AWS endpoint
* S3: Default to signature=v4 when using an AWS endpoint, since some
AWS regions need v4 and all support it. When host= is used to specify
a different S3 host, the default remains signature=v2.
* webapp: Support setting up S3 buckets in regions that need v4
signatures.
For the webapp, went ahead and added all current S3 regions
(except govcloud, which is not usable by everyone).
Sponsored-by: Dartmouth College's DANDI project
* S3: Default to signature=v4 when using an AWS endpoint, since some
AWS regions need v4 and all support it. When host= is used to specify
a different S3 host, the default remains signature=v2.
* webapp: Support setting up S3 buckets in regions that need v4
signatures.
For the webapp, went ahead and added all current S3 regions
(except govcloud, which is not usable by everyone).
Sponsored-by: Dartmouth College's DANDI project
diff --git a/CHANGELOG b/CHANGELOG index 7c210b3bb9..434d2c160c 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -5,6 +5,11 @@ git-annex (10.20250722) UNRELEASED; urgency=medium * Added git-remote-p2p-annex, which allows git pull and push to P2P networks provided by commands git-annex-p2p-<netname> * stack.yaml: Update to lts-24.2. + * S3: Default to signature=v4 when using an AWS endpoint, since some + AWS regions need v4 and all support it. When host= is used to specify + a different S3 host, the default remains signature=v2. + * webapp: Support setting up S3 buckets in regions that need v4 + signatures. -- Joey Hess <id@joeyh.name> Wed, 30 Jul 2025 13:45:42 -0400 diff --git a/Remote/Helper/AWS.hs b/Remote/Helper/AWS.hs index 92608ee0a8..2c9d535a85 100644 --- a/Remote/Helper/AWS.hs +++ b/Remote/Helper/AWS.hs @@ -52,20 +52,40 @@ regionInfo service = map (\(t, r) -> (t, fromServiceRegion r)) $ concatMap (\(t, l) -> map (t,) l) regions where regions = - [ ("US East (N. Virginia)", [S3Region "US", GlacierRegion "us-east-1"]) - , ("US West (Oregon)", [BothRegion "us-west-2"]) + -- Based on the list at https://docs.aws.amazon.com/general/latest/gr/s3.html + [ ("US East (Ohio)", [S3Region "us-east-2"]) + , ("US East (N. Virginia)", [S3Region "US", GlacierRegion "us-east-1"]) , ("US West (N. California)", [BothRegion "us-west-1"]) - , ("EU (Ireland)", [S3Region "EU", GlacierRegion "eu-west-1"]) + , ("US West (Oregon)", [BothRegion "us-west-2"]) + , ("Africa (Cape Town)", [S3Region "af-south-1"]) + , ("Asia Pacific (Hong Kong)", [S3Region "ap-east-1"]) + , ("Asia Pacific (Hyderabad)", [S3Region "ap-south-2"]) + , ("Asia Pacific (Jakarta)", [S3Region "ap-southeast-3"]) + , ("Asia Pacific (Malaysia)", [S3Region "ap-southeast-5"]) + , ("Asia Pacific (Melbourne)", [S3Region "ap-southeast-4"]) + , ("Asia Pacific (Mumbai)", [S3Region "ap-south-1"]) + , ("Asia Pacific (Osaka)", [S3Region "ap-northeast-3"]) + , ("Asia Pacific (Seoul)", [S3Region "ap-northeast-2"]) , ("Asia Pacific (Singapore)", [S3Region "ap-southeast-1"]) - , ("Asia Pacific (Tokyo)", [BothRegion "ap-northeast-1"]) , ("Asia Pacific (Sydney)", [S3Region "ap-southeast-2"]) + , ("Asia Pacific (Taipei)", [S3Region "ap-east-2"]) + , ("Asia Pacific (Thailand)", [S3Region "ap-southeast-7"]) + , ("Asia Pacific (Tokyo)", [BothRegion "ap-northeast-1"]) + , ("Canada (Central)", [S3Region "ca-central-1"]) + , ("Canada West (Calgary)", [S3Region "ca-west-1"]) + , ("EU (Frankfurt)", [BothRegion "eu-central-1"]) + , ("EU (Ireland)", [S3Region "EU", GlacierRegion "eu-west-1"]) + , ("Europe (London)", [S3Region "eu-west-2"]) + , ("Europe (Milan)", [S3Region "eu-south-1"]) + , ("Europe (Paris)", [S3Region "eu-west-3"]) + , ("Europe (Spain)", [S3Region "eu-south-2"]) + , ("Europe (Stockholm)", [S3Region "eu-north-1"]) + , ("Europe (Zurich)", [S3Region "eu-central-2"]) + , ("Israel (Tel Aviv)", [S3Region "il-central-1"]) + , ("Mexico (Central)", [S3Region "mx-central-1"]) + , ("Middle East (Bahrain)", [S3Region "me-south-1"]) + , ("Middle East (UAE)", [S3Region "me-central-1"]) , ("South America (São Paulo)", [S3Region "sa-east-1"]) - -- These need signature V4 to be used, and currently v2 is - -- the default, so to add these would need other changes. - -- , ("EU (Frankfurt)", [BothRegion "eu-central-1"]) - -- , ("Asia Pacific (Seoul)", [S3Region "ap-northeast-2"]) - -- , ("Asia Pacific (Mumbai)", [S3Region "ap-south-1"]) - -- , ("US East (Ohio)", [S3Region "us-east-2"]) ] fromServiceRegion (BothRegion s) = s diff --git a/Remote/S3.hs b/Remote/S3.hs index b17b6c268a..1fa8d44cd3 100644 --- a/Remote/S3.hs +++ b/Remote/S3.hs @@ -163,11 +163,12 @@ signatureField = Accepted "signature" data SignatureVersion = SignatureVersion Int + | DefaultSignatureVersion | Anonymous signatureVersionParser :: RemoteConfigField -> FieldDesc -> RemoteConfigFieldParser signatureVersionParser f fd = - genParser go f (Just defver) fd + genParser go f (Just DefaultSignatureVersion) fd (Just (ValueDesc "v2 or v4 or anonymous")) where go "v2" = Just (SignatureVersion 2) @@ -175,8 +176,6 @@ signatureVersionParser f fd = go "anonymous" = Just Anonymous go _ = Nothing - defver = SignatureVersion 2 - isAnonymous :: ParsedRemoteConfig -> Bool isAnonymous c = case getRemoteConfigValue signatureField c of @@ -984,16 +983,25 @@ s3Configuration _ua c = cfg Nothing | port == 443 -> AWS.HTTPS | otherwise -> AWS.HTTP - cfg = case getRemoteConfigValue signatureField c of - Just (SignatureVersion 4) -> - (S3.s3v4 proto endpoint False S3.SignWithEffort) + cfg = if usev4 $ getRemoteConfigValue signatureField c + then (S3.s3v4 proto endpoint False S3.SignWithEffort) #if MIN_VERSION_aws(0,24,0) - { S3.s3Region = r } + { S3.s3Region = r } #endif - _ -> (S3.s3 proto endpoint False) + else (S3.s3 proto endpoint False) #if MIN_VERSION_aws(0,24,0) - { S3.s3Region = r } + { S3.s3Region = r } + -- Use signature v4 for all AWS hosts by default, but don't use it by + -- default for other S3 hosts, which may not support it. + usev4 (Just DefaultSignatureVersion) + | h == AWS.s3DefaultHost = True + | otherwise = False + usev4 (Just (SignatureVersion 4)) = True + usev4 (Just (SignatureVersion _)) = False + usev4 (Just Anonymous) = False + usev4 Nothing = False + r = encodeBS <$> getRemoteConfigValue regionField c #endif diff --git a/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_4_d9a3fdfa34b780aafb5e5874fc8d98bc._comment b/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_4_d9a3fdfa34b780aafb5e5874fc8d98bc._comment new file mode 100644 index 0000000000..c75dc2e098 --- /dev/null +++ b/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_4_d9a3fdfa34b780aafb5e5874fc8d98bc._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-08-13T17:03:47Z" + content=""" +Made signature=v4 be used by default for AWS endpoints. Note that, +if you want a special remote to be able to be used by an older version +of git-annex, it would still make sense to explicitly specify signature=v4. +"""]] diff --git a/doc/special_remotes/S3.mdwn b/doc/special_remotes/S3.mdwn index 3684432f2a..96dfdb56d7 100644 --- a/doc/special_remotes/S3.mdwn +++ b/doc/special_remotes/S3.mdwn @@ -87,9 +87,8 @@ the S3 remote. indication that you need to use this. * `signature` - This controls the S3 signature version to use. - "v2" is currently the default, "v4" is needed to use some S3 services. - If you get some kind of authentication error, try "v4". - To access a S3 bucket anonymously, use "anonymous". + The default is "v4" when using Amazon S3, but "v2" when using other + hosts. To access a S3 bucket anonymously, use "anonymous". * `bucket` - S3 requires that buckets have a globally unique name, so by default, a bucket name is chosen based on the remote name
comment
diff --git a/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_3_6f35944c955cc9908450d126954e3839._comment b/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_3_6f35944c955cc9908450d126954e3839._comment new file mode 100644 index 0000000000..bb9a65a9e8 --- /dev/null +++ b/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_3_6f35944c955cc9908450d126954e3839._comment @@ -0,0 +1,31 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-08-13T16:23:48Z" + content=""" +I can replicate that behavior here with the command given, leaving off +"region=us-east-2", which is handy for +testing (needs AWS credentials in the environment). + +It would be possible for git-annex to use GetBucketLocation to determine +the location of an existing bucket, rather than requiring it to be manually +specified. I don't see much downside in making it do that. It would of +course fail for a new bucket, and it might be that some existing buckets +are not configured to allow that API call. Or another S3 server might not +support it. But as long as it falls back to current behavior it seems it +would only avoid a problem. + +As for determining which signature version to use, the problem is that +a S3 implementation might not support signature v4 yet. So using it by +default could break existing setups. It seems likely that there are plenty +of third party S3 servers out there that only support v2. +[According to AWS documentation](https://docs.aws.amazon.com/AmazonS3/latest/API/specify-signature-version.html), +every region supports v4 now, while some +regions do not support v2. So git-annex could have a special case for AWS +to use v4 by default. + +While AWS tried and failed to deprecate v2 entirely around 2019, +they ended up keeping it for those regions. That could still change I +suppose, and so it might be good for git-annex to proactively use v4 for +all AWS buckets. +"""]]
diff --git a/doc/bugs/exporttree_exports_plain_git_files.mdwn b/doc/bugs/exporttree_exports_plain_git_files.mdwn new file mode 100644 index 0000000000..3561413e08 --- /dev/null +++ b/doc/bugs/exporttree_exports_plain_git_files.mdwn @@ -0,0 +1,30 @@ +### Please describe the problem. + +### What steps will reproduce the problem? + + +git-annex initremote 2gt type=directory directory=/Volumes/2gt/photos encryption=none importtree=yes exporttree=yes + +git annex export HEAD -t 2gt + +.gitignore and .gitattributes (plain git, not annexed) make it to the export. + +Tried to limit this with (inbackend=SHA256E or inbackend=SHA512E) , did not help. + +### What version of git-annex are you using? On what operating system? + +10.20250721 , homebrew on Mac + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +
add https://pypi.org/project/git-annex-remote-internxt/
diff --git a/doc/special_remotes.mdwn b/doc/special_remotes.mdwn index ba483a9bf6..01b4567e0d 100644 --- a/doc/special_remotes.mdwn +++ b/doc/special_remotes.mdwn @@ -48,6 +48,7 @@ Here are specific instructions for using git-annex with various services: * [[Google Drive|tips/using_Google_Drive]] * [[hubiC|tips/using_Hubic]] * [[IMAP|forum/special_remote_for_IMAP]] +* [Internxt Drive](https://pypi.org/project/git-annex-remote-internxt/) * [[tips/Internet_Archive_via_S3]] * [[ipfs]] * [[Jottacloud|rclone]]
Added a comment: repodata is empty again 

diff --git a/doc/install/rpm_standalone/comment_6_bf6570dbd8ceafccf89cd13562429313._comment b/doc/install/rpm_standalone/comment_6_bf6570dbd8ceafccf89cd13562429313._comment new file mode 100644 index 0000000000..2ccc0b56a4 --- /dev/null +++ b/doc/install/rpm_standalone/comment_6_bf6570dbd8ceafccf89cd13562429313._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="evgeni" + avatar="http://cdn.libravatar.org/avatar/dc99bb7d8eb1b46c509f00925f486173" + subject="repodata is empty again :(" + date="2025-08-08T12:49:42Z" + content=""" +Hi, + +seems the repordata folder is empty again and no release is published :( + +Evgeni +"""]]
Add YouTube link
diff --git a/doc/videos/tdf4-life-in-git-annex.mdwn b/doc/videos/tdf4-life-in-git-annex.mdwn index b46ba97547..d58c6ec85f 100644 --- a/doc/videos/tdf4-life-in-git-annex.mdwn +++ b/doc/videos/tdf4-life-in-git-annex.mdwn @@ -1,6 +1,7 @@ Yann Büchau's (German) talk „Das Leben in Git (Annex)” („Life in Git (Annex)”) at the [Tage der Digitalen Freiheit](https://tdf.cttue.de/) in Tübingen, Germany: - [on media.ccc.de](https://media.ccc.de/v/tdf4-26-das-leben-in-git-annex-) +- [on YouTube](https://www.youtube.com/watch?v=LS2J-4LmRd0) [[!meta date="26 Jul 2025"]] [[!meta title="'life in git annex' talk by Yann Büchau at TdF2025"]]
Updated the S3-special-remote example and prose based on recent B2 experience
diff --git a/doc/tips/using_Backblaze_B2.mdwn b/doc/tips/using_Backblaze_B2.mdwn index 2669876cc6..059be63ee7 100644 --- a/doc/tips/using_Backblaze_B2.mdwn +++ b/doc/tips/using_Backblaze_B2.mdwn @@ -13,10 +13,21 @@ choices: Here is how to set up the special remote: - git annex initremote backblaze type=S3 signature=v4 host=$endpoint bucket=$bucketid protocol=https + export AWS_ACCESS_KEY_ID=$appKeyId + export AWS_SECRET_ACCESS_KEY=$appKeySecret + git annex initremote backblaze type=S3 signature=v4 host=$endpoint bucket=$bucketname protocol=https encryption=$encryption - Remember to replace $endpoint with the actual backblaze endpoint and $bucketid with - the bucketid. + Remember to replace: + + - `$appKeyId` and `$appKeySecret` with the values displayed by B2 when you created an "Application Key". + - `$endpoint` with the B2 endpoint to which your account has an affinity. + - An easy way to find the correct value is to create a temporary bucket in the service's web UI, and then use the "endpoint" field displayed for *that* bucket (before deleting the temporary bucket!). + - `$bucketname` with the name of the bucket you wish to use + - This is **not** the opaque B2 "Bucket ID". + - `$encryption` with the [[encryption]] setting you want git-annex to apply to the files it stores in the bucket. + - This is **unrelated** to the B2 server-side bucket encryption setting. + + If a bucket with the specified name does not already exist then git-annex will attempt to create it. If the access key provided does not permit bucket creation then the `initremote` command will fail. The bucket will be created in the unchangeable region that the access key's account is pinned to, with **infinite versioning enabled** and **server-side encryption disabled**. Given that you're probably using some form of git-annex [[encryption]], and that files with server-side encryption enabled are not included in a server-side bucket-level snapshot, you may wish to leave the server-side encryption disabled. * A dedicated special remote, <https://github.com/encryptio/git-annex-remote-b2> (Last updated 2016)
comment
diff --git a/doc/todo/Peer_to_peer_connection_purely_over_magic-wormhole/comment_4_1327394bf91f4c18be988cbfe15dfa58._comment b/doc/todo/Peer_to_peer_connection_purely_over_magic-wormhole/comment_4_1327394bf91f4c18be988cbfe15dfa58._comment new file mode 100644 index 0000000000..f86a147433 --- /dev/null +++ b/doc/todo/Peer_to_peer_connection_purely_over_magic-wormhole/comment_4_1327394bf91f4c18be988cbfe15dfa58._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-08-01T17:56:23Z" + content=""" +It should be possible to implement this with a small shell script now I +think. See [[special_remotes/p2p]] and [[design/generic_p2p_transport]]. + +I am leaving this todo open at least temporarily, but it does seem better +to implement this as addon commands rather than directly in git-annex. +"""]]
link to new page
diff --git a/doc/git-annex-p2p.mdwn b/doc/git-annex-p2p.mdwn index deb79c89aa..56e74cf8a0 100644 --- a/doc/git-annex-p2p.mdwn +++ b/doc/git-annex-p2p.mdwn @@ -17,8 +17,8 @@ network. (This needs Tor to be installed.) git-annex can also support other P2P networks, using a helper program that you can install. These programs have names of the form `git-annex-p2p-<netname>`. See -<https://git-annex.branchable.com/design/generic_p2p_transport/> -for documentation about how to create such a program. +<https://git-annex.branchable.com/special_remotes/p2p/> +for details. # OPTIONS
add section on security
diff --git a/doc/design/generic_p2p_transport.mdwn b/doc/design/generic_p2p_transport.mdwn index 32997b382a..e362c70668 100644 --- a/doc/design/generic_p2p_transport.mdwn +++ b/doc/design/generic_p2p_transport.mdwn @@ -50,3 +50,11 @@ P2P network and dummys it up by symlinking unix socket files together, its skeleton should be a good starting point. [[special_remotes/p2p/git-annex-p2p-unix-sockets]] + +## security + +This is only as secure as the underlying P2P network. +It is really designed with P2P networks in mind that are fully encrypted, +and that use cryptography to validate the identities of peers. + +See the security discussion on [[special_remotes/p2p]].
inline didn't work due to extension
diff --git a/doc/design/generic_p2p_transport.mdwn b/doc/design/generic_p2p_transport.mdwn index abfe428956..32997b382a 100644 --- a/doc/design/generic_p2p_transport.mdwn +++ b/doc/design/generic_p2p_transport.mdwn @@ -49,4 +49,4 @@ Here's a simple shell script example. While this avoids using any real P2P network and dummys it up by symlinking unix socket files together, its skeleton should be a good starting point. -[[!inline pages="special_remotes/p2p/git-annex-p2p-unix-sockets" feeds=no]] +[[special_remotes/p2p/git-annex-p2p-unix-sockets]]
layout
diff --git a/doc/special_remotes/p2p.mdwn b/doc/special_remotes/p2p.mdwn index 2a7fe93bce..37e606dd4d 100644 --- a/doc/special_remotes/p2p.mdwn +++ b/doc/special_remotes/p2p.mdwn @@ -10,9 +10,9 @@ For other P2P networks, a fairly simple program is used to connect git-annex up with the network. Install one of these programs to use the P2P network of your choice: -* [[git-annex-p2p-unix-sockets]] This is only a demo, using unix - sockets in `/tmp` rather than a real P2P network. Not for real world - use. +* [[git-annex-p2p-unix-sockets]] + This is only a demo, using unix sockets in `/tmp` rather than a real + P2P network. Not for real world use. To write your own program to use the P2P network of your choice, see [[design/generic_p2p_transport]]. Edit this page to add more programs!
add example git-annex-p2p-unix-sockets program and end-user docs
diff --git a/COPYRIGHT b/COPYRIGHT index 3ca3debd09..9dbafd6a74 100644 --- a/COPYRIGHT +++ b/COPYRIGHT @@ -10,7 +10,7 @@ Copyright: © 2012-2017 Joey Hess <id@joeyh.name> © 2014 Sören Brunk License: AGPL-3+ -Files: doc/special_remotes/external/* +Files: doc/special_remotes/external/* doc/special_remotes/p2p/git-annex-p2p-unix-sockets Copyright: © 2013 Joey Hess <id@joeyh.name> License: GPL-3+ diff --git a/doc/design/generic_p2p_transport.mdwn b/doc/design/generic_p2p_transport.mdwn index 18e966bff8..abfe428956 100644 --- a/doc/design/generic_p2p_transport.mdwn +++ b/doc/design/generic_p2p_transport.mdwn @@ -11,7 +11,13 @@ used to connect to a given peer by address across the network. A git remote using the P2P network has an url of the form `p2p-annex::<netname>:<address>` -To connect to that remote, git-annex runs the command +The program [[git-remote-p2p-annex]] is included in git-annex as a git +remote helper program. git will use that program to handle `pull` and +`push` with git remotes that use the `p2p-annex::` url scheme. + +## program interface + +To connect to a P2P remote, git-annex runs the command `git-annex-p2p-<netname>`, giving it the P2P network address as its only parameter. The command is responsible for connecting to that peer, and relaying data to it. Data fed into the command on stdin should be sent to @@ -37,6 +43,10 @@ Note that, if the P2P network does not natively use a unix socket file, a command like `socat` can be run by `git-annex-p2p-<netname> socket` to convert the P2P network's own equivilant into a unix socket file. -The program [[git-remote-p2p-annex]] is included in git-annex as a git -remote helper program. git will use that program to handle `pull` and -`push` with git remotes that use the `p2p-annex::` url scheme. +## example + +Here's a simple shell script example. While this avoids using any real +P2P network and dummys it up by symlinking unix socket files together, +its skeleton should be a good starting point. + +[[!inline pages="special_remotes/p2p/git-annex-p2p-unix-sockets" feeds=no]] diff --git a/doc/special_remotes.mdwn b/doc/special_remotes.mdwn index 93c2ac297d..ba483a9bf6 100644 --- a/doc/special_remotes.mdwn +++ b/doc/special_remotes.mdwn @@ -20,6 +20,7 @@ the content of files. * [[git]] * [[hook]] * [[httpalso]] +* [[p2p]] * [[rclone]] * [[rsync]] * [[S3]] (Amazon S3, and other compatible services) diff --git a/doc/special_remotes/external.mdwn b/doc/special_remotes/external.mdwn index aa61273570..94b2bad263 100644 --- a/doc/special_remotes/external.mdwn +++ b/doc/special_remotes/external.mdwn @@ -1,4 +1,4 @@ -There are three ways to implement a new special remote: +There are four ways to implement a new special remote: 1. Using the [[hook]] special remote to tell git-annex what commands to run to store and retrieve data. This is the easiest way, and diff --git a/doc/special_remotes/p2p.mdwn b/doc/special_remotes/p2p.mdwn new file mode 100644 index 0000000000..2a7fe93bce --- /dev/null +++ b/doc/special_remotes/p2p.mdwn @@ -0,0 +1,57 @@ +A P2P network can be used to connect together git-annex repositories. This +lets a regular git remote have an url that points to another peer on the +network. Both git fetch/push and git-annex can be used with that remote +the same as any other remote. + +The [[tor]] support is a special case of this, which is built into +git-annex. + +For other P2P networks, a fairly simple program is used to connect +git-annex up with the network. Install one of these programs to use the P2P +network of your choice: + +* [[git-annex-p2p-unix-sockets]] This is only a demo, using unix + sockets in `/tmp` rather than a real P2P network. Not for real world + use. + +To write your own program to use the P2P network of your choice, +see [[design/generic_p2p_transport]]. Edit this page to add more programs! + +## setup + +Once you have the program installed, the next step is to run this in your +git-annex repository to enable the P2P network: + + git-annex p2p --enable <netname> + +Replace `<netname>` with the name of the program after the +"git-annex-p2p-". + +Then [[git-annex remotedaemon|git-annex-remotedaemon]] can be used to serve +incoming connections from peers, and [[git-annex p2p|git-annex-p2p]] can be +used to set up connections to peers on the network. For example, you and a +friend could run these commands in your repositories to pair them: + + git-annex remotedaemon + git-annex p2p --pair + +Once a connection with a peer is set up, you have a git remote that can be +used like any other remote. Including `git pull`, `git push`, and using +git-annex commands to store content on it, etc. + +## security + +This is only as secure as the underlying P2P network, and it should only be +used with P2P networks that at least encrypt all traffic sent over them. + +It's a good idea, but not mandatory, for the P2P network to +cryptographically verify the identity of peers. Any modern encrypted P2P +network should do that. That prevents you from connecting to an +impersonator, and perhaps leaking data from your repository to them. +However, even if the P2P network does not verify the identity of peers, +git-annex only allows people you have paired with to connect to your +repository. + +Anyone you give access to your git-annex repository using this can get any +of the files stored in it, and can also drop the content of any annexed file +from it. diff --git a/doc/special_remotes/p2p/git-annex-p2p-unix-sockets b/doc/special_remotes/p2p/git-annex-p2p-unix-sockets new file mode 100755 index 0000000000..100dee5291 --- /dev/null +++ b/doc/special_remotes/p2p/git-annex-p2p-unix-sockets @@ -0,0 +1,36 @@ +#!/bin/sh +# Example P2P network transit program for git-annex. +# +# This simulates a multi-node P2P network using unix +# socket files in /tmp. +# +# Copyright 2025 Joey Hess; icenced under the GNU GPL version 3 or higher. + +set -e + +if [ "$1" = address ]; then + # Output the local P2P network address. + # + # For the purposes of this example, a new address is made up each + # time this is run. Using the current unix second to get a fairly + # unique address. + myaddress=$(date +%s) + echo "$myaddress" +else + socketfile="$2" + if [ -z "$socketfile" ]; then + # Connect to the peer's address and relay stdin and stdout. + peeraddress="$1" + # For the purposes of this demo, socat is used, and simply + # connects stdio to the unix socket file in /tmp. + socat - UNIX-CONNECT:"/tmp/$peeraddress" + else + # Arrange for incoming connections from peers to connect to + # the unix socket provided by git-annex. The local + # P2P network address is also available to use. + myaddress="$1" + # For the purposes of this demo, the socket file provided + # by git-annex is symlinked to the location in /tmp. + ln -sf $(realpath "$socketfile") "/tmp/$myaddress" + fi +fi diff --git a/doc/todo/generic_p2p_socket_transport.mdwn b/doc/todo/generic_p2p_socket_transport.mdwn index d86bb9ed97..1bc9b81eff 100644 --- a/doc/todo/generic_p2p_socket_transport.mdwn +++ b/doc/todo/generic_p2p_socket_transport.mdwn @@ -14,3 +14,5 @@ This should also make it possible to build e.g. a `git annex enable-yggstack` an What do you think? [[!tag projects/INM7]] + +> [[done]] --[[Joey]] diff --git a/doc/todo/generic_p2p_socket_transport/comment_14_f9f21446112ba73b29572a951a837429._comment b/doc/todo/generic_p2p_socket_transport/comment_14_f9f21446112ba73b29572a951a837429._comment new file mode 100644 index 0000000000..bb482cedca --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_14_f9f21446112ba73b29572a951a837429._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 14""" + date="2025-08-01T17:41:53Z" + content=""" +This is merged! + +See [[special_remotes/p2p]] for the end-user documentation, +including a list where you can add whatever scripts you implement for +different P2P networks. +"""]]
update design doc with changes from genericp2p branch
That branch is basically ready to merge, but needs more testing in a
chicken and egg situation.
That branch is basically ready to merge, but needs more testing in a
chicken and egg situation.
diff --git a/doc/design/p2p_socket_transport.mdwn b/doc/design/generic_p2p_transport.mdwn similarity index 69% rename from doc/design/p2p_socket_transport.mdwn rename to doc/design/generic_p2p_transport.mdwn index 725acb8b5e..18e966bff8 100644 --- a/doc/design/p2p_socket_transport.mdwn +++ b/doc/design/generic_p2p_transport.mdwn @@ -1,4 +1,6 @@ This is a generic interface that allows git-annex to use a P2P network. +The [[P2P_protocol]] is used, to allow accessing a peer's git-annex +repository as a git remote. Examples of such networks are tor, yggstack or fowl. (git-annex has a built-in integration with tor which does not use this interface.) @@ -7,7 +9,7 @@ Such a P2P network has some form of address, which can be used to connect to a given peer by address across the network. A git remote using the P2P network has an url of the form -`p2p-annex::<netname>+<address>` +`p2p-annex::<netname>:<address>` To connect to that remote, git-annex runs the command `git-annex-p2p-<netname>`, giving it the P2P network address as its only @@ -17,24 +19,24 @@ the peer, and data received from the peer should be output to stdout. If it is unable to connect, the command can exit nonzero. When the peer closes connection, the command can exit zero. -To handle incoming connections from peers, `git-annex remotedaemon` -runs `git-annex-p2p-<netname>` with the parameter "socket", followed -by the P2P address of the local repository. The command -should output the path of a unix socket file. When it does, `git-annex -remotedaemon` will use that socket file to listen for connections from -peers, and service them. (The [[P2P_protocol]] is spoken over these -connections.) +To configure `git-annex remotedaemon` to listen on a given P2P network, +the user runs `git-annex p2p --enable <netname>`. That also +runs `git-annex-p2p-<netname>`, this time with the parameter "address". +That should output one or more lines, the P2P network address (or addresses) +that can be used by peers to connect to the repository. It can first do +whatever it needs to do to set up the P2P network. + +To handle incoming connections from peers, `git-annex remotedaemon` runs +`git-annex-p2p-<netname>`, with two parameters. The first parameter is the +P2P address of the local repository, obtained earlier as described above. +The second parameter is the path to a unix socket file, which git-annex +will have already created. git-annex listens for connections from peers +that are made to the socket, and services them. Note that, if the P2P network does not natively use a unix socket file, a command like `socat` can be run by `git-annex-p2p-<netname> socket` to convert the P2P network's own equivilant into a unix socket file. -To configure `git-annex remotedaemon` to listen on a given P2P network, -the user runs `git-annex p2p --enable <netname>`. That also -runs `git-annex-p2p-<netname>`, this time with the parameter "address". -That should output the P2P network address that can be used by peers -to connect to the repository. - The program [[git-remote-p2p-annex]] is included in git-annex as a git remote helper program. git will use that program to handle `pull` and `push` with git remotes that use the `p2p-annex::` url scheme. diff --git a/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment b/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment index b6b2203482..941ebe62c1 100644 --- a/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment +++ b/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment @@ -3,7 +3,7 @@ subject="""comment 12""" date="2025-07-29T16:41:07Z" content=""" -I have started a design document at [[design/p2p_socket_transport]], +I have started a design document at [[design/generic_p2p_transport]], to collect all the scattered decisions here into a coherent document that can be used by someone implementing support for one of these networks. diff --git a/doc/todo/generic_p2p_socket_transport/comment_13_b415aa7cd50892c562eba9de2f4a47e9._comment b/doc/todo/generic_p2p_socket_transport/comment_13_b415aa7cd50892c562eba9de2f4a47e9._comment new file mode 100644 index 0000000000..38c9fbb62a --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_13_b415aa7cd50892c562eba9de2f4a47e9._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 13""" + date="2025-07-31T19:16:43Z" + content=""" +The `genericp2p` branch now has what seems to be a fully working +implementation of this. I have not fully tested it because I don't +have a real git-annex-p2p-foo command to test it with. Still I can see the +remotedaemon handling connections, and also making outgoing connections +seems to work. + +There have been some changes to the design, especially around how the +socket file is set up. +"""]]
Added a comment
diff --git a/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_2_05bc083421a4ba56951315d24e4a7c96._comment b/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_2_05bc083421a4ba56951315d24e4a7c96._comment new file mode 100644 index 0000000000..59e582842f --- /dev/null +++ b/doc/bugs/fails_to_authenticate_into_S3_for_initremote__63__/comment_2_05bc083421a4ba56951315d24e4a7c96._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 2" + date="2025-07-31T19:07:57Z" + content=""" +- adding `signature=v4` brought a different error which mentioned `s3ErrorMessage = \"The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-east-2'\"` +- adding `region=us-east-2` while running older `10.20241031-1~ndall+1` version of git-annex seems resulted in no change +- running newer `10.20250721-g8867e7590a3a70afa8a93d2fefab94adc9a176d0` installed from pypi seems took that into account and worked + +``` +(venv-annex) dandi@drogon:/mnt/backup/dandi/tmp/dandisets/test-importtree-s3$ git annex initremote s3-origin type=S3 importtree=yes encryption=none autoenable=true bucket=dandiarchive fileprefix=zarr-checksums/2ac71edb-738c-40ac-bd8c-8ca985adaa12/ signature=v4 region=us-east-2 +initremote s3-origin (checking bucket...) ok +(recording state in git...) +``` + +and then `import` worked. + +- couldn't it figure out which signature to use? +- it definitely could/should retry automagically with a correct zone or even start with it + + +"""]]
changed design for p2p generic socket
Having the git-annex-p2p-<netname> command output the socket filename
left git-annex scrambling to listen to it in order to not miss incoming
connections. And if the command uses something like socat UNIX-CONNECT,
that expects the socket to be accepting connections and errors out when
it's not, that would be a problem.
Rather than complicating the protocol with git-annex needing to send
back a message when it's listening to the socket, simplified it by
having git-annex provide the socket path to the command.
This does mean that, if a P2P network has its own place it expects to
find a socket file, the git-annex-p2p-<netname> command would need to
somehow arrange for it to use the git-annex socket path. A symlink would
be one way to handle that situation.
Having the git-annex-p2p-<netname> command output the socket filename
left git-annex scrambling to listen to it in order to not miss incoming
connections. And if the command uses something like socat UNIX-CONNECT,
that expects the socket to be accepting connections and errors out when
it's not, that would be a problem.
Rather than complicating the protocol with git-annex needing to send
back a message when it's listening to the socket, simplified it by
having git-annex provide the socket path to the command.
This does mean that, if a P2P network has its own place it expects to
find a socket file, the git-annex-p2p-<netname> command would need to
somehow arrange for it to use the git-annex socket path. A symlink would
be one way to handle that situation.
diff --git a/doc/design/generic_p2p_transport.mdwn b/doc/design/generic_p2p_transport.mdwn index 094cfc0bc0..18e966bff8 100644 --- a/doc/design/generic_p2p_transport.mdwn +++ b/doc/design/generic_p2p_transport.mdwn @@ -1,4 +1,6 @@ This is a generic interface that allows git-annex to use a P2P network. +The [[P2P_protocol]] is used, to allow accessing a peer's git-annex +repository as a git remote. Examples of such networks are tor, yggstack or fowl. (git-annex has a built-in integration with tor which does not use this interface.) @@ -17,19 +19,6 @@ the peer, and data received from the peer should be output to stdout. If it is unable to connect, the command can exit nonzero. When the peer closes connection, the command can exit zero. -To handle incoming connections from peers, `git-annex remotedaemon` -runs `git-annex-p2p-<netname>` with the parameter "socket", followed -by the P2P address of the local repository. The command -should output a single line, the path of a unix socket file. -(Any subsequent output is ignored.) -When it does, `git-annex remotedaemon` will use that socket file to listen for connections from -peers, and service them. (The [[P2P_protocol]] is spoken over these -connections.) - -Note that, if the P2P network does not natively use a unix socket file, -a command like `socat` can be run by `git-annex-p2p-<netname> socket` -to convert the P2P network's own equivilant into a unix socket file. - To configure `git-annex remotedaemon` to listen on a given P2P network, the user runs `git-annex p2p --enable <netname>`. That also runs `git-annex-p2p-<netname>`, this time with the parameter "address". @@ -37,6 +26,17 @@ That should output one or more lines, the P2P network address (or addresses) that can be used by peers to connect to the repository. It can first do whatever it needs to do to set up the P2P network. +To handle incoming connections from peers, `git-annex remotedaemon` runs +`git-annex-p2p-<netname>`, with two parameters. The first parameter is the +P2P address of the local repository, obtained earlier as described above. +The second parameter is the path to a unix socket file, which git-annex +will have already created. git-annex listens for connections from peers +that are made to the socket, and services them. + +Note that, if the P2P network does not natively use a unix socket file, +a command like `socat` can be run by `git-annex-p2p-<netname> socket` +to convert the P2P network's own equivilant into a unix socket file. + The program [[git-remote-p2p-annex]] is included in git-annex as a git remote helper program. git will use that program to handle `pull` and `push` with git remotes that use the `p2p-annex::` url scheme.
Add yann's TdF talk about life in git annex
diff --git a/doc/videos/tdf4-life-in-git-annex.mdwn b/doc/videos/tdf4-life-in-git-annex.mdwn new file mode 100644 index 0000000000..b46ba97547 --- /dev/null +++ b/doc/videos/tdf4-life-in-git-annex.mdwn @@ -0,0 +1,6 @@ +Yann Büchau's (German) talk „Das Leben in Git (Annex)” („Life in Git (Annex)”) at the [Tage der Digitalen Freiheit](https://tdf.cttue.de/) in Tübingen, Germany: + +- [on media.ccc.de](https://media.ccc.de/v/tdf4-26-das-leben-in-git-annex-) + +[[!meta date="26 Jul 2025"]] +[[!meta title="'life in git annex' talk by Yann Büchau at TdF2025"]]
p2p --enable
p2p: Added --enable option, which can be used to enable P2P networks
provided by external commands git-annex-p2p-<netname>
Made git-annex p2p --enable tor behave the same as git-annex enable-tor,
to make tor a bit less of a special case. However, it canot be run as root,
since it cannot take the user id parameter.
p2p: Added --enable option, which can be used to enable P2P networks
provided by external commands git-annex-p2p-<netname>
Made git-annex p2p --enable tor behave the same as git-annex enable-tor,
to make tor a bit less of a special case. However, it canot be run as root,
since it cannot take the user id parameter.
diff --git a/Annex/ExternalAddonProcess.hs b/Annex/ExternalAddonProcess.hs index 887f9f6466..6e2af92d61 100644 --- a/Annex/ExternalAddonProcess.hs +++ b/Annex/ExternalAddonProcess.hs @@ -33,8 +33,9 @@ data ExternalAddonStartError = ProgramNotInstalled String | ProgramFailure String -startExternalAddonProcess :: String -> [CommandParam] -> ExternalAddonPID -> Annex (Either ExternalAddonStartError ExternalAddonProcess) -startExternalAddonProcess basecmd ps pid = do +-- | Starts an external addon process that speaks a protocol over stdio. +startExternalAddonProcessProtocol :: String -> [CommandParam] -> ExternalAddonPID -> Annex (Either ExternalAddonStartError ExternalAddonProcess) +startExternalAddonProcessProtocol basecmd ps pid = do errrelayer <- mkStderrRelayer g <- Annex.gitRepo cmdpath <- liftIO $ searchPath basecmd diff --git a/Backend/External.hs b/Backend/External.hs index 23977d1ce7..77373cf3ae 100644 --- a/Backend/External.hs +++ b/Backend/External.hs @@ -215,7 +215,7 @@ poolVar = unsafePerformIO $ newMVar M.empty -- using it. newExternalState :: ExternalBackendName -> HasExt -> ExternalAddonPID -> Annex ExternalState newExternalState ebname hasext pid = do - st <- startExternalAddonProcess basecmd [] pid + st <- startExternalAddonProcessProtocol basecmd [] pid st' <- case st of Left (ProgramNotInstalled msg) -> warnonce msg >> return st Left (ProgramFailure msg) -> warnonce msg >> return st diff --git a/CHANGELOG b/CHANGELOG index 3acf44fd33..cb1223ee0d 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,10 @@ +git-annex (10.20250722) UNRELEASED; urgency=medium + + * p2p: Added --enable option, which can be used to enable P2P networks + provided by external commands git-annex-p2p-<netname> + + -- Joey Hess <id@joeyh.name> Wed, 30 Jul 2025 13:45:42 -0400 + git-annex (10.20250721) upstream; urgency=medium * Improved workaround for git 2.50 bug, avoding an occasional test suite diff --git a/Command/EnableTor.hs b/Command/EnableTor.hs index c4b323d825..f8c1d75a8d 100644 --- a/Command/EnableTor.hs +++ b/Command/EnableTor.hs @@ -38,11 +38,11 @@ cmd = noCommit $ dontCheck repoExists $ "uid" (withParams seek) seek :: CmdParams -> CommandSeek -seek = withWords (commandAction . start) +seek = withWords (commandAction . start . Just) -- This runs as root, so avoid making any commits or initializing -- git-annex, or doing other things that create root-owned files. -start :: [String] -> CommandStart +start :: Maybe [String] -> CommandStart #ifndef mingw32_HOST_OS start os = do #else @@ -53,9 +53,11 @@ start _os = do let si = SeekInput [] curruserid <- liftIO getEffectiveUserID if curruserid == 0 - then case readish =<< headMaybe os of - Nothing -> giveup "Need user-id parameter." - Just userid -> go userid + then case os of + Just os' -> case readish =<< headMaybe os' of + Nothing -> giveup "Need user-id parameter." + Just userid -> go userid + Nothing -> giveup "Cannot run this command as root." else starting "enable-tor" ai si $ do gitannex <- fromOsPath <$> liftIO programPath let ps = [Param (cmdname cmd), Param (show curruserid)] diff --git a/Command/P2P.hs b/Command/P2P.hs index 0ee588f42a..491355507c 100644 --- a/Command/P2P.hs +++ b/Command/P2P.hs @@ -1,6 +1,6 @@ {- git-annex command - - - Copyright 2016 Joey Hess <id@joeyh.name> + - Copyright 2016-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -13,6 +13,7 @@ import Command import P2P.Address import P2P.Auth import P2P.IO +import P2P.Generic import qualified P2P.Protocol as P2P import Git.Types import qualified Git.Remote @@ -27,6 +28,7 @@ import Utility.ThreadScheduler import Utility.SafeOutput import qualified Utility.FileIO as F import qualified Utility.MagicWormhole as Wormhole +import qualified Command.EnableTor as EnableTor import Control.Concurrent.Async import qualified Data.Text as T @@ -40,10 +42,11 @@ data P2POpts = GenAddresses | LinkRemote | Pair + | Enable P2PNetName optParser :: CmdParamsDesc -> Parser (P2POpts, Maybe RemoteName) optParser _ = (,) - <$> (pair <|> linkremote <|> genaddresses) + <$> (pair <|> linkremote <|> genaddresses <|> enable) <*> optional name where genaddresses = flag' GenAddresses @@ -58,6 +61,10 @@ optParser _ = (,) ( long "pair" <> help "pair with another repository" ) + enable = Enable . P2PNetName <$> strOption + ( long "enable" <> metavar paramName + <> help "enable using a P2P network" + ) name = Git.Remote.makeLegalName <$> strOption ( long "name" <> metavar paramName @@ -75,6 +82,8 @@ seek (Pair, Just name) = commandAction $ seek (Pair, Nothing) = commandAction $ do name <- unusedPeerRemoteName startPairing name =<< loadP2PAddresses +seek (Enable netname, _) = commandAction $ + enableNetwork netname unusedPeerRemoteName :: Annex RemoteName unusedPeerRemoteName = go (1 :: Integer) =<< usednames @@ -316,3 +325,16 @@ setupLink remotename (P2PAddressAuth addr authtoken) = do return LinkSuccess go (Right Nothing) = return $ AuthenticationError "Unable to authenticate with peer. Please check the address and try again." go (Left e) = return $ AuthenticationError $ "Unable to authenticate with peer: " ++ describeProtoFailure e + +enableNetwork :: P2PNetName -> CommandStart +enableNetwork netname@(P2PNetName name) + | name == "tor" = EnableTor.start Nothing + | otherwise = starting "p2p enable" ai si $ next $ do + addrs <- liftIO $ getAddressGenericP2P netname + when (null addrs) $ + giveup $ genericP2PCommand netname ++ " did not output any P2P addresses" + mapM_ storeP2PAddress addrs + return True + where + ai = ActionItemOther (Just (UnquotedString name)) + si = SeekInput [] diff --git a/P2P/Generic.hs b/P2P/Generic.hs index 9dc118f820..ac80f3eedf 100644 --- a/P2P/Generic.hs +++ b/P2P/Generic.hs @@ -25,18 +25,18 @@ connectGenericP2P netname (UnderlyingP2PAddress address) = socketGenericP2P :: P2PNetName -> UnderlyingP2PAddress -> CreateProcess socketGenericP2P netname (UnderlyingP2PAddress address) = (proc (genericP2PCommand netname) ["socket", address]) - { std_in = CreatePipe + { std_out = CreatePipe } addressGenericP2P :: P2PNetName -> CreateProcess addressGenericP2P netname = (proc (genericP2PCommand netname) ["address"]) - { std_in = CreatePipe + { std_out = CreatePipe } getSocketGenericP2P :: P2PNetName -> UnderlyingP2PAddress -> IO (Maybe (OsPath, ProcessHandle)) getSocketGenericP2P netname address = do - (Just hin, Nothing, Nothing, pid) <- createProcess $ + (Nothing, Just hin, Nothing, pid) <- createProcess $ socketGenericP2P netname address hGetLineUntilExitOrEOF pid hin >>= \case Just l | not (null l) -> return $ Just (toOsPath l, pid) @@ -44,7 +44,7 @@ getSocketGenericP2P netname address = do getAddressGenericP2P :: P2PNetName -> IO [P2PAddress] getAddressGenericP2P netname = do - (Just hin, Nothing, Nothing, pid) <- createProcess $ + (Nothing, Just hin, Nothing, pid) <- createProcess $ addressGenericP2P netname go [] hin pid where diff --git a/Remote/External.hs b/Remote/External.hs index 2b26e32239..dcfbaacbf2 100644 --- a/Remote/External.hs +++ b/Remote/External.hs @@ -673,7 +673,7 @@ startExternal' external = do n <- succ <$> readTVar (externalLastPid external) writeTVar (externalLastPid external) n return n - AddonProcess.startExternalAddonProcess externalcmd externalparams pid >>= \case + AddonProcess.startExternalAddonProcessProtocol externalcmd externalparams pid >>= \case (Diff truncated)
support P2PAnnex in connectPeer
This is probably enough to support accessing remotes using p2p-annex:: urls.
Not tested yet of course since there is not yet support for serving the
other side of such a connection, or for setting up such a connection.
P2P.Generic has an implementation of the whole interface to the
git-annex-p2p-<netname> commands.
This is probably enough to support accessing remotes using p2p-annex:: urls.
Not tested yet of course since there is not yet support for serving the
other side of such a connection, or for setting up such a connection.
P2P.Generic has an implementation of the whole interface to the
git-annex-p2p-<netname> commands.
diff --git a/P2P/Generic.hs b/P2P/Generic.hs new file mode 100644 index 0000000000..9dc118f820 --- /dev/null +++ b/P2P/Generic.hs @@ -0,0 +1,57 @@ +{- P2P protocol, generic transports. + - + - See doc/design/generic_p2p_transport.mdwn + - + - Copyright 2025 Joey Hess <id@joeyh.name> + - + - Licensed under the GNU AGPL version 3 or higher. + -} + +module P2P.Generic where + +import Common +import P2P.Address + +genericP2PCommand :: P2PNetName -> String +genericP2PCommand (P2PNetName netname) = "git-annex-p2p-" ++ netname + +connectGenericP2P :: P2PNetName -> UnderlyingP2PAddress -> CreateProcess +connectGenericP2P netname (UnderlyingP2PAddress address) = + (proc (genericP2PCommand netname) [address]) + { std_in = CreatePipe + , std_out = CreatePipe + } + +socketGenericP2P :: P2PNetName -> UnderlyingP2PAddress -> CreateProcess +socketGenericP2P netname (UnderlyingP2PAddress address) = + (proc (genericP2PCommand netname) ["socket", address]) + { std_in = CreatePipe + } + +addressGenericP2P :: P2PNetName -> CreateProcess +addressGenericP2P netname = + (proc (genericP2PCommand netname) ["address"]) + { std_in = CreatePipe + } + +getSocketGenericP2P :: P2PNetName -> UnderlyingP2PAddress -> IO (Maybe (OsPath, ProcessHandle)) +getSocketGenericP2P netname address = do + (Just hin, Nothing, Nothing, pid) <- createProcess $ + socketGenericP2P netname address + hGetLineUntilExitOrEOF pid hin >>= \case + Just l | not (null l) -> return $ Just (toOsPath l, pid) + _ -> return Nothing + +getAddressGenericP2P :: P2PNetName -> IO [P2PAddress] +getAddressGenericP2P netname = do + (Just hin, Nothing, Nothing, pid) <- createProcess $ + addressGenericP2P netname + go [] hin pid + where + go addrs hin pid = hGetLineUntilExitOrEOF pid hin >>= \case + Just l + | not (null l) -> + let addr = P2PAnnex netname (UnderlyingP2PAddress l) + in go (addr:addrs) hin pid + | otherwise -> go addrs hin pid + Nothing -> return addrs diff --git a/P2P/IO.hs b/P2P/IO.hs index d712be7d03..bc3bc61745 100644 --- a/P2P/IO.hs +++ b/P2P/IO.hs @@ -1,6 +1,6 @@ {- P2P protocol, IO implementation - - - Copyright 2016-2024 Joey Hess <id@joeyh.name> + - Copyright 2016-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -20,7 +20,6 @@ module P2P.IO , connectPeer , closeConnection , serveUnixSocket - , setupHandle , ProtoFailure(..) , describeProtoFailure , runNetProto @@ -31,6 +30,7 @@ module P2P.IO import Common import P2P.Protocol import P2P.Address +import P2P.Generic import Git import Git.Command import Utility.AuthToken @@ -138,7 +138,7 @@ stdioP2PConnectionDupped g = do -- Opens a connection to a peer. Does not authenticate with it. connectPeer :: Maybe Git.Repo -> P2PAddress -> IO P2PConnection connectPeer g (TorAnnex onionaddress onionport) = do - h <- setupHandle =<< connectHiddenService onionaddress onionport + h <- setupHandleFromSocket =<< connectHiddenService onionaddress onionport return $ P2PConnection { connRepo = g , connCheckAuth = const False @@ -147,6 +147,17 @@ connectPeer g (TorAnnex onionaddress onionport) = do , connProcess = Nothing , connIdent = ConnIdent Nothing } +connectPeer g (P2PAnnex netname address) = do + (Just hin, Just hout, Nothing, pid) <- createProcess $ + connectGenericP2P netname address + return $ P2PConnection + { connRepo = g + , connCheckAuth = const False + , connIhdl = P2PHandle hout + , connOhdl = P2PHandle hin + , connProcess = Just pid + , connIdent = ConnIdent Nothing + } closeConnection :: P2PConnection -> IO () closeConnection conn = do @@ -185,10 +196,10 @@ serveUnixSocket unixsocket serveconn = do S.listen soc 2 forever $ do (conn, _) <- S.accept soc - setupHandle conn >>= serveconn + setupHandleFromSocket conn >>= serveconn -setupHandle :: Socket -> IO Handle -setupHandle s = do +setupHandleFromSocket :: Socket -> IO Handle +setupHandleFromSocket s = do h <- socketToHandle s ReadWriteMode hSetBuffering h LineBuffering hSetBinaryMode h False diff --git a/doc/design/generic_p2p_transport.mdwn b/doc/design/generic_p2p_transport.mdwn index 2e06da195b..094cfc0bc0 100644 --- a/doc/design/generic_p2p_transport.mdwn +++ b/doc/design/generic_p2p_transport.mdwn @@ -33,9 +33,9 @@ to convert the P2P network's own equivilant into a unix socket file. To configure `git-annex remotedaemon` to listen on a given P2P network, the user runs `git-annex p2p --enable <netname>`. That also runs `git-annex-p2p-<netname>`, this time with the parameter "address". -That should output a single line, the P2P network address that can be used -by peers to connect to the repository. It can first do whatever it needs to -do to set up the P2P network. +That should output one or more lines, the P2P network address (or addresses) +that can be used by peers to connect to the repository. It can first do +whatever it needs to do to set up the P2P network. The program [[git-remote-p2p-annex]] is included in git-annex as a git remote helper program. git will use that program to handle `pull` and diff --git a/git-annex.cabal b/git-annex.cabal index 7170559264..ac9941b902 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -923,6 +923,7 @@ Executable git-annex P2P.Address P2P.Annex P2P.Auth + P2P.Generic P2P.Http.Types P2P.Http.Client P2P.Http.Url
Added a comment: Use an older version e.g. from archive.org
diff --git a/doc/install/comment_10_484a35a4739e8168019668aaf474bae9._comment b/doc/install/comment_10_484a35a4739e8168019668aaf474bae9._comment new file mode 100644 index 0000000000..0457f11cbb --- /dev/null +++ b/doc/install/comment_10_484a35a4739e8168019668aaf474bae9._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Use an older version e.g. from archive.org" + date="2025-07-30T16:17:44Z" + content=""" +This happens sometimes and will eventually get fixed when joey notices it. Currently the binary is only on his laptop apparently. + +You can use an older version which is eventually available on archive.org: + +[[!format bash \"\"\" +yann in yann-desktop-nixos in …/OSX/current/10.15_Catalina on master took 2s123ms +🐟 ❯ git co d8a7d5d54d24d17810f07c0756e7334e998650fe +HEAD ist jetzt bei d8a7d5d54d publishing git-annex 10.20250630 10.20250606 +yann in yann-desktop-nixos in …/OSX/current/10.15_Catalina on HEAD (d8a7d5d) +🐟 ❯ git annex whereis +whereis git-annex.dmg (1 copy) + 5dc2ccd1-e534-4dae-8e8c-f31c8015e26e -- archive.org via S3 + + The following untrusted locations may also have copies: + 00000000-0000-0000-0000-000000000001 -- web + + web: http://archive.org/download/git-annex-builds/SHA256E-s28610967--7fc0dbf3f0a1f275a95730899327694b90dcd60c4ba8d8070a3efde44983a719.dmg +ok +\"\"\"]] + +So for example [this link](http://archive.org/download/git-annex-builds/SHA256E-s28610967--7fc0dbf3f0a1f275a95730899327694b90dcd60c4ba8d8070a3efde44983a719.dmg): +"""]]
add P2PAnnex constructor
This is for p2p-annex:: urls that will use the new generic P2P
transport.
In addressCredsFile, threw in an url encoding of any non-alphanumeric
characters that are in the address. This is to avoid any possible path
traversal attacks via a p2p-annex:: url, since the address part of it
could contain any characters. And, went ahead and did the same url
encoding of tor-annex:: urls, even though tor onion addresses are all
alphanumerics, on the off chance that might avoid a similar problem.
(It does not seem likely enough to treat it as a security hole.)
This is for p2p-annex:: urls that will use the new generic P2P
transport.
In addressCredsFile, threw in an url encoding of any non-alphanumeric
characters that are in the address. This is to avoid any possible path
traversal attacks via a p2p-annex:: url, since the address part of it
could contain any characters. And, went ahead and did the same url
encoding of tor-annex:: urls, even though tor onion addresses are all
alphanumerics, on the off chance that might avoid a similar problem.
(It does not seem likely enough to treat it as a security hole.)
diff --git a/Command/EnableTor.hs b/Command/EnableTor.hs index 03293d2af4..b36136553a 100644 --- a/Command/EnableTor.hs +++ b/Command/EnableTor.hs @@ -102,6 +102,7 @@ checkHiddenService = bracket setup cleanup go go _ = check (150 :: Int) =<< filter istoraddr <$> loadP2PAddresses istoraddr (TorAnnex _ _) = True + istoraddr _ = False check 0 _ = giveup "Still unable to connect to hidden service. It might not yet be usable by others. Please check Tor's logs for details." check _ [] = giveup "Somehow didn't get an onion address." diff --git a/P2P/Address.hs b/P2P/Address.hs index 1a3186aca9..052f0af6ce 100644 --- a/P2P/Address.hs +++ b/P2P/Address.hs @@ -1,6 +1,6 @@ {- P2P protocol addresses - - - Copyright 2016 Joey Hess <id@joeyh.name> + - Copyright 2016-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -25,7 +25,15 @@ import System.PosixCompat.Files (fileOwner, fileGroup) -- -- This is enough information to connect to the peer, -- but not enough to authenticate with it. -data P2PAddress = TorAnnex OnionAddress OnionPort +data P2PAddress + = TorAnnex OnionAddress OnionPort + | P2PAnnex P2PNetName UnderlyingP2PAddress + deriving (Eq, Show) + +newtype P2PNetName = P2PNetName String + deriving (Eq, Show) + +newtype UnderlyingP2PAddress = UnderlyingP2PAddress String deriving (Eq, Show) -- | A P2P address, with an AuthToken. @@ -42,17 +50,26 @@ class FormatP2PAddress a where instance FormatP2PAddress P2PAddress where formatP2PAddress (TorAnnex (OnionAddress onionaddr) onionport) = torAnnexScheme ++ ":" ++ onionaddr ++ ":" ++ show onionport + formatP2PAddress (P2PAnnex (P2PNetName netname) (UnderlyingP2PAddress address)) = + p2pAnnexScheme ++ ":" ++ netname ++ ":" ++ address unformatP2PAddress s - | (torAnnexScheme ++ ":") `isPrefixOf` s = do - let s' = dropWhile (== ':') $ dropWhile (/= ':') s - let (onionaddr, ps) = separate (== ':') s' - onionport <- readish ps - return (TorAnnex (OnionAddress onionaddr) onionport) + | schemeprefixed torAnnexScheme = do + onionport <- readish bs + return (TorAnnex (OnionAddress as) onionport) + | schemeprefixed p2pAnnexScheme = + return (P2PAnnex (P2PNetName as) (UnderlyingP2PAddress bs)) | otherwise = Nothing + where + schemeprefixed scheme = (scheme ++ ":") `isPrefixOf` s + (as, bs) = separate (== ':') $ + dropWhile (== ':') $ dropWhile (/= ':') s torAnnexScheme :: String torAnnexScheme = "tor-annex:" +p2pAnnexScheme :: String +p2pAnnexScheme = "p2p-annex:" + instance FormatP2PAddress P2PAddressAuth where formatP2PAddress (P2PAddressAuth addr authtoken) = formatP2PAddress addr ++ ":" ++ T.unpack (fromAuthToken authtoken) diff --git a/P2P/Auth.hs b/P2P/Auth.hs index 8de3eda39c..f7b79678e6 100644 --- a/P2P/Auth.hs +++ b/P2P/Auth.hs @@ -16,6 +16,8 @@ import Utility.AuthToken import Utility.Tor import Utility.Env +import Network.URI +import Data.Char import qualified Data.Text as T -- | Load authtokens that are accepted by this repository for tor. @@ -55,7 +57,7 @@ storeP2PAuthToken addr t = do where v = case addr of TorAnnex _ _ -> (t, Nothing) - -- _ -> (t, Just addr) + _ -> (t, Just addr) fmt (tok, Nothing) = T.unpack (fromAuthToken tok) fmt (tok, Just addr') = T.unpack (fromAuthToken tok) @@ -86,9 +88,13 @@ storeP2PRemoteAuthToken addr t = writeCreds (T.unpack $ fromAuthToken t) (addressCredsFile addr) +-- | Unusual characters in the address are url encoded. addressCredsFile :: P2PAddress -> OsPath --- We can omit the port and just use the onion address for the creds file, --- because any given tor hidden service runs on a single port and has a --- unique onion address. -addressCredsFile (TorAnnex (OnionAddress onionaddr) _port) = - toOsPath onionaddr +addressCredsFile addr = toOsPath $ escapeURIString isAlphaNum $ case addr of + -- We can omit the port and just use the onion address for the + -- creds file, because any given tor hidden service runs on a + -- single port and has a unique onion address. + TorAnnex (OnionAddress onionaddr) _port -> + onionaddr + P2PAnnex (P2PNetName netname) (UnderlyingP2PAddress address) -> + netname ++ ":" ++ address diff --git a/doc/design/generic_p2p_transport.mdwn b/doc/design/generic_p2p_transport.mdwn index 20860504e9..2e06da195b 100644 --- a/doc/design/generic_p2p_transport.mdwn +++ b/doc/design/generic_p2p_transport.mdwn @@ -7,7 +7,7 @@ Such a P2P network has some form of address, which can be used to connect to a given peer by address across the network. A git remote using the P2P network has an url of the form -`p2p-annex::<netname>+<address>` +`p2p-annex::<netname>:<address>` To connect to that remote, git-annex runs the command `git-annex-p2p-<netname>`, giving it the P2P network address as its only
Added a comment: Cataline build missing
diff --git a/doc/install/comment_9_a17ccf32e2e450b2b744f11a1c5edc8c._comment b/doc/install/comment_9_a17ccf32e2e450b2b744f11a1c5edc8c._comment new file mode 100644 index 0000000000..e5bd2b3b47 --- /dev/null +++ b/doc/install/comment_9_a17ccf32e2e450b2b744f11a1c5edc8c._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="h0b0" + avatar="http://cdn.libravatar.org/avatar/bf8483b4623b379c3443d63ecdff22a2" + subject="Cataline build missing" + date="2025-07-30T15:19:06Z" + content=""" +Somehow [the Catalina binary got lost](https://downloads.kitenet.net/git-annex/OSX/current/10.15_Catalina/). Considering that homebrew also fails due to the old OS this is a problem. I'd be happy if this was made available again. +"""]]
document output as a single line
diff --git a/doc/design/generic_p2p_transport.mdwn b/doc/design/generic_p2p_transport.mdwn index 0302607cc7..20860504e9 100644 --- a/doc/design/generic_p2p_transport.mdwn +++ b/doc/design/generic_p2p_transport.mdwn @@ -20,8 +20,9 @@ connection, the command can exit zero. To handle incoming connections from peers, `git-annex remotedaemon` runs `git-annex-p2p-<netname>` with the parameter "socket", followed by the P2P address of the local repository. The command -should output the path of a unix socket file. When it does, `git-annex -remotedaemon` will use that socket file to listen for connections from +should output a single line, the path of a unix socket file. +(Any subsequent output is ignored.) +When it does, `git-annex remotedaemon` will use that socket file to listen for connections from peers, and service them. (The [[P2P_protocol]] is spoken over these connections.) @@ -32,9 +33,9 @@ to convert the P2P network's own equivilant into a unix socket file. To configure `git-annex remotedaemon` to listen on a given P2P network, the user runs `git-annex p2p --enable <netname>`. That also runs `git-annex-p2p-<netname>`, this time with the parameter "address". -That should output the P2P network address that can be used by peers -to connect to the repository. It can first do whatever it needs to do to -set up the P2P networl. +That should output a single line, the P2P network address that can be used +by peers to connect to the repository. It can first do whatever it needs to +do to set up the P2P network. The program [[git-remote-p2p-annex]] is included in git-annex as a git remote helper program. git will use that program to handle `pull` and
rename design page
diff --git a/doc/design/p2p_socket_transport.mdwn b/doc/design/generic_p2p_transport.mdwn similarity index 100% rename from doc/design/p2p_socket_transport.mdwn rename to doc/design/generic_p2p_transport.mdwn diff --git a/doc/git-annex-p2p.mdwn b/doc/git-annex-p2p.mdwn index 7d96c0dbf8..7d4c6ee80a 100644 --- a/doc/git-annex-p2p.mdwn +++ b/doc/git-annex-p2p.mdwn @@ -17,7 +17,7 @@ network. (This needs Tor to be installed.) git-annex can also support other P2P networks, using a helper program that you can install. These programs have names of the form `git-annex-p2p-<netname>`. See -<https://git-annex.branchable.com/design/p2p_socket_transport/> +<https://git-annex.branchable.com/design/generic_p2p_transport/> for documentation about how to create such a program. # OPTIONS diff --git a/doc/git-remote-p2p-annex.mdwn b/doc/git-remote-p2p-annex.mdwn index 175ae3bc29..5021312cae 100644 --- a/doc/git-remote-p2p-annex.mdwn +++ b/doc/git-remote-p2p-annex.mdwn @@ -28,7 +28,7 @@ gitremote-helpers(1) [[git-remote-tor-annex]](1) -<https://git-annex.branchable.com/design/p2p_socket_transport/> +<https://git-annex.branchable.com/design/generic_p2p_transport/> # AUTHOR diff --git a/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment b/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment index b6b2203482..941ebe62c1 100644 --- a/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment +++ b/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment @@ -3,7 +3,7 @@ subject="""comment 12""" date="2025-07-29T16:41:07Z" content=""" -I have started a design document at [[design/p2p_socket_transport]], +I have started a design document at [[design/generic_p2p_transport]], to collect all the scattered decisions here into a coherent document that can be used by someone implementing support for one of these networks.
documentation for generic P2P transports
diff --git a/doc/design/p2p_socket_transport.mdwn b/doc/design/p2p_socket_transport.mdwn index 725acb8b5e..0302607cc7 100644 --- a/doc/design/p2p_socket_transport.mdwn +++ b/doc/design/p2p_socket_transport.mdwn @@ -33,7 +33,8 @@ To configure `git-annex remotedaemon` to listen on a given P2P network, the user runs `git-annex p2p --enable <netname>`. That also runs `git-annex-p2p-<netname>`, this time with the parameter "address". That should output the P2P network address that can be used by peers -to connect to the repository. +to connect to the repository. It can first do whatever it needs to do to +set up the P2P networl. The program [[git-remote-p2p-annex]] is included in git-annex as a git remote helper program. git will use that program to handle `pull` and diff --git a/doc/git-annex-enable-tor.mdwn b/doc/git-annex-enable-tor.mdwn index 3da497233a..e4633aab0a 100644 --- a/doc/git-annex-enable-tor.mdwn +++ b/doc/git-annex-enable-tor.mdwn @@ -18,7 +18,8 @@ that file. If you run it as root, pass it your non-root user id number, as output by `id -u` After this command is run, `git annex remotedaemon` can be run to serve the -tor hidden service, and then `git-annex p2p --gen-addresses` can be run to +tor hidden service, and then `git-annex p2p --pair` or +`git-annex p2p --gen-addresses` can be run to give other users access to your repository via the tor hidden service. # OPTIONS diff --git a/doc/git-annex-p2p.mdwn b/doc/git-annex-p2p.mdwn index 6b9d60c381..7d96c0dbf8 100644 --- a/doc/git-annex-p2p.mdwn +++ b/doc/git-annex-p2p.mdwn @@ -11,11 +11,27 @@ git annex p2p [options] This command can be used to link git-annex repositories over peer-2-peer networks. -Currently, the only P2P network supported by git-annex is Tor hidden -services. +git-annex includes built-in support to use Tor hidden services as a P2P +network. (This needs Tor to be installed.) + +git-annex can also support other P2P networks, using a helper program +that you can install. These programs have names of the form +`git-annex-p2p-<netname>`. See +<https://git-annex.branchable.com/design/p2p_socket_transport/> +for documentation about how to create such a program. # OPTIONS +* `--enable <netname>` + + Enable using the P2P network with the specified name. + This needs the helper program `git-annex-p2p-<netname>` to be installed. + + After this command is run, `git annex remotedaemon` can be run to serve + incoming connections from peers, and `git-annex p2p --pair` or + `git-annex p2p --gen-addresses` can be run to give other users access + to your repository via the P2P network. + * `--pair` Run this in two repositories to pair them together over the P2P network. diff --git a/doc/git-annex-remotedaemon.mdwn b/doc/git-annex-remotedaemon.mdwn index 8fecee287c..02fcee7c12 100644 --- a/doc/git-annex-remotedaemon.mdwn +++ b/doc/git-annex-remotedaemon.mdwn @@ -24,6 +24,12 @@ accepting connections from other nodes and serving up the contents of the repository. This is only done if you first run `git annex enable-tor`. Use `git annex p2p` to configure access to tor-annex remotes. +For p2p-annex remotes, the remotedaemon runs a command +`git-annex-p2p-<netname>` to get a socket file, listens to the socket +file for connections from other peers, and serves up the contents of the +repository. This is only done if you first run +`git-annex p2p --enable <netname>`. + Note that when `remote.<name>.annex-pull` is set to false, the remotedaemon will avoid fetching changes from that remote. diff --git a/doc/git-remote-p2p-annex.mdwn b/doc/git-remote-p2p-annex.mdwn new file mode 100644 index 0000000000..175ae3bc29 --- /dev/null +++ b/doc/git-remote-p2p-annex.mdwn @@ -0,0 +1,37 @@ +# NAME + +git-remote-p2p-annex - remote helper program to talk to git-annex over a P2P network + +# SYNOPSIS + +git fetch p2p-annex::netname+address + +git remote add foo p2p-annex::netname+address + +# DESCRIPTION + +This is a git remote helper program that allows git to pull and push +over a P2P network. + +This uses a command `git-annex-p2p-<netname>` to communicate with +the P2P network. + +# SEE ALSO + +gitremote-helpers(1) + +[[git-annex]](1) + +[[git-annex-p2p]](1) + +[[git-annex-remotedaemon]](1) + +[[git-remote-tor-annex]](1) + +<https://git-annex.branchable.com/design/p2p_socket_transport/> + +# AUTHOR + +Joey Hess <id@joeyh.name> + +Warning: Automatically converted into a man page by mdwn2man. Edit with care. diff --git a/doc/git-remote-tor-annex.mdwn b/doc/git-remote-tor-annex.mdwn index e32b711e4c..baa18d7aae 100644 --- a/doc/git-remote-tor-annex.mdwn +++ b/doc/git-remote-tor-annex.mdwn @@ -29,6 +29,8 @@ gitremote-helpers(1) [[git-annex-remotedaemon]](1) +[[git-remote-p2p-annex]](1) + # AUTHOR Joey Hess <id@joeyh.name>
design for p2p socket transport
diff --git a/doc/design/p2p_socket_transport.mdwn b/doc/design/p2p_socket_transport.mdwn new file mode 100644 index 0000000000..725acb8b5e --- /dev/null +++ b/doc/design/p2p_socket_transport.mdwn @@ -0,0 +1,40 @@ +This is a generic interface that allows git-annex to use a P2P network. + +Examples of such networks are tor, yggstack or fowl. (git-annex has a +built-in integration with tor which does not use this interface.) + +Such a P2P network has some form of address, which can be +used to connect to a given peer by address across the network. + +A git remote using the P2P network has an url of the form +`p2p-annex::<netname>+<address>` + +To connect to that remote, git-annex runs the command +`git-annex-p2p-<netname>`, giving it the P2P network address as its only +parameter. The command is responsible for connecting to that peer, and +relaying data to it. Data fed into the command on stdin should be sent to +the peer, and data received from the peer should be output to stdout. If it +is unable to connect, the command can exit nonzero. When the peer closes +connection, the command can exit zero. + +To handle incoming connections from peers, `git-annex remotedaemon` +runs `git-annex-p2p-<netname>` with the parameter "socket", followed +by the P2P address of the local repository. The command +should output the path of a unix socket file. When it does, `git-annex +remotedaemon` will use that socket file to listen for connections from +peers, and service them. (The [[P2P_protocol]] is spoken over these +connections.) + +Note that, if the P2P network does not natively use a unix socket file, +a command like `socat` can be run by `git-annex-p2p-<netname> socket` +to convert the P2P network's own equivilant into a unix socket file. + +To configure `git-annex remotedaemon` to listen on a given P2P network, +the user runs `git-annex p2p --enable <netname>`. That also +runs `git-annex-p2p-<netname>`, this time with the parameter "address". +That should output the P2P network address that can be used by peers +to connect to the repository. + +The program [[git-remote-p2p-annex]] is included in git-annex as a git +remote helper program. git will use that program to handle `pull` and +`push` with git remotes that use the `p2p-annex::` url scheme. diff --git a/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment b/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment new file mode 100644 index 0000000000..b6b2203482 --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_12_c106b0a8011cc5f66e35894e9899c2ab._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 12""" + date="2025-07-29T16:41:07Z" + content=""" +I have started a design document at [[design/p2p_socket_transport]], +to collect all the scattered decisions here into a coherent document that +can be used by someone implementing support for one of these networks. + +In the process, I realized that rather than defining a path where git-annex +expects the socket file to be, it could run the same `git-annex-p2p-foo` +command in a mode where the command outputs the path to the socket file. +That also lets the command run socat to connect up the socket, for example. + +I also put in there that `git-annex p2p --enable <netname>` will run +`git-annex-p2p-<netname> address`, which will output the P2P address that +peers will use to connect the the repository. That seemed nicer than +requiring the user to somehow come up with the P2P address on their own. + +I have started writing the user-level documentation too, on the +`genericp2p` branch. +"""]]
Added a comment: fsck can do this
diff --git a/doc/forum/Check_export_or_force_re-export_to_special_remote/comment_1_9064573903a7314ee43786309fa9f920._comment b/doc/forum/Check_export_or_force_re-export_to_special_remote/comment_1_9064573903a7314ee43786309fa9f920._comment new file mode 100644 index 0000000000..c71e1ead98 --- /dev/null +++ b/doc/forum/Check_export_or_force_re-export_to_special_remote/comment_1_9064573903a7314ee43786309fa9f920._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="apoelstra" + avatar="http://cdn.libravatar.org/avatar/e0256d19738263aece2b0016e401864d" + subject="fsck can do this" + date="2025-07-24T14:44:52Z" + content=""" +If you run + +``` +git annex fsck --from=android --fast # don't forget the = +``` + +then it will report a bunch of errors for all missing files. You can then run + +``` +git annex export master --to android +``` + +again and it'll work. + +(`git-annex-fsck --fast` will just check the presence of files; without it, it will actually checksum the files, which might be what you want.) + +`fsck` is smart enough to understand that files are missing even if you exported some weird treeish; e.g. if you had done `git annex export master:photos/ --to android`, even though the paths would all be relative to `photos/` rather than the root, it'll still realize they're missing and do the right thing. In this case you might want to add `photos/` to your `git-annex-fsck` command to save yourself the time of confirming that files aren't there that aren't supposed to be. +"""]]
add news item for git-annex 10.20250721
diff --git a/doc/news/version_10.20250721.mdwn b/doc/news/version_10.20250721.mdwn new file mode 100644 index 0000000000..09ca1b73f0 --- /dev/null +++ b/doc/news/version_10.20250721.mdwn @@ -0,0 +1,17 @@ +git-annex 10.20250721 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * Improved workaround for git 2.50 bug, avoding an occasional test suite + failure, as well as some situations where an unlocked file did not get + populated when adding another file to the repository with the same + content. + * Add --url option and url= preferred content expression, to match + content that is recorded as present in an url. + * p2phttp: Scan multilevel directories with --directory. + * p2phttp: Added --socket option. + * Fix bug in handling of linked worktrees on filesystems not supporting + symlinks, that caused annexed file content to be stored in the wrong + location inside the git directory, and also caused pointer files to not + get populated. + * fsck: Fix location of annexed files when run in linked worktrees + that have experienced the above bug. + * Fix symlinks generated to annexed content when in adjusted unlocked + branch in a linked worktree on a filesystem not supporting symlinks."""]] \ No newline at end of file
Added a comment: Works!
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_10_9129b86d6c82fa7c76157acc28bd03a3._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_10_9129b86d6c82fa7c76157acc28bd03a3._comment new file mode 100644 index 0000000000..1c6d7b3667 --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_10_9129b86d6c82fa7c76157acc28bd03a3._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="mih" + avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd" + subject="Works!" + date="2025-07-22T13:31:21Z" + content=""" +I can confirm both the build working and the fix to address the issue. Thanks again! +"""]]
Added a comment: Workaround for default wanted content?
diff --git a/doc/todo/Setting_default_preferred_content_expressions/comment_4_c8a5f77b8a90524afb6bf44a8bde5975._comment b/doc/todo/Setting_default_preferred_content_expressions/comment_4_c8a5f77b8a90524afb6bf44a8bde5975._comment new file mode 100644 index 0000000000..c57019c0d6 --- /dev/null +++ b/doc/todo/Setting_default_preferred_content_expressions/comment_4_c8a5f77b8a90524afb6bf44a8bde5975._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Workaround for default wanted content?" + date="2025-07-21T18:12:59Z" + content=""" +I am trying to work around this, but can't really find a solution. Any of the following for one-off overriding of the preferred content would help, but apparently something like this doesn't exist, right? One has to go through `git annex wanted . present`? + +- `git -c annex.wanted=present annex assist` +- `GIT_ANNEX_WANTED=present git annex assist` +- `git annex assist|sync --wanted=present` + +[[preferred_content]] and [[git-annex-common-options]] don't list anything obvious in that direction. + +An environment variable would be particularly great as that's easy to deploy globally. +"""]]
run reconcileStaged even in smudge clean filter, using alternate code path
Improved workaround for git 2.50 bug, avoding an occasional test suite
failure, as well as some situations where an unlocked file did not get
populated when adding another file to the repository with the same content.
This uses the alternate code path that was already using when there was
a conflict. Since that code path is not able to record its work,
it will redo the same work next time. If the only way reconcileStaged
is getting run is via the smudge clean filter, that could result in
more and more changes getting processed redundantly each time. Once
some other git-annex command runs and calls reconcileStaged, it
will stop redoing that work. I don't think the extra work will be a
problem.
Improved workaround for git 2.50 bug, avoding an occasional test suite
failure, as well as some situations where an unlocked file did not get
populated when adding another file to the repository with the same content.
This uses the alternate code path that was already using when there was
a conflict. Since that code path is not able to record its work,
it will redo the same work next time. If the only way reconcileStaged
is getting run is via the smudge clean filter, that could result in
more and more changes getting processed redundantly each time. Once
some other git-annex command runs and calls reconcileStaged, it
will stop redoing that work. I don't think the extra work will be a
problem.
diff --git a/CHANGELOG b/CHANGELOG index e0407b3919..deaa942d35 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -12,6 +12,10 @@ git-annex (10.20250631) UNRELEASED; urgency=medium branch in a linked worktree on a filesystem not supporting symlinks. * Add --url option and url= preferred content expression, to match content that is recorded as present in an url. + * Improved workaround for git 2.50 bug, avoding an occasional test suite + failure, as well as some situations where an unlocked file did not get + populated when adding another file to the repository with the same + content. -- Joey Hess <id@joeyh.name> Mon, 07 Jul 2025 15:59:42 -0400 diff --git a/Database/Keys.hs b/Database/Keys.hs index 98a1db9053..d3fce7bbd8 100644 --- a/Database/Keys.hs +++ b/Database/Keys.hs @@ -260,7 +260,7 @@ isInodeKnown i s = or <$> runReaderIO ContentTable - is an associated file. -} reconcileStaged :: Bool -> H.DbQueue -> Annex DbTablesChanged -reconcileStaged dbisnew qh = ifM notneeded +reconcileStaged dbisnew qh = ifM isBareRepo ( return mempty , do gitindex <- inRepo currentIndexFile @@ -299,12 +299,12 @@ reconcileStaged dbisnew qh = ifM notneeded inRepo $ update' lastindexref newtree fastDebug "Database.Keys" "reconcileStaged end" return (DbTablesChanged True True) - -- git write-tree will fail if the index is locked or when there is - -- a merge conflict. To get up-to-date with the current index, - -- diff --staged with the old index tree. The current index tree - -- is not known, so not recorded, and the inode cache is not updated, - -- so the next time git-annex runs, it will diff again, even - -- if the index is unchanged. + -- Was not able to run git write-tree, or it failed due to the + -- index being locked or a merge conflict. To get up-to-date with + -- the current index, diff --staged with the old index tree. The + -- current index tree is not known, so not recorded, and the inode + -- cache is not updated, so the next time git-annex runs, it will + -- diff again, even if the index is unchanged. -- -- When there is a merge conflict, that will not see the new local -- version of the files that are conflicted. So a second diff @@ -327,21 +327,22 @@ reconcileStaged dbisnew qh = ifM notneeded processor l False `finally` void cleanup - -- Avoid running smudge clean filter, which would block trying to - -- access the locked database. git write-tree sometimes calls it, - -- even though it is not adding work tree files to the index, - -- and so the filter cannot have an effect on the contents of the - -- index or on the tree that gets written from it. - getindextree = inRepo $ \r -> writeTreeQuiet $ r - { gitGlobalOpts = gitGlobalOpts r ++ bypassSmudgeConfig } - - notneeded = isBareRepo - -- Avoid doing anything when run by the - -- smudge clean filter. When that happens in a conflicted - -- merge situation, running git write-tree - -- here would cause git merge to fail with an internal - -- error. This works around around that bug in git. - <||> Annex.getState Annex.insmudgecleanfilter + -- This avoids running git write-tree when run by the smudge clean + -- filter, in order to work around a bug in git. That causes + -- git merge to fail with an internal error when git write-tree is + -- run by the smudge clean filter in conflicted merge situation. + -- + -- When running git write-tree, avoid it running the smudge clean + -- filter, which would block trying to access the locked database. + -- git write-tree sometimes calls it, even though it is not adding + -- work tree files to the index, and so the filter cannot have an + -- effect on the contents of the index or on the tree that gets + -- written from it. + getindextree = ifM (Annex.getState Annex.insmudgecleanfilter) + ( return Nothing + , inRepo $ \r -> writeTreeQuiet $ r + { gitGlobalOpts = gitGlobalOpts r ++ bypassSmudgeConfig } + ) diff old new = -- Avoid running smudge clean filter, since we want the diff --git a/doc/bugs/flaky_test_failure_add_dup.mdwn b/doc/bugs/flaky_test_failure_add_dup.mdwn new file mode 100644 index 0000000000..f685a9861a --- /dev/null +++ b/doc/bugs/flaky_test_failure_add_dup.mdwn @@ -0,0 +1,74 @@ + Repo Tests v10 unlocked + [...] + add dup: FAIL (0.25s) + ./Test/Framework.hs:393: + checkcontent foo + expected: "annexed file content" + but got: "/annex/objects/SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77\n" + Use -p '/add dup/' to rerun this test only. + +I am able to produce this failure after about a minute of running the +test in a loop with: + + while git-annex test -p '/add dup/' ; do :;done + +Inside the test repo, file "foo" indeed is an unpopulated pointer file, +despite the file "foodup", which has the same git-annex key, being populated. + +Reverting [[!commit fb155b1e3e59cc1f9cf8a4fe7d47cba49d1c81af]] avoids +this test suite failure. (Or at least if it is flaky, it's much mess likely to +fail. I ran the loop for 10 minutes.) + +--- + +What the test suite is doing is a `git add` and is using the smudge filter +to add the new file as an unlocked annexed file. + +I have reproduced the problem doing the same outside the test suite. + +---- + +It seems that the keys database does not always get updated to indicate the +key used by file "foo". As shown here looking at the keys db in a failed +testcase dir: + + sqlite> select * from associated; + 1|SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77|foodup + sqlite> + +That happens in a fresh clone of the repository. So, `reconcileStaged` never +gets a chance to do anything, because it's only ever called from inside the +smudge filter. And [[!commit fb155b1e3e59cc1f9cf8a4fe7d47cba49d1c81af]] made it +not run in the smudge filter. + +This particular failure could be avoided if `git-annex init` called +`reconcileStaged`. Then it would learn about pointer files in the tree. + +But would that be a complete fix for all situations? If the user is +only running git-annex via `git add` (the smudge clean filter), +but is making other changes to the tree too, I don't think +it would. Consider for example: + + git clone r r2 + cd r2 + git-annex fsck + git mv foo bar + git config annex.largefiles anything + echo hi > baz + git add baz + +In the above example, the `git-annex fsck` updates the associated files, +so it know that the file foo has the key. But then foo is renamed to bar +and when `git add` is run on a file, generating the same key, +`reconcileStaged` does not update the associated files, so it does not +know about the rename to baz. So it leaves bar unpopulated. + +Conclusion: `reconcileStaged` needs to run even in the smudge clean filter. +But to avoid the git bug worked around by +[[!commit fb155b1e3e59cc1f9cf8a4fe7d47cba49d1c81af]], it must avoid +running `git write-tree` when called in smudge clean, at least when +in a conflicted merge situation. Luckily, `reconcileStaged` does contain +code to update things when it is unable to run `git write-tree`, so that +only needs to be used when in the smudge clean filter. + +> [[fixed|done]] and confirmed the fix works with git 2.50. --[[Joey]]
fixed build failure
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_9_fe973ec7a8a2ae37d6ec6b36a1e24873._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_9_fe973ec7a8a2ae37d6ec6b36a1e24873._comment new file mode 100644 index 0000000000..84b863c31e --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_9_fe973ec7a8a2ae37d6ec6b36a1e24873._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2025-07-21T16:27:22Z" + content=""" +That build failure has been fixed now. +"""]]
response
diff --git a/doc/forum/strange_content_of_files/comment_3_57f12bb9c4333f48755d722a5b3438ac._comment b/doc/forum/strange_content_of_files/comment_3_57f12bb9c4333f48755d722a5b3438ac._comment new file mode 100644 index 0000000000..6366958c1c --- /dev/null +++ b/doc/forum/strange_content_of_files/comment_3_57f12bb9c4333f48755d722a5b3438ac._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-07-21T16:15:37Z" + content=""" +These are pointer files, which are what is stored in git to represent an +unlocked annexed file, rather than the symlink that is stored to represent +a locked file. + +You would usually see these if the file's content is not present in the +repository; running `git-annex get` should get it and populate it with the +actual file content. + +The "<<<<<<< HEAD" is a git conflict marker; somehow or other you have gotten +two versions of an unlocked annexed file in conflict. This can be resolved the +usual way a git conflict would be resolved, by editing it to contain one +pointer or the other. Or you can run `git-annex resolvemerge` to resolve +the conflict the same way `git-annex sync` does, by checking in both +versions of the file with different filenames. +"""]]
Add --url option and url= preferred content expression
To match content that is recorded as present in an url.
Note that, this cannot ask remotes to provide an url using whereisKey, like
whereis does. Because preferred content expressions need to match the same
from multiple perspectives, and the remote would not always be available.
That's why the docs say "recorded as present", but still this may be
surprising to some who see an url in whereis output and are surprised they
cannot match on it.
The use of getDownloader is to strip the downloader prefix from urls like
"yt:". Note that, when OtherDownloader is used, this strips the ":" prefix,
and allows matching on those urls too.
To match content that is recorded as present in an url.
Note that, this cannot ask remotes to provide an url using whereisKey, like
whereis does. Because preferred content expressions need to match the same
from multiple perspectives, and the remote would not always be available.
That's why the docs say "recorded as present", but still this may be
surprising to some who see an url in whereis output and are surprised they
cannot match on it.
The use of getDownloader is to strip the downloader prefix from urls like
"yt:". Note that, when OtherDownloader is used, this strips the ":" prefix,
and allows matching on those urls too.
diff --git a/Annex/FileMatcher.hs b/Annex/FileMatcher.hs index 6157efa3f0..385e23a16e 100644 --- a/Annex/FileMatcher.hs +++ b/Annex/FileMatcher.hs @@ -1,6 +1,6 @@ {- git-annex file matching - - - Copyright 2012-2024 Joey Hess <id@joeyh.name> + - Copyright 2012-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -194,6 +194,7 @@ preferredContentTokens pcd = , ValueToken "approxlackingcopies" (usev $ limitLackingCopies "approxlackingcopies" True) , ValueToken "inbackend" (usev limitInBackend) , ValueToken "metadata" (usev limitMetaData) + , ValueToken "url" (usev limitUrl) , ValueToken "inallgroup" (usev $ limitInAllGroup $ getGroupMap pcd) , ValueToken "onlyingroup" (usev $ limitOnlyInGroup $ getGroupMap pcd) , ValueToken "balanced" (usev $ limitBalanced (repoUUID pcd) (getGroupMap pcd)) diff --git a/CHANGELOG b/CHANGELOG index 7216b21fbb..e0407b3919 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -10,6 +10,8 @@ git-annex (10.20250631) UNRELEASED; urgency=medium that have experienced the above bug. * Fix symlinks generated to annexed content when in adjusted unlocked branch in a linked worktree on a filesystem not supporting symlinks. + * Add --url option and url= preferred content expression, to match + content that is recorded as present in an url. -- Joey Hess <id@joeyh.name> Mon, 07 Jul 2025 15:59:42 -0400 diff --git a/CmdLine/GitAnnex/Options.hs b/CmdLine/GitAnnex/Options.hs index 890f9654de..4b44edda56 100644 --- a/CmdLine/GitAnnex/Options.hs +++ b/CmdLine/GitAnnex/Options.hs @@ -348,6 +348,11 @@ keyMatchingOptions' = <> help "match files with attached metadata" <> hidden ) + , annexOption (setAnnexState . Limit.addUrl) $ strOption + ( long "url" <> metavar paramGlob + <> help "match files by url" + <> hidden + ) , annexFlag (setAnnexState Limit.Wanted.addWantGet) ( long "want-get" <> help "match files the local repository wants to get" diff --git a/Limit.hs b/Limit.hs index d090e09d88..1916a606d5 100644 --- a/Limit.hs +++ b/Limit.hs @@ -31,6 +31,7 @@ import Types.FileMatcher import Types.MetaData import Annex.MetaData import Logs.MetaData +import Logs.Web import Logs.Group import Logs.Unused import Logs.Location @@ -867,6 +868,26 @@ limitMetaData s = case parseMetaDataMatcher s of . S.filter matching . metaDataValues f <$> getCurrentMetaData k +addUrl :: String -> Annex () +addUrl = addLimit . limitUrl + +limitUrl :: MkLimit Annex +limitUrl glob = Right $ MatchFiles + { matchAction = const $ const $ checkKey check + , matchNeedsFileName = False + , matchNeedsFileContent = False + , matchNeedsKey = True + , matchNeedsLocationLog = False + , matchNeedsLiveRepoSize = False + , matchNegationUnstable = False + , matchDesc = "url" =? glob + } + where + check k = any (matchGlob cglob) + . map (fst . getDownloader) + <$> getUrls k + cglob = compileGlob glob CaseSensitive (GlobFilePath False) -- memoized + addAccessedWithin :: Duration -> Annex () addAccessedWithin duration = do now <- liftIO getPOSIXTime diff --git a/doc/git-annex-matching-options.mdwn b/doc/git-annex-matching-options.mdwn index ea29f98848..cf964cc71d 100644 --- a/doc/git-annex-matching-options.mdwn +++ b/doc/git-annex-matching-options.mdwn @@ -178,6 +178,11 @@ in either of two repositories. (Note that you will need to quote the second parameter to avoid the shell doing redirection.) +* `--url=glob` + + Matches when the content is recorded as being present in an url that + matches the glob. + * `--want-get` Matches only when the preferred content settings for the local repository diff --git a/doc/git-annex-preferred-content.mdwn b/doc/git-annex-preferred-content.mdwn index 52c6ff225e..6b9fc521ac 100644 --- a/doc/git-annex-preferred-content.mdwn +++ b/doc/git-annex-preferred-content.mdwn @@ -166,6 +166,11 @@ content not being configured. To match PDFs with between 100 and 200 pages (assuming something has set that metadata), use `metadata=pagecount>=100 and metadata=pagecount<=200` +* `url=glob` + + Matches when the content is recorded as being present in an url that + matches the glob. + * `present` Makes content be wanted if it's present, but not otherwise. diff --git a/doc/todo/match_on_url.mdwn b/doc/todo/match_on_url.mdwn index 6623debbed..5ef885e02d 100644 --- a/doc/todo/match_on_url.mdwn +++ b/doc/todo/match_on_url.mdwn @@ -10,3 +10,5 @@ expression if adding that. An alternative way could be to populate a metadata field with the url, if that were done without increasing the size of the git repository. --[[Joey]] + +> [[done]] --[[Joey]]
todo
diff --git a/doc/todo/match_on_url.mdwn b/doc/todo/match_on_url.mdwn new file mode 100644 index 0000000000..6623debbed --- /dev/null +++ b/doc/todo/match_on_url.mdwn @@ -0,0 +1,12 @@ +Add a matching option that matches on the recorded url of a file. + +My use case is eg using, `git-annex find` to list files that were addurled +from a given host. So I want a way to match on the url with a glob, eg +`--url=*/example.com/*` + +Seems likely that there would also be a corresponding preferred content +expression if adding that. + +An alternative way could be to populate a metadata field with the url, +if that were done without increasing the size of the git repository. +--[[Joey]]
Added a comment
diff --git a/doc/forum/strange_content_of_files/comment_2_422dbe52b347670a8dde34fb18a81999._comment b/doc/forum/strange_content_of_files/comment_2_422dbe52b347670a8dde34fb18a81999._comment new file mode 100644 index 0000000000..ca88636a3f --- /dev/null +++ b/doc/forum/strange_content_of_files/comment_2_422dbe52b347670a8dde34fb18a81999._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jnkl" + avatar="http://cdn.libravatar.org/avatar/2ab576f3bf2e0d96b1ee935bb7f33dbe" + subject="comment 2" + date="2025-07-19T09:05:46Z" + content=""" +Yes, it's btrfs. +"""]]
todo
diff --git a/doc/todo/should_pull_drop_from_remote.mdwn b/doc/todo/should_pull_drop_from_remote.mdwn new file mode 100644 index 0000000000..8fc051ca53 --- /dev/null +++ b/doc/todo/should_pull_drop_from_remote.mdwn @@ -0,0 +1,42 @@ +Currently, `git-annex pull` drops unwanted files from the remote it pulls +from. I wonder if this is a good choice? + +I was surprised by this behavior today myself. I was thinking of pull as +conceptually not modifying the state of the remote, only getting data from +it. That's what `git pull` does after all. + +My use case was that I knew the remote didn't want a file, but I wanted to +leave the file on the remote for a little while longer. So I did a pull, +rather than a full sync. + +It's also the case that `git-annex push` drops unwanted files from the +local repository. The same analogy to `git push` would say it should not do +that. + +Separating out these behaviors would have pull drop unwanted files from the +local repository, while push drops unwanted files from the remote. The +latter seems unambiguously what the user would want; the former might be +surprising to some, but one of pull/push needs to drop from local in order +for them combined to be the same as sync. + +Looking at [[!commit 5df89d58c7d43b5cd26829cb8c4699e02fc352f3]] that +implemented pull and push, I think this behavior was emergent, not +designed. The existing `git-annex sync --pull` happened to drop unwanted +content from the remote and `git-annex pull` inherited that behavior. +Looking back to [[!commit 1cc1f9f4e5e3e974ddec069b2a6a3edf0893c369]] that +implemented `--pull`, it also doesn't seem to have considered what to do +about dropping. + +Note that there is some risk of a wider behavior change than expected if +implementing this. `handleDropsFrom` drops from remotes first, and from the +local repository last. So if a file is unwanted by both local and remote, +and both start with a copy, `git-annex pull` will drop it from the remote, +then be unable to drop it from the local, and so it will stay on the local +repo. If it were changed to only drop from the local repo, it would be able +to drop it from local, and the file would stay on the remote. It's not +clear to me that either behavior is better than the other; both are legal +solutions to that preferred content situation of course. It might be +possible to only document this behavior change, and if a user has set up +such a preferred content, they can of course change it to something that +picks the repository they want to keep the copy. +--[[Joey]]
Added a comment: fs type?
diff --git a/doc/forum/strange_content_of_files/comment_1_c7e71dd9db16a9316eaf9ed8aa8fb8aa._comment b/doc/forum/strange_content_of_files/comment_1_c7e71dd9db16a9316eaf9ed8aa8fb8aa._comment new file mode 100644 index 0000000000..f97155ce93 --- /dev/null +++ b/doc/forum/strange_content_of_files/comment_1_c7e71dd9db16a9316eaf9ed8aa8fb8aa._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jose1711" + avatar="http://cdn.libravatar.org/avatar/bca6c53d89f86ddae44aa62f99c670a4" + subject="fs type?" + date="2025-07-18T09:10:50Z" + content=""" +does the underlying filesystem support symlinks? +"""]]
diff --git a/doc/forum/strange_content_of_files.mdwn b/doc/forum/strange_content_of_files.mdwn new file mode 100644 index 0000000000..e166616dc7 --- /dev/null +++ b/doc/forum/strange_content_of_files.mdwn @@ -0,0 +1,16 @@ +Some of my files suddenly have a content like this: + + + <<<<<<< HEAD + /annex/objects/SHA256E-s761469--0311429015097e62e392d8c0c7c8250a07eadbf9670f8396286024ebc7b2d9ec.svg + ======= + /annex/objects/SHA256E-s744177--4b9dce2dc6a6933defa7b69e5ebbd8f4fd653246c31891b7fb9b27ffec038f22.svg + >>>>>>> refs/remotes/singapore/master + +or just + + /annex/objects/SHA256E-s1155603--95cefc864632f441029328e96414b026492718789941377f59b1a4bf6694be69.jpg + +instead of the actual content. These are not links or something. + +What happened?
Added a comment: Thanks!
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_8_f8637dc282e5867bdce23e88f55474e5._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_8_f8637dc282e5867bdce23e88f55474e5._comment new file mode 100644 index 0000000000..4eacc8d3a8 --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_8_f8637dc282e5867bdce23e88f55474e5._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="mih" + avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd" + subject="Thanks!" + date="2025-07-17T06:58:01Z" + content=""" +Thanks for the fix! I tried to confirm it in my original test case, but currently it FTBFS with + +``` +[570 of 755] Compiling Command.P2PHttp +/git-annex-wheel/git-annex/Command/P2PHttp.hs:305:73: error: [GHC-83865] +[571 of 755] Compiling Command.P2P + • Couldn't match type ‘OsString’ +``` +"""]]
fsck: Fix location of annexed files when run in linked worktrees
This cleans up after the bug that was fixed in commit
6a9e923c74a1ae5c4c904e7a28e9339bdc241427
Object files that were stored in the wrong location are rescued,
and after that any wrong location logs will be fixed by the usual fsck.
This cleans up after the bug that was fixed in commit
6a9e923c74a1ae5c4c904e7a28e9339bdc241427
Object files that were stored in the wrong location are rescued,
and after that any wrong location logs will be fixed by the usual fsck.
diff --git a/Annex.hs b/Annex.hs index 84b2bda0ad..aba23587fb 100644 --- a/Annex.hs +++ b/Annex.hs @@ -12,6 +12,7 @@ module Annex ( AnnexState(..), AnnexRead(..), new, + new', run, eval, makeRunner, @@ -291,10 +292,13 @@ newAnnexState c r = do - Ensures the config is read, if it was not already, and performs - any necessary git repo fixups. -} new :: Git.Repo -> IO (AnnexState, AnnexRead) -new r = do +new = new' fixupRepo + +new' :: (Git.Repo -> GitConfig -> IO Git.Repo) -> Git.Repo -> IO (AnnexState, AnnexRead) +new' f r = do r' <- Git.Config.read r let c = extractGitConfig FromGitConfig r' - st <- newAnnexState c =<< fixupRepo r' c + st <- newAnnexState c =<< f r' c rd <- newAnnexRead c return (st, rd) diff --git a/CHANGELOG b/CHANGELOG index d85abaa2c1..7216b21fbb 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -6,6 +6,8 @@ git-annex (10.20250631) UNRELEASED; urgency=medium symlinks, that caused annexed file content to be stored in the wrong location inside the git directory, and also caused pointer files to not get populated. + * fsck: Fix location of annexed files when run in linked worktrees + that have experienced the above bug. * Fix symlinks generated to annexed content when in adjusted unlocked branch in a linked worktree on a filesystem not supporting symlinks. diff --git a/Command/Fsck.hs b/Command/Fsck.hs index 50c21149b3..cd9dd335e1 100644 --- a/Command/Fsck.hs +++ b/Command/Fsck.hs @@ -1,6 +1,6 @@ {- git-annex command - - - Copyright 2010-2023 Joey Hess <id@joeyh.name> + - Copyright 2010-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -15,6 +15,7 @@ import qualified Annex import qualified Remote import qualified Types.Backend import qualified Backend +import qualified Git import Annex.Content import Annex.Verify #ifndef mingw32_HOST_OS @@ -24,6 +25,7 @@ import Annex.Content.Presence import Annex.Content.Presence.LowLevel import Annex.Perms import Annex.Link +import Annex.Fixup import Logs.Location import Logs.Trust import Logs.Activity @@ -102,6 +104,8 @@ seek o = startConcurrency commandStages $ do from <- maybe (pure Nothing) (Just <$$> getParsed) (fsckFromOption o) u <- maybe getUUID (pure . Remote.uuid) from checkDeadRepo u + when (isNothing from) $ + cleanupLinkedWorkTreeBug i <- prepIncremental u (incrementalOpt o) let seeker = AnnexedFileSeeker { startAction = const $ start from i @@ -768,4 +772,38 @@ withFsckDb (ContIncremental h) a = a h withFsckDb (StartIncremental h) a = a h withFsckDb (NonIncremental mh) a = maybe noop a mh withFsckDb (ScheduleIncremental _ _ i) a = withFsckDb i a - + +-- A bug caused linked worktrees on filesystems not supporting symlinks +-- to not use the common annex directory, but one annex directory per +-- linked worktree. Object files could end up stored in those directories. +-- +-- When run in a linked worktree with its own annex directory that is not a +-- symlink, move any object files to the right location, and delete the +-- annex directory. +cleanupLinkedWorkTreeBug :: Annex () +cleanupLinkedWorkTreeBug = + whenM (Annex.inRepo needsGitLinkFixup) $ do + r <- Annex.gitRepo + -- mainWorkTreePath is set by fixupUnusualRepos. + -- Unsetting it makes a version of the Repo that uses + -- the wrong object location. + let r' = r { Git.mainWorkTreePath = Nothing } + let dir = gitAnnexDir r' + whenM (liftIO $ dirnotsymlink dir) $ do + showSideAction $ "Cleaning up directory " + <> QuotedPath dir + <> " created by buggy version of git-annex" + (st, rd) <- liftIO $ Annex.new' (\r'' _c -> pure r'') r' + ks <- liftIO $ Annex.eval (st, rd) $ + listKeys InAnnex + forM_ ks $ \k -> void $ tryNonAsync $ do + loc <- liftIO $ gitAnnexLocation k r' + (Annex.gitconfig st) + moveAnnex k loc + void $ tryNonAsync $ liftIO $ + removeDirectoryRecursive dir + where + dirnotsymlink dir = + tryIO (R.getSymbolicLinkStatus (fromOsPath dir)) >>= \case + Right st -> return $ not (isSymbolicLink st) + Left _ -> return False diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn index bbbb8143b3..9aa0d48d6d 100644 --- a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn @@ -97,3 +97,5 @@ This is on win11. Absolutely! Approaching 15 years of bigger, better, faster, more ;-) [[!tag projects/INM7]] + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_7_cfb97162cd21818995f9d71a5ce3a0d0._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_7_cfb97162cd21818995f9d71a5ce3a0d0._comment new file mode 100644 index 0000000000..bb752665bf --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_7_cfb97162cd21818995f9d71a5ce3a0d0._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2025-07-15T17:07:43Z" + content=""" +Running `git-annex fsck` in the affected worktree will clean up from this +bug. +"""]]
tag INM7
Based on submitter, I assume so..
Based on submitter, I assume so..
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn index e99aec49d3..bbbb8143b3 100644 --- a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn @@ -96,3 +96,4 @@ This is on win11. Absolutely! Approaching 15 years of bigger, better, faster, more ;-) +[[!tag projects/INM7]]
fixed but not ready to close yet
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_6_03de310223c158d6ff8e2485854792fc._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_6_03de310223c158d6ff8e2485854792fc._comment new file mode 100644 index 0000000000..2610b6ff3f --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_6_03de310223c158d6ff8e2485854792fc._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2025-07-14T17:35:14Z" + content=""" +I have fixed this bug. + +I still need to make `git-annex fsck` clean up repositories that +encountered this bug. +"""]]
subtlety
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_4_8643dcafa1f6ef04ca32cda6aa841a11._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_4_8643dcafa1f6ef04ca32cda6aa841a11._comment index 7b0d351849..889eb5235d 100644 --- a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_4_8643dcafa1f6ef04ca32cda6aa841a11._comment +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_4_8643dcafa1f6ef04ca32cda6aa841a11._comment @@ -11,7 +11,7 @@ which is not the right path to the object file. Should be (In the ext4 case that does not happen, instead the reconcileStaged `git diff` does not include the new file. So that is a different problem.) -It seems that `.git/worktrees/foo/annex` is a symlink when the filesystem +Turns out that `.git/worktrees/foo/annex` is a symlink when the filesystem supports symlinks. But, when symlinks are not supported, that symlink is not made. And so it looks for objects there, but they're not there. This could also cause other behavior differences, since other state files @@ -23,7 +23,20 @@ But git-annex shouldn't rely on the symlink in things like `gitAnnexLocation`. Luckily, `annexDir` exists, and I've checked and it is the *only* thing -that produces "annex" as a path to the annex directory. So `annexDir` will -need to be made into a function that is passed the git repository and -handles this special case. +that produces "annex" as a path to the annex directory. So `annexDir` could +be made into a function that is passed the git repository and +handles this special case, by returning a path like "../../annex", which +when combined with the git directory in a linked worktree, ends up pointing +to the main repository's ".git/annex". + +Except, `annexDir` is not only used to find the paths to object files. It's +also used to *generate* the symlink target. When `git-annex add` is run in +a linked worktree, and symlinks are supported, the symlink target needs to +be of the form ".git/annex/". With this `annexDir` change, it would not be +right. + +So, it seems that `annexDir`, and some functions that call it need to behave +differently when they're generating a path into the annex directory, vs +when they're generating a symlink target or other similar thing. +Which is a subtle distinction to introduce. """]]
complication
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_5_ca4480178d6723bcad985793364ad855._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_5_ca4480178d6723bcad985793364ad855._comment new file mode 100644 index 0000000000..c3bbfe80f0 --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_5_ca4480178d6723bcad985793364ad855._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-07-11T18:38:59Z" + content=""" +When a secondary worktree is used on a filesystem not supporting symlinks, +it would be possible for `git-annex move` to move an object from another +repository. And store it to the wrong location, under +`.git/worktrees/foo/annex/objects/`. The object would still be accessible, +and a later `git-annex copy --to remote`, if run in the same worktree, +would be able to send the object on to a remote. + +But if this bug gets fixed, then the misplaced object file will be left, +and won't be used any longer. Which could appear to the user as data loss +in some situations. Eg, the copy to the remote would fail. (There might be +situations where the populated worktree file would be used as a copy of the +object, but that assumes the worktree file is still populated.) + +Also, `git-annex drop` would not delete such misplaced object files, so the +user would be left with bloated repository. + +So, `git-annex fsck` will need to be made to search out such misplaced +object files and move them to the correct objects directory. +"""]]
analysis
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_4_8643dcafa1f6ef04ca32cda6aa841a11._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_4_8643dcafa1f6ef04ca32cda6aa841a11._comment new file mode 100644 index 0000000000..7b0d351849 --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_4_8643dcafa1f6ef04ca32cda6aa841a11._comment @@ -0,0 +1,29 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-07-11T17:41:03Z" + content=""" +Apparently in the FAT case `gitAnnexLocation` is returning something like +`../demo/.git/worktrees/demo-wt3/annex/objects/d13/2dd/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999` +which is not the right path to the object file. Should be +`../demo/.git/annex/objects/d13/2dd/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999` + +(In the ext4 case that does not happen, instead the reconcileStaged `git diff` +does not include the new file. So that is a different problem.) + +It seems that `.git/worktrees/foo/annex` is a symlink when the filesystem +supports symlinks. But, when symlinks are not supported, that symlink is +not made. And so it looks for objects there, but they're not there. +This could also cause other behavior differences, since other state files +that go in the annex directory get written there, so git-annex inside +and outside the worktree, or in different worktrees, can have different states. + +That symlink is needed to make annex symlinks point to the object files. +But git-annex shouldn't rely on the symlink in things like +`gitAnnexLocation`. + +Luckily, `annexDir` exists, and I've checked and it is the *only* thing +that produces "annex" as a path to the annex directory. So `annexDir` will +need to be made into a function that is passed the git repository and +handles this special case. +"""]]
update
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_3_69386f9f7c39782f4809beb5a7b6df92._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_3_69386f9f7c39782f4809beb5a7b6df92._comment new file mode 100644 index 0000000000..85e3b7bf95 --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_3_69386f9f7c39782f4809beb5a7b6df92._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-07-11T16:42:26Z" + content=""" +I've verified that `populatePointerFile` is not getting called in this case, +and does get called in the same situation on ext4. And that call is made by +`reconcileStaged`, which is getting called. +So I would look in there for the bug. + +Except, interestingly, some percent of the time, on ext4, manually +populating the pointer file followed by git-annex add also does not call +`populatePointerFile`. The pointer file remains unpopulated until another +process calls `reconcileStaged`, and it gets populated then. This seems +like also a bug, possibly another case of the same bug? +"""]]
another case
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_2_ddd574497f95e5b30c3c7706ef0ad6a5._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_2_ddd574497f95e5b30c3c7706ef0ad6a5._comment new file mode 100644 index 0000000000..0b430e5f87 --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_2_ddd574497f95e5b30c3c7706ef0ad6a5._comment @@ -0,0 +1,21 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-07-11T16:18:16Z" + content=""" +In a FAT filesystem after reproducing this bug with initial file `foo`, +the following thing also happens: + + joey@darkstar:~/mnt/demo-wt3#demo-wt3>cat foo + /annex/objects/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999 + joey@darkstar:~/mnt/demo-wt3#demo-wt3>cp foo bar + joey@darkstar:~/mnt/demo-wt3#demo-wt3>git add bar + joey@darkstar:~/mnt/demo-wt3#demo-wt3>cat bar + /annex/objects/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999 + +This seems to be another case of the bug, because the content of the object +is present in the repository, so usually `git add` of a pointer file +should result in the smudge filter populating it. + +`git-annex add` behaves the same as well. +"""]]
reproed
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_1_a431120abf810e41b03a071eb748ea8b._comment b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_1_a431120abf810e41b03a071eb748ea8b._comment new file mode 100644 index 0000000000..8adb5fd491 --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__/comment_1_a431120abf810e41b03a071eb748ea8b._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-07-11T16:05:53Z" + content=""" +I reproduced the same behavior on linux, when using a FAT filesystem. + +So, this has something to do with automatic entry of an unlocked adjusted +branch on a crippled filesystem. + +Interestingly, doing the same on ext4, and manually using `git-annex +unlock` on the file and committing before checking out the worktree does +not replicate the problem. The unlocked file is automatically populated on +worktree checkout there. And manual `git-annex adjust --unlock` before +worktree creation also doesn't have the problem, even though the worktree +does end up in an adjusted unlocked branch. + +(The output of `git-annex get` is also weird. I think what's happening is +that, since the unlocked file is not populated, it is enumerated as a file +that `get` can operate on. But then when it runs, since there is no other +location, it displays that message. The command does not have anything to +handle this unusual case of the file being a pointer file but its content +being present in the repisitory. And, usually there is no way that can +happen, eg even writing a pointer file manually followed by `git add` of it +populates it. So I think this unusual behavior of `git-annex get` doesn't +need to change, once this bug is fixed it should not be possible to see +that behavior.) +"""]]
Added a comment
diff --git a/doc/bugs/error_during_copy_to_S3/comment_4_6c2bcd8f475459e76aed9a5ea6661641._comment b/doc/bugs/error_during_copy_to_S3/comment_4_6c2bcd8f475459e76aed9a5ea6661641._comment new file mode 100644 index 0000000000..bfbd226f2f --- /dev/null +++ b/doc/bugs/error_during_copy_to_S3/comment_4_6c2bcd8f475459e76aed9a5ea6661641._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="gioele@678b7c03f524f2669b179b603f65352fcc16774e" + nickname="gioele" + avatar="http://cdn.libravatar.org/avatar/366dbda84e78aff8a8a070622aeb63ce" + subject="comment 4" + date="2025-07-09T19:36:24Z" + content=""" +For the record, I've experienced a similar problem when uploading a 11 GB file to Backblaze B2: + + +> 6:1 (158)-6:8 (165): Expected end element for: Name {nameLocalName = \"hr\", nameNamespace = Nothing, namePrefix = Nothing}, but received: EventEndElement (Name {nameLocalName = \"body\", nameNamespace = Nothing, namePrefix = Nothing}) + +Using `git annex copy --explain` shows that the remote provided a more useful error message: + +> 1% 116.47 MiB 5 MiB/s 38m9s +> +> [22:57:03.01203742] (Remote.S3) Response status: Status {statusCode = 413, statusMessage = \"Request Entity Too Large\"} + +Maybe the status code could be inspected before parsing the body of the response to provide a clearer error message? +"""]]
Bug report with reproducer
diff --git a/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn new file mode 100644 index 0000000000..e99aec49d3 --- /dev/null +++ b/doc/bugs/Missing_file_content_in_secondary_worktree___40__win__41__.mdwn @@ -0,0 +1,98 @@ +### Please describe the problem. + +In a secondary worktree on Windows, pointer files remain even with keys being available. +A git annex get fails with "not available", even though "whereis" reports "[here]". + +I would expect the behavior to be internally consistent (keys being available in the +worktree when they are found to be available in the repository). + +This report assumes that secondary worktrees are, in principle, a supported use case, +based on the statement at https://git-annex.branchable.com/tips/Using_git-worktree_with_annex/: +"Getting, dropping and syncing content works fine in a worktree". + +### What steps will reproduce the problem? + +``` +C:\Users\mih>md demo +C:\Users\mih>cd demo +C:\Users\mih\demo>git init +Initialized empty Git repository in C:/Users/mih/demo/.git/ + +C:\Users\mih\demo>git annex init +init + Detected a filesystem without fifo support. + Disabling ssh connection caching. + Detected a crippled filesystem. + Entering an adjusted branch where files are unlocked as this filesystem does not support locked files. +Switched to branch 'adjusted/master(unlocked)' +ok +(recording state in git...) + +C:\Users\mih\demo>echo "onetwothree" > 123.txt +C:\Users\mih\demo>git annex add 123.txt +add 123.txt +ok +(recording state in git...) +C:\Users\mih\demo>git commit -m Demo +[adjusted/master(unlocked) abf6dc4] Demo + 1 file changed, 1 insertion(+) + create mode 100644 123.txt + +C:\Users\mih\demo>git status +On branch adjusted/master(unlocked) +nothing to commit, working tree clean + +C:\Users\mih\demo>git worktree add ..\demo-wt2 +Preparing worktree (new branch 'demo-wt2') +HEAD is now at abf6dc4 Demo + +C:\Users\mih\demo>cd ..\demo-wt2 +C:\Users\mih\demo-wt2>dir + Volume in drive C is Windows + Volume Serial Number is ECB5-B3C0 + + Directory of C:\Users\mih\demo-wt2 + +08/07/2025 12:30 <DIR> . +08/07/2025 12:30 <DIR> .. +08/07/2025 12:30 98 123.txt + 1 File(s) 98 bytes + 2 Dir(s) 384.871.727.104 bytes free + +C:\Users\mih\demo-wt2>type 123.txt +/annex/objects/SHA256E-s16--7091a3bb554a96356db798ae528b2eb2ec9ca6ef99daa0263e6f0af65b17bd5c.txt + +C:\Users\mih\demo-wt2>git annex get 123.txt +get 123.txt (not available) + No other repository is known to contain the file. +failed +get: 1 failed + +C:\Users\mih\demo-wt2>git annex whereis 123.txt +whereis 123.txt (1 copy) + a4127935-16c9-4d81-98c6-9fd65461293c -- mih@bnbnb67:~/demo [here] +ok +``` + + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20250630-gc6b6be2eab17fd5d8921f3af9376d15f2cf917f5 +build flags: Assistant Webapp Pairing TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.1 http-client-0.7.19 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: mingw32 x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +This is on win11. + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Absolutely! Approaching 15 years of bigger, better, faster, more ;-) +
Added a comment: We'll call this solved...
diff --git a/doc/todo/Relative_Ignores_for_Relative_Imports/comment_2_cfc8c1f62d5524debb5aadfecf17a9ee._comment b/doc/todo/Relative_Ignores_for_Relative_Imports/comment_2_cfc8c1f62d5524debb5aadfecf17a9ee._comment new file mode 100644 index 0000000000..d1aea9bdb9 --- /dev/null +++ b/doc/todo/Relative_Ignores_for_Relative_Imports/comment_2_cfc8c1f62d5524debb5aadfecf17a9ee._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="We'll call this solved..." + date="2025-07-08T07:01:21Z" + content=""" +OK, I verified that what you accomplished also works on my end. This must be yet another gotcha related to my [previous issue regarding imports](https://git-annex.branchable.com/forum/Import_-_Changing_Largefiles/): once something is imported it is not possible to attempt a \"clean\" re-import. At some point in my initial testing I must have accidentally imported the files ignored at `foo/*.c` (`root-ignore/c` in my case) and from there I could not get those files not to import once again. +"""]]
p2phttp: Added --socket option
Used protectedOutput to set up a umask that makes the socket only
accessible by the current user.
Authentication is still needed when using this option unless it is combined
with --wideopen. It was just simpler to keep authentication separate from
this.
Used protectedOutput to set up a umask that makes the socket only
accessible by the current user.
Authentication is still needed when using this option unless it is combined
with --wideopen. It was just simpler to keep authentication separate from
this.
diff --git a/CHANGELOG b/CHANGELOG index 09d75769af..76ce9410b1 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,6 +1,7 @@ git-annex (10.20250631) UNRELEASED; urgency=medium * p2phttp: Scan multilevel directories with --directory. + * p2phttp: Added --socket option. -- Joey Hess <id@joeyh.name> Mon, 07 Jul 2025 15:59:42 -0400 diff --git a/Command/P2PHttp.hs b/Command/P2PHttp.hs index 3d61c213ef..1463ec5dda 100644 --- a/Command/P2PHttp.hs +++ b/Command/P2PHttp.hs @@ -22,11 +22,13 @@ import qualified Git.Construct import qualified Annex import Types.Concurrency import qualified Utility.RawFilePath as R +import Utility.FileMode import Servant import qualified Network.Wai.Handler.Warp as Warp import qualified Network.Wai.Handler.WarpTLS as Warp import Network.Socket (PortNumber) +import qualified Network.Socket as Socket import System.PosixCompat.Files (isSymbolicLink) import qualified Data.Map as M import Data.String @@ -42,6 +44,7 @@ cmd = noMessages $ dontCheck repoExists $ data Options = Options { portOption :: Maybe PortNumber , bindOption :: Maybe String + , socketOption :: Maybe FilePath , certFileOption :: Maybe FilePath , privateKeyFileOption :: Maybe FilePath , chainFileOption :: [FilePath] @@ -67,6 +70,10 @@ optParser _ = Options ( long "bind" <> metavar paramAddress <> help "specify address to bind to" )) + <*> optional (strOption + ( long "socket" <> metavar paramPath + <> help "bind to unix domain socket" + )) <*> optional (strOption ( long "certfile" <> metavar paramFile <> help "TLS certificate file for HTTPS" @@ -174,12 +181,20 @@ runServer o mst = go `finally` serverShutdownCleanup mst let settings = Warp.setPort port $ Warp.setHost host $ Warp.defaultSettings mstv <- newTMVarIO mst + let app = p2pHttpApp mstv case (certFileOption o, privateKeyFileOption o) of - (Nothing, Nothing) -> Warp.runSettings settings (p2pHttpApp mstv) - (Just certfile, Just privatekeyfile) -> do - let tlssettings = Warp.tlsSettingsChain - certfile (chainFileOption o) privatekeyfile - Warp.runTLS tlssettings settings (p2pHttpApp mstv) + (Nothing, Nothing) -> case socketOption o of + Nothing -> Warp.runSettings settings app + Just socketpath -> + withsocket socketpath $ \sock -> + Warp.runSettingsSocket settings sock app + (Just certfile, Just privatekeyfile) -> + case socketOption o of + Nothing -> do + let tlssettings = Warp.tlsSettingsChain + certfile (chainFileOption o) privatekeyfile + Warp.runTLS tlssettings settings app + Just _socketpath -> giveup "HTTPS is not supported with --socket" _ -> giveup "You must use both --certfile and --privatekeyfile options to enable HTTPS." port = maybe (fromIntegral defaultP2PHttpProtocolPort) @@ -189,6 +204,13 @@ runServer o mst = go `finally` serverShutdownCleanup mst (fromString "*") -- both ipv4 and ipv6 fromString (bindOption o) + withsocket socketpath = + bracket (opensocket socketpath) Socket.close + opensocket socketpath = protectedOutput $ do + sock <- Socket.socket Socket.AF_UNIX Socket.Stream 0 + Socket.bind sock $ Socket.SockAddrUnix socketpath + Socket.listen sock Socket.maxListenQueue + return sock mkServerState :: Options -> M.Map Auth P2P.ServerMode -> Annex P2PHttpServerState mkServerState o authenv = diff --git a/doc/git-annex-p2phttp.mdwn b/doc/git-annex-p2phttp.mdwn index c712721fb1..a7ecd581d8 100644 --- a/doc/git-annex-p2phttp.mdwn +++ b/doc/git-annex-p2phttp.mdwn @@ -103,6 +103,12 @@ convenient way to download the content of any key, by using the path What address to bind to. The default is to bind to all addresses. +* `--socket=path` + + Rather than binding to an address, create and listen to a unix domain + socket at the specified location. This can be useful when proxying + to `git-annex p2phttp`. + * `--certfile=filename` TLS certificate file to use. Combining this with `--privatekeyfile` diff --git a/doc/todo/p2phttp__58___listen_on_unix_domain_sockets.mdwn b/doc/todo/p2phttp__58___listen_on_unix_domain_sockets.mdwn index d61a8a5402..58a79fe2f4 100644 --- a/doc/todo/p2phttp__58___listen_on_unix_domain_sockets.mdwn +++ b/doc/todo/p2phttp__58___listen_on_unix_domain_sockets.mdwn @@ -1,3 +1,5 @@ For p2phttp support in forgejo-aneksajo I decided to just spawn a `git annex p2phttp --wideopen` server, do authentication on the Forgejo side, and then proxy requests to p2phttp. Since p2phttp only supports serving one repository at the moment this means that I have to allocate one free port per repository. Actually finding a free port adds complexity and a race condition, as there also seems to be no way to set `--port 0` for p2phttp and then figure out which port it bound to. This would be simplified if p2phttp could listen on unix domain sockets instead. + +> [[done]] --[[Joey]] diff --git a/doc/todo/p2phttp__58___listen_on_unix_domain_sockets/comment_3_eadeac1803ffc89e7684df508765e561._comment b/doc/todo/p2phttp__58___listen_on_unix_domain_sockets/comment_3_eadeac1803ffc89e7684df508765e561._comment new file mode 100644 index 0000000000..b596538988 --- /dev/null +++ b/doc/todo/p2phttp__58___listen_on_unix_domain_sockets/comment_3_eadeac1803ffc89e7684df508765e561._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-07-07T19:26:56Z" + content=""" +I've made it support nested directories, which was easy. + +Should be possible to make it use runSettingsSocket indeed though. +"""]] diff --git a/doc/todo/p2phttp__58___listen_on_unix_domain_sockets/comment_4_3d8584bfb819f3c6ece5ecdec3d9c020._comment b/doc/todo/p2phttp__58___listen_on_unix_domain_sockets/comment_4_3d8584bfb819f3c6ece5ecdec3d9c020._comment new file mode 100644 index 0000000000..43e4f2ef3a --- /dev/null +++ b/doc/todo/p2phttp__58___listen_on_unix_domain_sockets/comment_4_3d8584bfb819f3c6ece5ecdec3d9c020._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-07-07T20:28:38Z" + content=""" +Implemented a --socket option. I have not tried connecting to it as a +client, but it seems to be listening to it, so I assume all is good. + +Note that it still checks for authentication when using the socket, +so you will probably want to combine it with --wideopen. The socket mode +allows only the current user to access it. +"""]]
p2phttp: Scan multilevel directories with --directory
This allows for eg dir/user/repo structure. But also other layouts. It
still does not look for repositories that are nested inside other
repositories.
The check for symlinks is mostly to avoid cycles that would prevent
findRepos from returning. Eg, foo/bar/baz being a symlink to foo/bar.
If the directory is writable by someone else they can still race it and
get it to follow a symlink to some other directory. I don't think p2phttp
needs to worry about that kind of situation though, and I doubt it avoids
such problems when operating on files in a git-annex repository either.
This allows for eg dir/user/repo structure. But also other layouts. It
still does not look for repositories that are nested inside other
repositories.
The check for symlinks is mostly to avoid cycles that would prevent
findRepos from returning. Eg, foo/bar/baz being a symlink to foo/bar.
If the directory is writable by someone else they can still race it and
get it to follow a symlink to some other directory. I don't think p2phttp
needs to worry about that kind of situation though, and I doubt it avoids
such problems when operating on files in a git-annex repository either.
diff --git a/CHANGELOG b/CHANGELOG index a0e380ace9..09d75769af 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,9 @@ +git-annex (10.20250631) UNRELEASED; urgency=medium + + * p2phttp: Scan multilevel directories with --directory. + + -- Joey Hess <id@joeyh.name> Mon, 07 Jul 2025 15:59:42 -0400 + git-annex (10.20250630) upstream; urgency=medium * Work around git 2.50 bug that caused it to crash when there is a merge diff --git a/Command/P2PHttp.hs b/Command/P2PHttp.hs index 029307ed10..3d61c213ef 100644 --- a/Command/P2PHttp.hs +++ b/Command/P2PHttp.hs @@ -1,6 +1,6 @@ {- git-annex command - - - Copyright 2024 Joey Hess <id@joeyh.name> + - Copyright 2024-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -21,11 +21,13 @@ import qualified Git import qualified Git.Construct import qualified Annex import Types.Concurrency +import qualified Utility.RawFilePath as R import Servant import qualified Network.Wai.Handler.Warp as Warp import qualified Network.Wai.Handler.WarpTLS as Warp import Network.Socket (PortNumber) +import System.PosixCompat.Files (isSymbolicLink) import qualified Data.Map as M import Data.String import Control.Concurrent.STM @@ -268,6 +270,20 @@ findRepos :: Options -> IO [Git.Repo] findRepos o = do files <- concat <$> mapM (dirContents . toOsPath) (directoryOption o) - map Git.Construct.newFrom . catMaybes - <$> mapM Git.Construct.checkForRepo files - + concat <$> mapM go files + where + go f = Git.Construct.checkForRepo f >>= \case + Just loc -> return [Git.Construct.newFrom loc] + Nothing -> + -- Avoid following symlinks, both to avoid + -- cycles and in case there is an unexpected + -- symlink to some other directory we are not + -- supposed to serve. + ifM (isSymbolicLink <$> R.getSymbolicLinkStatus f) + ( return [] + -- Ignore any errors getting the contents of a + -- subdirectory. + , catchNonAsync + (concat <$> (mapM go =<< dirContents f)) + (const (return [])) + ) diff --git a/doc/git-annex-p2phttp.mdwn b/doc/git-annex-p2phttp.mdwn index 4dd7869c92..c712721fb1 100644 --- a/doc/git-annex-p2phttp.mdwn +++ b/doc/git-annex-p2phttp.mdwn @@ -41,8 +41,10 @@ convenient way to download the content of any key, by using the path * `--directory=path` - Serve each git-annex repository found in immediate - subdirectories of a directory. + Serve each git-annex repository found in subdirectories of the directory. + For example, `--directory=/foo` will find git-annex repositories + in `/foo/bar`, `/foo/user/bar`, and so on. Note that a git-annex + repository located within another git-annex repository will not be found. This option can be provided more than once to serve several directories full of git-annex repositories.
comment
diff --git a/doc/todo/generic_p2p_socket_transport/comment_11_5cf016637c51fb639d3ede23b6df636b._comment b/doc/todo/generic_p2p_socket_transport/comment_11_5cf016637c51fb639d3ede23b6df636b._comment new file mode 100644 index 0000000000..dcef66647c --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_11_5cf016637c51fb639d3ede23b6df636b._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2025-07-07T19:20:36Z" + content=""" +I did some necessary groundwork for this in +[[!commit 46ee651c9438a5dfc430b231089d3ac1e0d09e3c]]. + +I am about ready to really start implementing this, I think. The design +seems to be ready. +"""]]
design
diff --git a/doc/todo/generic_p2p_socket_transport/comment_10_d1bb9a968329e889e05879001a2b41de._comment b/doc/todo/generic_p2p_socket_transport/comment_10_d1bb9a968329e889e05879001a2b41de._comment new file mode 100644 index 0000000000..49b17eb5f7 --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_10_d1bb9a968329e889e05879001a2b41de._comment @@ -0,0 +1,38 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 10""" + date="2025-07-07T17:29:16Z" + content=""" +I had suggested using the remote's configuration to determine the socket +that remotedaemon listens on. + +> Eg, a remote with uuid U could use .git/annex/p2p/U as its socket file. + +But it may be that only incoming connections are wanted to be served, +without having any remotes configured that use a P2P network. (And there +could be multiple remotes that use the same P2P network.) + +Instead, I think that remotedaemon should use socket files in the form +`.git/annex/p2p/$address`, for each P2P address that loadP2PAddresses +returns (except tor ones). + +There could be a `git-annex p2p --enable` command, which is passed +the P2P address to enable. Eg: + + git-annex p2p --enable p2p-annex::yggstack+somepubkey.pk.ygg + +That is similar to `git-annex enable-tor` in that it would run +`storeP2PAddress`. And so configure remotedaemon to listen on the socket +file for that address. + +It could also generate an AuthToken and output a version of the address +with the AuthToken included, similar to `git-annex p2p --gen-addresses`. + +That would let its output be communicated to the remote users, who can feed +it into `git-annex p2p --link`. For that matter, I think that `git-annex +p2p --pair` would also work. + +The address passed to `git-annex p2p --enable` could be anything, +but using a p2p-annex::foo address makes a `git-annex-p2p-foo` command be +used when connecting to the address. +"""]] diff --git a/doc/todo/generic_p2p_socket_transport/comment_8_eae9998285899a22b7619fc75c52e270._comment b/doc/todo/generic_p2p_socket_transport/comment_8_eae9998285899a22b7619fc75c52e270._comment new file mode 100644 index 0000000000..cf639281c8 --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_8_eae9998285899a22b7619fc75c52e270._comment @@ -0,0 +1,69 @@ +[[!comment format=mdwn + username="joey" + subject="""AuthTokens""" + date="2025-07-07T14:54:36Z" + content=""" +I wrote: + +> If the P2P protocol's AUTH is provided with an AuthToken, there would +> need to be an interface to record the one to use for a given p2p +> connection. + +But, as implemented `git-annex remotedaemon` will accept +any of the authtokens in its list for any p2p connection. So if there are +2 onion services for the same repository for some reason, there will be 2 +authtokens, but either can be used with either. + +If there are 2 P2P connections and you decide to stop listening to one of +them, it does mean that authtoken needs to be removed from the list, +otherwise someone could still use it with the other P2P connection. If we +think about 2 different P2P protocols, one might turn out to be insecure, +so you stop using it. But then if the insecurity allowed someone else to +observe the authtoken that was used with it, and you didn't remove it from +the list, they could use that to connect via the other P2P service. + +And the user does not know about authtokens, they're an implementation +detail currently. So expecting the user to remove them from the list isn't +really sufficient. + +So it seems better for each P2P address to have its own unique authtoken, +that is not accepted for any other address. Or at least each P2P address +that needs an authtoken; perhaps some don't. (I don't think it's a problem +that for tor each hidden service accepts all listed authtokens though.) + +@matrrs wrote: + +> A configuration option annex.start-p2psocket=true would instruct +> remotedaemon to listen on .git/annex/p2psocket (I think a hardcoded +> location is fine, as there only really needs to be one such socket even +> with multiple networks + +That single socket wouldn't work if each P2P address has its own unique +authtoken. Because remotedaemon would have no way to know what P2P address +that socket was connected with. + +It also could be that some P2P protocol is 100% certain not to need an +authtoken for security. That would need a separate socket where +remotedaemon does not require AUTH with a valid authtoken. Or, setting up +a P2P connection for such a network would need to exchange authtokens, even +though there is no security benefit in doing so. + +I don't know if I would want to make the determination of whether or not +some P2P protocol needs an authtoken or not. It may be that the security +situation of a P2P protocol evolves over time. +Consider the case of tor, where it used to be fairly trivially possible to +enumerate onion addresses. See for example +[this paper](https://pure.port.ac.uk/ws/files/11523722/paper.pdf). +(Which is why I made tor use AuthTokens in the first place IIRC.) +Apparently changes were later made to tor to prevent that. I don't know +how secure it is considered to be in this area now though. + +If `git-annex p2p` is used to set up the P2P connection, it handles +generating the authtokens and exchanging them, fairly transparently to the +user. So maybe it would be simplest to always require authtokens. + +There is another reason for the authtoken: The socket file may be +accessible by other users of the system. This is the case with the tor +socket, since tor runs as another user, and so the socket file is made +world writable. +"""]] diff --git a/doc/todo/generic_p2p_socket_transport/comment_9_3ec0a998f89533555c14a9745956f800._comment b/doc/todo/generic_p2p_socket_transport/comment_9_3ec0a998f89533555c14a9745956f800._comment new file mode 100644 index 0000000000..d2778b8a19 --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_9_3ec0a998f89533555c14a9745956f800._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2025-07-07T16:00:45Z" + content=""" +> A configuration option annex.expose-p2p-via=foo that could be supplied +> zero, one, or multiple times, and each of these configurations would +> instruct remotedaemon to start the external program +> git-annex-p2ptransport-foo after the p2p socket is ready + +Hmm, I don't know if it would generally make sense for remotedaemon to +start up external programs that run P2P networks. That might be something +that runs system-wide, like tor (often) does. Or the user might expect to +run it themselves and only have git-annex use it when it's running. + +It seems to me that in your yggstack example, there's no real need +for remotedaemon to be responsible for running +`git-annex-p2ptransport-yggstack`. You could run that yourself first. +Then the remotedaemon can create the socket file and listen to it. + +If a tcp connection comes in before the socket file exists, socat handles +it by closing that connection, and keeps listening for further +connections. +"""]]
correction
diff --git a/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment b/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment index 2780e6b75b..9367e9e323 100644 --- a/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment +++ b/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment @@ -20,8 +20,9 @@ the generic one could be "p2p-annex::<path-to-socket-file>". Or it could be `git-annex-p2p-foo <bar>` and talk to its stdin and stdout. That's for outgoing connections. For incoming connections, -for tor, the remotedaemon looks to see if the socket file exists and -if so it accepts connections from it. (That tor socket is not used for +for tor, the remotedaemon creates the socket file that tor is configured to +use for the hidden service, and listens to it +to accept connections from tor. (That tor socket is not used for outgoing connections.) It would be easy to generalize this to additional socket filenames. Eg, a remote with uuid U could use `.git/annex/p2p/U` as its socket file.
fix name of man page
diff --git a/doc/git-annex-enable-tor.mdwn b/doc/git-annex-enable-tor.mdwn index e56008ec0f..3da497233a 100644 --- a/doc/git-annex-enable-tor.mdwn +++ b/doc/git-annex-enable-tor.mdwn @@ -29,7 +29,7 @@ give other users access to your repository via the tor hidden service. [[git-annex]](1) -[[git-annex-p2p-auth]](1) +[[git-annex-p2p]](1) [[git-annex-remotedaemon]](1)
diff --git a/doc/bugs/S3_fsck_from_chunked_file__through_publicurl_fails.mdwn b/doc/bugs/S3_fsck_from_chunked_file__through_publicurl_fails.mdwn index 88ab4bb205..f0d972a4fb 100644 --- a/doc/bugs/S3_fsck_from_chunked_file__through_publicurl_fails.mdwn +++ b/doc/bugs/S3_fsck_from_chunked_file__through_publicurl_fails.mdwn @@ -10,7 +10,7 @@ When running fsck (fast or slow) from an S3 remote that has anonymous access and - set AWS creds as env vars - initremote S3 special remote with chunk size smaller than the file size and publicurl - push data (authenticated) to the S3 remote, ensure it get chunked. -- unset AWS auth vars (and remote .git/annex/creds if necessary) +- unset AWS auth vars (and remove .git/annex/creds if necessary) - fsck from remote should fail on chunked file
diff --git a/doc/bugs/S3_fsck_from_chunked_file__through_publicurl_fails.mdwn b/doc/bugs/S3_fsck_from_chunked_file__through_publicurl_fails.mdwn new file mode 100644 index 0000000000..88ab4bb205 --- /dev/null +++ b/doc/bugs/S3_fsck_from_chunked_file__through_publicurl_fails.mdwn @@ -0,0 +1,33 @@ +### Please describe the problem. + +When running fsck (fast or slow) from an S3 remote that has anonymous access and git-annex publicurl set, it fails only on the files that get chunked into multiple parts because they exceed the configured chunk size for the remote. + + +### What steps will reproduce the problem? + +- create repo +- annex a file +- set AWS creds as env vars +- initremote S3 special remote with chunk size smaller than the file size and publicurl +- push data (authenticated) to the S3 remote, ensure it get chunked. +- unset AWS auth vars (and remote .git/annex/creds if necessary) +- fsck from remote should fail on chunked file + + +### What version of git-annex are you using? On what operating system? + +git-annex version: 10.20250417-gfd493804c004f7facb4b99d4bf21ed49a081c5cf + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Yes indeed! :D
Added a comment
diff --git a/doc/todo/p2phttp__58___listen_on_unix_domain_sockets/comment_2_e075c37cbeab627a6a96dcfb1525e21d._comment b/doc/todo/p2phttp__58___listen_on_unix_domain_sockets/comment_2_e075c37cbeab627a6a96dcfb1525e21d._comment new file mode 100644 index 0000000000..187ed0fc5e --- /dev/null +++ b/doc/todo/p2phttp__58___listen_on_unix_domain_sockets/comment_2_e075c37cbeab627a6a96dcfb1525e21d._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 2" + date="2025-07-03T11:07:06Z" + content=""" +Unfortunately I wasn't able to make use of the multiple repositories feature because Forgejo stores repositories in nested directories (`<username-or-organisation>/<repository>`). Even if I was able to use that feature, using unix sockets would still feel cleaner and avoid some security concerns around running the p2phttp server with `--wideopen` (as-is it is accessible to all local users, with unix sockets permissions could be used to restrict it; but this is more of a theoretical concern, I am not aware of anyone running a Forgejo-aneksajo server on a host with untrusted users). + +According to <https://stackoverflow.com/questions/22621623/warp-binding-to-unix-domain-sockets> it should be possible to use warp's runSettingsSocket with a unix socket, instead of runSettings. I am not familiar enough with Haskell or git-annex to judge if there are other obstacles though... +"""]]
diff --git a/doc/forum/Check_export_or_force_re-export_to_special_remote.mdwn b/doc/forum/Check_export_or_force_re-export_to_special_remote.mdwn new file mode 100644 index 0000000000..c94b17263d --- /dev/null +++ b/doc/forum/Check_export_or_force_re-export_to_special_remote.mdwn @@ -0,0 +1,6 @@ +Hello. + +I'm using `git annex export` to export files to my android. It works, but sometimes I accidentally remove files from special remotes and want to get them back. +I repeat `git annex export master --to android` command, it says everything OK, but it doesn't actually transfer anything. + +How to force export of a tree to existing special remote?
add news item for git-annex 10.20250630
diff --git a/doc/news/version_10.20250320.mdwn b/doc/news/version_10.20250320.mdwn deleted file mode 100644 index 17bb165883..0000000000 --- a/doc/news/version_10.20250320.mdwn +++ /dev/null @@ -1,14 +0,0 @@ -git-annex 10.20250320 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Added the compute special remote. - * addcomputed: New command, adds a file that is generated by a compute - special remote. - * recompute: New command, recomputes computed files. - * findcomputed: New command, displays information about computed files. - * Support help.autocorrect settings "prompt", "never", and "immediate". - * Allow setting remote.foo.annex-tracking-branch to a branch name - that contains "/", as long as it's not a remote tracking branch. - * Added OsPath build flag, which speeds up git-annex's operations on files. - * git-lfs: Added an optional apiurl parameter. - (This needs version 1.2.5 of the haskell git-lfs library to be used.) - * fsck: Remember the files that are checked, so a later run with --more - will skip them, without needing to use --incremental."""]] \ No newline at end of file diff --git a/doc/news/version_10.20250630.mdwn b/doc/news/version_10.20250630.mdwn new file mode 100644 index 0000000000..7761138fea --- /dev/null +++ b/doc/news/version_10.20250630.mdwn @@ -0,0 +1,9 @@ +git-annex 10.20250630 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * Work around git 2.50 bug that caused it to crash when there is a merge + conflict with an unlocked annexed file. + * Skip and warn when a tree import includes empty filenames, + which can happen with eg a S3 bucket. + * Avoid a problem with temp file names ending in whitespace on + filesystems like VFAT that don't support such filenames. + * webapp: Rename "Upgrade Repository" to "Convert Repository" + to avoid confusion with git-annex upgrade."""]] \ No newline at end of file