Recent changes to this wiki:
diff --git a/doc/bugs/_Files_fail_to_download_from_NTFS_special_remote_w.mdwn b/doc/bugs/_Files_fail_to_download_from_NTFS_special_remote_w.mdwn new file mode 100644 index 0000000000..205aac9f5a --- /dev/null +++ b/doc/bugs/_Files_fail_to_download_from_NTFS_special_remote_w.mdwn @@ -0,0 +1,60 @@ +# Environment: +- git-annex running in WSL (ext4) +- Special remote on NTFS filesystem + +# Issue: +Files in special remote with exporttree/importtree enabled fail to download during fsck operations. The issue affects only some files, with no clear pattern between affected and unaffected files (occurs with both binary and text files). Steps to reproduce: + +- Set up git-annex repository in WSL (ext4) +- Configure special remote on NTFS with exporttree/importtree +- Run git annex fsck on files +- Some files fail with download error + +# Verification: + +Direct comparison shows files are identical: + +```sh +diff ~/mydir/work/info/templates/form.svg .git/annex/objects/qg/Vk/SHA256E-s2258--0faa4a5b2bbb98665d79741ce8e88aa0fe9fb526ba9990bc51b830e1d767c3fe.svg/SHA256E-s2258--0faa4a5b2bbb98665d79741ce8e88aa0fe9fb526ba9990bc51b830e1d767c3fe.svg +``` +(outputs nothing, files are identical) + +Current behavior: + +```sh +git annex fsck info/templates/form.svg -f specrepo +failed to download file from remote +failed +(recording state in git...) +fsck: 1 failed +```` +```sh +git annex get info/templates/form.svg -f specrepo +get info/templates/form.svg (from specrepo...) + file content has changed +failed +get: 1 failed +``` + +# Expected behavior: + +FSck should complete verification successfully. All files should download and verify correctly + +# Additional notes: +- Issue appears to be related to filesystem differences (NTFS vs ext4) +- Problem affects only some files, while others work correctly +- No clear pattern between affected and unaffected files (occurs with both binary and text files) +- Standard remediation steps (fsck, fix, reinject) do not resolve the issue + +# git annex version info +``` +git-annex version: 10.20220121-gdf6a8476e +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV +dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.26 DAV-1.3.4 feed-1.3.0.1 ghc-8.8.4 http-client-0.6.4.1 persistent-sqlite-2.10.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +```
diff --git a/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items.mdwn b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items.mdwn index bbe37ffaae..0ccbfbe953 100644 --- a/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items.mdwn +++ b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items.mdwn @@ -17,6 +17,3 @@ Yes, there is git annex diffdriver but it is a barrier to write a script for it. It would be a quality-of-life improvement if `git annex unlock` and `git annex fix` would do their job regardless whether the file is checked in to git or not. After all, the many git annex commands are hard to memorize in addition to the many git commands there already are, and this would make their usage more orthogonal to other commands and thus easier to understand. - -# Add local directors -git annex
diff --git a/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items.mdwn b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items.mdwn new file mode 100644 index 0000000000..bbe37ffaae --- /dev/null +++ b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items.mdwn @@ -0,0 +1,22 @@ +git annex requires files to be checked in to git before allowing the "unlock" and "fix" operation. + +This makes things needlessly complicated in non-orthodox usage situations. + +Example 1: The inadvertently added files. + +I add my directory to git annex, only to see some files that I do not want to add. In my case, it was the VSCode Remote Host Extension, which is not relevant for the reproducibility of my analysis. I Ctrl-C the adding process, and by `git rm -r --cached` undo the adding process, and commit. Then I add the directory to .gitignore. + +Yes, I know, I should have unlocked the added files first, but I forgot. Now I cannot do it anymore, because "git annex unlock IGNORED_FOLDER" says "pathspec "..." did not match any files known to git". But the solution is so simple, just copy the annex'ed content back to the original location. No information needs to be transmitted to git. I have do do it using a custom `find IGNORED_FOLDER -type l -execdir '....'` command. It would be awesome if git annex would just be graceful that (especially new) users can make such kind of mistakes and just unlock the ignored files. + +Example 2: Moving annex'ed files + +I want to compare two versions of a file. Maybe they have complicated history, so one of the files was in another folder. I restore the file from an earlier commit using `git checkout -- commit:file`. I move the old analysis to the new location temporarily to be able to diff them without typing all the folder names etc. Now I need to update the symlink target of the old, temporarily-checked-out file but `git annex fix` does not do it because it is not checked into git. I had this use-case in an analysis where I wanted to compare two result outputs, but the folder structure has changed in the meantime. + +Yes, there is git annex diffdriver but it is a barrier to write a script for it. It pays off but sometimes you need a simple solution to start and you need to focus on your problem at hand instead of on git annex details. I consider it would be easy for git annex fix to fix up the symlink target without any knowledge from git. Just update the number of (`../`s) in the symlink so that it matches again the position of the `.git` directory. + +It would be a quality-of-life improvement if `git annex unlock` and `git annex fix` would do their job regardless whether the file is checked in to git or not. After all, the many git annex commands are hard to memorize in addition to the many git commands there already are, and this would make their usage more orthogonal to other commands and thus easier to understand. + + + +# Add local directors +git annex
Added a comment: update 4: union is correct
diff --git a/doc/todo/Adding_unmatched_files_to_a_view/comment_7_951de7e01709d19a01175e44e3fc3b7f._comment b/doc/todo/Adding_unmatched_files_to_a_view/comment_7_951de7e01709d19a01175e44e3fc3b7f._comment new file mode 100644 index 0000000000..b5af951ae4 --- /dev/null +++ b/doc/todo/Adding_unmatched_files_to_a_view/comment_7_951de7e01709d19a01175e44e3fc3b7f._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="update 4: union is correct" + date="2024-12-20T07:27:43Z" + content=""" +After even closer inspection, `_` works as expected indeed. + +The reason why I was confused is that some folders with `tag-b` were appearing in `_` because this folder was containing `.jpg.jpg` probably produced by accident or are thumbnails generated by some app. +"""]]
Added a comment: update 3: strange union behavior
diff --git a/doc/todo/Adding_unmatched_files_to_a_view/comment_6_3b614fe1250aca7e12f41473f317ea00._comment b/doc/todo/Adding_unmatched_files_to_a_view/comment_6_3b614fe1250aca7e12f41473f317ea00._comment new file mode 100644 index 0000000000..c309555412 --- /dev/null +++ b/doc/todo/Adding_unmatched_files_to_a_view/comment_6_3b614fe1250aca7e12f41473f317ea00._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="update 3: strange union behavior" + date="2024-12-20T07:12:59Z" + content=""" +After closer inspection of the view described in the previous comment, seems like the content of `_` is NOT pure union of \"all except tag-a and tag-b\". + +I was hoping that the content of \"unset dir\" will be `_ = dir-c-children - (tag-a ∪ dir-c-children)∪(tag-b ∪ dir-c-children)`. +Or simply `_ = (dir-c-children ∪ !(tag-a ∪ tag-b)`. + +But what I'm observing is that `_` contains some dir-c-children that are also part of `tag-b` but not `tag-c` (`_ = dir-c-children ∪ !(tag-a)`). +"""]]
Added a comment: update: unset directory is cool feature, but not always work
diff --git a/doc/todo/Adding_unmatched_files_to_a_view/comment_5_4f67c6b5d57f23b66c8825392fa37a05._comment b/doc/todo/Adding_unmatched_files_to_a_view/comment_5_4f67c6b5d57f23b66c8825392fa37a05._comment new file mode 100644 index 0000000000..f6754c8329 --- /dev/null +++ b/doc/todo/Adding_unmatched_files_to_a_view/comment_5_4f67c6b5d57f23b66c8825392fa37a05._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="update: unset directory is cool feature, but not always work" + date="2024-12-20T06:52:56Z" + content=""" +After fixing long paths and running: + +```shell +git annex view ?tag tag=tag-a tag=tag-b 'dir-a/dir-b/dir-c/=*' +``` + +it does exactly what I need, and produces: + +``` +/ + _/ + dir-c-children-c + tag-a/ + dir-c-children-a + tag-b/ + dir-c-children-b +``` + +Haven't tried to use it with `tag=*` but for now it's even better as I'm mostly interested in `tag-a` and `tag-b`. +"""]]
Added a comment: unset directory is cool feature, but not always work
diff --git a/doc/todo/Adding_unmatched_files_to_a_view/comment_4_3ede457edaf2555e8cd9ea60d98f1ade._comment b/doc/todo/Adding_unmatched_files_to_a_view/comment_4_3ede457edaf2555e8cd9ea60d98f1ade._comment new file mode 100644 index 0000000000..9714c8cd27 --- /dev/null +++ b/doc/todo/Adding_unmatched_files_to_a_view/comment_4_3ede457edaf2555e8cd9ea60d98f1ade._comment @@ -0,0 +1,50 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="unset directory is cool feature, but not always work" + date="2024-12-19T17:49:52Z" + content=""" +Hi, + +First of all thanks a lot for implementing this feature! + +I was trying to combine tag-filter with a path-filter, though it doesn't do what I was expecting. + +## Case 1: When I filter by a directory + vadd tag then it kind of works + +```shell +git annex view 'dir-a/dir-b/dir-c/?=*' +git annex vadd tag?=* +``` + +Notice: `_` (unset dir) is showing after running `git annex view 'dir-a/dir-b/dir-c/?=*'` command. +But I'm getting \"path too long\" on `10.20241031` in vadd. + +## Case 2: When I do it like this (no paths too long errors) + +```shell +git annex view tag?=* 'dir-a/dir-b/dir-c/?=*' +``` + +It produces something like this: + +``` +/tag-a/ + dir-c-child-1 + dir-c-child-2 +/tag-b/ + dir-c-child-1 + dir-c-child-2 +``` + +It does filter directories as expected except `_` (unset directory) is missing from the root and subdirectories. + +## Case 3: it behaves 1:1 as case 2 (no paths too long errors) + +```shell +git annex view 'dir-a/dir-b/dir-c/=*' +git annex view tag?=* +``` + +The last case should illustrate why I need this - I want to have a list of files limited by path `dir-a/dir-b/dir-c/` (which also preserve subdirectories in `dir_c`, which I also need), and then I want to group directories by tag INCLUDING \"unset tag dir\" in order to then assign tags to subfolders of `dir_c` (for those that aren't already have tags). But despite I specified `tag?=*` (notice the question mark), the \"unset dir\" does not appear. +"""]]
annex.addunlocked support for tree imports
Honor annex.addunlocked configuration when importing a tree from a special
remote.
Note, in a --no-content import, the object file will not be populated
(usually) and so expressions that match on mime type will not match. Tested
this and it works ok, the file just ends up locked. Updated docs for the
mime expressions to mention that they can't match when the file is present
Note that in Command.Sync.pullThirdPartyPopulated, recordImportTree is
called without a AddUnlockedMatcher. Since the tree generated here is not
exposed to the user and does not contain usual filenames, there is no need
of the overhead of checking it.
Honor annex.addunlocked configuration when importing a tree from a special
remote.
Note, in a --no-content import, the object file will not be populated
(usually) and so expressions that match on mime type will not match. Tested
this and it works ok, the file just ends up locked. Updated docs for the
mime expressions to mention that they can't match when the file is present
Note that in Command.Sync.pullThirdPartyPopulated, recordImportTree is
called without a AddUnlockedMatcher. Since the tree generated here is not
exposed to the user and does not contain usual filenames, there is no need
of the overhead of checking it.
diff --git a/Annex/Import.hs b/Annex/Import.hs index 3cc068692c..587d866a96 100644 --- a/Annex/Import.hs +++ b/Annex/Import.hs @@ -107,9 +107,10 @@ buildImportCommit :: Remote -> ImportTreeConfig -> ImportCommitConfig + -> AddUnlockedMatcher -> Imported -> Annex (Maybe Ref) -buildImportCommit remote importtreeconfig importcommitconfig imported = +buildImportCommit remote importtreeconfig importcommitconfig addunlockedmatcher imported = case importCommitTracking importcommitconfig of Nothing -> go Nothing Just trackingcommit -> inRepo (Git.Ref.tree trackingcommit) >>= \case @@ -117,7 +118,7 @@ buildImportCommit remote importtreeconfig importcommitconfig imported = Just _ -> go (Just trackingcommit) where go trackingcommit = do - (importedtree, updatestate) <- recordImportTree remote importtreeconfig imported + (importedtree, updatestate) <- recordImportTree remote importtreeconfig (Just addunlockedmatcher) imported buildImportCommit' remote importcommitconfig trackingcommit importedtree >>= \case Just finalcommit -> do updatestate @@ -132,10 +133,11 @@ buildImportCommit remote importtreeconfig importcommitconfig imported = recordImportTree :: Remote -> ImportTreeConfig + -> Maybe AddUnlockedMatcher -> Imported -> Annex (History Sha, Annex ()) -recordImportTree remote importtreeconfig imported = do - importedtree@(History finaltree _) <- buildImportTrees basetree subdir imported +recordImportTree remote importtreeconfig addunlockedmatcher imported = do + importedtree@(History finaltree _) <- buildImportTrees basetree subdir addunlockedmatcher imported return (importedtree, updatestate finaltree) where basetree = case importtreeconfig of @@ -177,7 +179,7 @@ recordImportTree remote importtreeconfig imported = do } return oldexport - -- downloadImport takes care of updating the location log + -- importKeys takes care of updating the location log -- for the local repo when keys are downloaded, and also updates -- the location log for the remote for keys that are present in it. -- That leaves updating the location log for the remote for keys @@ -283,11 +285,12 @@ buildImportCommit' remote importcommitconfig mtrackingcommit imported@(History t buildImportTrees :: Ref -> Maybe TopFilePath + -> Maybe AddUnlockedMatcher -> Imported -> Annex (History Sha) -buildImportTrees basetree msubdir (ImportedFull imported) = - buildImportTreesGeneric convertImportTree basetree msubdir imported -buildImportTrees basetree msubdir (ImportedDiff (LastImportedTree oldtree) imported) = do +buildImportTrees basetree msubdir addunlockedmatcher (ImportedFull imported) = + buildImportTreesGeneric (convertImportTree addunlockedmatcher) basetree msubdir imported +buildImportTrees basetree msubdir addunlockedmatcher (ImportedDiff (LastImportedTree oldtree) imported) = do importtree <- if null (importableContents imported) then pure oldtree else applydiff @@ -312,7 +315,7 @@ buildImportTrees basetree msubdir (ImportedDiff (LastImportedTree oldtree) impor oldtree mktreeitem (loc, DiffChanged v) = - Just <$> mkImportTreeItem msubdir loc v + Just <$> mkImportTreeItem addunlockedmatcher msubdir loc v mktreeitem (_, DiffRemoved) = pure Nothing @@ -320,17 +323,26 @@ buildImportTrees basetree msubdir (ImportedDiff (LastImportedTree oldtree) impor isremoved (_, v) = v == DiffRemoved -convertImportTree :: Maybe TopFilePath -> [(ImportLocation, Either Sha Key)] -> Annex Tree -convertImportTree msubdir ls = - treeItemsToTree <$> mapM (uncurry $ mkImportTreeItem msubdir) ls +convertImportTree :: Maybe AddUnlockedMatcher -> Maybe TopFilePath -> [(ImportLocation, Either Sha Key)] -> Annex Tree +convertImportTree maddunlockedmatcher msubdir ls = + treeItemsToTree <$> mapM (uncurry $ mkImportTreeItem maddunlockedmatcher msubdir) ls -mkImportTreeItem :: Maybe TopFilePath -> ImportLocation -> Either Sha Key -> Annex TreeItem -mkImportTreeItem msubdir loc v = case v of - Right k -> do - relf <- fromRepo $ fromTopFilePath topf - symlink <- calcRepo $ gitAnnexLink relf k - linksha <- hashSymlink symlink - return $ TreeItem treepath (fromTreeItemType TreeSymlink) linksha +mkImportTreeItem :: Maybe AddUnlockedMatcher -> Maybe TopFilePath -> ImportLocation -> Either Sha Key -> Annex TreeItem +mkImportTreeItem maddunlockedmatcher msubdir loc v = case v of + Right k -> case maddunlockedmatcher of + Nothing -> mklink k + Just addunlockedmatcher -> do + objfile <- calcRepo (gitAnnexLocation k) + let mi = MatchingFile FileInfo + { contentFile = objfile + , matchFile = getTopFilePath topf + , matchKey = Just k + } + ifM (checkAddUnlockedMatcher NoLiveUpdate addunlockedmatcher mi) + ( mkpointer k + , mklink k + ) + Left sha -> return $ TreeItem treepath (fromTreeItemType TreeFile) sha where @@ -338,6 +350,13 @@ mkImportTreeItem msubdir loc v = case v of treepath = asTopFilePath lf topf = asTopFilePath $ maybe lf (\sd -> getTopFilePath sd P.</> lf) msubdir + mklink k = do + relf <- fromRepo $ fromTopFilePath topf + symlink <- calcRepo $ gitAnnexLink relf k + linksha <- hashSymlink symlink + return $ TreeItem treepath (fromTreeItemType TreeSymlink) linksha + mkpointer k = TreeItem treepath (fromTreeItemType TreeFile) + <$> hashPointerFile k {- Builds a history of git trees using ContentIdentifiers. - @@ -604,8 +623,8 @@ getLastImportedTree remote = do - generates Keys without downloading. - - Generates either a Key or a git Sha, depending on annex.largefiles. - - But when importcontent is False, it cannot match on annex.largefiles - - (or generate a git Sha), so always generates Keys. + - But when importcontent is False, it cannot generate a git Sha, + - so always generates Keys. - - Supports concurrency when enabled. - diff --git a/CHANGELOG b/CHANGELOG index 0e2f177b0b..11d27b8c86 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -11,6 +11,8 @@ git-annex (10.20241203) UNRELEASED; urgency=medium default unset behavior. * sync: Avoid misleading warning about future preferred content transition when preferred content is set to "". + * Honor annex.addunlocked configuration when importing a tree from a + special remote. -- Joey Hess <id@joeyh.name> Mon, 02 Dec 2024 13:41:31 -0400 diff --git a/Command/Import.hs b/Command/Import.hs index f06543bb7e..c35055927e 100644 --- a/Command/Import.hs +++ b/Command/Import.hs @@ -147,8 +147,10 @@ seek o@(RemoteImportOptions {}) = startConcurrency commandStages $ do (pure Nothing) (Just <$$> inRepo . toTopFilePath . toRawFilePath) (importToSubDir o) + addunlockedmatcher <- addUnlockedMatcher seekRemote r (importToBranch o) subdir (importContent o) (checkGitIgnoreOption o) + addunlockedmatcher (messageOption o) startLocal :: ImportOptions -> AddUnlockedMatcher -> GetFileMatcher -> DuplicateMode -> (RawFilePath, RawFilePath) -> CommandStart @@ -322,8 +324,8 @@ verifyExisting key destfile (yes, no) = do verifyEnoughCopiesToDrop [] key Nothing Nothing needcopies mincopies [] preverified tocheck (const yes) no -seekRemote :: Remote -> Branch -> Maybe TopFilePath -> Bool -> CheckGitIgnore -> [String] -> CommandSeek -seekRemote remote branch msubdir importcontent ci importmessages = do +seekRemote :: Remote -> Branch -> Maybe TopFilePath -> Bool -> CheckGitIgnore -> AddUnlockedMatcher -> [String] -> CommandSeek +seekRemote remote branch msubdir importcontent ci addunlockedmatcher importmessages = do importtreeconfig <- case msubdir of Nothing -> return ImportTree Just subdir -> @@ -337,7 +339,7 @@ seekRemote remote branch msubdir importcontent ci importmessages = do trackingcommit <- fromtrackingbranch Git.Ref.sha cmode <- annexCommitMode <$> Annex.getGitConfig let importcommitconfig = ImportCommitConfig trackingcommit cmode importmessages' - let commitimport = commitRemote remote branch tb trackingcommit importtreeconfig importcommitconfig + let commitimport = commitRemote remote branch tb trackingcommit importtreeconfig importcommitconfig addunlockedmatcher importabletvar <- liftIO $ newTVarIO Nothing void $ includeCommandAction (listContents remote importtreeconfig ci importabletvar) @@ -383,10 +385,10 @@ listContents' remote importtreeconfig ci a = , err ] -commitRemote :: Remote -> Branch -> RemoteTrackingBranch -> Maybe Sha -> ImportTreeConfig -> ImportCommitConfig -> Imported -> CommandStart -commitRemote remote branch tb trackingcommit importtreeconfig importcommitconfig imported = +commitRemote :: Remote -> Branch -> RemoteTrackingBranch -> Maybe Sha -> ImportTreeConfig -> ImportCommitConfig -> AddUnlockedMatcher -> Imported -> CommandStart +commitRemote remote branch tb trackingcommit importtreeconfig importcommitconfig addunlockedmatcher imported = starting "update" ai si $ do - importcommit <- buildImportCommit remote importtreeconfig importcommitconfig imported + importcommit <- buildImportCommit remote importtreeconfig importcommitconfig addunlockedmatcher imported next $ updateremotetrackingbranch importcommit where ai = ActionItemOther (Just $ UnquotedString $ fromRef $ fromRemoteTrackingBranch tb) diff --git a/Command/Sync.hs b/Command/Sync.hs index c9436778bd..5b2fa3c380 100644 --- a/Command/Sync.hs (Diff truncated)
comment and close
diff --git a/doc/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct.mdwn b/doc/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct.mdwn index 1afa94d559..0b9012cfdd 100644 --- a/doc/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct.mdwn +++ b/doc/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct.mdwn @@ -65,3 +65,5 @@ disableWildcardExpansion r = r [[!meta author=yoh]] [[!tag projects/openneuro]] + +> [[wontfix|done]] --[[Joey]] diff --git a/doc/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct/comment_2_ec87ff6b645d0ea78fe06d7b59fa89f5._comment b/doc/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct/comment_2_ec87ff6b645d0ea78fe06d7b59fa89f5._comment new file mode 100644 index 0000000000..26aaeb5b99 --- /dev/null +++ b/doc/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct/comment_2_ec87ff6b645d0ea78fe06d7b59fa89f5._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2024-12-18T23:45:32Z" + content=""" +You can accomplish the same thing as that pathspec with +`--include=*.nii.gz` and it works on all git-annex commands and +is a much richer language than git's pathspecs that can do a lot more +besides. + +So, I think it would be redundant to support git's pathspecs, and am going +to close this bug. +"""]]
close since --include works
diff --git a/doc/todo/get__58___allow_for_both_--branch_and_pathspec.mdwn b/doc/todo/get__58___allow_for_both_--branch_and_pathspec.mdwn index e80cfb14bd..9c0156369a 100644 --- a/doc/todo/get__58___allow_for_both_--branch_and_pathspec.mdwn +++ b/doc/todo/get__58___allow_for_both_--branch_and_pathspec.mdwn @@ -6,3 +6,5 @@ yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop$ git annex get --branch e6888f70ed97 git-annex: Can only specify one of file names, --all, --branch, --unused, --failed, --key, or --incomplete ``` + +> [[closing|done]] --[[Joey]] diff --git a/doc/todo/get__58___allow_for_both_--branch_and_pathspec/comment_2_e5864c4befa0689b25d95eb17d0315c9._comment b/doc/todo/get__58___allow_for_both_--branch_and_pathspec/comment_2_e5864c4befa0689b25d95eb17d0315c9._comment new file mode 100644 index 0000000000..500e6dc3b4 --- /dev/null +++ b/doc/todo/get__58___allow_for_both_--branch_and_pathspec/comment_2_e5864c4befa0689b25d95eb17d0315c9._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2024-12-18T23:40:52Z" + content=""" +Aha, --include and --exclude options work with --branch. So: + + git annex get --branch e6888f70ed97099f83a77d5bcf3372a9a75a2b5e^ --include=*/*.nii.gz' + +No need for git's pagespecs when we have [[git-annex-matching-options]]! +"""]]
Added a comment: Sync git-annex metadata subset with S3 metadata.
diff --git a/doc/special_remotes/S3/comment_39_3b4360ac0e30b089533f76d1d9c7eb95._comment b/doc/special_remotes/S3/comment_39_3b4360ac0e30b089533f76d1d9c7eb95._comment new file mode 100644 index 0000000000..77df892242 --- /dev/null +++ b/doc/special_remotes/S3/comment_39_3b4360ac0e30b089533f76d1d9c7eb95._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="Basile.Pinsard" + avatar="http://cdn.libravatar.org/avatar/87e1f73acf277ad0337b90fc0253c62e" + subject="Sync git-annex metadata subset with S3 metadata." + date="2024-12-18T19:34:59Z" + content=""" +How feasible would it be to be able to configure the remote so that a git-annex metadata get pushed to the S3 with the objects. +Something like `sync-meta=mymetafield` that would set `x-amz-meta-mymetafield=` to the value of `git-annex metadata -g mymetafield --key theobjectkey`, whenever the data is pushed to the S3. +Thanks! +"""]]
comments
diff --git a/doc/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct/comment_1_0f281fa20ca777f77b504bc7f2fb2b22._comment b/doc/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct/comment_1_0f281fa20ca777f77b504bc7f2fb2b22._comment new file mode 100644 index 0000000000..c1159e30fc --- /dev/null +++ b/doc/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct/comment_1_0f281fa20ca777f77b504bc7f2fb2b22._comment @@ -0,0 +1,53 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2024-12-18T17:11:33Z" + content=""" +The "did not match any file(s) known to git" message is output by git, not +git-annex. + +I tracked down why git-annex uses --literal-pathspecs. In +[[!commit f35d0bf4b2d7ada897b66620ff94d4068badd90b]], a particular +problem mentioned is that `git-annex add *.jpeg`, in a case where there +are no such files, would add `foo/bar.jpeg` due to git ls-files default +behavior: + + joey@darkstar:~/tmp/a1>git ls-files --others *.jpeg + subdir/bar.jpeg + +Which was very surprising and did not seem desirable for `git-annex add`. +(Let alone for a command like `git-annex drop --force`!) + +Although `git add` does in fact behave that way, which surprised me: + + joey@darkstar:~/tmp/a1>touch subdir/foo.txt + joey@darkstar:~/tmp/a1>git add '*.txt' + joey@darkstar:~/tmp/a1>git status + On branch master + Changes to be committed: + (use "git restore --staged <file>..." to unstage) + new file: subdir/foo.txt + +`git add` is documented to behave that way, as are some other commands +like `git rm` (!). But git-annex is not, its commands are documented to +operate on filenames or paths. So I don't think this is really a bug. + +As to providing a way to enable non-literal pathspecs, since git +has `GIT_GLOB_PATHSPECS` and `GIT_ICASE_PATHSPECS`, checking for those +and removing --literal-pathspecs would be one way. But then it risks +unexpected behavior if the git-annex version is too old. So a command-line +option seems maybe better. + +But, I do consider it an implementation detail that git-annex uses +`git ls-files` for some commands. Who knows, there may eventually be a +reason to change that. Making this configurable would lock in use of +ls-files. + +There are also situations where git-annex does not use ls-files, which +would all need to be covered in the documentation when implementing this. +The one that comes to mind is `--batch` which doesn't recurse +trees at all. + +Of course, `git ls-files <pathspec> | git-annex foo --batch` is a way you +can operate on a pathspec without any changes. +"""]] diff --git a/doc/todo/get__58___allow_for_both_--branch_and_pathspec/comment_1_8b79b81418a79a598d14c3a6db1a427f._comment b/doc/todo/get__58___allow_for_both_--branch_and_pathspec/comment_1_8b79b81418a79a598d14c3a6db1a427f._comment new file mode 100644 index 0000000000..5baae9ca69 --- /dev/null +++ b/doc/todo/get__58___allow_for_both_--branch_and_pathspec/comment_1_8b79b81418a79a598d14c3a6db1a427f._comment @@ -0,0 +1,21 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2024-12-18T19:10:50Z" + content=""" +Note that `git-annex` intentionally does not operate on pathspecs, +which is being discussed in +<https://git-annex.branchable.com/bugs/get__47__metadata__47____63____63____63____58___does_not_handle_pathspec_correct/> + +It is possible to use eg `git-annex get --branch foo:subdir/` to operate +on a subdirectory, which is enough in many situations. +But what you're looking for is pathspec style filtering. +I do see the benefit. + +`git ls-tree` also doesn't have a way to filter by pathspec, and that's +what `--branch` uses. So it would require git-annex reimplement git's +pathspecs, which seem complicated and not very well documented. Or there +would need to be a way to pass the paths through some other git command +to handle the pathspec. I don't know what git command might be able to be +used to do that. +"""]]
Added a comment: thank you!
diff --git a/doc/forum/Compute_space_required_for_a_git_annex_get_--auto__63__/comment_2_128ed8907399febc2817520e4ed9434f._comment b/doc/forum/Compute_space_required_for_a_git_annex_get_--auto__63__/comment_2_128ed8907399febc2817520e4ed9434f._comment new file mode 100644 index 0000000000..6a025dfe9c --- /dev/null +++ b/doc/forum/Compute_space_required_for_a_git_annex_get_--auto__63__/comment_2_128ed8907399febc2817520e4ed9434f._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="eugen" + avatar="http://cdn.libravatar.org/avatar/7e7e5700d7017735fd00c2dfcd3f91e4" + subject="thank you!" + date="2024-12-18T18:01:03Z" + content=""" +I've just got the chance to test it and yes, that command works. I didn't know about the `--want-get` flag ( git-annex-matching-options(1) ). Also the `--fast` flag for the `git annex info` would make me run that command more often now :). +Thanks! +"""]]
fix comment display
diff --git a/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment b/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment index 5c6307cb51..2780e6b75b 100644 --- a/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment +++ b/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment @@ -42,4 +42,4 @@ address using such an url, like `git-annex enable-tor` does. Seems pretty close to a workable design to me, but I don't know how well it will match up with these various kinds of P2P networks. -""]] +"""]]
update
diff --git a/doc/thanks/list b/doc/thanks/list index 4dafc97d54..e493dc436b 100644 --- a/doc/thanks/list +++ b/doc/thanks/list @@ -123,3 +123,4 @@ Marco, schodet, oz, Lilia.Nanne, +Dusty Mabe,
Added a comment
diff --git a/doc/forum/Requesting_of_files_across_disconnected_devices/comment_3_48b8108fa3fd16ef72c5beeb0765e5c5._comment b/doc/forum/Requesting_of_files_across_disconnected_devices/comment_3_48b8108fa3fd16ef72c5beeb0765e5c5._comment new file mode 100644 index 0000000000..406b747057 --- /dev/null +++ b/doc/forum/Requesting_of_files_across_disconnected_devices/comment_3_48b8108fa3fd16ef72c5beeb0765e5c5._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 3" + date="2024-12-16T22:48:16Z" + content=""" +There is a standard group called \"transfer\" which is meant for this kind of thing: <https://git-annex.branchable.com/preferred_content/standard_groups/>. This is especially applicable if there is a static preferred content expression that can be written for each repository (i.e. no ad-hoc gets, just something more structured). + +To make it more dynamic you could include a match on a metadata tag in a repositories preferred content expression. Requesting a file would then be setting the tag on it (well, and a bunch of syncing in all repositories). +"""]]
Added a comment
diff --git a/doc/forum/clone_and_initialize_with_a_given_uuid/comment_1_828e5d92ec03594170d7ac52d346533d._comment b/doc/forum/clone_and_initialize_with_a_given_uuid/comment_1_828e5d92ec03594170d7ac52d346533d._comment new file mode 100644 index 0000000000..a00f64e35f --- /dev/null +++ b/doc/forum/clone_and_initialize_with_a_given_uuid/comment_1_828e5d92ec03594170d7ac52d346533d._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 1" + date="2024-12-16T22:19:34Z" + content=""" +It _looks_ like you can just set annex.uuid before the first `git annex init` to achieve this: + +``` +git init / git clone +git config annex.uuid 00000000-0000-0000-0000-000000000003 +git annex init +``` + +But I would say that doing so is ill-advised. You can set a description for each repository and give the remotes descriptive names instead. If you use shared UUIDs you will run into an issue if it ever happens that two of those repositories become connected. +"""]]
diff --git a/doc/bugs/Installation_error_on_android.mdwn b/doc/bugs/Installation_error_on_android.mdwn new file mode 100644 index 0000000000..3e1fbc1638 --- /dev/null +++ b/doc/bugs/Installation_error_on_android.mdwn @@ -0,0 +1,76 @@ +### Please describe the problem. + +Following the installation instructions for android (termux), I get an error while sourcing git-annex-install: + +``` +Running on Android.. Tuning for optimal behavior. +sed: can't read /data/data/com.termux/files/home/git-annex.linux/git-remote-annex: No such file or directory +``` + +I can confirm that git-remote-annex is indeed missing in that directory. + +### What steps will reproduce the problem? + +``` +pkg install wget +wget https://git-annex.branchable.com/install/Android/git-annex-install +source git-annex-install +``` + +### What version of git-annex are you using? On what operating system? + +None yet x) and on a freshly updated termux. + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log +~ $ wget https://git-annex.branchable.com/install/Android/git-annex-install +source git-annex-install +--2024-12-16 23:01:13-- https://git-annex.branchable.com/install/Android/git-annex-install +Resolving git-annex.branchable.com (git-annex.branchable.com)... 2600:3c03::f03c:91ff:fedf:c0e5, 66.228.46.55 +Connecting to git-annex.branchable.com (git-annex.branchable.com)|2600:3c03::f03c:91ff:fedf:c0e5|:443... connected. +HTTP request sent, awaiting response... 200 OK +Length: 1470 (1.4K) +Saving to: ‘git-annex-install’ + +git-annex-ins 100% 1.44K --.-KB/s in 0s + +2024-12-16 23:01:14 (194 MB/s) - ‘git-annex-install’ saved [1470/1470] + +Installing dependencies with termux pkg manager... +Checking availability of current mirror: +[*] https://ftp.fau.de/termux/termux-main: ok +Reading package lists... Done +Building dependency tree... Done +Reading state information... Done +git is already the newest version (2.47.1). +wget is already the newest version (1.25.0). +tar is already the newest version (1.35). +coreutils is already the newest version (9.5-3). +proot is already the newest version (5.1.107-65). +0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. +Downloading git-annex... +--2024-12-16 23:01:14-- https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-arm64-ancient.tar.gz +Resolving downloads.kitenet.net (downloads.kitenet.net)... 2600:3c03::f03c:91ff:fe73:b0d2, 66.228.36.95 +Connecting to downloads.kitenet.net (downloads.kitenet.net)|2600:3c03::f03c:91ff:fe73:b0d2|:443... connected. +HTTP request sent, awaiting response... 200 OK +Length: 57553624 (55M) [application/x-gzip] +Saving to: ‘STDOUT’ + +- 100% 54.89M 8.16MB/s in 11s + +2024-12-16 23:01:25 (5.18 MB/s) - written to stdout [57553624/57553624] + +Running on Android.. Tuning for optimal behavior. +sed: can't read /data/data/com.termux/files/home/git-annex.linux/git-remote-annex: No such file or directory + +[Process completed (code 2) - press Enter] + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +
Added a comment
diff --git a/doc/todo/support_--not_--unused/comment_1_1d3e1d89535b72c185cb93c4aeae0ccb._comment b/doc/todo/support_--not_--unused/comment_1_1d3e1d89535b72c185cb93c4aeae0ccb._comment new file mode 100644 index 0000000000..9775eadfa8 --- /dev/null +++ b/doc/todo/support_--not_--unused/comment_1_1d3e1d89535b72c185cb93c4aeae0ccb._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="Doable8234" + avatar="http://cdn.libravatar.org/avatar/b0d5fea745f92c3b8cc8ecc3dafa6278" + subject="comment 1" + date="2024-12-16T08:24:31Z" + content=""" +I've absolutely no idea about the relative difficulty of implementing these, but it sounds to me like your second part `It would also perhaps be good to detect when matching options are used that don't make sense, and error out on commands like git-annex find --not or git-annex find -and -(` might actually be more important than the first! +"""]]
Added a comment
diff --git a/doc/forum/Requesting_of_files_across_disconnected_devices/comment_2_8ff41c2d22d49feb7ce8af7feaff7914._comment b/doc/forum/Requesting_of_files_across_disconnected_devices/comment_2_8ff41c2d22d49feb7ce8af7feaff7914._comment new file mode 100644 index 0000000000..8a0f9fd7e0 --- /dev/null +++ b/doc/forum/Requesting_of_files_across_disconnected_devices/comment_2_8ff41c2d22d49feb7ce8af7feaff7914._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="Doable8234" + avatar="http://cdn.libravatar.org/avatar/b0d5fea745f92c3b8cc8ecc3dafa6278" + subject="comment 2" + date="2024-12-16T08:20:43Z" + content=""" +I've thought about this exact use case, though I never actually used it yet. One simple way to do this could be by using git annex preferred content settings. In the nodes that push out content, all you need to do is set up a cron job for `git annex sync --content`. Now you can make it push content wherever you want by adjusting the preferred content settings. +"""]]
Added a comment
diff --git a/doc/forum/Compute_space_required_for_a_git_annex_get_--auto__63__/comment_1_a9160519a19f35ce6bb1cc555d7112b7._comment b/doc/forum/Compute_space_required_for_a_git_annex_get_--auto__63__/comment_1_a9160519a19f35ce6bb1cc555d7112b7._comment new file mode 100644 index 0000000000..ccb0defd56 --- /dev/null +++ b/doc/forum/Compute_space_required_for_a_git_annex_get_--auto__63__/comment_1_a9160519a19f35ce6bb1cc555d7112b7._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 1" + date="2024-12-15T23:39:33Z" + content=""" +Something like this should get you the answer: `git annex info --fast . --not --in here --and --want-get` (adapted from the example here: <https://git-annex.branchable.com/git-annex-info/>). +"""]]
Added a comment
diff --git a/doc/todo/generic_p2p_socket_transport/comment_4_8e4b4f476284b0a9f77b3ebf158b5c34._comment b/doc/todo/generic_p2p_socket_transport/comment_4_8e4b4f476284b0a9f77b3ebf158b5c34._comment new file mode 100644 index 0000000000..0d0f7ce839 --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_4_8e4b4f476284b0a9f77b3ebf158b5c34._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0" + subject="comment 4" + date="2024-12-15T18:13:00Z" + content=""" +One more thought: the proposed `p2p-annex::foo+<addr>` remote makes one assumption that I don't think holds for all thinkable p2p transports. That assumption is that there is a public address for the server-side that can be trusted to be the expected other side. + +For tor and yggstack this does hold: the public address (onion address of the hidden service for tor and the IPv6 derived from the public key of the yggstack peer (potentially resolved from a .pk.ygg DNS entry like above), respectively) ensures that the server side is who they are expected to be. There is no way for a third-party to pretend that they were the server-side, even if they knew the git remote string, because they would need to have the servers private key to do so. + +This is not the case for fowl: with fowl one would essentially do `fowl <psk> ...` on both sides to create a tunnel between server and client. If the PSK were fully contained in the remote string then a third-party getting hold of that string could pretend to be the server (when the server side is currently not waiting for a connection itself) and steal the auth token from the client. So under the assumption that the remote string is not a secret this would be a problem. + +But this problem can be overcome: with fowl both sides could simply derive the psk from the p2p auth token to establish the connection, essentially like so: `fowl <number derived from auth token>-<auth token> ...`. The git remote string would only need to contain the information to use fowl and some unique identifier for the remote then, so that the right auth token can be taken from .git/annex/creds. + +Likewise, for other p2p transports that don't have stable and secure public addresses, necessary information exchange could also happen over magic-wormhole using the auth tokens, or the auth tokens could be used as PSKs between both sides if that's what the transport needs. This would e.g. apply for a hypothetical transport over webrtc data channels, where some kind of \"SDP\" has to be exchanged between both sides to establish a connection. + +--- + +All that to say: I think `p2p-annex::foo+` would indeed be general enough for many conceivable means of transport, if a re-use of the auth tokens in the above fashion would be acceptable. And I can't think of anything against it, yet. +"""]]
diff --git a/doc/forum/Compute_space_required_for_a_git_annex_get_--auto__63__.mdwn b/doc/forum/Compute_space_required_for_a_git_annex_get_--auto__63__.mdwn new file mode 100644 index 0000000000..f75b843310 --- /dev/null +++ b/doc/forum/Compute_space_required_for_a_git_annex_get_--auto__63__.mdwn @@ -0,0 +1 @@ +Before I run a command that get new content in a repository -- especially with the --auto flag -- is there a way to find out the size of the data to be copied? My case is simple. I'm just using USB sticks/drives. But I never know if the space is enough for the next `get --auto` command...
Added a comment
diff --git a/doc/forum/Unable_to_delete_preferred-content.log/comment_4_f8254b4671a4fae5b51398cddcc190c2._comment b/doc/forum/Unable_to_delete_preferred-content.log/comment_4_f8254b4671a4fae5b51398cddcc190c2._comment new file mode 100644 index 0000000000..6a3dc152e7 --- /dev/null +++ b/doc/forum/Unable_to_delete_preferred-content.log/comment_4_f8254b4671a4fae5b51398cddcc190c2._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="Doable8234" + avatar="http://cdn.libravatar.org/avatar/b0d5fea745f92c3b8cc8ecc3dafa6278" + subject="comment 4" + date="2024-12-14T08:15:22Z" + content=""" +Thanks, Joey. That seems to work based on my testing. Appreciate the quick and precise response! + +Fixing my actual repo will have to wait since one of my nodes is now offline, but hopefully that goes off without a glitch. + +Also just want to say how awesome git annex is. I've been using it for nearly 10 years now and don't see myself ever wanting to stop. +"""]]
Added a comment
diff --git a/doc/todo/generic_p2p_socket_transport/comment_3_0a1208f17265ff77cd3956da22439e4b._comment b/doc/todo/generic_p2p_socket_transport/comment_3_0a1208f17265ff77cd3956da22439e4b._comment new file mode 100644 index 0000000000..1e9b0f03c0 --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_3_0a1208f17265ff77cd3956da22439e4b._comment @@ -0,0 +1,83 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de" + subject="comment 3" + date="2024-12-13T22:02:14Z" + content=""" +Your comment seems to be wrongly formatted. It was shown correctly in the notification mail, but doesn't show up here. + +--- + +Just to document what I have tried out, for completeness: with what is already in place it is possible to connect two repositories over yggstack, it is just very awkward. + +On one system you can do: + +- `sudo mkdir /etc/tor && sudo touch /etc/tor/torrc` (without actually having tor installed) +- `sudo git annex enable-tor $(id -u)` +- `yggstack -genconf > yggstack.conf` +- `echo tor-annex::<pubkey>.pk.ygg:12345` (take the pubkey out of yggstack.conf) +- `socat TCP-LISTEN:12345,fork,reuseaddr UNIX-CONNECT:/var/lib/tor-annex/<uid>_<repo-uuid>/s` +- `yggstack -useconffile yggstack.conf -remote-tcp 12345:127.0.0.1:12345` +- `git annex p2p --gen-addresses` + +On the other system do: + +- `yggstack -autoconf -socks 127.0.0.1:9050` +- `git annex p2p --link` and paste in the generated address when asked (it should have the form `tor-annex::<pubkey>.pk.ygg:12345:<auth-token>`) + +On the server side this simply exposes the p2p socket generated for tor through a different means, and on the client side this works because yggstack can be used similarly enough to tor (doing name resolution through the socks proxy at port 9050 and then connecting the supplied port). + +--- + +I really like your proposal of a `p2p-annex::foo+<whatever>` remote; together with a way to tell remotedaemon to start a process exposing the socket it would make for an easily extendable mechanism. Imagine this: + +Client side: + +- `p2p-annex::foo+<addr>` would start `git-annex-p2p-foo <addr>` and talk to its stdin/stdout. + +Server side: + +- A configuration option `annex.start-p2psocket=true` would instruct remotedaemon to listen on .git/annex/p2psocket (I think a hardcoded location is fine, as there only really needs to be one such socket even with multiple networks, and somewhere under .git/annex is a good location to associate it with the repository and will always be writable by the user). +- A configuration option `annex.expose-p2p-via=foo` that could be supplied zero, one, or multiple times, and each of these configurations would instruct remotedaemon to start the external program git-annex-p2ptransport-foo after the p2p socket is ready (this configuration could also just point to a command to execute, but I thought it might be nice to stay with the theme of commonly prefixed programs). + +With these things in place a third-party package git-annex-p2p-yggstack could provide a simple set of shell scripts to implement transport over yggstack: + +For the server side there would be a `git-annex-p2ptransport-yggstack` along these lines (modulo proper process cleanup of course): + +``` +socat TCP-LISTEN:12345,fork,reuseaddr UNIX-CONNECT:.git/annex/p2psocket & +yggstack -useconffile .git/annex/p2ptransport/yggstack/yggstack.conf -remote-tcp 12345:127.0.0.1:12345 +``` + +and a `git-annex-p2ptransport-enable-yggstack` like this: + +``` +git config --local annex.start-p2psocket true +git config --local --add annex.expose-p2p-via yggstack +if [ ! -f .git/annex/p2ptransport/yggstack/yggstack.conf ]; then + yggstack -genconf > .git/annex/p2ptransport/yggstack/yggstack.conf +fi +echo \"p2p-annex::yggstack+<pubkey-from-yggstack.conf>.pk.ygg:12345\" >> .git/annex/creds/p2paddrs +``` + +For the client-side it would provide `git-annex-p2p-yggstack` along these lines: + +``` +yggstack -autoconf -socks 127.0.0.1:1080 +nc -X 5 -x 127.0.0.1:1080 <pubkey>.pk.ygg 12345 +``` + +With that package installed one could then do `git annex p2ptransport enable-yggstack` followed by `git annex p2p --gen-addresses`. A `git annex remotedaemon` would now start everything on the server-side, and the client-side could connect using `git annex p2p --link` with the address from `--gen-addresses`. + +--- + +I think this would be sufficiently flexible for most kinds of p2p transport one could come up with. E.g. a transport over fowl or even plain magic-wormhole (though the transit relay wouldn't appreciate it) could use `p2p-annex::fowl+<code>` where the code is a pre-generated token instead of the usual passphrases used by magic-wormhole. The server side would be a script that repeatedly waits for connections to that code, the client side just connects to it. + +Even for more traditional p2p setups (tinc, wireguard, yggdrasil, etc.) where the transport is pre-set up at the system level this would just work if there was a helper for `p2p-annex::tcpip+<hostname>:<port>` (effectively just netcat again). + +--- + +Configuration, program, and subcommand names etc. are of course open to bike-shedding. Some of the hardcoded ports above should be dynamically chosen, or completely avoided if the transport can do so (yggstack and fowl can't expose unix sockets directly yet, so the digression through the loopback device is needed for now). + +What do you think? +"""]]
document empty expression
diff --git a/doc/git-annex-required.mdwn b/doc/git-annex-required.mdwn index 9cb079a322..876ac37ebc 100644 --- a/doc/git-annex-required.mdwn +++ b/doc/git-annex-required.mdwn @@ -11,10 +11,6 @@ git annex required `repository [expression]` When run with an expression, configures the content that is required to be held in the repository. -For example: - - git annex required . "include=*.mp3 or include=*.ogg" - Without an expression, displays the current required content setting of the repository. @@ -26,6 +22,15 @@ need to be removed with `git annex drop --force`. Also, `git-annex fsck` will warn about required contents that are not present. +For example: + + git annex required here "include=*.mp3 or include=*.ogg" + +To return a repository to the original default behavior, use an empty +value for the expression, eg: + + git-annex required here "" + # OPTIONS * The [[git-annex-common-options]](1) can be used. diff --git a/doc/git-annex-wanted.mdwn b/doc/git-annex-wanted.mdwn index f78aef0fc0..f683d5be46 100644 --- a/doc/git-annex-wanted.mdwn +++ b/doc/git-annex-wanted.mdwn @@ -13,11 +13,16 @@ to be held in the repository. See [[git-annex-preferred-content]](1) For example: - git annex wanted . "include=*.mp3 or include=*.ogg" + git annex wanted here "include=*.mp3 or include=*.ogg" Without an expression, displays the current preferred content setting of the repository. +To return a repository to the original default behavior, use an empty +value for the expression, eg: + + git-annex wanted here "" + # OPTIONS * The [[git-annex-common-options]](1) can be used.
empty preferred content
* Document that settting preferred content to "" is the same as the
default unset behavior.
* sync: Avoid misleading warning about future preferred content
transition when preferred content is set to "".
* Document that settting preferred content to "" is the same as the
default unset behavior.
* sync: Avoid misleading warning about future preferred content
transition when preferred content is set to "".
diff --git a/CHANGELOG b/CHANGELOG index a9e21290ae..0e2f177b0b 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -7,6 +7,10 @@ git-annex (10.20241203) UNRELEASED; urgency=medium * Work around git hash-object --stdin-paths's odd stripping of carriage return from the end of the line (some windows infection), avoiding crashing when the repo contains a filename ending in a carriage return. + * Document that settting preferred content to "" is the same as the + default unset behavior. + * sync: Avoid misleading warning about future preferred content + transition when preferred content is set to "". -- Joey Hess <id@joeyh.name> Mon, 02 Dec 2024 13:41:31 -0400 diff --git a/Command/Sync.hs b/Command/Sync.hs index 1e38211260..c9436778bd 100644 --- a/Command/Sync.hs +++ b/Command/Sync.hs @@ -85,6 +85,7 @@ import Utility.Bloom import Utility.OptParse import Utility.Process.Transcript import Utility.Tuple +import Utility.Matcher import Control.Concurrent.MVar import qualified Data.Map as M @@ -1130,7 +1131,7 @@ warnSyncContentTransition o remotes _ -> do m <- preferredContentMap hereu <- getUUID - when (any (`M.member` m) (hereu:map Remote.uuid remotes)) $ + when (any (preferredcontentconfigured m) (hereu:map Remote.uuid remotes)) $ showwarning where showwarning = earlyWarning $ @@ -1140,6 +1141,8 @@ warnSyncContentTransition o remotes <> " send any content, use --no-content (or -g)" <> " to prepare for that change." <> " (Or you can configure annex.synccontent)" + preferredcontentconfigured m u = + maybe False (not . isEmpty . fst) (M.lookup u m) notOnlyAnnex :: SyncOptions -> Annex Bool notOnlyAnnex o = not <$> onlyAnnex o diff --git a/doc/forum/Unable_to_delete_preferred-content.log/comment_3_ef3f7b7c14f44b79785ca9ab5b23ca0f._comment b/doc/forum/Unable_to_delete_preferred-content.log/comment_3_ef3f7b7c14f44b79785ca9ab5b23ca0f._comment new file mode 100644 index 0000000000..4020bb7a08 --- /dev/null +++ b/doc/forum/Unable_to_delete_preferred-content.log/comment_3_ef3f7b7c14f44b79785ca9ab5b23ca0f._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2024-12-13T16:52:18Z" + content=""" +Ah, you're right that this future change in `git-annex sync` behavior is +one way that "anything" is different from not configured. + +It turns out that you can just use `git-annex wanted here ""` to get back +the same behavior as the preferred content being unset. I had forgotten +about that, and it was not really documented anywhere, which I've +corrected. + +Running `git-annex sync` without --content with preferred content set to "" +currently warns about the transition, but that warning is false. I'm fixing +it to not warn in this case. + +As to the heroic measures, .git/annex/index gets merged into whatever is in +the git-annex branch, so you need to delete that file as well as rewriting +the branch. And you need to do this in every single repository that has +received the unwanted change. And since it also auto-merges git-annex +branches from remotes, you probably will want to temporarily remove the +remote tracking branches from git's ref list. +"""]] diff --git a/doc/git-annex-preferred-content.mdwn b/doc/git-annex-preferred-content.mdwn index 908b35a95f..68769484dc 100644 --- a/doc/git-annex-preferred-content.mdwn +++ b/doc/git-annex-preferred-content.mdwn @@ -30,6 +30,9 @@ a file matches, the repository wants to store its content. If it doesn't, the repository wants to drop its content (if there are enough copies elsewhere to allow removing it). +An empty preferred content expression is treated the same as preferred +content not being configured. + # EXPRESSIONS * `include=glob` / `exclude=glob`
Added a comment
diff --git a/doc/forum/Unable_to_delete_preferred-content.log/comment_2_7cfd96ab17be33b407514a4d3b5d9f01._comment b/doc/forum/Unable_to_delete_preferred-content.log/comment_2_7cfd96ab17be33b407514a4d3b5d9f01._comment new file mode 100644 index 0000000000..0584c0228b --- /dev/null +++ b/doc/forum/Unable_to_delete_preferred-content.log/comment_2_7cfd96ab17be33b407514a4d3b5d9f01._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="Doable8234" + avatar="http://cdn.libravatar.org/avatar/b0d5fea745f92c3b8cc8ecc3dafa6278" + subject="comment 2" + date="2024-12-13T02:04:39Z" + content=""" +I tried setting it to `anything`, I still get the warning `git-annex sync will change default behavior in the future to send content to repositories that have preferred content configured. If you do not want this to send any content, use --no-content (or -g) to prepare for that change.`. My guess is that this means that git annex will start trying to sync content in the future (and see that there's nothing to do). I'd prefer having it not try to sync content at all. (I do know that setting annex.synccontent to true does exactly that) + +One thing is that I'm a git afficionado, and I DID go through some heroic measures and I'm VERY surprised that this is still happening and can't understand it. I even tried squashing all the commits in the git-annex branch into one, and git-annex STILL somehow managed to bring the preferred-content.log back. + +I can share a script to show what I did if that helps. I'm mostly just looking to understand how this is possible since I just can't figure it out. +"""]]
response
diff --git a/doc/forum/Unable_to_delete_preferred-content.log/comment_1_5bc7da512e2f6f6edb10685cac33de65._comment b/doc/forum/Unable_to_delete_preferred-content.log/comment_1_5bc7da512e2f6f6edb10685cac33de65._comment new file mode 100644 index 0000000000..7fc7c741fb --- /dev/null +++ b/doc/forum/Unable_to_delete_preferred-content.log/comment_1_5bc7da512e2f6f6edb10685cac33de65._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2024-12-12T18:27:08Z" + content=""" +The automatic union merging done to the git-annex branch does not allow +deleting files from it without heroic measures. +Anyway, the current content of the config file remains stored in the git +history even if it gets deleted. + +Just set it to "anything". That, not "standard" is the actual default. +"""]]
comment
diff --git a/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment b/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment new file mode 100644 index 0000000000..5c6307cb51 --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_2_57e4608559b0873617c82d5454cab798._comment @@ -0,0 +1,45 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2024-12-12T17:01:42Z" + content=""" +This was all designed to be generalizable to some degree, but has so far +really only been used for tor. + +Making it generic may be a good idea. Or it may be that there are really +too many complications around how different p2p networks and addresses work +and how authentication is done, that would complicate a generic command, +but that can be transparently handled when implementing support for a +specific p2p transport, as was done for tor. + +Working from the client end, the git remote has an url, which needs to be +identified as a p2p address to use a p2p transport to talk to it. Currently +that is an url starting with "tor-annex:". Like you suggest, +the generic one could be "p2p-annex::<path-to-socket-file>". Or it could be +"p2p-annex::foo+<bar>" which causes git-annex to run a command like +`git-annex-p2p-foo <bar>` and talk to its stdin and stdout. + +That's for outgoing connections. For incoming connections, +for tor, the remotedaemon looks to see if the socket file exists and +if so it accepts connections from it. (That tor socket is not used for +outgoing connections.) It would be easy to generalize this +to additional socket filenames. Eg, a remote with uuid U could use +`.git/annex/p2p/U` as its socket file. + +BTW, that git-annex-p2p-foo command is different from the git remote +helper you suggest, which corresponds to git-remote-tor-annex. But, +git-remote-tor-annex would easily generalize to a +git-remote-p2p-annex git remote helper, if there was a generic +p2p-annex url type and a way to connect to it. + +If the P2P protocol's AUTH is provided with an AuthToken, there would need +to be an interface to record the one to use for a given p2p connection. +`git-annex p2p` handles setting up AuthTokens, but its approach may or may +not make sense for a given p2p protocol. It does look like, if there's a +generic way implemented to connect to a given p2p-annex url, `git-annex +p2p` would mostly work. But there would need to be a way to generate an +address using such an url, like `git-annex enable-tor` does. + +Seems pretty close to a workable design to me, but I don't know how well it +will match up with these various kinds of P2P networks. +""]]
diff --git a/doc/forum/Unable_to_delete_preferred-content.log.mdwn b/doc/forum/Unable_to_delete_preferred-content.log.mdwn index 1671db3214..5d331f09e0 100644 --- a/doc/forum/Unable_to_delete_preferred-content.log.mdwn +++ b/doc/forum/Unable_to_delete_preferred-content.log.mdwn @@ -1,4 +1,4 @@ -I create a preferred content expression for a host by running `git annex wanted . <some expression>` for a test. Now I want to clear this completely. However, if I manually delete the `preferred-content.log`, every time I do a sync it pops back up. +I create a preferred content expression for a host by running `git annex wanted . <some expression>` for a test. Now I want to clear this completely. However, if I manually delete the `preferred-content.log`, every time I do a `git annex add` it pops back up. How can I just delete this completely so it stays deleted? One of the reasons I want to clear this is since `git annex sync` behavior is changing to start doing `sync --content` by default. I know that having it as standard should be effectively the same, but there's gotta be a way to completely undo it.
diff --git a/doc/forum/Unable_to_delete_preferred-content.log.mdwn b/doc/forum/Unable_to_delete_preferred-content.log.mdwn index 395122ddd5..1671db3214 100644 --- a/doc/forum/Unable_to_delete_preferred-content.log.mdwn +++ b/doc/forum/Unable_to_delete_preferred-content.log.mdwn @@ -1,5 +1,5 @@ I create a preferred content expression for a host by running `git annex wanted . <some expression>` for a test. Now I want to clear this completely. However, if I manually delete the `preferred-content.log`, every time I do a sync it pops back up. -How can I just delete this completely so it stays deleted? One of the reasons I want to clear this is since `git annex sync` behavior is changing to start doing `sync --content` by default. +How can I just delete this completely so it stays deleted? One of the reasons I want to clear this is since `git annex sync` behavior is changing to start doing `sync --content` by default. I know that having it as standard should be effectively the same, but there's gotta be a way to completely undo it.
diff --git a/doc/forum/Unable_to_delete_preferred-content.log.mdwn b/doc/forum/Unable_to_delete_preferred-content.log.mdwn new file mode 100644 index 0000000000..395122ddd5 --- /dev/null +++ b/doc/forum/Unable_to_delete_preferred-content.log.mdwn @@ -0,0 +1,5 @@ +I create a preferred content expression for a host by running `git annex wanted . <some expression>` for a test. Now I want to clear this completely. However, if I manually delete the `preferred-content.log`, every time I do a sync it pops back up. + +How can I just delete this completely so it stays deleted? One of the reasons I want to clear this is since `git annex sync` behavior is changing to start doing `sync --content` by default. + +
Added a comment: we use something like that..
diff --git a/doc/forum/Requesting_of_files_across_disconnected_devices/comment_1_047baf46f07acec2021d71a742f9e2f2._comment b/doc/forum/Requesting_of_files_across_disconnected_devices/comment_1_047baf46f07acec2021d71a742f9e2f2._comment new file mode 100644 index 0000000000..4fa1414335 --- /dev/null +++ b/doc/forum/Requesting_of_files_across_disconnected_devices/comment_1_047baf46f07acec2021d71a742f9e2f2._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="vince@56a3a35e623d01b9236e911a9caff71eb367399b" + nickname="vince" + avatar="http://cdn.libravatar.org/avatar/e4b490f63f17c7b29f234f65b9b42699" + subject="we use something like that.. " + date="2024-12-08T20:34:35Z" + content=""" +we created Baobáxia, with a python layer on top of g-a that implement something like that: + +https://git-annex.branchable.com/design/requests_routing/ + + +"""]]
diff --git a/doc/forum/clone_and_initialize_with_a_given_uuid.mdwn b/doc/forum/clone_and_initialize_with_a_given_uuid.mdwn new file mode 100644 index 0000000000..0c7b09374e --- /dev/null +++ b/doc/forum/clone_and_initialize_with_a_given_uuid.mdwn @@ -0,0 +1,8 @@ +Ayo.. hi there.. Love git-annex! + +We are using git-annex as a backend for an asynchronous network.. we have multiple repositories in the same machines/nodes and we'd like to use the same uuid to identify these nodes. + +Each time we clone an existing g-a repository it creates a new uuid. Right now we have to reinit with the existing uuid and set the newly created uuid as dead. + +Is it possible to prevent this stillborn? +
diff --git a/doc/todo/generic_p2p_socket_transport.mdwn b/doc/todo/generic_p2p_socket_transport.mdwn index 102edc45ae..6137b73a0b 100644 --- a/doc/todo/generic_p2p_socket_transport.mdwn +++ b/doc/todo/generic_p2p_socket_transport.mdwn @@ -3,7 +3,7 @@ Being able to connect repositories peer to peer is nice, but only having tor as What I am thinking would be nice to have for this is: 1. Something like `git annex enable-p2p-socket`, which would configure the repository such that `git annex remotedaemon` listens on a unix socket somewhere under .git/annex for incoming p2p connections, which would be authenticated using the pairing process from `git annex p2p` just like when using the tor transport. -2. A git remote `p2p-annex::<path-to-socket-file>`, which would connect to the unix socket and speak the p2p protocol with it. +2. A git remote helper `p2p-annex::<path-to-socket-file>`, which would connect to the unix socket and speak the p2p protocol with it. With these two things in place it would be possible to use any transport to connect the socket files on two systems, including yggstack, fowl, or just netcat or socat (though unencrypted communication would be a bad idea).
Added a comment
diff --git a/doc/todo/generic_p2p_socket_transport/comment_1_0eafb233b37668daa473f0415e11813f._comment b/doc/todo/generic_p2p_socket_transport/comment_1_0eafb233b37668daa473f0415e11813f._comment new file mode 100644 index 0000000000..e341ce2158 --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport/comment_1_0eafb233b37668daa473f0415e11813f._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de" + subject="comment 1" + date="2024-12-08T17:27:59Z" + content=""" +I suppose this wouldn't have to communicate over unix sockets either, it could also use stdin/stdout like `git annex shell p2pstdio`, but without skipping authentication, instead using the `git annex p2p` pairing process. Something like socat could then be used to connect those stdin/stdout's to a unix socket, tcp port, or whatever else. +"""]]
diff --git a/doc/todo/generic_p2p_socket_transport.mdwn b/doc/todo/generic_p2p_socket_transport.mdwn new file mode 100644 index 0000000000..102edc45ae --- /dev/null +++ b/doc/todo/generic_p2p_socket_transport.mdwn @@ -0,0 +1,14 @@ +Being able to connect repositories peer to peer is nice, but only having tor as an option is quite limiting, especially considering that tor isn't that suitable for large file transfers. It would be nice if git-annex could be taught to use other transports as well (what I have in mind is [yggstack](https://github.com/yggdrasil-network/yggstack) or [fowl](https://github.com/meejah/fowl) (for fowl I had already opened a todo in the past: <https://git-annex.branchable.com/todo/Peer_to_peer_connection_purely_over_magic-wormhole/>), but there are probably others that could be used as well). + +What I am thinking would be nice to have for this is: + +1. Something like `git annex enable-p2p-socket`, which would configure the repository such that `git annex remotedaemon` listens on a unix socket somewhere under .git/annex for incoming p2p connections, which would be authenticated using the pairing process from `git annex p2p` just like when using the tor transport. +2. A git remote `p2p-annex::<path-to-socket-file>`, which would connect to the unix socket and speak the p2p protocol with it. + +With these two things in place it would be possible to use any transport to connect the socket files on two systems, including yggstack, fowl, or just netcat or socat (though unencrypted communication would be a bad idea). + +My understanding is that the current tor p2p support is essentially a special case of the above, using a socket file in /var/lib/tor-annex and requiring a hidden service configuration in torrc on the server-side, while being limited to onion addresses on the client-side. In that sense this would just be a generalization and I think most of the code to support this is already there, and just needs to be wired differently. + +This should also make it possible to build e.g. a `git annex enable-yggstack` and `yggstack-annex::<pubkey>.pk.ygg` remote in terms of enable-p2p-socket and `p2p-annex::`, even outside of git-annex itself. + +What do you think?
close
diff --git a/doc/todo/copy__47__move_support_for_pushinsteadOf_.mdwn b/doc/todo/copy__47__move_support_for_pushinsteadOf_.mdwn index 48c440f1eb..819ca3c04d 100644 --- a/doc/todo/copy__47__move_support_for_pushinsteadOf_.mdwn +++ b/doc/todo/copy__47__move_support_for_pushinsteadOf_.mdwn @@ -56,3 +56,4 @@ and the use case is quite common for me and in particular for ReproNim/container [[!meta author=yoh]] [[!tag projects/repronim]] +> Calling this [[done]] for now. --[[Joey]]
comments
diff --git a/doc/bugs/Web_app_terribly_slow/comment_3_1850a8f1a1c30ab8665fc5b777a89cf9._comment b/doc/bugs/Web_app_terribly_slow/comment_3_1850a8f1a1c30ab8665fc5b777a89cf9._comment new file mode 100644 index 0000000000..0c2be50e69 --- /dev/null +++ b/doc/bugs/Web_app_terribly_slow/comment_3_1850a8f1a1c30ab8665fc5b777a89cf9._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2024-12-03T18:41:08Z" + content=""" +Tried in chromium, also seeing it block sometimes, though not always, on +page loads. + +This seems rather similar to this old bug involving the same software: +<https://github.com/yesodweb/wai/issues/146> +But not I think quite the same, because that was caused by +a long delay between clicks, and this happens when clicking immediately. + +The browser's network console shows a few ms for all resources to load, +except the page itself which for some reason takes a log time. + +tcpdump shows that chromium does not send a request for that page until +many seconds after the click. The response to the request is immediate. + +Before that point there is only some SYN/ACK traffic. Which looked a bit +weird maybe. +"""]] diff --git a/doc/bugs/Web_app_terribly_slow/comment_4_bbde0af9e43dc14c779fa8e7b3969edd._comment b/doc/bugs/Web_app_terribly_slow/comment_4_bbde0af9e43dc14c779fa8e7b3969edd._comment new file mode 100644 index 0000000000..3459cf5309 --- /dev/null +++ b/doc/bugs/Web_app_terribly_slow/comment_4_bbde0af9e43dc14c779fa8e7b3969edd._comment @@ -0,0 +1,38 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2024-12-03T20:13:35Z" + content=""" +Looking at the network inspector, it seems more likely a given click stalls +when there are multiple long polling requests ongoing at the same time +(which show as "pending"). + +So on a hunch, I disabled javascript in chromium. No more hangs. + +I suppose maybe chromium has a small pool of connections to the web server, +and if all of those get blocked up doing long polling, and if it doesn't +cancel those when navigating away from the page that made the long polling +request, it could block? + +But why wouldn't it cancel them? The requests were made by the page it's +navigating away from. Maybe it cancels them only once it's loaded the new +page. + +I do think that's it though. When I open 2 tabs both to the webapp, +a request in one tab can be stalled, and pressing escape in the other tab, +when it cancels the long polling requests, will unstall it. + +Amazingly, chromium is limited to 6 concurrent connections per server, with +no way to configure it! And the front page of the webapp opens several +long polling connections. + +In firefox, by comparison, the network inspector shows the long polling +connections apparently disappear when navigating to a new page. But, there +is a similar problem, when opening 2 tabs to the webapp, one can stall, +due to the long polling connections open by the other tab. + +So, I think the solution to this will either need to involve some change +to the long polling javascript, if there is some way to make chromium +cancel the request when navigating away... Or it would need to replace the +long polling with something else entirely. +"""]]
Added a comment
diff --git a/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_9_66d2d0212b9ecdc7dafc28ca00008c82._comment b/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_9_66d2d0212b9ecdc7dafc28ca00008c82._comment new file mode 100644 index 0000000000..b40f7495be --- /dev/null +++ b/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_9_66d2d0212b9ecdc7dafc28ca00008c82._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 9" + date="2024-12-03T20:26:10Z" + content=""" +FWIW, I do like separation into a dedicated `annexInsteadOf` and its alignment to `annexUrl <-> pushUrl`! I will try it up soonish. Thank you! +"""]]
annexInsteadOf config
Added config `url.<base>.annexInsteadOf` corresponding to git's
`url.<base>.pushInsteadOf`, to configure the urls to use for accessing the
git-annex repositories on a server without needing to configure
remote.name.annexUrl in each repository.
While one use case for this would be rewriting urls to use annex+http,
I decided not to add any kind of special case for that. So while
git-annex p2phttp, when serving multiple repositories, needs an url
of eg "annex+http://example.com/git-annex/ for each of them, rewriting an
url like "https://example.com/git/foo/bar" with this config set to
"https://example.com/git/" will result in eg
"annex+http://example.com/git-annex/foo/bar", which p2phttp does not
support.
That seems better dealt with in either git-annex p2phttp or a http
middleware, rather than complicating the config with a special case for
annex+http.
Anyway, there are other use cases for this that don't involve annex+http.
Added config `url.<base>.annexInsteadOf` corresponding to git's
`url.<base>.pushInsteadOf`, to configure the urls to use for accessing the
git-annex repositories on a server without needing to configure
remote.name.annexUrl in each repository.
While one use case for this would be rewriting urls to use annex+http,
I decided not to add any kind of special case for that. So while
git-annex p2phttp, when serving multiple repositories, needs an url
of eg "annex+http://example.com/git-annex/ for each of them, rewriting an
url like "https://example.com/git/foo/bar" with this config set to
"https://example.com/git/" will result in eg
"annex+http://example.com/git-annex/foo/bar", which p2phttp does not
support.
That seems better dealt with in either git-annex p2phttp or a http
middleware, rather than complicating the config with a special case for
annex+http.
Anyway, there are other use cases for this that don't involve annex+http.
diff --git a/CHANGELOG b/CHANGELOG index fece6c4def..a9e21290ae 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,5 +1,9 @@ git-annex (10.20241203) UNRELEASED; urgency=medium + * Added config `url.<base>.annexInsteadOf` corresponding to git's + `url.<base>.pushInsteadOf`, to configure the urls to use for accessing + the git-annex repositories on a server without needing to configure + remote.name.annexUrl in each repository. * Work around git hash-object --stdin-paths's odd stripping of carriage return from the end of the line (some windows infection), avoiding crashing when the repo contains a filename ending in a carriage return. diff --git a/Git/Remote.hs b/Git/Remote.hs index 2cc0e2601d..b09aee6643 100644 --- a/Git/Remote.hs +++ b/Git/Remote.hs @@ -89,7 +89,7 @@ remoteLocationIsSshUrl _ = False parseRemoteLocation :: String -> Bool -> Repo -> RemoteLocation parseRemoteLocation s knownurl repo = go where - s' = calcloc s + s' = fromMaybe s $ insteadOfUrl s ".insteadof" $ fullconfig repo go #ifdef mingw32_HOST_OS | dosstyle s' = RemotePath (dospath s') @@ -98,28 +98,6 @@ parseRemoteLocation s knownurl repo = go | urlstyle s' = RemoteUrl s' | knownurl && s' == s = RemoteUrl s' | otherwise = RemotePath s' - -- insteadof config can rewrite remote location - calcloc l - | null insteadofs = l - | otherwise = replacement ++ drop (S.length bestvalue) l - where - replacement = decodeBS $ S.drop (S.length prefix) $ - S.take (S.length bestkey - S.length suffix) bestkey - (bestkey, bestvalue) = - case maximumBy longestvalue insteadofs of - (ConfigKey k, ConfigValue v) -> (k, v) - (ConfigKey k, NoConfigValue) -> (k, mempty) - longestvalue (_, a) (_, b) = compare b a - insteadofs = filterconfig $ \case - (ConfigKey k, ConfigValue v) -> - prefix `S.isPrefixOf` k && - suffix `S.isSuffixOf` k && - v `S.isPrefixOf` encodeBS l - (_, NoConfigValue) -> False - filterconfig f = filter f $ - concatMap splitconfigs $ M.toList $ fullconfig repo - splitconfigs (k, vs) = map (\v -> (k, v)) (NE.toList vs) - (prefix, suffix) = ("url." , ".insteadof") -- git supports URIs that contain unescaped characters such as -- spaces. So to test if it's a (git) URI, escape those. urlstyle v = isURI (escapeURIString isUnescapedInURI v) @@ -147,3 +125,26 @@ parseRemoteLocation s knownurl repo = go dosstyle = hasDrive dospath = fromRawFilePath . fromInternalGitPath . toRawFilePath #endif + +insteadOfUrl :: String -> S.ByteString -> RepoFullConfig -> Maybe String +insteadOfUrl u configsuffix fullcfg + | null insteadofs = Nothing + | otherwise = Just $ replacement ++ drop (S.length bestvalue) u + where + replacement = decodeBS $ S.drop (S.length configprefix) $ + S.take (S.length bestkey - S.length configsuffix) bestkey + (bestkey, bestvalue) = + case maximumBy longestvalue insteadofs of + (ConfigKey k, ConfigValue v) -> (k, v) + (ConfigKey k, NoConfigValue) -> (k, mempty) + longestvalue (_, a) (_, b) = compare b a + insteadofs = filterconfig $ \case + (ConfigKey k, ConfigValue v) -> + configprefix `S.isPrefixOf` k && + configsuffix `S.isSuffixOf` k && + v `S.isPrefixOf` encodeBS u + (_, NoConfigValue) -> False + filterconfig f = filter f $ + concatMap splitconfigs $ M.toList fullcfg + splitconfigs (k, vs) = map (\v -> (k, v)) (NE.toList vs) + configprefix = "url." diff --git a/Git/Types.hs b/Git/Types.hs index 18398a040e..b28380bc46 100644 --- a/Git/Types.hs +++ b/Git/Types.hs @@ -41,9 +41,9 @@ data RepoLocation data Repo = Repo { location :: RepoLocation - , config :: M.Map ConfigKey ConfigValue + , config :: RepoConfig -- a given git config key can actually have multiple values - , fullconfig :: M.Map ConfigKey (NE.NonEmpty ConfigValue) + , fullconfig :: RepoFullConfig -- remoteName holds the name used for this repo in some other -- repo's list of remotes, when this repo is such a remote , remoteName :: Maybe RemoteName @@ -60,6 +60,10 @@ data Repo = Repo -- when using this repository. , repoPathSpecifiedExplicitly :: Bool } deriving (Show, Eq, Ord) + +type RepoConfig = M.Map ConfigKey ConfigValue + +type RepoFullConfig = M.Map ConfigKey (NE.NonEmpty ConfigValue) newtype ConfigKey = ConfigKey S.ByteString deriving (Ord, Eq) diff --git a/Remote/Git.hs b/Remote/Git.hs index 10d582bd36..d77fce1fd8 100644 --- a/Remote/Git.hs +++ b/Remote/Git.hs @@ -98,8 +98,9 @@ locationField = Accepted "location" list :: Bool -> Annex [Git.Repo] list autoinit = do - c <- fromRepo Git.config - rs <- mapM (tweakurl c) =<< Annex.getGitRemotes + cfg <- fromRepo Git.config + fullcfg <- fromRepo Git.fullconfig + rs <- mapM (tweakurl cfg fullcfg) =<< Annex.getGitRemotes rs' <- mapM (configRead autoinit) (filter (not . isGitRemoteAnnex) rs) proxies <- doQuietAction getProxies if proxies == mempty @@ -108,17 +109,20 @@ list autoinit = do proxied <- listProxied proxies rs' return (proxied++rs') where - tweakurl c r = do + tweakurl cfg fullcfg r = do let n = fromJust $ Git.remoteName r - case getAnnexUrl r c of - Just url | not (isP2PHttpProtocolUrl url) -> + case getAnnexUrl r cfg fullcfg of + Just url | not (isP2PHttpProtocolUrl url) -> inRepo $ \g -> Git.Construct.remoteNamed n $ Git.Construct.fromRemoteLocation url False g _ -> return r -getAnnexUrl :: Git.Repo -> M.Map Git.ConfigKey Git.ConfigValue -> Maybe String -getAnnexUrl r c = Git.fromConfigValue <$> M.lookup (annexUrlConfigKey r) c +getAnnexUrl :: Git.Repo -> Git.RepoConfig -> Git.RepoFullConfig -> Maybe String +getAnnexUrl r cfg fullcfg = + (Git.fromConfigValue <$> M.lookup (annexUrlConfigKey r) cfg) + <|> + annexInsteadOfUrl fullcfg (Git.repoLocation r) annexUrlConfigKey :: Git.Repo -> Git.ConfigKey annexUrlConfigKey r = remoteConfig r "annexurl" diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs index 4b9c546e86..2ab5de3ea6 100644 --- a/Types/GitConfig.hs +++ b/Types/GitConfig.hs @@ -27,6 +27,7 @@ module Types.GitConfig ( proxyInheritedFields, MkRemoteConfigKey, mkRemoteConfigKey, + annexInsteadOfUrl, ) where import Common @@ -35,7 +36,7 @@ import qualified Git.Config import qualified Git.Construct import Git.Types import Git.ConfigTypes -import Git.Remote (isRemoteKey, isLegalName, remoteKeyToRemoteName) +import Git.Remote (isRemoteKey, isLegalName, remoteKeyToRemoteName, insteadOfUrl) import Git.Branch (CommitMode(..)) import Git.Quote (QuotePath(..)) import Utility.DataUnits @@ -497,16 +498,14 @@ extractRemoteGitConfig r remotename = do , remoteAnnexClusterGateway = fromMaybe [] $ (mapMaybe (mkClusterUUID . toUUID) . words) <$> getmaybe ClusterGatewayField - , remoteUrl = case Git.Config.getMaybe (mkRemoteConfigKey remotename (remoteGitConfigKey UrlField)) r of - Just (ConfigValue b) - | B.null b -> Nothing - | otherwise -> Just (decodeBS b) - _ -> Nothing + , remoteUrl = getremoteurl , remoteAnnexP2PHttpUrl = case Git.Config.getMaybe (mkRemoteConfigKey remotename (remoteGitConfigKey AnnexUrlField)) r of Just (ConfigValue b) -> parseP2PHttpUrl (decodeBS b) - _ -> Nothing + _ -> parseP2PHttpUrl + =<< annexInsteadOfUrl (fullconfig r) + =<< getremoteurl , remoteAnnexShell = getmaybe ShellField , remoteAnnexSshOptions = getoptions SshOptionsField , remoteAnnexRsyncOptions = getoptions RsyncOptionsField @@ -544,6 +543,11 @@ extractRemoteGitConfig r remotename = do in Git.Config.getMaybe (mkRemoteConfigKey remotename k) r <|> Git.Config.getMaybe (mkAnnexConfigKey k) r getoptions k = fromMaybe [] $ words <$> getmaybe k + getremoteurl = case Git.Config.getMaybe (mkRemoteConfigKey remotename (remoteGitConfigKey UrlField)) r of + Just (ConfigValue b) (Diff truncated)
comments
diff --git a/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_6_041e44a8386f97b2301df2ea77dbb497._comment b/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_6_041e44a8386f97b2301df2ea77dbb497._comment new file mode 100644 index 0000000000..fca2ac009e --- /dev/null +++ b/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_6_041e44a8386f97b2301df2ea77dbb497._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2024-12-03T15:48:52Z" + content=""" +git-annex does not currently use pushurl, and making it start to use it +would be the same kind of potentially breaking change as making it start to +use pushinsteadof. + +I get where you're coming from but just because a lot of people use +pushinstead of that way does not mean that other people don't use it to +redirect pushes to an entirely different clone of the repository. + +[Here](https://github.com/git/git/commit/697f652818f211aa48e3c007f25d6177647980c1) +Junio calls using pushurl that way a "common mistake", so I guess he is +seeing people do that. He does have a good point that with such a +configuration refs/remotes/origin won't (usually) reflect the state of both +repos. +"""]] diff --git a/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_7_d65ac9b362eed6f36e12609efbe8a793._comment b/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_7_d65ac9b362eed6f36e12609efbe8a793._comment new file mode 100644 index 0000000000..e655bb624c --- /dev/null +++ b/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_7_d65ac9b362eed6f36e12609efbe8a793._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2024-12-03T16:06:15Z" + content=""" +If git-annex used pushInsteadOf for sending content to a remote, should it +also use it for dropping content from the remote? Dropping is quite far +from pushing. Does it make sense to expect the user to generalize "push" to +"arbitrary write access" when it comes to git-annex's interpretation of +configuration settings that were designed for git? + +Granted, `git-annex push` can drop content from the remote when preferred +content is configured to. +"""]] diff --git a/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_8_8604dfafa32ebb14281c2c866214a920._comment b/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_8_8604dfafa32ebb14281c2c866214a920._comment new file mode 100644 index 0000000000..859b8b8f7f --- /dev/null +++ b/doc/todo/copy__47__move_support_for_pushinsteadOf_/comment_8_8604dfafa32ebb14281c2c866214a920._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2024-12-03T16:15:50Z" + content=""" +Maybe what's really missing is `url.<base>.annexInsteadOf` +corresponding to `url.<base>.pushInsteadOf`. + +The same way `remote.<name>.annexUrl` corresponds to +`remote.<name>.pushUrl`. + +You would need to set 2 configs, but the separation is clear. +And you could do it once in your global git config for whatever +servers you commonly use. + +Another benefit to is that the new `git-annex p2phttp` server +needs annexUrl to be configured to a different url than the git url +when using it. annexInsteadOf would let that be configured a +single time for all urls on a given git server. +"""]]
Added a comment
diff --git a/doc/bugs/Web_app_terribly_slow/comment_2_7af7205599bd1cad6ec9062e122db9dc._comment b/doc/bugs/Web_app_terribly_slow/comment_2_7af7205599bd1cad6ec9062e122db9dc._comment new file mode 100644 index 0000000000..b3b4a095c1 --- /dev/null +++ b/doc/bugs/Web_app_terribly_slow/comment_2_7af7205599bd1cad6ec9062e122db9dc._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="iirekm" + avatar="http://cdn.libravatar.org/avatar/c00a453db590bda7a0120281d0a12f62" + subject="comment 2" + date="2024-12-03T15:43:25Z" + content=""" +Just tested more: +- Brave - slow +- Chrome - slow +- Firefox - works like a dream + +It seems it's something wrong with Chromium-family browser support. There are no JavaScript and network errors in developer console. +I also tried to start git annex webapp with --vebose and --debug flags - nothing there as well. +"""]]
comment
diff --git a/doc/bugs/Web_app_terribly_slow/comment_1_86aabaca367ebbd3de2a0ca9b3b58540._comment b/doc/bugs/Web_app_terribly_slow/comment_1_86aabaca367ebbd3de2a0ca9b3b58540._comment new file mode 100644 index 0000000000..62890bac70 --- /dev/null +++ b/doc/bugs/Web_app_terribly_slow/comment_1_86aabaca367ebbd3de2a0ca9b3b58540._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2024-12-03T15:34:33Z" + content=""" +What browser? +"""]]
diff --git a/doc/bugs/Web_app_terribly_slow.mdwn b/doc/bugs/Web_app_terribly_slow.mdwn new file mode 100644 index 0000000000..01722e32ba --- /dev/null +++ b/doc/bugs/Web_app_terribly_slow.mdwn @@ -0,0 +1,4 @@ +Git Annex web app (started from KDE menu) is so slow that it's unusable, after any click, response time is some 30 seconds. It's so slow even on newly-created empty repository. Killing all git-annex processes doesn't help. + +Version: 10.20241031-1~ndall+1 +OS: KUbuntu 24.04
Added a comment: Maybe explanation
diff --git a/doc/forum/refs__47__heads__47__synced__47__git-annex_receives_from_more_than_one_src/comment_2_f220b11501f60fe4d3e0c1911e85e8e2._comment b/doc/forum/refs__47__heads__47__synced__47__git-annex_receives_from_more_than_one_src/comment_2_f220b11501f60fe4d3e0c1911e85e8e2._comment new file mode 100644 index 0000000000..c0ed7f857f --- /dev/null +++ b/doc/forum/refs__47__heads__47__synced__47__git-annex_receives_from_more_than_one_src/comment_2_f220b11501f60fe4d3e0c1911e85e8e2._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="david@1439a1cab13195a56248b6a8fd98a62028bcba8a" + nickname="david" + avatar="http://cdn.libravatar.org/avatar/22c2d800db6a7699139df604a67cb221" + subject="Maybe explanation" + date="2024-12-02T20:25:34Z" + content=""" +I case anyone else is struggling with this 9 years later, I noticed I got this message by running \"git annex sync\" with the branch \"git-annex\" checked out. + +"""]]
comment
diff --git a/doc/forum/less_paranoid_mode/comment_1_ff638eda708966f151ef12849f69572c._comment b/doc/forum/less_paranoid_mode/comment_1_ff638eda708966f151ef12849f69572c._comment new file mode 100644 index 0000000000..818163920e --- /dev/null +++ b/doc/forum/less_paranoid_mode/comment_1_ff638eda708966f151ef12849f69572c._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2024-12-02T18:00:31Z" + content=""" +This happens to avoid it: + +`git config core.sharedRepository group` + +I do think that preventing accidental deletion, of what +may be the only copy of the file, has probably saved some peoples' data +though. +"""]]
deal with git's CFLR nonsense once again
Work around git hash-object --stdin-paths's odd stripping of carriage
return from the end of the line (some windows infection), avoiding crashing
when the repo contains a filename ending in a carriage return.
Work around git hash-object --stdin-paths's odd stripping of carriage
return from the end of the line (some windows infection), avoiding crashing
when the repo contains a filename ending in a carriage return.
diff --git a/CHANGELOG b/CHANGELOG index 77fea4eb73..fece6c4def 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,11 @@ +git-annex (10.20241203) UNRELEASED; urgency=medium + + * Work around git hash-object --stdin-paths's odd stripping of carriage + return from the end of the line (some windows infection), avoiding + crashing when the repo contains a filename ending in a carriage return. + + -- Joey Hess <id@joeyh.name> Mon, 02 Dec 2024 13:41:31 -0400 + git-annex (10.20241202) upstream; urgency=medium * add: Consistently treat files in a dotdir as dotfiles, even diff --git a/Git/HashObject.hs b/Git/HashObject.hs index 1474c5709d..620c095141 100644 --- a/Git/HashObject.hs +++ b/Git/HashObject.hs @@ -48,13 +48,17 @@ hashFile hdl@(HashObjectHandle h _ _) file = do -- So, make the filename absolute, which will work now -- and also if git's behavior later changes. file' <- absPath file - if newline `S.elem` file' + if newline `S.elem` file' || carriagereturn `S.elem` file then hashFile' hdl file else CoProcess.query h (send file') receive where send file' to = S8.hPutStrLn to file' receive from = getSha "hash-object" $ S8.hGetLine from newline = fromIntegral (ord '\n') + -- git strips carriage return from the end of a line, out of some + -- misplaced desire to support windows, so also use the newline + -- fallback for those. + carriagereturn = fromIntegral (ord '\r') {- Runs git hash-object once per call, rather than using a running - one, so is slower. But, is able to handle newlines in the filepath, diff --git a/doc/bugs/hFlush__58___resource_vanished___40__Broken_pipe__41__.mdwn b/doc/bugs/hFlush__58___resource_vanished___40__Broken_pipe__41__.mdwn index 71a559f2eb..e0df1225bc 100644 --- a/doc/bugs/hFlush__58___resource_vanished___40__Broken_pipe__41__.mdwn +++ b/doc/bugs/hFlush__58___resource_vanished___40__Broken_pipe__41__.mdwn @@ -73,3 +73,5 @@ It succeeds at moving the large `test2` file into `.git/annex/objects` and symli ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) Yes! :) It's helped me manage an unruly mess of files, backups, and backups of backups. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/hFlush__58___resource_vanished___40__Broken_pipe__41__/comment_1_0a3132101d9cc48afa55d467d1c3e12a._comment b/doc/bugs/hFlush__58___resource_vanished___40__Broken_pipe__41__/comment_1_0a3132101d9cc48afa55d467d1c3e12a._comment new file mode 100644 index 0000000000..982073e14d --- /dev/null +++ b/doc/bugs/hFlush__58___resource_vanished___40__Broken_pipe__41__/comment_1_0a3132101d9cc48afa55d467d1c3e12a._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2024-12-02T16:41:07Z" + content=""" +Reproduced on linux. This is pretty surprising since `\r` is not a +particularly special character. + +Adding any file not matching largefiles with '\r` in its name will trigger +it, the rest is not needed. + +`git hash-object --stdin-paths` is what is failing. + + printf '/home/joey/tmp/tr/example/Icon3\r\n' | git hash-object --stdin-paths + fatal: could not open '/home/joey/tmp/tr/example/Icon3' for reading: No such file or directory + +So, this is a misbehavior in git, which prevents passing a filename ending +in '\r' into --stdin-paths here. Probably git is removing DOS style CRLF +when it should not. I have reported this (and several related bugs) to the +git mailing list so it might get fixed. + +`git cat-file --batch` also has this behavior, and git-annex already works +around it by treating "\r" the same as "\n" and avoiding using the batch +interface for it. (It could use -z, which avoids the problem, but older +git's don't support that option.) + +I've made git-annex treat "\r" as special for git hash-object as well. +"""]]
add news item for git-annex 10.20241202
diff --git a/doc/news/version_10.20240731.mdwn b/doc/news/version_10.20240731.mdwn deleted file mode 100644 index d5d39b9b46..0000000000 --- a/doc/news/version_10.20240731.mdwn +++ /dev/null @@ -1,22 +0,0 @@ -git-annex 10.20240731 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * New HTTP API that is equivilant to the P2P protocol. - * New p2phttp command to serve the HTTP API. - * annex+http and annex+https urls can be configured for - remote.name.annexUrl to use the HTTP API to communicate with a server. - This supports writable repositories, as well as accessing clusters - and proxied remotes over HTTP. - * When a http remote has annex.url set to an annex+http url in - the git config file on the website, it will be copied into - remote.name.annexUrl the first time git-annex uses the remote. - * assistant: Fix a race condition that could cause a pointer file to - get ingested into the annex. - * Avoid potential data loss in unlikely situations where git-annex-shell - or git-annex remotedaemon is killed while locking a key to prevent its - removal. - * When proxying a download from a special remote, avoid unncessary hashing. - * When proxying an upload to a special remote, verify the hash. - * Propagate --force to git-annex transferrer. - * Added a build flag for servant, enabling annex+http urls and - git-annex p2phttp. - * Added a dependency on the haskell clock library. - * Updated stack.yaml to nightly-2024-07-29."""]] \ No newline at end of file diff --git a/doc/news/version_10.20241202.mdwn b/doc/news/version_10.20241202.mdwn new file mode 100644 index 0000000000..0c3b2f2cfc --- /dev/null +++ b/doc/news/version_10.20241202.mdwn @@ -0,0 +1,28 @@ +git-annex 10.20241202 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * add: Consistently treat files in a dotdir as dotfiles, even + when ran inside that dotdir. + * add: When adding a dotfile as a non-large file, mention that it's a + dotfile. + * p2phttp: Added --directory option which serves multiple git-annex + repositories located inside a directory. + * When remote.name.annexUrl is an annex+http(s) url, that + uses the same hostname as remote.name.url, which is itself a http(s) + url, they are assumed to share a username and password. This avoids + unnecessary duplicate password prompts. + * git-remote-annex: Fix a reversion introduced in version 10.20241031 + that broke cloning from a special remote. + * git-remote-annex: Fix cloning from a special remote on a crippled + filesystem. + * git-remote-annex: Fix buggy behavior when annex.stalldetection is + configured. + * git-remote-annex: Require git version 2.31 or newer, since old + ones had a buggy git bundle command. + * S3: Support versioning=yes with a readonly bucket. + (Needs aws-0.24.3) + * S3: Send git-annex or other configured User-Agent. + (Needs aws-0.24.3) + * S3: Fix infinite loop and memory blowup when importing from an + unversioned S3 bucket that is large enough to need pagination. + * S3: Use significantly less memory when importing from a + versioned S3 bucket. + * vpop: Only update state after successful checkout."""]] \ No newline at end of file
forgot to add this comment earlier
diff --git a/doc/forum/How_to_get_a_list_of_all_NOT_unused_files/comment_2_a48aa264a5514b70f8362208f3136dc0._comment b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files/comment_2_a48aa264a5514b70f8362208f3136dc0._comment new file mode 100644 index 0000000000..68fc7c30db --- /dev/null +++ b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files/comment_2_a48aa264a5514b70f8362208f3136dc0._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2024-11-25T16:19:42Z" + content=""" +The reason `git-annex copy --not --unused` behaves that way is that +--unused is not a file matching option. --not, meanwhile, inverts the next +file matching option. So here it does nothing. So that command is the same +as `git-annex copy --unused`! + +Obviously, that's a bit of an excuse, but it's what's going on. I do think +that having `--not --unused` work would be a useful thing. Opened a todo +[[todo/support_--not_--unused]]. +"""]]
diff --git a/doc/forum/less_paranoid_mode.mdwn b/doc/forum/less_paranoid_mode.mdwn new file mode 100644 index 0000000000..e4fc3a93b1 --- /dev/null +++ b/doc/forum/less_paranoid_mode.mdwn @@ -0,0 +1,28 @@ +Is there a way/configuration setting to avoid directory permission problem in the following scenario: + +```bash +git init repo +cd repo/ +git-annex init +echo test > testfile +git-annex add testfile +cd .. +rm -rf repo/ +``` + +This results in the following error: + +``` +rm: cannot remove 'repo/.git/annex/objects/w8/pv/SHA256E-s5--f2ca1bb6c7e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2/SHA256E-s5--f2ca1bb6c7e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2': Permission denied +``` + +The same problem is when the repository is moved to a different filesystem (mount point). + +This fixes the problem: + +``` +find repo -type d -exec chmod 755 {} + +rm -rf repo/ +``` + +But my question is: can git-annex do this automatically?
Added a comment: thanks and the followup
diff --git a/doc/forum/Stuck_in_git_annex_view__58___file_name_too_long/comment_3_08642e7f8c863d3b518517be2a5b4ae1._comment b/doc/forum/Stuck_in_git_annex_view__58___file_name_too_long/comment_3_08642e7f8c863d3b518517be2a5b4ae1._comment new file mode 100644 index 0000000000..c6b0e2551d --- /dev/null +++ b/doc/forum/Stuck_in_git_annex_view__58___file_name_too_long/comment_3_08642e7f8c863d3b518517be2a5b4ae1._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="thanks and the followup" + date="2024-11-27T09:08:32Z" + content=""" +Thanks for posting your reasoning and for the fix, Joey! + +> As to whether git-annex should try to detect this and avoid entering such a view, I dunno.. + +It's not that critical when you know how to fix it. Though, it was definitely putting a good amount of stress on me. I was starting to think that I have to re-create this particular \"meta\" repository, and at that moment I didn't have enough time/energy to do it. And even if re-creating it wouldn't be as difficult as I was imagining it back than, I decided to fix it, because if it would have happen in a non-meta repository, and I would having some non-synced changes, it could be very unpleasant. And I could imaging that it might happen to anyone. + +Also, as a side note, I was thinking about creating a FUSE FS which could specifically handle such cases for metadata, e.g. by introducing a \"virtual folder\" in the root, e.g. `long-paths` (don't like the naming, probably it needs more thinking). Such FUSE FS could potentially work along with the git-annex views, preserve the existing folder structure, but only showing files with specific tags/metadata. + +"""]]
diff --git a/doc/bugs/hFlush__58___resource_vanished___40__Broken_pipe__41__.mdwn b/doc/bugs/hFlush__58___resource_vanished___40__Broken_pipe__41__.mdwn new file mode 100644 index 0000000000..71a559f2eb --- /dev/null +++ b/doc/bugs/hFlush__58___resource_vanished___40__Broken_pipe__41__.mdwn @@ -0,0 +1,75 @@ +### Please describe the problem. + +`git annex add` fails if it encounters a file whose name ends with a carriage return. + +This can happen with any directory that has a custom icon in MacOS - the directory includes an empty file named `Icon\r`. + +First git-annex shows a fatal error, "No such file or directory". Then for any subsequent files being added, it fails with "git-annex: fd:18: hFlush: resource vanished (Broken pipe)". + +The problem only occurs if `config.largefiles` is set. + +### What steps will reproduce the problem? + +[[!format sh """ +# Step 1. +# Create a directory with an empty icon file. +# To type the carriage return (^M), press Ctrl-V and then Return. +mkdir example/ +touch example/Icon^M + +# Step 2. +# Create some test files in the directory. +# This should be a mixture of small files and large files. +echo hello > example/test1 +head -c5000 < /dev/urandom > example/test2 + +# Step 3. +# Set `annex.largefiles` to anything less than the size of the large file above. +git annex config --set annex.largefiles largerthan=4kb + +# Step 4. +# Add the directory to the git annex. +git annex add example/ +"""]] + +### What version of git-annex are you using? On what operating system? + +- git-annex version 10.20241031 +- git version 2.47.1 +- bash version 5.2.37 +- MacOS version 15.1.1 (Sequoia) + +### Please provide any additional information below. + +This happens when I run `git annex add example/`: + +``` +add "example/Icon\r" (non-large file; adding content to git repository) fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory +fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory +fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory +fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory +fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory +fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory +fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory +fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory +fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory +fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory +fatal: could not open '/path/to/repo/example/Icon' for reading: No such file or directory + +git-annex: fd:19: Data.ByteString.hGetLine: end of file +failed +add example/test1 (non-large file; adding content to git repository) +git-annex: fd:18: hFlush: resource vanished (Broken pipe) +failed +add example/test2 + +git-annex: fd:18: hFlush: resource vanished (Broken pipe) +failed +git-annex: fd:18: hClose: resource vanished (Broken pipe) +``` + +It succeeds at moving the large `test2` file into `.git/annex/objects` and symlinking it, but `git status` doesn't show any changes. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Yes! :) It's helped me manage an unruly mess of files, backups, and backups of backups.
comment
diff --git a/doc/todo/support_--not_--unused.mdwn b/doc/todo/support_--not_--unused.mdwn new file mode 100644 index 0000000000..ed951bdfa7 --- /dev/null +++ b/doc/todo/support_--not_--unused.mdwn @@ -0,0 +1,13 @@ +`git-annex find --not --unused` is currently the same as `git-annex find +--unused` because `--unused` is not a file matching option. This is a bit +confusing. + +And it would be useful to have a way to find all keys +that are not amoung the unused keys. + +Given the implementation of `--not` is tied to file matching options, +it might be best to add a new option like `--used` or `--not-unused`. + +It would also perhaps be good to detect when matching options are used that +don't make sense, and error out on commands like `git-annex find --not` +or `git-annex find -and -(`
git-remote-annex: Fix buggy behavior when annex.stalldetection is configured
Make programPath never return "git-remote-annex" or other known multi-call
program names, which are not git-annex and won't behave like it.
If the git-annex binary gets installed under some entirely other name,
it will still return it.
This change exposed that readProgramFile actually could crash,
which happened before only if getExecutablePath was not absolute
and there was no ~/.config/git-annex/program. So fixed that to catch
exception.
Make programPath never return "git-remote-annex" or other known multi-call
program names, which are not git-annex and won't behave like it.
If the git-annex binary gets installed under some entirely other name,
it will still return it.
This change exposed that readProgramFile actually could crash,
which happened before only if getExecutablePath was not absolute
and there was no ~/.config/git-annex/program. So fixed that to catch
exception.
diff --git a/Annex/Path.hs b/Annex/Path.hs index c131ddba0f..d3cca7c503 100644 --- a/Annex/Path.hs +++ b/Annex/Path.hs @@ -1,6 +1,6 @@ {- git-annex program path - - - Copyright 2013-2022 Joey Hess <id@joeyh.name> + - Copyright 2013-2024 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -18,9 +18,11 @@ import Annex.Common import Config.Files import Utility.Env import Annex.PidLock +import CmdLine.Multicall import qualified Annex import System.Environment (getExecutablePath, getArgs, getProgName) +import qualified Data.Map as M {- A fully qualified path to the currently running git-annex program. - @@ -33,23 +35,35 @@ import System.Environment (getExecutablePath, getArgs, getProgName) - getExecutablePath. It sets GIT_ANNEX_DIR to the location of the - standalone build directory, and there are wrapper scripts for git-annex - and git-annex-shell in that directory. + - + - When the currently running program is not git-annex, but is instead eg + - git-annex-shell or git-remote-annex, this finds a git-annex program + - instead. -} programPath :: IO FilePath programPath = go =<< getEnv "GIT_ANNEX_DIR" where go (Just dir) = do - name <- getProgName + name <- reqgitannex <$> getProgName return (dir </> name) go Nothing = do - exe <- getExecutablePath + name <- getProgName + exe <- if isgitannex name + then getExecutablePath + else pure "git-annex" p <- if isAbsolute exe then return exe else fromMaybe exe <$> readProgramFile maybe cannotFindProgram return =<< searchPath p + reqgitannex name + | isgitannex name = name + | otherwise = "git-annex" + isgitannex = flip M.notMember otherMulticallCommands + {- Returns the path for git-annex that is recorded in the programFile. -} readProgramFile :: IO (Maybe FilePath) -readProgramFile = do +readProgramFile = catchDefaultIO Nothing $ do programfile <- programFile headMaybe . lines <$> readFile programfile diff --git a/CHANGELOG b/CHANGELOG index bdc8bb07b4..e1957cb91b 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -23,6 +23,8 @@ git-annex (10.20241032) UNRELEASED; urgency=medium unnecessary duplicate password prompts. * git-remote-annex: Require git version 2.31 or newer, since old ones had a buggy git bundle command. + * git-remote-annex: Fix buggy behavior when annex.stalldetection is + configured. * p2phttp: Added --directory option which serves multiple git-annex repositories located inside a directory. diff --git a/CmdLine/Multicall.hs b/CmdLine/Multicall.hs new file mode 100644 index 0000000000..fa25836d37 --- /dev/null +++ b/CmdLine/Multicall.hs @@ -0,0 +1,28 @@ +{- git-annex multicall binary + - + - Copyright 2024 Joey Hess <id@joeyh.name> + - + - Licensed under the GNU AGPL version 3 or higher. + -} + +module CmdLine.Multicall where + +import qualified Data.Map as M + +-- Commands besides git-annex that can be run by the multicall binary. +-- +-- The reason git-annex itself is not included here is because the program +-- can be renamed to any other name than these and will behave the same as +-- git-annex. +data OtherMultiCallCommand + = GitAnnexShell + | GitRemoteAnnex + | GitRemoteTorAnnex + +otherMulticallCommands :: M.Map String OtherMultiCallCommand +otherMulticallCommands = M.fromList + [ ("git-annex-shell", GitAnnexShell) + , ("git-remote-annex", GitRemoteAnnex) + , ("git-remote-tor-annex", GitRemoteTorAnnex) + ] + diff --git a/doc/bugs/tests_started_to_fail_recently.mdwn b/doc/bugs/tests_started_to_fail_recently.mdwn index b4c663f906..ef6582f46d 100644 --- a/doc/bugs/tests_started_to_fail_recently.mdwn +++ b/doc/bugs/tests_started_to_fail_recently.mdwn @@ -75,4 +75,6 @@ although there in first failing was a bit different on OSX Use -p '/git-remote-annex/' to rerun this test only. ``` -[[!meta title="git-remote-annex clone from special remote fails on OSX"]] +[[!meta title="git-remote-annex clone from special remote fails"]] + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/tests_started_to_fail_recently/comment_8_d5f926736bbad33eeffb0b592536d880._comment b/doc/bugs/tests_started_to_fail_recently/comment_8_d5f926736bbad33eeffb0b592536d880._comment index dcc782c20f..086d72a0b0 100644 --- a/doc/bugs/tests_started_to_fail_recently/comment_8_d5f926736bbad33eeffb0b592536d880._comment +++ b/doc/bugs/tests_started_to_fail_recently/comment_8_d5f926736bbad33eeffb0b592536d880._comment @@ -5,4 +5,6 @@ content=""" And the specific reason these test cases are failing is because they have annex.stalldetection set, which needs to run the transferrer. + +Fixed this. """]] diff --git a/git-annex.cabal b/git-annex.cabal index 83d6b02489..9da241ae77 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -633,9 +633,10 @@ Executable git-annex CmdLine.GitAnnexShell.Checks CmdLine.GitAnnexShell.Fields CmdLine.AnnexSetter - CmdLine.Option + CmdLine.Multicall CmdLine.GitRemoteAnnex CmdLine.GitRemoteTorAnnex + CmdLine.Option CmdLine.Seek CmdLine.Usage Command diff --git a/git-annex.hs b/git-annex.hs index 88117b4508..8f98d77265 100644 --- a/git-annex.hs +++ b/git-annex.hs @@ -10,7 +10,9 @@ import System.Environment (getArgs, getProgName) import System.FilePath import Network.Socket (withSocketsDo) +import qualified Data.Map as M +import CmdLine.Multicall import qualified CmdLine.GitAnnex import qualified CmdLine.GitAnnexShell import qualified CmdLine.GitRemoteAnnex @@ -34,11 +36,15 @@ main = sanitizeTopLevelExceptionMessages $ withSocketsDo $ do #endif run ps =<< getProgName where - run ps n = case takeFileName n of - "git-annex-shell" -> CmdLine.GitAnnexShell.run ps - "git-remote-annex" -> CmdLine.GitRemoteAnnex.run ps - "git-remote-tor-annex" -> CmdLine.GitRemoteTorAnnex.run ps - _ -> CmdLine.GitAnnex.run Test.optParser Test.runner Benchmark.mkGenerator ps + run ps n = case M.lookup (takeFileName n) otherMulticallCommands of + Just GitAnnexShell -> CmdLine.GitAnnexShell.run ps + Just GitRemoteAnnex -> CmdLine.GitRemoteAnnex.run ps + Just GitRemoteTorAnnex -> CmdLine.GitRemoteTorAnnex.run ps + Nothing -> CmdLine.GitAnnex.run + Test.optParser + Test.runner + Benchmark.mkGenerator + ps #ifdef mingw32_HOST_OS {- On Windows, if HOME is not set, probe it and set it.
comment
diff --git a/doc/bugs/tests_started_to_fail_recently/comment_8_d5f926736bbad33eeffb0b592536d880._comment b/doc/bugs/tests_started_to_fail_recently/comment_8_d5f926736bbad33eeffb0b592536d880._comment new file mode 100644 index 0000000000..dcc782c20f --- /dev/null +++ b/doc/bugs/tests_started_to_fail_recently/comment_8_d5f926736bbad33eeffb0b592536d880._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2024-11-25T15:31:59Z" + content=""" +And the specific reason these test cases are failing is because they have +annex.stalldetection set, which needs to run the transferrer. +"""]]
analysis
diff --git a/doc/bugs/tests_started_to_fail_recently/comment_7_fa068cc0530785bf4845a454c49692a7._comment b/doc/bugs/tests_started_to_fail_recently/comment_7_fa068cc0530785bf4845a454c49692a7._comment new file mode 100644 index 0000000000..6ec2904887 --- /dev/null +++ b/doc/bugs/tests_started_to_fail_recently/comment_7_fa068cc0530785bf4845a454c49692a7._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2024-11-25T15:14:20Z" + content=""" + git-annex: remote url not configured for transferrer + +Apparently git is running "git-remote-annex transferrer". + +This must be due to git-remote-annex be running "$0 transferrer" instead +of "git-annex transferrer"! + +In the usual case, when git-remote-annex is a symlink to git-annex, +getExecutablePath returns "git-annex". But, if git-remote-annex is a +hardlink or copy, that returns "git-remote-annex" instead. + +And in the linux standalone tarball and OSX app, it does not use +getExecutablePath, but getProgName so "git-remote-annex" also there. +"""]]
Added a comment: Overriding git folder
diff --git a/doc/tips/using_nested_git_repositories/comment_5_b1c2d9fa5167e01a7dc95c8fd91709a2._comment b/doc/tips/using_nested_git_repositories/comment_5_b1c2d9fa5167e01a7dc95c8fd91709a2._comment new file mode 100644 index 0000000000..1597dfa32c --- /dev/null +++ b/doc/tips/using_nested_git_repositories/comment_5_b1c2d9fa5167e01a7dc95c8fd91709a2._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="aaron" + avatar="http://cdn.libravatar.org/avatar/8a07e2f7af4bbf1bfcb48bbc53e00747" + subject="Overriding git folder" + date="2024-11-25T02:55:25Z" + content=""" +It seems to be that git has gotten smarter and now actively prevents you from adding a `.git` folder (I did this many years ago when before I learned about submodules); I'd like to do something like the following: +```bash +git init --separate-git-dir=.gitannex . +git --git-dir=.gitannex annex init +git clone some_repo # A repo I'm pulling from GitHub/wherever and don't want a submodule of as it's not my personal project +git --git-dir=.gitannex add some_repo +``` + +Essentially, I can override that `.git` folder name, but it still checks for other `.git` folders; is there a way to remove this check? +"""]]
Added a comment: re: How to get a list of all NOT unused files
diff --git a/doc/forum/How_to_get_a_list_of_all_NOT_unused_files/comment_1_6dfcbe06c7ea184e9a8ff3137584b07f._comment b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files/comment_1_6dfcbe06c7ea184e9a8ff3137584b07f._comment new file mode 100644 index 0000000000..14cbe7a44b --- /dev/null +++ b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files/comment_1_6dfcbe06c7ea184e9a8ff3137584b07f._comment @@ -0,0 +1,50 @@ +[[!comment format=mdwn + username="kyle" + avatar="http://cdn.libravatar.org/avatar/7d6e85cde1422ad60607c87fa87c63f3" + subject="re: How to get a list of all NOT unused files" + date="2024-11-25T02:23:25Z" + content=""" +> How to get a list of all NOT unused files + +There may be a simpler way, but one idea: + + * list all unused keys + * list all present keys + * filter out the unused keys from the present keys + +So something like this: + +``` +$ git annex findkeys | sort >present-keys +$ git annex unused --json | jq -r '.\"unused-list\" | to_entries | map(.value) | .[]' | sort >unused-keys +$ comm -2 -3 present-keys unused-keys +``` + +> Those that should be saved are tagged + +If you wanted to focus just on keys referenced from tags, you could +generate a list of those keys with + +``` +$ git rev-list --objects --no-object-names --no-walk --tags | \ + git annex lookupkey --ref --batch | grep -v '^$' +``` + +--- + +After generating a list of keys with either of those approaches, you +could copy them to your new repo with + +``` +git annex copy --to=NEW-REMOTE --batch-keys ... +``` + +For example, the full pipeline for the second approach could be + +``` +$ git rev-list --objects --no-object-names --no-walk --tags | \ + git annex lookupkey --ref --batch | grep -v '^$' | \ + git annex copy --to=NEW-REMOTE --batch-keys +``` + +"""]]
diff --git a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn index a37dbe2593..338ad68162 100644 --- a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn +++ b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn @@ -1,7 +1,7 @@ Has anyone done something where devices may not have a direct connection to other git-annex devices, but where they can push out a request for a file? Basically, something that allows them to post file requests that other devices then pickup and relay to a shared endpoint that can be populated with the requested file? I'm currently thinking of a situation where a cloud node might be 'expensive' and purges itself once some other devices contain the file; but at a later point in time a mobile device (without the file or a direct connection to devices containing the file) may connect to the cloud node and request the file. ## Possible behavior -The requestor could populate a 'git-annex-requests' file at the root of the repositor with contents similar to the following: +The requestor could populate a 'git-annex-requests' file at the root of the repository with contents similar to the following: ```text file-shasum, requester-id, (optional endpoint1), (optional endpoint2),... ```
diff --git a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn index 7b043ca033..a37dbe2593 100644 --- a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn +++ b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn @@ -1,4 +1,4 @@ -Has anyone done something where devices may not have a direct connection to other git-annex devices, but where they can push out a request for a file? Basically, something that allows them to post file requests that other devices then pickup and relay to a shared endpoint that can be populated with the requested file? I'm currently thinking of a situation where a cloud node might be 'expensive' and purges itself once some other devices contain the file; but at a later point in time a mobile device (without the file) may only be able to connect to the cloud node and request the file. +Has anyone done something where devices may not have a direct connection to other git-annex devices, but where they can push out a request for a file? Basically, something that allows them to post file requests that other devices then pickup and relay to a shared endpoint that can be populated with the requested file? I'm currently thinking of a situation where a cloud node might be 'expensive' and purges itself once some other devices contain the file; but at a later point in time a mobile device (without the file or a direct connection to devices containing the file) may connect to the cloud node and request the file. ## Possible behavior The requestor could populate a 'git-annex-requests' file at the root of the repositor with contents similar to the following:
diff --git a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn index 226fc53c07..7b043ca033 100644 --- a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn +++ b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn @@ -1,4 +1,4 @@ -Has anyone done something where devices may not have a direct connection to other git-annex devices that allows them to post file requests that other devices then pickup and relay so that a shared endpoint can be populated with the requested file? I'm currently thinking of something where a cloud node might be 'expensive' and purges itself once some other devices contain the file; but at a later point in time a mobile device may only be able to connect to the cloud node, but wants to request the file. +Has anyone done something where devices may not have a direct connection to other git-annex devices, but where they can push out a request for a file? Basically, something that allows them to post file requests that other devices then pickup and relay to a shared endpoint that can be populated with the requested file? I'm currently thinking of a situation where a cloud node might be 'expensive' and purges itself once some other devices contain the file; but at a later point in time a mobile device (without the file) may only be able to connect to the cloud node and request the file. ## Possible behavior The requestor could populate a 'git-annex-requests' file at the root of the repositor with contents similar to the following:
diff --git a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn index f136ed2286..226fc53c07 100644 --- a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn +++ b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn @@ -1,14 +1,18 @@ -Has anyone done something where devices may not have a direct connection to other git-annex devices that allows them to post file requests that other devices then pickup and relay so that a shared endpoint can be populated with the requested file? I'm currently thinking of something where a cloud node might be 'expensive' and purges itself once some other devices contain the file; but at a later point in time a mobile device may only be able to connect to the cloud node, but wants to request the file. It could populate a 'git-annex-requests' file with contents similar to the following: +Has anyone done something where devices may not have a direct connection to other git-annex devices that allows them to post file requests that other devices then pickup and relay so that a shared endpoint can be populated with the requested file? I'm currently thinking of something where a cloud node might be 'expensive' and purges itself once some other devices contain the file; but at a later point in time a mobile device may only be able to connect to the cloud node, but wants to request the file. + +## Possible behavior +The requestor could populate a 'git-annex-requests' file at the root of the repositor with contents similar to the following: ```text file-shasum, requester-id, (optional endpoint1), (optional endpoint2),... ``` + This 'git-annex-requests' file would require a minimum of the file-shasum and requester id, with the endpoints helping other devices (containing the desired file) to know where to best push the file (other than guessing/all available remotes). So, for the attached diagram, where the mobile laptop attached to the cellphone wants a file from the remote-office nas/server, a flow would look like: laptop updates request file -> syncs file to phone -> syncs file to homelab server -> allows home office computer to sync file -> syncs file to home-office nas/server; the home-office computer gets the file from the nas and pushes it to the homelab server -> the mobile phone downloads the file from the homelab server -> the mobile laptop gets the file from the phone and removes the request from the requests file (which then triggers the reverse propagation of the acknowledgement/removal of the request and allows the devices to proceed with any garbage cleanup). Additionally, a 'git-annex-routing' file could optionally be added that includes netlist details describing routing chains where certain 'static' devices may be able to easily push to each other so that other git-annex clients can make more informed decisions on where to push a file. [[!img git_annex_request.png align="right" size="" alt="Network diagram"]] -[[!img git_annex_request_routing.png align="right" size="" alt="Network diagram showing file routing"]] +[[!img git_annex_request_routing.png align="right" size="" alt="Network diagram showing file routing"]] -Alternate links for images: -Example diagram with mixed network connections: https://imgur.com/gallery/network-diagram-with-mixed-inbound-outbound-connections-3q76OzI -Example diagram with network request across mixed network connections: https://imgur.com/gallery/network-diagram-file-request-across-mixed-inbound-outbound-connections-hev94Kj +## Alternate links for images +Example diagram with mixed network connections: [[https://imgur.com/gallery/network-diagram-with-mixed-inbound-outbound-connections-3q76OzI]] +Example diagram with network request across mixed network connections: [[https://imgur.com/gallery/network-diagram-file-request-across-mixed-inbound-outbound-connections-hev94Kj]]
diff --git a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn index 322efcca52..f136ed2286 100644 --- a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn +++ b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn @@ -6,7 +6,7 @@ This 'git-annex-requests' file would require a minimum of the file-shasum and re Additionally, a 'git-annex-routing' file could optionally be added that includes netlist details describing routing chains where certain 'static' devices may be able to easily push to each other so that other git-annex clients can make more informed decisions on where to push a file. -[[!img git_annex_request.png align="right" size="" alt="Network diagram"]] +[[!img git_annex_request.png align="right" size="" alt="Network diagram"]] [[!img git_annex_request_routing.png align="right" size="" alt="Network diagram showing file routing"]] Alternate links for images:
diff --git a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn index aaed669834..322efcca52 100644 --- a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn +++ b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn @@ -10,5 +10,5 @@ Additionally, a 'git-annex-routing' file could optionally be added that includes [[!img git_annex_request_routing.png align="right" size="" alt="Network diagram showing file routing"]] Alternate links for images: -Example diagram with mixed network connections: https://imgur.com/gallery/network-diagram-with-mixed-inbound-outbound-connections-3q76OzI +Example diagram with mixed network connections: https://imgur.com/gallery/network-diagram-with-mixed-inbound-outbound-connections-3q76OzI Example diagram with network request across mixed network connections: https://imgur.com/gallery/network-diagram-file-request-across-mixed-inbound-outbound-connections-hev94Kj
diff --git a/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn new file mode 100644 index 0000000000..aaed669834 --- /dev/null +++ b/doc/forum/Requesting_of_files_across_disconnected_devices.mdwn @@ -0,0 +1,14 @@ +Has anyone done something where devices may not have a direct connection to other git-annex devices that allows them to post file requests that other devices then pickup and relay so that a shared endpoint can be populated with the requested file? I'm currently thinking of something where a cloud node might be 'expensive' and purges itself once some other devices contain the file; but at a later point in time a mobile device may only be able to connect to the cloud node, but wants to request the file. It could populate a 'git-annex-requests' file with contents similar to the following: +```text +file-shasum, requester-id, (optional endpoint1), (optional endpoint2),... +``` +This 'git-annex-requests' file would require a minimum of the file-shasum and requester id, with the endpoints helping other devices (containing the desired file) to know where to best push the file (other than guessing/all available remotes). So, for the attached diagram, where the mobile laptop attached to the cellphone wants a file from the remote-office nas/server, a flow would look like: laptop updates request file -> syncs file to phone -> syncs file to homelab server -> allows home office computer to sync file -> syncs file to home-office nas/server; the home-office computer gets the file from the nas and pushes it to the homelab server -> the mobile phone downloads the file from the homelab server -> the mobile laptop gets the file from the phone and removes the request from the requests file (which then triggers the reverse propagation of the acknowledgement/removal of the request and allows the devices to proceed with any garbage cleanup). + +Additionally, a 'git-annex-routing' file could optionally be added that includes netlist details describing routing chains where certain 'static' devices may be able to easily push to each other so that other git-annex clients can make more informed decisions on where to push a file. + +[[!img git_annex_request.png align="right" size="" alt="Network diagram"]] +[[!img git_annex_request_routing.png align="right" size="" alt="Network diagram showing file routing"]] + +Alternate links for images: +Example diagram with mixed network connections: https://imgur.com/gallery/network-diagram-with-mixed-inbound-outbound-connections-3q76OzI +Example diagram with network request across mixed network connections: https://imgur.com/gallery/network-diagram-file-request-across-mixed-inbound-outbound-connections-hev94Kj
show remote name when failing
to help debug strange git behavior on some daily builds
to help debug strange git behavior on some daily builds
diff --git a/CmdLine/GitRemoteAnnex.hs b/CmdLine/GitRemoteAnnex.hs index 0b5505c52c..ef1801dd94 100644 --- a/CmdLine/GitRemoteAnnex.hs +++ b/CmdLine/GitRemoteAnnex.hs @@ -85,7 +85,7 @@ run (remotename:url:[]) = do case parseSpecialRemoteNameUrl remotename u of Right src -> checkallowed src >>= run' u Left e -> giveup e -run (_remotename:[]) = giveup "remote url not configured" +run (remotename:[]) = giveup $ "remote url not configured for " ++ remotename run _ = giveup "expected remote name and url parameters" run' :: String -> SpecialRemoteConfig -> Annex () diff --git a/doc/bugs/tests_started_to_fail_recently/comment_6_a49f83b0444f119fa04386c9fb3c9566._comment b/doc/bugs/tests_started_to_fail_recently/comment_6_a49f83b0444f119fa04386c9fb3c9566._comment new file mode 100644 index 0000000000..30d5b99428 --- /dev/null +++ b/doc/bugs/tests_started_to_fail_recently/comment_6_a49f83b0444f119fa04386c9fb3c9566._comment @@ -0,0 +1,33 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2024-11-21T19:36:58Z" + content=""" +And here's why it's failing still on OSX and that 1 ubuntu "custom-config1" run: + + git clone from special remote (after git push with output: Full remote url: annex::89ddefa4-a04c-11ef-87b5-e880882a4f98?encryption=none&type=directory + git-annex: remote url not configured + Push failed (Failed to upload GITBUNDLE-s3832--89ddefa4-a04c-11ef-87b5-e880882a4f98-b73a51289d659d73054ee531b45825da3061213d32d47aeb998f4abeb591a88d) + warning: helper reported unexpected status of push + warning: helper reported unexpected status of push + Everything up-to-date + +Fascinating. It seems that git-remote-annex has been run twice. The first time +seemed to do something successfully, since it reported the "Full remote url". +Probably that first run is git using it to see what refs are on the remote. + +The second time, git ran git-remote-annex with only 1 argument, rather than the +expected 2. Why would git do that? And only in these few situations? + +According to gitremote-helpers: + + Additionally, when a configured remote has remote.<name>.vcs set to <transport>, Git explicitly invokes git remote-<transport> + with <name> as the first argument. If set, the second argument is remote.<name>.url; otherwise, the second argument is omitted. + +But that does not apply. The docs don't seem to give any other reason why +the second argument would be omitted. Although the docs do say it's optional. + +I've improved git-remote-annex output in this situation, so it will show +wha the first parameter is. That might help understand out what git is trying to +do here. +"""]]
comment
diff --git a/doc/todo/p2phttp_serve_multiple_repositories/comment_2_4dbd2fc7fbd0234ac84dced8dd3c5e1b._comment b/doc/todo/p2phttp_serve_multiple_repositories/comment_2_4dbd2fc7fbd0234ac84dced8dd3c5e1b._comment new file mode 100644 index 0000000000..d2f13ce1b5 --- /dev/null +++ b/doc/todo/p2phttp_serve_multiple_repositories/comment_2_4dbd2fc7fbd0234ac84dced8dd3c5e1b._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2024-11-21T19:16:20Z" + content=""" +This is now implemented, use p2phttp --directory to serve all repositories +in that directory. + +There is one known problem, which I'm leaving this todo open for. When a +repository is moved out of the directory, the server should stop serving +it. And it does, eg both git-annex get and git-annex drop will fail, since +the server is not able to access the directory any longer. But, a second +git-annex drop actually hangs at the "locking origin" stage. +(When run with -J2.. with more jobs it takes more than 2) + +It seems that the server is leaking annex workers in this case. Probably +it fails to recover from a crash. + +Luckily, it only affects serving the uuid of the removed repository, other +repositories in the directory will continue to be served without problems +when this happens. +"""]]
p2phttp --directory implementation
Untested, but it compiles, so.
Known problems:
* --jobs is not available to startIO
* Does not notice when new repositories are added to a directory.
* Does not notice when repositories are removed from a directory.
Untested, but it compiles, so.
Known problems:
* --jobs is not available to startIO
* Does not notice when new repositories are added to a directory.
* Does not notice when repositories are removed from a directory.
diff --git a/Command/P2PHttp.hs b/Command/P2PHttp.hs index 0c55f09a3b..5246b09302 100644 --- a/Command/P2PHttp.hs +++ b/Command/P2PHttp.hs @@ -16,6 +16,10 @@ import P2P.Http.Server import P2P.Http.Url import qualified P2P.Protocol as P2P import Utility.Env +import Annex.UUID +import qualified Git +import qualified Git.Construct +import qualified Annex import Servant import qualified Network.Wai.Handler.Warp as Warp @@ -25,10 +29,12 @@ import qualified Data.Map as M import Data.String cmd :: Command -cmd = noMessages $ withAnnexOptions [jobsOption] $ - command "p2phttp" SectionPlumbing - "communicate in P2P protocol over http" - paramNothing (seek <$$> optParser) +cmd = withAnnexOptions [jobsOption] $ + noMessages $ dontCheck repoExists $ + noRepo (startIO <$$> optParser) $ + command "p2phttp" SectionPlumbing + "communicate in P2P protocol over http" + paramNothing (startAnnex <$$> optParser) data Options = Options { portOption :: Maybe PortNumber @@ -44,6 +50,7 @@ data Options = Options , wideOpenOption :: Bool , proxyConnectionsOption :: Maybe Integer , clusterJobsOption :: Maybe Int + , directoryOption :: [FilePath] } optParser :: CmdParamsDesc -> Parser Options @@ -100,22 +107,41 @@ optParser _ = Options ( long "clusterjobs" <> metavar paramNumber <> help "number of concurrent node accesses per connection" )) + <*> many (strOption + ( long "directory" <> metavar paramPath + <> help "serve repositories in subdirectories of a directory" + )) -seek :: Options -> CommandSeek -seek o = getAnnexWorkerPool $ \workerpool -> - withP2PConnections workerpool - (fromMaybe 1 $ proxyConnectionsOption o) - (fromMaybe 1 $ clusterJobsOption o) - (go workerpool) - where - go workerpool servinguuids acquireconn = liftIO $ do +startAnnex :: Options -> Annex () +startAnnex o + | null (directoryOption o) = ifM ((/=) NoUUID <$> getUUID) + ( do + authenv <- liftIO getAuthEnv + st <- mkServerState o authenv + liftIO $ runServer o st + -- Run in a git repository that is not a git-annex repository. + , liftIO $ startIO o + ) + | otherwise = liftIO $ startIO o + +-- TODO --jobs option only available to startAnnex, not here, need +-- to parse it into Options for this command. +startIO :: Options -> IO () +startIO o + | null (directoryOption o) = + giveup "Use the --directory option to specify which git-annex repositories to serve." + | otherwise = do authenv <- getAuthEnv - st <- mkPerRepoServerState acquireconn workerpool $ - mkGetServerMode authenv o - let mst = P2PHttpServerState - { servedRepos = M.fromList $ - zip servinguuids (repeat st) - } + repos <- findRepos o + sts <- forM repos $ \r -> do + strd <- Annex.new r + Annex.eval strd $ mkServerState o authenv + runServer o (mconcat sts) + +runServer :: Options -> P2PHttpServerState -> IO () +runServer o mst = go `finally` serverShutdownCleanup mst + where + go = do let settings = Warp.setPort port $ Warp.setHost host $ Warp.defaultSettings case (certFileOption o, privateKeyFileOption o) of @@ -125,7 +151,6 @@ seek o = getAnnexWorkerPool $ \workerpool -> certfile (chainFileOption o) privatekeyfile Warp.runTLS tlssettings settings (p2pHttpApp mst) _ -> giveup "You must use both --certfile and --privatekeyfile options to enable HTTPS." - port = maybe (fromIntegral defaultP2PHttpProtocolPort) fromIntegral @@ -135,6 +160,14 @@ seek o = getAnnexWorkerPool $ \workerpool -> fromString (bindOption o) +mkServerState :: Options -> M.Map Auth P2P.ServerMode -> Annex P2PHttpServerState +mkServerState o authenv = + getAnnexWorkerPool $ + mkP2PHttpServerState + (mkGetServerMode authenv o) + (fromMaybe 1 $ proxyConnectionsOption o) + (fromMaybe 1 $ clusterJobsOption o) + mkGetServerMode :: M.Map Auth P2P.ServerMode -> Options -> GetServerMode mkGetServerMode _ o _ Nothing | wideOpenOption o = ServerMode @@ -201,3 +234,11 @@ getAuthEnv = do case M.lookup user permmap of Nothing -> (auth, P2P.ServeReadWrite) Just perms -> (auth, perms) + +findRepos :: Options -> IO [Git.Repo] +findRepos o = do + files <- map toRawFilePath . concat + <$> mapM dirContents (directoryOption o) + map Git.Construct.newFrom . catMaybes + <$> mapM Git.Construct.checkForRepo files + diff --git a/P2P/Http/State.hs b/P2P/Http/State.hs index e3fabdd990..3a15b3e902 100644 --- a/P2P/Http/State.hs +++ b/P2P/Http/State.hs @@ -42,11 +42,28 @@ import qualified Data.Map.Strict as M import qualified Data.Set as S import Control.Concurrent.Async import Data.Time.Clock.POSIX +import qualified Data.Semigroup as Sem +import Prelude data P2PHttpServerState = P2PHttpServerState { servedRepos :: M.Map UUID PerRepoServerState + , serverShutdownCleanup :: IO () } +instance Monoid P2PHttpServerState where + mempty = P2PHttpServerState + { servedRepos = mempty + , serverShutdownCleanup = noop + } + +instance Sem.Semigroup P2PHttpServerState where + a <> b = P2PHttpServerState + { servedRepos = servedRepos a <> servedRepos b + , serverShutdownCleanup = do + serverShutdownCleanup a + serverShutdownCleanup b + } + data PerRepoServerState = PerRepoServerState { acquireP2PConnection :: AcquireP2PConnection , annexWorkerPool :: AnnexWorkerPool @@ -213,13 +230,13 @@ type AcquireP2PConnection = ConnectionParams -> IO (Either ConnectionProblem P2PConnectionPair) -withP2PConnections - :: AnnexWorkerPool +mkP2PHttpServerState + :: GetServerMode -> ProxyConnectionPoolSize -> ClusterConcurrency - -> ([UUID] -> AcquireP2PConnection -> Annex a) - -> Annex a -withP2PConnections workerpool proxyconnectionpoolsize clusterconcurrency a = do + -> AnnexWorkerPool + -> Annex P2PHttpServerState +mkP2PHttpServerState getservermode proxyconnectionpoolsize clusterconcurrency workerpool = do enableInteractiveBranchAccess myuuid <- getUUID myproxies <- M.lookup myuuid <$> getProxies @@ -233,7 +250,11 @@ withP2PConnections workerpool proxyconnectionpoolsize clusterconcurrency a = do liftIO $ atomically $ putTMVar endv () liftIO $ wait asyncservicer let servinguuids = myuuid : map proxyRemoteUUID (maybe [] S.toList myproxies) - a servinguuids (acquireconn reqv) `finally` endit + st <- liftIO $ mkPerRepoServerState (acquireconn reqv) workerpool getservermode + return $ P2PHttpServerState + { servedRepos = M.fromList $ zip servinguuids (repeat st) + , serverShutdownCleanup = endit + } where acquireconn reqv connparams = do respvar <- newEmptyTMVarIO diff --git a/doc/git-annex-p2phttp.mdwn b/doc/git-annex-p2phttp.mdwn index 821503e533..4dd7869c92 100644 --- a/doc/git-annex-p2phttp.mdwn +++ b/doc/git-annex-p2phttp.mdwn (Diff truncated)
in test suite display error from git push that fails to exit nonzero
diff --git a/Test.hs b/Test.hs index 9796608365..7c08d3e948 100644 --- a/Test.hs +++ b/Test.hs @@ -446,10 +446,12 @@ test_git_remote_annex exporttree git_annex "get" [] "get failed" () <- populate git "config" ["remote.foo.url", "annex::"] "git config" - git "push" ["foo", "master"] "git push" - git "push" ["foo", "git-annex"] "git push" + -- git push does not always propagate nonzero exit + -- status from git-remote-annex, so remember the + -- transcript and display it if clone fails + pushtranscript <- testProcess' "git" ["push", "foo", "master", "git-annex"] Nothing (== True) (const True) "git push" git "clone" ["annex::"++diruuid++"?"++intercalate "&" cfg', "clonedir"] - "git clone from special remote" + ("git clone from special remote (after git push with output: " ++ pushtranscript ++ ")") inpath "clonedir" $ git_annex "get" [annexedfile] "get from origin special remote" diruuid="89ddefa4-a04c-11ef-87b5-e880882a4f98" diff --git a/Test/Framework.hs b/Test/Framework.hs index c249e93529..ab6645308f 100644 --- a/Test/Framework.hs +++ b/Test/Framework.hs @@ -73,7 +73,11 @@ import qualified Command.Uninit -- In debug mode, the output is allowed to pass through. -- So the output does not get checked in debug mode. testProcess :: String -> [String] -> Maybe [(String, String)] -> (Bool -> Bool) -> (String -> Bool) -> String -> Assertion -testProcess command params environ expectedret expectedtranscript faildesc = do +testProcess command params environ expectedret expectedtranscript faildesc = + void $ testProcess' command params environ expectedret expectedtranscript faildesc + +testProcess' :: String -> [String] -> Maybe [(String, String)] -> (Bool -> Bool) -> (String -> Bool) -> String -> IO String +testProcess' command params environ expectedret expectedtranscript faildesc = do let p = (proc command params) { env = environ } debug <- testDebug . testOptions <$> getTestMode if debug @@ -81,10 +85,12 @@ testProcess command params environ expectedret expectedtranscript faildesc = do ret <- withCreateProcess p $ \_ _ _ pid -> waitForProcess pid (expectedret (ret == ExitSuccess)) @? (faildesc ++ " failed with unexpected exit code") + return "" else do (transcript, ret) <- Utility.Process.Transcript.processTranscript' p Nothing (expectedret ret) @? (faildesc ++ " failed with unexpected exit code (transcript follows)\n" ++ transcript) (expectedtranscript transcript) @? (faildesc ++ " failed with unexpected output (transcript follows)\n" ++ transcript) + return transcript -- Run git. (Do not use to run git-annex as the one being tested -- may not be in path.) diff --git a/doc/bugs/tests_started_to_fail_recently/comment_5_edf45de127e639174893775d41e5a6c5._comment b/doc/bugs/tests_started_to_fail_recently/comment_5_edf45de127e639174893775d41e5a6c5._comment new file mode 100644 index 0000000000..b7eb3e6fda --- /dev/null +++ b/doc/bugs/tests_started_to_fail_recently/comment_5_edf45de127e639174893775d41e5a6c5._comment @@ -0,0 +1,32 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2024-11-20T17:42:57Z" + content=""" +My arm64-ancient build failed today in the same way as the OSX build is failing, +so I should be able to debug it there. + + builder@sparrow:~/x/a$ git push d git-annex + Full remote url: annex::f88d4965-fc4f-4dd0-aac2-eaf19c9bcfa5?encryption=none&type=directory + fatal: Refusing to create empty bundle. + Push failed (user error (git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","bundle","create","--quiet","/tmp/GITBUNDLE1049637-0","--stdin"] exited 128)) + warning: helper reported unexpected status of push + Everything up-to-date + builder@sparrow:~/x/a$ echo $? + 0 + +Huh ok, so git-remote-annex is failing to push, which is why clone +later fails. And for whatever reason git doesn't propigate the error, which +is why this is not visible in the transcript. + +That build uses git 2.30.2. That git bundle --stdin was broken and +didn't read refs from stdin at all. Also it had other bugs. I think it's +best not to try to support git-remote-annex with that version of git at +all, given those bugs. + +That probably won't help with the OSX failure, which is with a very new git +version. So I also made the test +suite capture the git push output even when it exits successfully, so it +can display it when the git pull fails. That should show what the problem +is there. +"""]]
comment
diff --git a/doc/todo/p2phttp_serve_multiple_repositories/comment_1_28942d454244ea6df6aabed03b43d8a3._comment b/doc/todo/p2phttp_serve_multiple_repositories/comment_1_28942d454244ea6df6aabed03b43d8a3._comment new file mode 100644 index 0000000000..d67d34a1dc --- /dev/null +++ b/doc/todo/p2phttp_serve_multiple_repositories/comment_1_28942d454244ea6df6aabed03b43d8a3._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2024-11-20T17:41:12Z" + content=""" +I have some early work toward implementing this in the `p2phttp-multi` +branch. +"""]]
document p2phttp --directory
The option is not implemented yet.
The option is not implemented yet.
diff --git a/CHANGELOG b/CHANGELOG index 249ed77549..76d77db4a3 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -21,6 +21,8 @@ git-annex (10.20241032) UNRELEASED; urgency=medium uses the same hostname as remote.name.url, which is itself a http(s) url, they are assumed to share a username and password. This avoids unnecessary duplicate password prompts. + * p2phttp: Added --directory option which serves all git-annex + repositories located inside a directory. -- Joey Hess <id@joeyh.name> Mon, 11 Nov 2024 12:26:00 -0400 diff --git a/doc/git-annex-p2phttp.mdwn b/doc/git-annex-p2phttp.mdwn index 3d10f62198..821503e533 100644 --- a/doc/git-annex-p2phttp.mdwn +++ b/doc/git-annex-p2phttp.mdwn @@ -12,8 +12,12 @@ This is a HTTP server for the git-annex API. It is the git-annex equivilant of git-http-backend(1), for serving a repository over HTTP with write access for authenticated users. -This does not serve the git repository over HTTP, only the git-annex -API. +This does not serve a git repository over HTTP, only the git-annex +API. + +By default, this serves the git-annex API for the git-annex repository +in the current working directory. It can also serve more than one +repository, see the `--directory` parameter. Typically a remote will have `remote.name.url` set to a http url as usual, and `remote.name.annexUrl` set to an annex+http url such as @@ -35,10 +39,25 @@ convenient way to download the content of any key, by using the path # OPTIONS +* `--directory=path` + + Serve each git-annex repository found in a directory. This does not + recurse into subdirectories. + + This option can be provided more than once to serve serveral directories + full of git-annex repositories. + + New git-annex repositories can be added to the directory, and will be + noticed and served immediately. There is no need to restart the server. + + When a git-annex repository is removed from the directory, the server + will stop serving it as well. This may not be immediate, as some files + in the deleted repository may still be open. + * `--jobs=N` `-JN` This or annex.jobs must be set to configure the number of worker - threads that serve connections to the webserver. + threads, per repository served, that serve connections to the webserver. Since the webserver itself also uses one of these threads, this needs to be set to 2 or more. @@ -47,15 +66,15 @@ convenient way to download the content of any key, by using the path * `--proxyconnections=N` - When this command is run in a repository that is configured to act as a - proxy for some of its remotes, this is the maximum number of idle - connections to keep open to proxied remotes. + When serving a repository that is configured to act as a proxy for some + of its remotes, this is the maximum number of idle connections to keep + open to proxied remotes. The default is 1. * `--clusterjobs=N` - When this command is run in a repository that is a gateway for a cluster, + When serving a repository that is a gateway for a cluster, this is the number of concurrent jobs to use to access nodes of the cluster, per connection to the webserver.
Added a comment
diff --git a/doc/bugs/tests_started_to_fail_recently/comment_4_989bcca4ecd2a00c509585034d707547._comment b/doc/bugs/tests_started_to_fail_recently/comment_4_989bcca4ecd2a00c509585034d707547._comment new file mode 100644 index 0000000000..459dbf6bb8 --- /dev/null +++ b/doc/bugs/tests_started_to_fail_recently/comment_4_989bcca4ecd2a00c509585034d707547._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 4" + date="2024-11-20T00:22:44Z" + content=""" +- re mac: try `joey@datalads-imac2` from `smaug` +- a few times we used https://github.com/mxschmitt/action-tmate to interactively debug on github CI... want us to bolt it on? +"""]]
reuse http url password for p2phttp url when on same host
When remote.name.annexUrl is an annex+http(s) url, that uses the same
hostname as remote.name.url, which is itself a http(s) url, they are
assumed to share a username and password.
This avoids unnecessary duplicate password prompts.
When remote.name.annexUrl is an annex+http(s) url, that uses the same
hostname as remote.name.url, which is itself a http(s) url, they are
assumed to share a username and password.
This avoids unnecessary duplicate password prompts.
diff --git a/CHANGELOG b/CHANGELOG index 7e523186c6..249ed77549 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -17,6 +17,10 @@ git-annex (10.20241032) UNRELEASED; urgency=medium versioned S3 bucket. * git-remote-annex: Fix cloning from a special remote on a crippled filesystem. + * When remote.name.annexUrl is an annex+http(s) url, that + uses the same hostname as remote.name.url, which is itself a http(s) + url, they are assumed to share a username and password. This avoids + unnecessary duplicate password prompts. -- Joey Hess <id@joeyh.name> Mon, 11 Nov 2024 12:26:00 -0400 diff --git a/P2P/Http/Client.hs b/P2P/Http/Client.hs index d047eca7a0..c708908a19 100644 --- a/P2P/Http/Client.hs +++ b/P2P/Http/Client.hs @@ -37,6 +37,7 @@ import Annex.Concurrent import Utility.Url (BasicAuth(..)) import Utility.HumanTime import Utility.STM +import qualified Git import qualified Git.Credential as Git import Servant hiding (BasicAuthData(..)) @@ -83,8 +84,19 @@ p2pHttpClientVersions -> (String -> Annex a) -> ClientAction a -> Annex (Maybe a) +p2pHttpClientVersions allowedversion rmt fallback clientaction = do + rmtrepo <- getRepo rmt + p2pHttpClientVersions' allowedversion rmt rmtrepo fallback clientaction + +p2pHttpClientVersions' + :: (ProtocolVersion -> Bool) + -> Remote + -> Git.Repo + -> (String -> Annex a) + -> ClientAction a + -> Annex (Maybe a) #ifdef WITH_SERVANT -p2pHttpClientVersions allowedversion rmt fallback clientaction = +p2pHttpClientVersions' allowedversion rmt rmtrepo fallback clientaction = case p2pHttpBaseUrl <$> remoteAnnexP2PHttpUrl (gitconfig rmt) of Nothing -> error "internal" Just baseurl -> do @@ -139,9 +151,13 @@ p2pHttpClientVersions allowedversion rmt fallback clientaction = ++ " " ++ decodeBS (statusMessage (responseStatusCode resp)) - credentialbaseurl = case p2pHttpUrlString <$> remoteAnnexP2PHttpUrl (gitconfig rmt) of + credentialbaseurl = case remoteAnnexP2PHttpUrl (gitconfig rmt) of + Just p2phttpurl + | isP2PHttpSameHost p2phttpurl rmtrepo -> + Git.repoLocation rmtrepo + | otherwise -> + p2pHttpUrlString p2phttpurl Nothing -> error "internal" - Just url -> url credauth cred = do ba <- Git.credentialBasicAuth cred @@ -159,7 +175,7 @@ p2pHttpClientVersions allowedversion rmt fallback clientaction = M.insert (Git.CredentialBaseURL credentialbaseurl) cred cc Nothing -> noop #else -p2pHttpClientVersions _ _ fallback () = Just <$> fallback +p2pHttpClientVersions _ _ _ fallback () = Just <$> fallback "This remote uses an annex+http url, but this version of git-annex is not built with support for that." #endif diff --git a/P2P/Http/Url.hs b/P2P/Http/Url.hs index 9e1af2c8dc..b7ec6d22fe 100644 --- a/P2P/Http/Url.hs +++ b/P2P/Http/Url.hs @@ -15,6 +15,9 @@ import Network.URI import System.FilePath.Posix as P import Servant.Client (BaseUrl(..), Scheme(..)) import Text.Read +import Data.Char +import qualified Git +import qualified Git.Url #endif defaultP2PHttpProtocolPort :: Int @@ -79,3 +82,15 @@ unavailableP2PHttpUrl p = p #ifdef WITH_SERVANT { p2pHttpBaseUrl = (p2pHttpBaseUrl p) { baseUrlHost = "!dne!" } } #endif + +#ifdef WITH_SERVANT +-- When a p2phttp url is on the same host as a git repo, which also uses +-- http, the same username+password is assumed to be used for both. +isP2PHttpSameHost :: P2PHttpUrl -> Git.Repo -> Bool +isP2PHttpSameHost u repo + | not (Git.repoIsHttp repo) = False + | otherwise = + Just (map toLower $ baseUrlHost (p2pHttpBaseUrl u)) + == + (map toLower <$> (Git.Url.host repo)) +#endif diff --git a/doc/git-annex-p2phttp.mdwn b/doc/git-annex-p2phttp.mdwn index 802c52d929..3d10f62198 100644 --- a/doc/git-annex-p2phttp.mdwn +++ b/doc/git-annex-p2phttp.mdwn @@ -20,6 +20,12 @@ as usual, and `remote.name.annexUrl` set to an annex+http url such as "annex+http://example.com/git-annex/". The annex+http url is served by this server, and uses port 9417 by default. +Note that, when `remote.name.url` and `remote.name.annexUrl` +contain the same hostname, they are assumed by git-annex to +support the same users and passwords. So, git-annex will use +the password for the `remote.name.url` to log into the +`remote.name.annexUrl`. + As well as serving the git-annex HTTP API, this server provides a convenient way to download the content of any key, by using the path "/git-annex/$uuid/$key". For example: diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 178ad146d3..a082c97647 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1569,6 +1569,13 @@ Remotes are configured using these settings in `.git/config`. git operations. This allows using [[git-annex-p2phttp]] to serve a git-annex repository over http. + When this and the `remote.<name>.url` contain the same hostname, + and this is an annex+http(s) url, and that is also a http(s) url, + git-annex assumes that the same username and password can be used + for both urls. When password cacheing is configured, this allows + you to only be prompted once for a password when using both git and + git-annex. See gitcredentials(7) for how to set up password caching. + * `remote.<name>.annex-uuid` git-annex caches UUIDs of remote repositories here. diff --git a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host.mdwn b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host.mdwn index da46f5b7f1..cbef6aebca 100644 --- a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host.mdwn +++ b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host.mdwn @@ -13,3 +13,6 @@ I see some ways to address this: 3. Perhaps most elegantly: make p2phttp support serving multiple repositories, so that repositories could share the same annexurl and therefore share credentials [[!tag projects/INM7]] + +> I have implemented reuse of the remote.name.url password for +> remote.name.annexurl when they are on the same host. [[done]] --[[Joey]]
update
diff --git a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_3_7ea1596e9c9c06ef609a8aa6bccefd29._comment b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_2_7ea1596e9c9c06ef609a8aa6bccefd29._comment similarity index 97% rename from doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_3_7ea1596e9c9c06ef609a8aa6bccefd29._comment rename to doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_2_7ea1596e9c9c06ef609a8aa6bccefd29._comment index be11c10dfd..69c350d2a3 100644 --- a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_3_7ea1596e9c9c06ef609a8aa6bccefd29._comment +++ b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_2_7ea1596e9c9c06ef609a8aa6bccefd29._comment @@ -1,6 +1,6 @@ [[!comment format=mdwn username="joey" - subject="""comment 3""" + subject="""comment 2""" date="2024-11-19T17:37:01Z" content=""" credential.useHttpPath is the relevant git config for this git-credential diff --git a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_2_63806afed3ab03308584415506183ced._comment b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_3_63806afed3ab03308584415506183ced._comment similarity index 54% rename from doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_2_63806afed3ab03308584415506183ced._comment rename to doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_3_63806afed3ab03308584415506183ced._comment index adee58f09b..fa24e48a0a 100644 --- a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_2_63806afed3ab03308584415506183ced._comment +++ b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_3_63806afed3ab03308584415506183ced._comment @@ -1,6 +1,6 @@ [[!comment format=mdwn username="joey" - subject="""comment 2""" + subject="""comment 3""" date="2024-11-19T17:19:38Z" content=""" Unfortunately, remote.foo.annexUrl is not limited to use for p2phttp. It @@ -21,4 +21,20 @@ prompt. So, I think it makes sense to only do this when credential.helper is configured. And when the hostname is the same in both the git url and the p2phttp url. + +Hmm, I can imagine a situation where this behavior could be considered a +security hole. Suppose A and B both have accounts on the same host. A is in +charge of serving the git repositories. B is in charge of serving git-annex +p2phttp. This would make git-annex prompt for a password to +one of user A's git repositories, and send the password to user B. So B +would be able to crack into the git repos. + +That is pretty farfetched. But it begs the question: If the git +repository and p2phttp are on the same host, why would they *ever* need 2 +distinct passwords? If git-annex simply doesn't support that A/B split, +then that security hole can't happen. + +So, git-annex could simply, when the git url and p2phttp url have the same +hostname, request the git credentials for the git url, rather than for the +p2phttp url. """]]
comments
diff --git a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_2_63806afed3ab03308584415506183ced._comment b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_2_63806afed3ab03308584415506183ced._comment new file mode 100644 index 0000000000..adee58f09b --- /dev/null +++ b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_2_63806afed3ab03308584415506183ced._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2024-11-19T17:19:38Z" + content=""" +Unfortunately, remote.foo.annexUrl is not limited to use for p2phttp. It +existed before that and could be legitimately set to a http url when +p2phttp is not being used. + +I agree it would be good to try to reuse the credentials of the git url for +p2phttp. That could be done by just querying git credential for the git url +credentials, and trying to use them for the p2phttp url. If they don't work, +use git credential to prompt for the p2phttp url credentials as it does now. + +If the user had credential.helper configured, they would probably already +have the git credentials cached, and if not, this would cache them for +later use, so no harm done asking for them. But if credential.helper was +not configured, there would be an extra and wholly unncessary password +prompt. + +So, I think it makes sense to only do this when credential.helper is +configured. And when the hostname is the same in both the git url +and the p2phttp url. +"""]] diff --git a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_3_7ea1596e9c9c06ef609a8aa6bccefd29._comment b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_3_7ea1596e9c9c06ef609a8aa6bccefd29._comment new file mode 100644 index 0000000000..be11c10dfd --- /dev/null +++ b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_3_7ea1596e9c9c06ef609a8aa6bccefd29._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2024-11-19T17:37:01Z" + content=""" +credential.useHttpPath is the relevant git config for this git-credential +behavior. + +I think it would be reasonable for git-annex to check if that is false, and +if so, remove the path from the `git credential` request for an annex+http +url. + +But I agree, it would be better, in the vast majority of cases, to have a +single url endpoint that serves multiple repositories. + +And for that matter, if someone is running git-annex p2phttp to serve 2 +different repositories right now, they are probably making the two listen +on different ports and so removing the path wouldn't help. They would have +to be interposing another web server that mapped those ports to paths, like +you have done with forgejo-aneksajo, for the path mangling to help. + +So implementing [[todo/p2phttp_serve_multiple_repositories]] +seems better than adding such path mangling. +"""]]
tag INM7
diff --git a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host.mdwn b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host.mdwn index 075c4b57a3..da46f5b7f1 100644 --- a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host.mdwn +++ b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host.mdwn @@ -11,3 +11,5 @@ I see some ways to address this: 1. Remove the path from the request to `git credential` on git-annex' side 2. Allow `remote.<name>.annexurl` to be set to `http(s)://` URLs in addition to `annex+http(s)://`, exploiting the difference in the `git credential` behavior 3. Perhaps most elegantly: make p2phttp support serving multiple repositories, so that repositories could share the same annexurl and therefore share credentials + +[[!tag projects/INM7]]
comment
diff --git a/doc/bugs/tests_started_to_fail_recently/comment_3_2acec0272bc0f9ad0e706797851c5345._comment b/doc/bugs/tests_started_to_fail_recently/comment_3_2acec0272bc0f9ad0e706797851c5345._comment new file mode 100644 index 0000000000..cf9afe3ee0 --- /dev/null +++ b/doc/bugs/tests_started_to_fail_recently/comment_3_2acec0272bc0f9ad0e706797851c5345._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2024-11-19T17:04:13Z" + content=""" +Aha, this test on ubuntu is failing the same way as the OSX test: + +<https://github.com/datalad/git-annex/actions/runs/11905453897/job/33176247387> + +It seems that "custom-config1" only involves a annex.stalldetection +setting, if I am reading the workflow file right. I was not able to +reproduce the failure with that config set though. +"""]]
split git-remote-annex test
diff --git a/Test.hs b/Test.hs index 605bd85fc1..a850cae258 100644 --- a/Test.hs +++ b/Test.hs @@ -282,7 +282,8 @@ repoTests note numparts = map mk $ sep [ testCase "add dup" test_add_dup , testCase "add extras" test_add_extras , testCase "add moved link" test_add_moved - , testCase "git-remote-annex" test_git_remote_annex + , testCase "git-remote-annex" (test_git_remote_annex False) + , testCase "git-remote-annex exporttree" (test_git_remote_annex True) , testCase "readonly remote" test_readonly_remote , testCase "ignore deleted files" test_ignore_deleted_files , testCase "metadata" test_metadata @@ -422,12 +423,14 @@ test_add_extras = intmpclonerepo $ do annexed_present wormannexedfile checkbackend wormannexedfile backendWORM -test_git_remote_annex :: Assertion -test_git_remote_annex = do - testspecialremote [] $ - git_annex "copy" ["--to=foo"] "copy" - testspecialremote ["importtree=yes", "exporttree=yes"] $ - git_annex "export" ["master", "--to=foo"] "export" +test_git_remote_annex :: Bool -> Assertion +test_git_remote_annex exporttree + | exporttree = + testspecialremote ["importtree=yes", "exporttree=yes"] $ + git_annex "export" ["master", "--to=foo"] "export" + | otherwise = + testspecialremote [] $ + git_annex "copy" ["--to=foo"] "copy" where testspecialremote cfg populate = intmpclonerepo $ do let cfg' = ["type=directory", "encryption=none", "directory=dir"] ++ cfg diff --git a/doc/bugs/tests_started_to_fail_recently/comment_2_a58e3ebcd37f866d7154f66da8c01929._comment b/doc/bugs/tests_started_to_fail_recently/comment_2_a58e3ebcd37f866d7154f66da8c01929._comment new file mode 100644 index 0000000000..ecc2322267 --- /dev/null +++ b/doc/bugs/tests_started_to_fail_recently/comment_2_a58e3ebcd37f866d7154f66da8c01929._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2024-11-19T16:48:25Z" + content=""" +Re the OSX failure, it seems that somehow the manifest key is not being +found when the test is run on OSX. I don't know why. There is nothing in +this code that should be OSX-specific. + +Unfortunately I do have access to any OSX system to try to investigate +this. The "datalads-mac" I used to use does not seem to exist anymore. + +Of course, this test could be skipped on OSX. + +Does occur to me this could somehow be exposing a deeper problem on OSX +with exporttree special remotes. I have split the failing test in two, so +we'll see if both fail, or only the exporttree one. +"""]]
retitle OSX bug
diff --git a/doc/bugs/tests_started_to_fail_recently.mdwn b/doc/bugs/tests_started_to_fail_recently.mdwn index 77bed9042e..b4c663f906 100644 --- a/doc/bugs/tests_started_to_fail_recently.mdwn +++ b/doc/bugs/tests_started_to_fail_recently.mdwn @@ -74,3 +74,5 @@ although there in first failing was a bit different on OSX Use -p '/git-remote-annex/' to rerun this test only. ``` + +[[!meta title="git-remote-annex clone from special remote fails on OSX"]]
git-remote-annex: Fix cloning from a special remote on a crippled filesystem
Not initializing and so deleting the bundles only causes a little more work
on the first git fetch.
Not initializing and so deleting the bundles only causes a little more work
on the first git fetch.
diff --git a/CHANGELOG b/CHANGELOG index 254f914b59..7e523186c6 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -15,6 +15,8 @@ git-annex (10.20241032) UNRELEASED; urgency=medium unversioned S3 bucket that is large enough to need pagination. * S3: Use significantly less memory when importing from a versioned S3 bucket. + * git-remote-annex: Fix cloning from a special remote on a crippled + filesystem. -- Joey Hess <id@joeyh.name> Mon, 11 Nov 2024 12:26:00 -0400 diff --git a/CmdLine/GitRemoteAnnex.hs b/CmdLine/GitRemoteAnnex.hs index 36d2446e4e..89ded7191c 100644 --- a/CmdLine/GitRemoteAnnex.hs +++ b/CmdLine/GitRemoteAnnex.hs @@ -1129,7 +1129,7 @@ specialRemoteFromUrl sab a = withTmpDir "journal" $ \tmpdir -> do -- If the git-annex branch did not exist when this command started, -- it was created empty by this command, and this command has avoided -- making any other commits to it, writing any temporary annex branch --- changes to thre alternateJournal, which can now be discarded. +-- changes to the alternateJournal, which can now be discarded. -- -- If nothing else has written to the branch while this command was running, -- the branch will be deleted. That allows for the git-annex branch that is @@ -1152,6 +1152,11 @@ specialRemoteFromUrl sab a = withTmpDir "journal" $ \tmpdir -> do -- does not contain any hooks. Since initialization installs -- hooks, have to work around that by not initializing, and -- delete the git bundle objects. +-- +-- Similarly, when on a crippled filesystem, doing initialization would +-- involve checking out an adjusted branch. But git clone wants to do its +-- own checkout. So no initialization is done then, and the git bundle +-- objects are deleted. cleanupInitialization :: StartAnnexBranch -> FilePath -> Annex () cleanupInitialization sab alternatejournaldir = void $ tryNonAsync $ do liftIO $ mapM_ removeFile =<< dirContents alternatejournaldir @@ -1173,7 +1178,7 @@ cleanupInitialization sab alternatejournaldir = void $ tryNonAsync $ do Nothing -> return () Just _ -> void $ tryNonAsync $ inRepo $ Git.Branch.delete Annex.Branch.fullname - ifM (Annex.Branch.hasSibling <&&> nonbuggygitversion) + ifM (Annex.Branch.hasSibling <&&> nonbuggygitversion <&&> notcrippledfilesystem) ( do autoInitialize' (pure True) startupAnnex remoteList differences <- allDifferences <$> recordedDifferences @@ -1190,6 +1195,8 @@ cleanupInitialization sab alternatejournaldir = void $ tryNonAsync $ do _ -> noop void $ liftIO $ tryIO $ removeDirectory (decodeBS annexobjectdir) + notcrippledfilesystem = not <$> probeCrippledFileSystem + nonbuggygitversion = liftIO $ flip notElem buggygitversions <$> Git.Version.installed buggygitversions = map Git.Version.normalize diff --git a/doc/bugs/tests_started_to_fail_recently/comment_1_c07a23f5d8524ba8f97187ade6eeb441._comment b/doc/bugs/tests_started_to_fail_recently/comment_1_c07a23f5d8524ba8f97187ade6eeb441._comment new file mode 100644 index 0000000000..8f4197908f --- /dev/null +++ b/doc/bugs/tests_started_to_fail_recently/comment_1_c07a23f5d8524ba8f97187ade6eeb441._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2024-11-19T16:18:31Z" + content=""" +This is a new test. + +Looks like it's found a legitimate bug in git-remote-annex. When the +filesystem is crippled, the git-annex init checks out an adjusted branch, +which here happens in the middle of git's own checkout and so legitimately +confuses git. + +I can reproduce this on a FAT filesystem, cloning from eg a directory +special remote. Fixed this. + +(The OSX failure is something else.) +"""]]
initial report on failing tests
diff --git a/doc/bugs/tests_started_to_fail_recently.mdwn b/doc/bugs/tests_started_to_fail_recently.mdwn new file mode 100644 index 0000000000..77bed9042e --- /dev/null +++ b/doc/bugs/tests_started_to_fail_recently.mdwn @@ -0,0 +1,76 @@ +### Please describe the problem. + +eg from [this recent run](https://github.com/datalad/git-annex/actions/runs/11875458683/job/33092822672) + +``` +Tests + Repo Tests v10 adjusted unlocked branch + Init Tests + init: OK (0.43s) + add: OK (0.83s) + sop crypto: OK + upgrade: OK (0.52s) + conflict resolution (uncommitted local file): OK (4.99s) + adjusted branch merge regression: OK (1.09s) + describe: OK (0.62s) + fsck (local untrusted): OK (1.60s) + lock --force: OK (2.29s) + drop (untrusted remote): OK (1.69s) + view: OK (0.91s) + git-remote-annex: FAIL (3.01s) + ./Test/Framework.hs:86: + git clone from special remote failed with unexpected exit code (transcript follows) + Cloning into 'clonedir'... + Detected a filesystem without fifo support. + Disabling ssh connection caching. + Detected a crippled filesystem. + Entering an adjusted branch where files are unlocked as this filesystem does not support locked files. + Switched to branch 'adjusted/master(unlocked)' + error: Untracked working tree file 'bar.c' would be overwritten by merge. + fatal: unable to checkout working tree + warning: Clone succeeded, but checkout failed. + You can inspect what was checked out with 'git status' + and retry with 'git restore --source=HEAD :/' + + + Use -p '/git-remote-annex/' to rerun this test only. + +1 out of 12 tests failed (17.99s) +``` + +overall -- seems started to fail about a week ago + +``` + 167 T Nov 17 GitHub Actions datalad/git-annex daily summary: 20 PASSED, 10 FAILED, 1 ABSENT + 238 T Nov 16 GitHub Actions datalad/git-annex daily summary: 20 PASSED, 10 FAILED, 1 ABSENT + 348 T Nov 15 GitHub Actions datalad/git-annex daily summary: 23 PASSED, 7 FAILED, 1 ABSENT + 890 T Nov 14 GitHub Actions datalad/git-annex daily summary: 23 PASSED, 7 FAILED, 1 ABSENT +1676 T Nov 13 GitHub Actions datalad/git-annex daily summary: 22 PASSED, 8 FAILED, 1 ABSENT +2032 T Nov 12 GitHub Actions datalad/git-annex daily summary: 23 PASSED, 7 FAILED, 1 ABSENT +2561 T Nov 11 GitHub Actions datalad/git-annex daily summary: 30 PASSED, 1 ABSENT +``` + +although there in first failing was a bit different on OSX + +``` + Repo Tests v10 locked + Init Tests + init: OK (0.43s) + add: OK (1.17s) + sop crypto: OK + upgrade: OK (0.62s) + conflict resolution (uncommitted local file): OK (5.93s) + adjusted branch merge regression: OK (7.74s) + describe: OK (0.92s) + fsck (local untrusted): OK (1.87s) + lock --force: OK (1.64s) + drop (untrusted remote): OK (1.38s) + view: OK (1.48s) + git-remote-annex: FAIL (2.95s) + ./Test/Framework.hs:86: + git clone from special remote failed with unexpected exit code (transcript follows) + Cloning into 'clonedir'... + git-annex: No git repository found in this remote. + + Use -p '/git-remote-annex/' to rerun this test only. +```
update
diff --git a/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_8_017cf9156e94b1587f1853504d6c2de1._comment b/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_8_017cf9156e94b1587f1853504d6c2de1._comment index e6b9800971..a361760605 100644 --- a/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_8_017cf9156e94b1587f1853504d6c2de1._comment +++ b/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_8_017cf9156e94b1587f1853504d6c2de1._comment @@ -20,4 +20,7 @@ oddly didn't save any memory. Memory profiling might let this be improved further, but needing 1 gb of memory to import a million changes to files doesn't seem too bad. + +Update: Did some memory profiling, nothing stuck out as badly wrong. +Lists and tuples are using as much memory as anything. """]]
close
diff --git a/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix.mdwn b/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix.mdwn index 8f4723e67d..6a9c2349be 100644 --- a/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix.mdwn +++ b/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix.mdwn @@ -70,3 +70,6 @@ local repository version: 10 [[!meta author=yoh]] [[!tag projects/dandi]] + +> Calling this [[done]] although memory use improvements still seem +> possible.. --[[Joey]]
comments
diff --git a/doc/bugs/git-annex-import_stalls_and_uses_all_ram_available/comment_4_94b241ec93018adce716ceeed4bffd44._comment b/doc/bugs/git-annex-import_stalls_and_uses_all_ram_available/comment_4_94b241ec93018adce716ceeed4bffd44._comment new file mode 100644 index 0000000000..e1cf41e901 --- /dev/null +++ b/doc/bugs/git-annex-import_stalls_and_uses_all_ram_available/comment_4_94b241ec93018adce716ceeed4bffd44._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2024-11-15T19:29:52Z" + content=""" +FWIW, I've made some improvements that should make it need around 80% less +memory in this case. Which might be enough to let it import. + +Still don't have filtering on preferred contents on the fly though. +"""]] diff --git a/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_8_017cf9156e94b1587f1853504d6c2de1._comment b/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_8_017cf9156e94b1587f1853504d6c2de1._comment new file mode 100644 index 0000000000..e6b9800971 --- /dev/null +++ b/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_8_017cf9156e94b1587f1853504d6c2de1._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2024-11-15T17:48:08Z" + content=""" +Did same memory optimisation for the versioned case, and the results are +striking! Running the command until it had made 45 API requests, it was +using 592788 kb of memory. Now it uses only 110968 kb. + +Of that, about 78900 kb are used at startup, so it grew 29836 kb. +At that point, it has gathered 23537 changes. So about 1 kb is used per +change. That seems a bit more memory than really should be needed, +each change takes about 75 bytes of data, eg: + + "y3RixvrmLvr1oWJ7meEa4vWK6B.C.aad",3340,"dandisets/000003/draft/dandiset.jsonld",2021-09-28 02:12:39 UTC + +I did try some further memory optimisation, making it avoid storing the +same filename repeatedly in memory when gathering versioned changes. Which +oddly didn't save any memory. + +Memory profiling might let this be improved further, but needing 1 gb of +memory to import a million changes to files doesn't seem too bad. +"""]]
use 20% less memory when listing unversioned S3 bucket
diff --git a/Remote/S3.hs b/Remote/S3.hs index 299f7d7644..36cbedef50 100644 --- a/Remote/S3.hs +++ b/Remote/S3.hs @@ -601,15 +601,29 @@ listImportableContentsS3 hv r info c = { S3.gbMarker = marker , S3.gbPrefix = fileprefix } - continuelistunversioned h (rsp:l) rsp' + l' <- extractFromResourceT $ + extractunversioned rsp + continuelistunversioned h (l':l) rsp' Nothing -> nomore | otherwise = nomore where nomore = return $ - mkImportableContentsUnversioned info (reverse (rsp:l)) + mkImportableContentsUnversioned + (reverse (extractunversioned rsp:l)) + extractunversioned = mapMaybe extractunversioned' . S3.gbrContents + extractunversioned' oi = do + loc <- bucketImportLocation info $ + T.unpack $ S3.objectKey oi + let sz = S3.objectSize oi + let cid = mkS3UnversionedContentIdentifier $ S3.objectETag oi + return (loc, (cid, sz)) + continuelistversioned h l rsp | S3.gbovrIsTruncated rsp = do + let showme x = case x of + S3.DeleteMarker {} -> "delete" + v -> S3.oviKey v rsp' <- sendS3Handle h $ (S3.getBucketObjectVersions (bucket info)) { S3.gbovKeyMarker = S3.gbovrNextKeyMarker rsp @@ -620,18 +634,11 @@ listImportableContentsS3 hv r info c = | otherwise = return $ mkImportableContentsVersioned info (reverse (rsp:l)) -mkImportableContentsUnversioned :: S3Info -> [S3.GetBucketResponse] -> ImportableContents (ContentIdentifier, ByteSize) -mkImportableContentsUnversioned info l = ImportableContents - { importableContents = concatMap (mapMaybe extract . S3.gbrContents) l +mkImportableContentsUnversioned :: [[(ImportLocation, (ContentIdentifier, ByteSize))]] -> ImportableContents (ContentIdentifier, ByteSize) +mkImportableContentsUnversioned l = ImportableContents + { importableContents = concat l , importableHistory = [] } - where - extract oi = do - loc <- bucketImportLocation info $ - T.unpack $ S3.objectKey oi - let sz = S3.objectSize oi - let cid = mkS3UnversionedContentIdentifier $ S3.objectETag oi - return (loc, (cid, sz)) mkImportableContentsVersioned :: S3Info -> [S3.GetBucketObjectVersionsResponse] -> ImportableContents (ContentIdentifier, ByteSize) mkImportableContentsVersioned info = build . groupfiles diff --git a/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_7_fe6e9bc5460f9bcd24eb3034a2f45fbc._comment b/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_7_fe6e9bc5460f9bcd24eb3034a2f45fbc._comment new file mode 100644 index 0000000000..abeaf7d584 --- /dev/null +++ b/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_7_fe6e9bc5460f9bcd24eb3034a2f45fbc._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2024-11-15T17:16:51Z" + content=""" +Trying the same command but with versioning=yes, I have verified that + +* it does not have the same loop forever behavior +* it does use a lot of memory quite quickly + +Going back to the unversioned command, I was able to reduce the memory use +by 20% by processing each result, rather than building up a list of results +and processing at the end. It will be harder to do that in the versioning +case, but I expect it will improve it at least that much, and probably +more, since it will be able to GC all the delete markers. +"""]]
diff --git a/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn index 45af5b2ee6..fc1e5bb168 100644 --- a/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn +++ b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn @@ -15,6 +15,6 @@ However, git annex exits without copying any files, my repo is still empty after I also tried git annex findkeys --not --unused, but it says invalid option --unused :-( -In my example I have multiple repositories that all have part of the files I want, so I cannot just make a repo that has all versions of all files and then `drop --unused`. That also would take too much storage. +In real life I have multiple repositories that all have part of the files I want, so I cannot just make a repo that has all versions of all files and then `drop --unused`. That also would take too much storage. How can I do this?
diff --git a/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn index 2c4b1a3289..45af5b2ee6 100644 --- a/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn +++ b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn @@ -15,4 +15,6 @@ However, git annex exits without copying any files, my repo is still empty after I also tried git annex findkeys --not --unused, but it says invalid option --unused :-( +In my example I have multiple repositories that all have part of the files I want, so I cannot just make a repo that has all versions of all files and then `drop --unused`. That also would take too much storage. + How can I do this?
diff --git a/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn index cdc9b6bf8b..2c4b1a3289 100644 --- a/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn +++ b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn @@ -4,10 +4,12 @@ I tried to clone a present repository to an new folder and move there only files But git annex does nothing: +``` git clone my-repo repo-archive cd repo-archive git annex init git annex copy --to=here --not --unused +``` However, git annex exits without copying any files, my repo is still empty afterwards.
Added a comment
diff --git a/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_1_139620857a275559b06fee54a21cbf08._comment b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_1_139620857a275559b06fee54a21cbf08._comment new file mode 100644 index 0000000000..d34274f304 --- /dev/null +++ b/doc/todo/p2phttp__58___reuse_credentials_for_repos_on_one_host/comment_1_139620857a275559b06fee54a21cbf08._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="matrss" + avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de" + subject="comment 1" + date="2024-11-15T08:54:07Z" + content=""" +Just an addendum: in forgejo-aneksajo I've effectively implemented the third option by having one git-annex-p2phttp endpoint for all repositories, peaking at the request to get the repository UUID, starting the p2phttp server for that repository, and then forwarding the request. So, having to enter the credentials for every new repository is no longer a concern there, and <https://git-annex.branchable.com/todo/p2phttp_serve_multiple_repositories/> would address this for standalone p2phttp. + +What might still be nice though is trying to reuse the credentials of standard git operations for p2phttp. In the case of forgejo-aneksajo, git push/pull and annex-p2phttp operations use the same username/password or username/access-token combination for authentication, but git-annex will prompt for them twice due to the different URLs. This might be a bit hacky, but I think this would just work if git-annex allowed plain http(s):// URLs in addition to annex+http(s):// in the annexurl configuration, as the request to git credential would then match that of plain git operations. +"""]]
diff --git a/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn new file mode 100644 index 0000000000..cdc9b6bf8b --- /dev/null +++ b/doc/forum/How_to_get_a_list_of_all_NOT_unused_files.mdwn @@ -0,0 +1,16 @@ +I have a research project where I want to save some but not all versions. Those that should be saved are tagged. I want to create a repository (and archive it) that contains only those files. It is, so to say, the inverse of --unused. + +I tried to clone a present repository to an new folder and move there only files that are referenced by some ref (branch or tag). + +But git annex does nothing: + +git clone my-repo repo-archive +cd repo-archive +git annex init +git annex copy --to=here --not --unused + +However, git annex exits without copying any files, my repo is still empty afterwards. + +I also tried git annex findkeys --not --unused, but it says invalid option --unused :-( + +How can I do this?
fixed
diff --git a/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_6_7cdffb27b1ab45fab71f1de19501f243._comment b/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_6_7cdffb27b1ab45fab71f1de19501f243._comment new file mode 100644 index 0000000000..ebe1e01a77 --- /dev/null +++ b/doc/bugs/importtree_from_S3_slows_to_halt_even_with_prefix/comment_6_7cdffb27b1ab45fab71f1de19501f243._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2024-11-14T20:14:29Z" + content=""" +Fixed in [[!commit 4b87669ae229c89eadb4ff88eba927e105c003c4]]. Now it runs +in seconds. + +Note that this bug does not seem to affect S3 remotes that have versioning +enabled. +"""]]