Recent changes to this wiki:
Added a comment
diff --git a/doc/bugs/add__58___inconsistently_treats_files_in_dotdirs_as_dotfiles/comment_8_52510b89aa6a287493f2f77c07b8a682._comment b/doc/bugs/add__58___inconsistently_treats_files_in_dotdirs_as_dotfiles/comment_8_52510b89aa6a287493f2f77c07b8a682._comment new file mode 100644 index 0000000000..3f4bcde57f --- /dev/null +++ b/doc/bugs/add__58___inconsistently_treats_files_in_dotdirs_as_dotfiles/comment_8_52510b89aa6a287493f2f77c07b8a682._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 8" + date="2025-06-22T03:53:19Z" + content=""" +I did spend some time now trying to figure FTW and why git-annex (version from end of last year) says \"non-large file; adding content to git repository\" whenever `check-attr` insists that `largefile` should apply to my huge `.dandi/assets.json`. Only trying newer git-annex I think I got the reason which it finally announced as \"dotfile; adding content to git repository\" and I was able to recover this discussion! + +Re + +> But, .config/ seems to me to perfectly match what dotfiles are, which is files that are configuration that are named with a name starting with a dot in order to keep them from cluttering up ls. + +As far as I know, having leading dot just a convention for **hidden** and not config, and even not neccessarily text files. +Even though, `dotfile` files (not folders) are most of the time are text files, I would not generalize that to dot-folders: +content of `.cache/` or `.venv/` (created by `uv`) etc are unlikely to be text files to even start with. Those folder names start with dot to signal \"hidden\" not \"text\" or \"small\". + +That is why I retain that it remains confusing and inconsistent to have any special treatment and need for extra configuration (`git annex config --set annex.dotfiles true`) for content of dot-folders. I appreciate that such change would likely change behavior but IMHO it might just be \"for the best\". +"""]]
FR for gx-import
diff --git a/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn b/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn new file mode 100644 index 0000000000..b46105c19d --- /dev/null +++ b/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn @@ -0,0 +1,49 @@ +[[!meta author="Spencer"]] + +I have discovered that what is meant by `Any files that are gitignored will not be included in the import, but will be left on the remote.` in the [import doc](https://git-annex.branchable.com/git-annex-import) is `Any files that are [locally] gitignored [relative to the repo's root] will not be included in the import`. This is to say that when importing, only `.gitignore` paths from the root repo are used to exclude paths in the imported tree as if the tree were imported relative to root, regardless if a subtree is specified. This means that the repo gitignores must include ignores as desired to import the correct files from an import tree. + +This makes it challenging to import special remotes into subtrees. Ignores must be written to match the trees' roots but this might lead to clobbering of paths/names which overlap with other trees or the main repo. + +Therefore I suggest that imports to a subtree respect ignores as if the files in the tree were already adjusted to their new destination. +I suspect that annex is listing the tree, comparing the list to ignores, then importing what doesn't match. +So, instead, this would involve listing, moving the list to its subtree path, then comparing to ignores. + +A similar argument could be made for attributes in general. +I haven't done the testing on import attributes (namely `largefiles`), but I would want these to respect subtree paths as well. + +<details> +<summary>Testing Notes</summary> + +I made various .gitignore files in a fresh repo with a tree at `../tree` relative to fresh repo. +The tree had files `a`-`g`. +The ignores all began from this template: + +```gitignore +# -- Import into ROOT +# -- tree ignore +a + +# -- root ignore +b + +# -- root ignore in root +root-ignore/c + +# -- relative ignore in relative +d + +# -- root ignore in relative +root-ignore/e + +# -- relative ignore in root +f + +# -- tree ignore relative to root +root-ignore/g +``` + +Then I commented out certain lines for each location. E.g. only try ignoring `a` and `root-ignore/g` in the tree, `b`, `root-ignore/c` in root, and `f` in root. + +Regardless of import or ignore, only `b` and `f` were ignored pertaining to the root `.gitignore` matching these files in the tree, even when the tree was imported to subtree `rel-ignore` or `root-ignore`. + +</details>
diff --git a/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn b/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn index 28abcfc8f1..13c6700f10 100644 --- a/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn +++ b/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn @@ -58,3 +58,4 @@ local repository version: 10 OSX (brew) +[[!meta author="Spencer"]]
import to nonexistent path/branch gives unable to find base tree error
diff --git a/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn b/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn new file mode 100644 index 0000000000..28abcfc8f1 --- /dev/null +++ b/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn @@ -0,0 +1,60 @@ +### Please describe the problem. + +I thought the branch in an import was arbitrary? E.g. `gx import <branch>:<subtree> -f <remote>`. + +While I could understand if it is not arbitrary if it corresponds to an existing local branch, in which case the local branch is taken as a basis for which the import should be based on, I assumed if the branch name did not have a corresponding local branch that import would just base its work on an orphan. However, this fails when importing a subtree and gives `Unable to find base tree for branch <branch>`. + + +### What steps will reproduce the problem? + +Here's an example setup in a fresh repo with no commits to master: + +``` +(base) ➜ repo git:(master) ✗ grv +tree + +(base) ➜ repo git:(master) ✗ gx info tree +uuid: df2c15bd-0d12-4508-99c0-31da0b5e00d6 +description: [tree] +trust: untrusted +remote: tree +cost: 100.0 +type: directory +available: true +directory: /Users/coesite/Documents/Temp/annex-tests/import-which-gitignore/tree +encryption: none +chunking: none +importtree: yes +remote annex keys: 6 +remote annex size: 472 bytes + +(base) ➜ repo git:(master) ✗ la +total 16 +drwxr-xr-x@ 13 coesite staff 416B Jun 20 14:28 .git +-rw-r--r-- 1 coesite staff 239B Jun 20 14:13 .gitignore +-rw-r--r-- 1 coesite staff 1.8K Jun 20 14:24 README.md +drwxr-xr-x 3 coesite staff 96B Jun 20 14:05 rel-ignore +drwxr-xr-x 2 coesite staff 64B Jun 20 14:06 root-ignore + +(base) ➜ repo git:(master) ✗ gx import two:rel-ignore -f tree +git-annex: Unable to find base tree for branch two +``` + +I would have suspected that even though path `rel-ignore` doesn't yet exist on orphan branch `two` that this would still import `tree` to be under that path. + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20250605 +build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.8.4 http-client-0.7.19 persistent-sqlite-2.13.3.1 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: darwin aarch64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +OSX (brew) +
Added a comment: A (Mildly) Compelling Reason
diff --git a/doc/todo/symlinks_to_symlinks_to_the_annex/comment_6_743a6d9f8f4061f429a844024ba1208f._comment b/doc/todo/symlinks_to_symlinks_to_the_annex/comment_6_743a6d9f8f4061f429a844024ba1208f._comment new file mode 100644 index 0000000000..e9887c02c4 --- /dev/null +++ b/doc/todo/symlinks_to_symlinks_to_the_annex/comment_6_743a6d9f8f4061f429a844024ba1208f._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="A (Mildly) Compelling Reason" + date="2025-06-19T01:34:17Z" + content=""" +This feature would alleviate one problem I have with annex in that the path stored in annex symlinks depends on the tree a file sits in. +This makes each *`git`* object of a annexed file in a different folder unique. +If annexed files ever move, we now have a fairly useless new git object introduced into the repo. +Not at all a problem for one file but if you have tens of thousands of annexed files and you refactor, you start to notice that. + +Unlocked files don't have this problem because their blobs point agnostically to the annex and key. +But, of course, unlocking large amounts of files mean content copies so that's not great. + +Symlink chains alleviate this because if I have a chain like `.root -> ./` in the root and `.root -> ../.root` in essentially every directory, then annex symlinks become agnostic too. +And on the git side, that's two new objects to add, and only a new tree object when performing a move. + +Again this is only relevant when the number of files becomes massive. +For sense of scale, let's assume a symlink payload is on the order of 100 bytes. +So 10,000 files generates roughly a Mb of git objects, meaning if I had 100,000 files and moved them around once, I'd have 20 Mb of data dedicated to locating these files w/ 10 Mb of what I would deem as waste. +Honestly, annex and git slow down appreciably at that scale for other reasons (pull/push/checkout, especially on slower file systems), so I say this is a non-issue by comparison. +For those who had similar concerns, there's your benchmark: 10Mb of bloat per 10,000 files per move! +"""]]
Added a comment: Easy Approach
diff --git a/doc/bugs/git_worktree_remove_fails/comment_3_fffd495306d1eb4093f1235c337326cd._comment b/doc/bugs/git_worktree_remove_fails/comment_3_fffd495306d1eb4093f1235c337326cd._comment new file mode 100644 index 0000000000..c012bbf112 --- /dev/null +++ b/doc/bugs/git_worktree_remove_fails/comment_3_fffd495306d1eb4093f1235c337326cd._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="Easy Approach" + date="2025-06-18T04:27:16Z" + content=""" +Here's how to remove a worktree: + +```bash +echo \"gitdir: $(readlink .git)\" > .git0; +rm .git; +mv .git0 .git; +git worktree remove .; +``` + +as done inside the worktree itself. Update paths if you want to remove the worktree from outside of it. +So long as you don't run another `git annex` command after replacing the symlink with a file, `worktree remove` should work! +"""]]
Added a comment
diff --git a/doc/bugs/git-lfs_special_insists_on_https/comment_2_fdf1d13005f8a32663643773a09cc273._comment b/doc/bugs/git-lfs_special_insists_on_https/comment_2_fdf1d13005f8a32663643773a09cc273._comment new file mode 100644 index 0000000000..3b20ba13bd --- /dev/null +++ b/doc/bugs/git-lfs_special_insists_on_https/comment_2_fdf1d13005f8a32663643773a09cc273._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec" + nickname="beryllium" + avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137" + subject="comment 2" + date="2025-06-17T10:00:16Z" + content=""" +Sorry to comment on a done bug. It's just that it seems opening a forum thread is going to lose the context. + +Can I ask, why is a git url even required? Isn't that going to require that only a self-hosted git is available anyway... because you aren't going to get the specific configuration to allow git-annex to fetch .git/config for the annex id? + +I thought the git-lfs remote was only for \"blob\" storage, and so API only. Or at least, it being integrated with a git service would have been optional, not mandatory. + +The example I gave with http and non-standard port was based on running the reference implementation https://github.com/git-lfs/lfs-test-server alone. Which works when a git project is configured just local (no remotes) and then the lfs url is set. + +"""]]
thx
diff --git a/doc/forum/Import_-_Changing_Largefiles.mdwn b/doc/forum/Import_-_Changing_Largefiles.mdwn index 88cf543cec..0dfa3769b6 100644 --- a/doc/forum/Import_-_Changing_Largefiles.mdwn +++ b/doc/forum/Import_-_Changing_Largefiles.mdwn @@ -20,3 +20,9 @@ For a better understanding, here is a MWE to reproduce this: 1. Note that all files are still considered large. Maybe there's another way of fixing this that I don't know about, but as far as I know, from this point you have to delete the special remote and redo the above now with the desired `.gitattributes` file staged for files in this external tree to be imported as small. + +[[done]] + +--- + +Conclusion: Don't just delete the imported branch, update it with a commit to force small/large the files as desired.
Added a comment: Solutions
diff --git a/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_1_50b69686e3574a407a334556303a11cb._comment b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_1_50b69686e3574a407a334556303a11cb._comment new file mode 100644 index 0000000000..aceb6d582a --- /dev/null +++ b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_1_50b69686e3574a407a334556303a11cb._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="Solutions" + date="2025-06-16T20:33:03Z" + content=""" +While I agree that being unable to fix symlinks can sometimes be a bit annoying your examples have straightforward solutions using existing tools: + +1. `g add -f <symlink>` - [`gx fix`] - `gx unannex`. \"Unlocks\" and `rm`'s in one go. Does still leave a copy in annex (as did your `git rm --cached`) so you still have to contend with that. +1. `diff` before moving the file. You have to type the relpath anyway to move the file so might as well just type the relpath into diff instead of mv. + +It's unfortunately fairly antithetical to modify any untracked file by `git`. This includes modifying symlink paths. Therefore the existing friction is actually helping new users figure out the proper way of doing things in a git environment IMHO. +"""]]
Added a comment: Ignoring files on directory special remote
diff --git a/doc/special_remotes/directory/comment_23_d6a4a7bd602260051eef1cc1c57bf01a._comment b/doc/special_remotes/directory/comment_23_d6a4a7bd602260051eef1cc1c57bf01a._comment new file mode 100644 index 0000000000..b885217e1d --- /dev/null +++ b/doc/special_remotes/directory/comment_23_d6a4a7bd602260051eef1cc1c57bf01a._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="ruka" + avatar="http://cdn.libravatar.org/avatar/8844137c8ca327cdd49ed692f0a30e02" + subject="Ignoring files on directory special remote" + date="2025-06-15T14:41:28Z" + content=""" +Is there a way for git-annex to completely ignore some files on a directory special remote? I'm managing files on my MP3 player's SD card using a directory special remote, but git-annex also tries to manage the player's database files, which I don't want. If I exclude them from wanted or put them in .gitignore, git-annex tries to delete them on sync or export. +"""]]
diff --git a/doc/bugs/openTempfile_invalid_argument_on_sd_card.mdwn b/doc/bugs/openTempfile_invalid_argument_on_sd_card.mdwn new file mode 100644 index 0000000000..f3c37c5327 --- /dev/null +++ b/doc/bugs/openTempfile_invalid_argument_on_sd_card.mdwn @@ -0,0 +1,45 @@ +### Please describe the problem. + +I am using a directory special remote with "exporttree=yes" and "importtree=yes" to manage my music collection on the SD card for my Tangara. Some filenames produce an "invalid argument" error when git-annex tries to export them to the card even though the filename is perfectly valid for vfat. The main commonality seems to be multiple dots in the filename, though other files with multiple dots work fine. + + +### What steps will reproduce the problem? + +1. Create a repository with a bunch of files that have multiple dots in them in different places +2. Create a directory special remote on a vfat filesystem with "exporttree=yes" and "importtree=yes" and no encryption +3. Attempt to export or sync files to the directory special remote + +### What version of git-annex are you using? On what operating system? + +10.20250605-gb9e3cf8780a04c8b1ac0cf4768c9ec510483477c +Linux Mint + +### Please provide any additional information below. + +[[!format sh """ +$ git annex sync --content +commit +On branch main +nothing to commit, working tree clean +ok +list tangara ok +update refs/remotes/tangara/main ok +unexport tangara Music/Cloudpunk/City of Ghosts/07. Home is Now.mp3 ok +... +unexport tangara .git-annex-tmp-content-SHA256E-s7284686--102594598eea9c5e7fd96ef20e9d5fd0485244716a1b5e95a528ca887a81ae59.mp3 ok +... +export tangara Music/Cloudpunk/City of Ghosts/01 - Bandit Queens.mp3 ok +... +export tangara Music/KIRA/KIRA - The Introduction (Deluxe Edition)/KIRA - The Introduction (Deluxe Edition) - 05 Games (feat. Ruby & Gumi).ogg + /media/ciara/F0F5-1E76/Music/KIRA/KIRA - The Introduction (Deluxe Edition)/: openTempFile template KIRA - The Introduction (Deluxe Edition) - 05 Games (feat. : invalid argument (Invalid argument) +failed +export tangara Music/KIRA/KIRA ft. GUMI - Burn Me Down.ogg + /media/ciara/F0F5-1E76/Music/KIRA/: openTempFile template KIRA ft. GUMI - : invalid argument (Invalid argument) +failed +... +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +I use it quite successfully to archive media on removable spinning hard drives.
response
diff --git a/doc/forum/special_remote_protocol__58___How_to_identify_exactsize/comment_1_bbc07fed3ef028d5551932853dba3f45._comment b/doc/forum/special_remote_protocol__58___How_to_identify_exactsize/comment_1_bbc07fed3ef028d5551932853dba3f45._comment new file mode 100644 index 0000000000..087d6c6b85 --- /dev/null +++ b/doc/forum/special_remote_protocol__58___How_to_identify_exactsize/comment_1_bbc07fed3ef028d5551932853dba3f45._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-13T12:35:50Z" + content=""" +`git-annex examinekey --format='${bytesize}\n'` + +Or `git-annex examinekey --json` and use the `bytesize` field. + +(You will probably want to use `--batch` to keep a single examinekey +process running, for speed.) + +Note that not all keys have a known size. Usually keys without a known size +were added with eg `git-anex addurl --fast`. Encrypted keys also won't have +a size field. + +Also, when chunking is used with a special remote (without +encryption), each chunk is a key, with its size field set to the total size +of the original key. In that case there is a separate chunk size field, +although the last chunk may be smaller than its chunk size field. +If it would be useful, examinekey could have something added to it to +indicate when a key is a chunk key, and show the chunk size. +"""]]
response
diff --git a/doc/forum/Import_-_Changing_Largefiles/comment_1_1d560f2c337e5f067f42a3088e686467._comment b/doc/forum/Import_-_Changing_Largefiles/comment_1_1d560f2c337e5f067f42a3088e686467._comment new file mode 100644 index 0000000000..85cf6c1417 --- /dev/null +++ b/doc/forum/Import_-_Changing_Largefiles/comment_1_1d560f2c337e5f067f42a3088e686467._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-13T12:27:35Z" + content=""" +git-annex has to maintain a considerable amount of state about the content +of a special remote in order to efficiently import trees from it, and this +caching is what is preventing the new configuration of annex.largefiles +from being used. + +In particular, git-annex knows the content identifier associated with the +file you imported before. And the key associated with that content +identifier is present in the repository. So it uses the existing content +rather than download it again. + +While it would be possible to either remove enough information from the +git-annex branch to defeat that, or modify git-annex to have a mode where +it redoes expensive work, it seems to me to be easier to just treat this as +a case of an annexed file that you want to change to be stored in git +instead. Since that is a general problem, with a general solution. See +[[tips/largefiles]], "converting annexed to git". +"""]]
revert man page changes
Revert "Linked to discussion on caveat"
This reverts commit 9fe60062a38228594ce8d48bbe1b14532934f22d.
We don't link from man pages to forum discussions. If there is a
problem, it should be fixed, and if there is a wart it should be
documented on the man page in enough detail to understand on its own.
In this case, I don't know that there is any problem at all.
Revert "Linked to discussion on caveat"
This reverts commit 9fe60062a38228594ce8d48bbe1b14532934f22d.
We don't link from man pages to forum discussions. If there is a
problem, it should be fixed, and if there is a wart it should be
documented on the man page in enough detail to understand on its own.
In this case, I don't know that there is any problem at all.
diff --git a/doc/git-annex-import.mdwn b/doc/git-annex-import.mdwn index 71b7dedbb4..e78fa0ac14 100644 --- a/doc/git-annex-import.mdwn +++ b/doc/git-annex-import.mdwn @@ -229,12 +229,10 @@ link, and that symbolic link will be followed. Note that using `--deduplicate` or `--clean-duplicates` with the WORM backend does not look at file content, but filename and mtime. -If `annex.largefiles` is configured (in the current repo's `.gitattributes` file), -and does not match a file, `git annex import` will add the non-large file directly to the git repository, +If annex.largefiles is configured, and does not match a file, `git annex +import` will add the non-large file directly to the git repository, instead of to the annex. -[[Caveat Discussion: Adjusting Largefiles Specification|forum/Import_-_Changing_Largefiles]] - # SEE ALSO [[git-annex]](1)
Added a comment: OK I may have overcomplicated things
diff --git a/doc/forum/Move_part_of_one_repository_into_other/comment_2_6845c2bec20af80f7457386f403e3bb0._comment b/doc/forum/Move_part_of_one_repository_into_other/comment_2_6845c2bec20af80f7457386f403e3bb0._comment new file mode 100644 index 0000000000..5cbe60a430 --- /dev/null +++ b/doc/forum/Move_part_of_one_repository_into_other/comment_2_6845c2bec20af80f7457386f403e3bb0._comment @@ -0,0 +1,29 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="OK I may have overcomplicated things" + date="2025-06-11T21:47:32Z" + content=""" +Turns out, the answer is simple: + +1. `git rm --cached \"B\"` +1. (in `B`): + 1. `git add` + 1. `git remote add tmp.parent <relpath/from/B/root/to/A/root>` + 1. `git annex get` + 1. `git remote remove tmp.parent` + +***if you need just the files moved around*** + +I haven't used metadata so I can't comment on how to move that around but you might have to rely on something akin to my first comment. +In my brief testing, because metadata is stored in the `git-annex` branch on a per-key level, it does in fact require merging of the git-annex branch somehow to transfer. + +In short: `git-annex` can get file content in both an *informed* and *uninformed* way. +If `git-annex` knows about content in a repo because of historic moves/copies-to or merging of `git-annex` branches, +it has *informed* knowledge of what's in certain remotes. +If it does not, then it can still do an *uninformed* query for potential file content. +In this way, e.g. `git annex info` and `git annex list` may show file content as not in a particular remote, +but a `git annex get` or `git annex move` *may actually still work*. + + +"""]]
Added a comment: wait, something's up with the suffix!
diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_2_1f276654e46182acc1f55f65c4b95dc1._comment b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_2_1f276654e46182acc1f55f65c4b95dc1._comment new file mode 100644 index 0000000000..cbf370e544 --- /dev/null +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_2_1f276654e46182acc1f55f65c4b95dc1._comment @@ -0,0 +1,85 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="wait, something's up with the suffix!" + date="2025-06-11T11:25:03Z" + content=""" +Hm, Just the non-ascii characters can't be the problem. Only with the suff `.pdf` or `.pdff` it fails. Without the suffix or just `.pd` or longer extensions `.pdfff` it works 🤪 + +[[!format bash \"\"\" +yann in yann-desktop-nixos in …/nonascii on main as 🧙 +🐟 ❮ touch 🌸.txt 🎶.txt ★.txt α.txt β.txt δ.txt 乙.txt 山.txt 川.txt 空.txt 愛.txt 心.txt 学.txt 数.txt 詩.txt 韓.txt 北.txt 南.txt 墨.txt 漬.txt 墨漬.txt \"墨漬 \" \"墨漬 Ink\" \"墨漬 Ink Stains\" \"墨漬 Ink Stains.pdf\" \"墨漬 Ink Stains.\" \"墨漬 Ink Stains.p\" \"墨漬 Ink Stains.pd\" \"墨漬 Ink Stains.pdf\" \"墨漬 Ink Stains.pdff\" \"墨漬 Ink Stains.pdfff\" \"墨漬 Ink Stains.pdffff\" +yann in yann-desktop-nixos in …/nonascii on main [?] as 🧙 +🐟 ❯ git annex add --jobs 1 +add α.txt +ok +add β.txt +ok +add δ.txt +ok +add ★.txt +ok +add 乙.txt +ok +add 北.txt +ok +add 南.txt +ok +add 墨.txt +ok +add 墨漬 +ok +add 墨漬 Ink +ok +add 墨漬 Ink Stains +ok +add 墨漬 Ink Stains. +ok +add 墨漬 Ink Stains.p +ok +add 墨漬 Ink Stains.pd +ok +add 墨漬 Ink Stains.pdf + +git-annex: createSymbolicLink '.git/annex/objects/7x/w0/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdf/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdf' to '.git/annex/othertmp/.0': already exists (File exists) +failed +add 墨漬 Ink Stains.pdff + +git-annex: createSymbolicLink '.git/annex/objects/7p/22/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdff/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdff' to '.git/annex/othertmp/.0': already exists (File exists) +failed +add 墨漬 Ink Stains.pdfff +ok +add 墨漬 Ink Stains.pdffff +ok +add 墨漬.txt +ok +add 学.txt +ok +add 山.txt +ok +add 川.txt +ok +add 心.txt +ok +add 愛.txt +ok +add 数.txt +ok +add 漬.txt +ok +add 空.txt +ok +add 詩.txt +ok +add 韓.txt +ok +add 🌸.txt +ok +add 🎶.txt +ok +(recording state in git...) +add: 2 failed +yann in yann-desktop-nixos in …/nonascii on main [+?] as 🧙 +❌1 🐟 ❯ +\"\"\"]] +"""]]
Added a comment: confirm, but not all non-ascii characters are a problem
diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_1_f33a3cce42b31952562ff2688b8bae8f._comment b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_1_f33a3cce42b31952562ff2688b8bae8f._comment new file mode 100644 index 0000000000..bb09292e9e --- /dev/null +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_1_f33a3cce42b31952562ff2688b8bae8f._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="confirm, but not all non-ascii characters are a problem" + date="2025-06-11T11:11:19Z" + content=""" +I can confirm this behaviour. But it's not precisely \"non-ascii characters\" that cause this, emojis and greek letters for example are no problem. + +[[!format bash \"\"\" +🐟 ❯ touch \"墨漬 Ink Stains.pdf\" +🐟 ❯ touch 📝.txt +🐟 ❯ touch σ.txt +🐟 ❯ git annex add +add 📝.txt ok +add 墨漬 Ink Stains.pdf +git-annex: createSymbolicLink '.git/annex/objects/7x/w0/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdf/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdf' to '.git/annex/othertmp/.0': already exists (File exists) +failed +add σ.txt ok +(recording state in git...) +add: 1 failed +\"\"\"]] +"""]]
diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn new file mode 100644 index 0000000000..50e3f1ba51 --- /dev/null +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn @@ -0,0 +1,52 @@ +### Please describe the problem. + +In a large import, three files (all with non-ascii names) gave the following error: `git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists)` + +I've tried to extract the relevant part of a `strace -f`: + +``` +mkdir(".git/annex/othertmp", 0777) = -1 EEXIST (File exists) +newfstatat(AT_FDCWD, ".git/annex/othertmp", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0 +mkdir(".git/annex/othertmp", 0777) = -1 EEXIST (File exists) +newfstatat(AT_FDCWD, ".git/annex/othertmp", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0 +mkdir(".git/annex/othertmp/.0", 0777) = 0 +unlink(".git/annex/othertmp/.0") = -1 EISDIR (Is a directory) +symlink("../../../../../.git/annex/objects/9w/wJ/SHA256E-s5426861--cdc0664822c9df3ffbf255d160870fc39a6fdd1168b02fc2c9b59cc65bc81c26.pdf/SHA256E-s5426861 +--cdc0664822c9df3ffbf255d160870fc39a6fdd1168b02fc2c9b59cc65bc81c26.pdf", ".git/annex/othertmp/.0") = -1 EEXIST (File exists) +newfstatat(AT_FDCWD, ".git/annex/othertmp/.0", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0 +newfstatat(AT_FDCWD, ".git/annex/othertmp/.0", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 +openat(AT_FDCWD, ".git/annex/othertmp/.0", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 31 +fstat(31, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 +getdents64(31, 0x7f2694001180 /* 2 entries */, 32768) = 48 +getdents64(31, 0x7f2694001180 /* 0 entries */, 32768) = 0 +close(31) = 0 +rmdir(".git/annex/othertmp/.0") = 0 +close(26) = 0 +``` + +### What steps will reproduce the problem? + +``` +touch "墨漬 Ink Stains.pdf" +git annex add "墨漬 Ink Stains.pdf" +``` + +(the file name base64 encoded is `5aKo5rysIEluayBTdGFpbnMucGRm`) + + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20250605-gd2dc318a867f571cbc848b5d45e82e153e364e4e +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24 bloomfilter-2.0.1.2 crypton-1.0.0 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.8 http-client-0.7.18 persistent-sqlite-2.13.1.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +``` + +I'm running Arch Linux (kernel 6.15.1-arch1-2). The repo I'm running the commands in is on an ext4 filesystem. + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +git-annex has been brilliant for managing my large media collection across several removable drives, and I'm confident it will continue to scale. This is the first issue I've run into with it. + +
Added a comment
diff --git a/doc/special_remotes/compute/comment_7_8fa94e05ff0e67ae87a13132bfb40b61._comment b/doc/special_remotes/compute/comment_7_8fa94e05ff0e67ae87a13132bfb40b61._comment new file mode 100644 index 0000000000..08854589fc --- /dev/null +++ b/doc/special_remotes/compute/comment_7_8fa94e05ff0e67ae87a13132bfb40b61._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec" + nickname="beryllium" + avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137" + subject="comment 7" + date="2025-06-10T12:01:14Z" + content=""" +I've realised that... I'm overlooking that the input filename itself is metadata. I have a methodology that I like now. + +As per: `git-annex addcomputed --to=imageconvert foo.jpeg foo.gif`, where foo. is linking metadata, I can just generate a filename (and as I've learnt, path), that links back to the source by retaining it. + +I also see now that there is no need to avoid duplication of pointer files to the same computed file by key. + +The uncomplicated existing approach is more than sufficient. + +"""]]
diff --git a/doc/users/Spencer.mdwn b/doc/users/Spencer.mdwn index b521ab0894..382ce88aa9 100644 --- a/doc/users/Spencer.mdwn +++ b/doc/users/Spencer.mdwn @@ -1,4 +1,4 @@ ---- +[[!meta author="Spencer"]] ## Contributions
diff --git a/doc/forum/Import_-_Changing_Largefiles.mdwn b/doc/forum/Import_-_Changing_Largefiles.mdwn index a49fe3d970..88cf543cec 100644 --- a/doc/forum/Import_-_Changing_Largefiles.mdwn +++ b/doc/forum/Import_-_Changing_Largefiles.mdwn @@ -1,3 +1,5 @@ +[[!meta author="Spencer"]] + # Changing Largefile Specification for Imported Trees If you want files to be large/small *after* already importing a tree from an `importtree` enabled remote, well, it appears you can't.
Added a comment
diff --git a/doc/special_remotes/compute/comment_6_0785f4683f0e7f9848aced2357ca1ec0._comment b/doc/special_remotes/compute/comment_6_0785f4683f0e7f9848aced2357ca1ec0._comment new file mode 100644 index 0000000000..8606110243 --- /dev/null +++ b/doc/special_remotes/compute/comment_6_0785f4683f0e7f9848aced2357ca1ec0._comment @@ -0,0 +1,58 @@ +[[!comment format=mdwn + username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec" + nickname="beryllium" + avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137" + subject="comment 6" + date="2025-06-09T13:09:25Z" + content=""" +I'm getting acquainted with this special remote. I cannot praise it enough. It is brilliant. + +This is my first cut git-annex-compute-stripexif: + +[[!format bash \"\"\" +#!/bin/bash + +set -e + +if [ -z \"$1\" ]; then + echo \"Specify the input image file, followed by the output image file.\" >&2 + echo \"Example: foo.jpg foo.gif\" >&2 + exit 1 +fi + +echo REPRODUCIBLE +echo \"INPUT $1\" +read input + +if [ -n \"$input\" ]; then + tf=$(mktemp) + cp \"$input\" \"$tf\" >&2 + exiftool -overwrite_original -ALL= \"$tf\" >&2 + outfile=\"SANSEXIF-\"$(git-annex calckey \"$tf\") +fi +echo \"OUTPUT $outfile\" +read output + +cp -v \"$tf\" \"$outfile\" >&2 +rm -v \"$tf\" >&2 +\"\"\"]] + +Along the way, I've learnt that EXIF metadata isn't the only metadata stored in a jpeg, so the name is now a bit of a misnomer. Also, as it was more proof-of-concept, the target name and location is not well thought out, and there's no preservation of file extension. It's indicative for now. + +The aim is to aid (only) in the identifying two copies of the same jpeg, where only the metadata has been changed (eg. either by adjustments I made by script eons ago, or by apps like Microsoft photoviewer where orientation changes were made via metadata). I say aid only, because it's not going to help if the image is resized, etc. and I understand that. + +To that end, I do have some questions. The first is... is it wise (or possible) to try to set metadata on the source files whilst in the script? (since writing this, I have come to understand that the compute script is not run within the working directory, and the implication is that you're not meant to run any git-annex commands) + +Obviously, the idea would be to tag the source file with the computed key. I have already verified that if two copies of a jpeg that differ only by metadata, the computed file and key will be the same. + +But what I found is, if I don't have that option to set metadata, then respectfully, git-annex-findcomputed may have some deficiencies. + +From what I can gather, git-annex-findcomputed will not list the subsequent input file that when added, computes it. Only the first one. + +So trying to post process the computed files to perform the setting of metadata on the source files would likely not work. + +Also, I was curious about what happens if the input file moves within the archive? I haven't tried... but from what I can see, you wouldn't be able to backtrack from the computed file, because you won't know the key of the input file, in turn to go searching for it (eg. git-annex-whereused). + +Is my use case way off base as to why you should use the compute remote? + +"""]]
For one: why would preview show a nonexistent page as an existent link instead of a question mark? For two: why is []() syntax relative to current page but [[|]] syntax is relative to root?
diff --git a/doc/git-annex-import.mdwn b/doc/git-annex-import.mdwn index a0ecc9fa08..71b7dedbb4 100644 --- a/doc/git-annex-import.mdwn +++ b/doc/git-annex-import.mdwn @@ -233,7 +233,7 @@ If `annex.largefiles` is configured (in the current repo's `.gitattributes` file and does not match a file, `git annex import` will add the non-large file directly to the git repository, instead of to the annex. -[Caveat Discussion: Adjusting Largefiles Specification](forum/Import_-_Changing_Largefiles) +[[Caveat Discussion: Adjusting Largefiles Specification|forum/Import_-_Changing_Largefiles]] # SEE ALSO
Linked to discussion on caveat
diff --git a/doc/git-annex-import.mdwn b/doc/git-annex-import.mdwn index e78fa0ac14..a0ecc9fa08 100644 --- a/doc/git-annex-import.mdwn +++ b/doc/git-annex-import.mdwn @@ -229,10 +229,12 @@ link, and that symbolic link will be followed. Note that using `--deduplicate` or `--clean-duplicates` with the WORM backend does not look at file content, but filename and mtime. -If annex.largefiles is configured, and does not match a file, `git annex -import` will add the non-large file directly to the git repository, +If `annex.largefiles` is configured (in the current repo's `.gitattributes` file), +and does not match a file, `git annex import` will add the non-large file directly to the git repository, instead of to the annex. +[Caveat Discussion: Adjusting Largefiles Specification](forum/Import_-_Changing_Largefiles) + # SEE ALSO [[git-annex]](1)
diff --git a/doc/forum/Import_-_Changing_Largefiles.mdwn b/doc/forum/Import_-_Changing_Largefiles.mdwn new file mode 100644 index 0000000000..a49fe3d970 --- /dev/null +++ b/doc/forum/Import_-_Changing_Largefiles.mdwn @@ -0,0 +1,20 @@ +# Changing Largefile Specification for Imported Trees + +If you want files to be large/small *after* already importing a tree from an `importtree` enabled remote, well, it appears you can't. + +I tried removing the imported branch via `git branch -d --remote <tree>/<branch>`. +While this produces a new clean import commit upon running `import` again, it does *not* respect changes to `.gitattributes`. +Instead, `git-annex` seems to hold onto information about which files were large/small in a given special remote. +So, the only way to change what are considered large files and small files is to create a new special remote entirely :/ + +For most people, this should not be too problematic since the history of imported trees isn't too important, but for some diffs on an external tree may be valuable. +Is there any interest in addressing this issue? +For a better understanding, here is a MWE to reproduce this: + +1. Create an `importtree` enabled special remote for a fresh repo without a `.gitattributes` file (or at least one without `annex.largefiles` attributes) +1. Import (e.g. `gx import -f tree main`) from this tree and note that all files are considered large (e.g. `git log --raw tree/main` -> `git show <hash>`) +1. Modify/create a local `.gitattributes` file (and add it to the index) that would specify one of the tree files as small (i.e. `annex.largefiles` does *not* match) +1. Attempt new import, or do `git branch -d --remote tree/main` and perform new import. +1. Note that all files are still considered large. + +Maybe there's another way of fixing this that I don't know about, but as far as I know, from this point you have to delete the special remote and redo the above now with the desired `.gitattributes` file staged for files in this external tree to be imported as small.
Added a comment: Now the current branch is pushed first! 🥳
diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_5_3f06a7f454747ec9359dd4c45b12a563._comment b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_5_3f06a7f454747ec9359dd4c45b12a563._comment new file mode 100644 index 0000000000..4910adf894 --- /dev/null +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_5_3f06a7f454747ec9359dd4c45b12a563._comment @@ -0,0 +1,57 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Now the current branch is pushed first! 🥳" + date="2025-06-07T09:39:27Z" + content=""" +Thank you very much joey, I can confirm that the current branch is now pushed first and thus used as the default branch of the newly created repo: + +## New version + +[[!format bash \"\"\" +$ git annex version --raw +10.20250605-gb9e3cf8780a04c8b1ac0cf4768c9ec510483477c$ +$ git init repo +Initialized empty Git repository in /home/yann/Downloads/git-annex.linux/repo/.git/ +$ cd repo +$ git annex init +init ok +(recording state in git...) +$ git remote add homelab ssh://.../yann/testrepo +$ touch bla +$ git annex assist +add bla ok +(recording state in git...) +commit (recording state in git...) +ok +pull homelab ok +push homelab ok +$ git remote show homelab | grep HEAD + HEAD branch: main ✅✅✅✅✅✅✅✅✅✅✅✅✅ +\"\"\"]] + +## Old version + +[[!format bash \"\"\" +🐟 ❯ git annex version --raw +10.20250521-g1a9e6bf26b56c39429d4a096bf733e57e5684e1b +🐟 ❯ git init repo2 +Leeres Git-Repository in /home/yann/Downloads/repo2/.git/ initialisiert +🐟 ❯ cd repo2/ +🐟 ❯ git annex init +init ok +(recording state in git...) +🐟 ❯ git remote add homelab ssh://.../yann/testrepo2 +🐟 ❯ touch bla +🐟 ❯ git annex assist +add bla ok +(recording state in git...) +commit (recording state in git...) +ok +pull homelab ok +push homelab ok +🐟 ❯ LC_ALL=C.UTF-8 git remote show homelab | grep HEAD + HEAD branch: synced/main ⚠️⚠️⚠️⚠️⚠️ +\"\"\"]] + +"""]]
Special remote protocol: How to identify exact size of a particular key?
diff --git a/doc/forum/special_remote_protocol__58___How_to_identify_exactsize.mdwn b/doc/forum/special_remote_protocol__58___How_to_identify_exactsize.mdwn new file mode 100644 index 0000000000..472d69d1f9 --- /dev/null +++ b/doc/forum/special_remote_protocol__58___How_to_identify_exactsize.mdwn @@ -0,0 +1,4 @@ +I'm trying to write a special remote protocol in which it would be really helpful to have the exact size for a particular key. I was thinking of something like the special remote asking git-annex `GETKEYINFO <key-id>` and git annex responds with some useful info (Something like a dictionary of useful values maybe?) + +I considered doing something like `git annex info ..` to figure this out but realized it's a bad idea(That'll be very brittle, plus it won't work well with chunked/encrypted remotes at all). Does git annex typically have this info available? It would even be helpful if it only gives responses in specific cases (eg: no encryption since it'll presumably be hard to keep track of that case) +
add news item for git-annex 10.20250605
diff --git a/doc/news/version_10.20250115.mdwn b/doc/news/version_10.20250115.mdwn deleted file mode 100644 index c6b56c47d6..0000000000 --- a/doc/news/version_10.20250115.mdwn +++ /dev/null @@ -1,26 +0,0 @@ -git-annex 10.20250115 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Improve handing of ssh connection problems during - remote annex.uuid discovery. - * log: Support --key, as well as --branch and --unused. - * Avoid verification error when addurl --verifiable is used - with an url claimed by a special remote other than the web. - * Fix installation on Android. - * Allow enableremote of an existing webdav special remote that has - read-only access. - * git-remote-annex: Use enableremote rather than initremote. - * Windows: Fix permission denied error when dropping files that - have the readonly attribute set. - * Added freezecontent-annex and thawcontent-annex hooks that - correspond to the git configs annex.freezecontent and - annex.thawcontent. - * Added secure-erase-annex hook that corresponds to the git config - annex.secure-erase-command. - * Added commitmessage-annex hook that corresponds to the git config - annex.commitmessage-command. - * Added http-headers-annex hook that corresponds to the git config - annex.http-headers-command. - * Added git configs annex.post-update-command and annex.pre-commit-command - that correspond to the post-update-annex and pre-commit-annex hooks. - * Added annex.pre-init-command git config and pre-init-annex hook - that is run before git-annex repository initialization. - * Linux standalone builds' bundled rsync updated to fix security holes."""]] \ No newline at end of file diff --git a/doc/news/version_10.20250605.mdwn b/doc/news/version_10.20250605.mdwn new file mode 100644 index 0000000000..5a9016e9f5 --- /dev/null +++ b/doc/news/version_10.20250605.mdwn @@ -0,0 +1,19 @@ +git-annex 10.20250605 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * sync: Push the current branch first, rather than a synced branch, + to better support git forges (gitlab, gitea, forgejo, etc.) which + use push-to-create with the first pushed branch becoming the default + branch. + * Added annex.fastcopy and remote.name.annex-fastcopy config setting. + When set, this allows the copy\_file\_range syscall to be used, which + can eg allow for server-side copies on NFS. (For fastest copying, + also disable annex.verify or remote.name.annex-verify.) + * map: Support --json option. + * map: Improve display of remote names. + * When annex.freezecontent-command or annex.thawcontent-command is + configured but fails, prevent initialization. This allows the user to + fix their configuration and avoid crippled filesystem detection + entering an adjusted branch. + * assistant: Avoid hanging at startup when a process has a *.lock file + open in the .git directory. + * Windows: Fix duplicate file bug that could occur when files were + supposed to be moved across devices."""]] \ No newline at end of file
initial report on "fatal: empty filename in tree entry"
diff --git a/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__.mdwn b/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__.mdwn new file mode 100644 index 0000000000..f0ffec47fb --- /dev/null +++ b/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__.mdwn @@ -0,0 +1,46 @@ +### Please describe the problem. + +`import` manages to import something empty (wild idea, didnt check -- might be a aws s3 key ending with `/` and creating empty named file or folder?) which lead to + +``` +$ git annex initremote s3-origin type=S3 importtree=yes encryption=none autoenable=true bucket=aind-benchmark-data fileprefix=mesoscale-anatomy-cell-detection/ public=yes signature=v4 storageclass=STANDARD port=443 signature=anonymous +... + +$ git annex import --from s3-origin master +... +update refs/remotes/s3-origin/master fatal: empty filename in tree entry +ok +(recording state in git...) + +$ git merge --allow-unrelated-histories s3-origin/master +fatal: empty filename in tree entry + +``` + +watchout if to reproduce -- it is about 12GB + +### What steps will reproduce the problem? + + +### What version of git-annex are you using? On what operating system? + +``` +(venv-annex) dandi@drogon:/mnt/backup/dandi/aind-benchmark-data/mesoscale-anatomy-cell-detection$ git annex version +git-annex version: 10.20250521-gafbe7e15b0f44ffa4c597dffc73b7cbdc0d06820 +build flags: Assistant Webapp Pairing Inotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.1 http-client-0.7.19 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +the version from pypi @mih started to build recently + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + + +[[!meta author=yoh]] +[[!tag projects/dandi]]
Added a comment
diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_5_4acd238cbe4eee4edf1311172f24555a._comment b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_5_4acd238cbe4eee4edf1311172f24555a._comment new file mode 100644 index 0000000000..960252d8d7 --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_5_4acd238cbe4eee4edf1311172f24555a._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="datawraith" + avatar="http://cdn.libravatar.org/avatar/e36c82a2b6f3150ad14a24eb7eb85826" + subject="comment 5" + date="2025-06-04T19:26:29Z" + content=""" +> What are the versions of git-annex in the VM where it worked vs where it didn't? + +The version on the VM is the same one I reported in the initial post: 10.20250520, installed via Homebrew. git-annex wasn't originally installed on that VM, so I installed it at that version to test it. + +When everything worked at first, I updated the VM to the Bluefin version I was running on my laptop, thinking that might be the problem, and then had the strange results I reported above. + +Since the git-annex installation itself had not changed between when things worked and when they stopped, I started to suspect something like the kernel bug I mentioned (because the Kernel *had* changed). + +I'm now also having trouble reproducing the problem in the VM at all. The files that were failing before are now added without problems again, as are newly created files -- though I had had to shut down and later restart the VM. I wish I had thought of making a full snapshot when I started experimenting, but I didn't. :-/ + +The only machine that exhibits the problem consistently now is my laptop. + +> And, if you can possibly download and unpack the linuxstandalone tarball, and use that to run git-annex in the bad VM, that would be a useful check that the problem does not somehow involve the homebrew build. https://git-annex.branchable.com/install/Linux_standalone/ + +With the standalone tarball (`10.20250521-g1a9e6bf26b56c39429d4a096bf733e57e5684e1b`, using the `./runshell`) `addcomputed` works as expected on my laptop -- Unicode characters are shown with the backslash escape, whereas the Homebrew build alone fails by stripping the unicode characters. + +Hm. + +Running the `git-annex` executable from `/home/linuxbrew/.linuxbrew/bin/` inside of the runshell works as well -- it doesn't strip the characters. That might mean that it is not the Homebrew build that is broken, but that something about my environment is simply screwed up. + +"""]]
tag as INM7 because it involves git-annex integration with forgejo
diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn index d1b7dd8e3d..e179f2e768 100644 --- a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn @@ -14,3 +14,5 @@ However, `git annex assist|sync|push` seem to push `git-annex`, `synced/git-anne Of course the solution is to just `git push` manually before `git annex assist`. But `git annex assist` is already such a brilliant command that does it all, and telling people to just run that to "do the git stuff" is very comfortable and easily accepted. Could the current branch be pushed first? Or is there a reason for pushing all the meta-branches first? > [[fixed|done]] --[[Joey]] + +[[!tag projects/INM7]]
sync: push current branch first
sync: Push the current branch first, rather than a synced branch, to better
support git forges (gitlab, gitea, forgejo, etc.) which use push-to-create
with the first pushed branch becoming the default branch.
With considerable complication to filter out warning message about
receive.denyCurrentBranch when pushing to a non-bare repository. Localization
may break it in the future, but it seems like the best way to handle this. See
my comments for the gory details.
sync: Push the current branch first, rather than a synced branch, to better
support git forges (gitlab, gitea, forgejo, etc.) which use push-to-create
with the first pushed branch becoming the default branch.
With considerable complication to filter out warning message about
receive.denyCurrentBranch when pushing to a non-bare repository. Localization
may break it in the future, but it seems like the best way to handle this. See
my comments for the gory details.
diff --git a/CHANGELOG b/CHANGELOG index 55ca8b37ef..d9675a47de 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -14,6 +14,10 @@ git-annex (10.20250521) UNRELEASED; urgency=medium When set, this allows the copy_file_range syscall to be used, which can eg allow for server-side copies on NFS. (For fastest copying, also disable annex.verify or remote.name.annex-verify.) + * sync: Push the current branch first, rather than a synced branch, + to better support git forges (gitlab, gitea, forgejo, etc.) which + use push-to-create with the first pushed branch becoming the default + branch. -- Joey Hess <id@joeyh.name> Thu, 22 May 2025 12:43:38 -0400 diff --git a/Command/Sync.hs b/Command/Sync.hs index 2892768b73..02326a390e 100644 --- a/Command/Sync.hs +++ b/Command/Sync.hs @@ -83,7 +83,6 @@ import Types.Availability import qualified Database.Export as Export import Utility.Bloom import Utility.OptParse -import Utility.Process.Transcript import Utility.Tuple import Utility.Matcher @@ -706,20 +705,13 @@ pushRemote o remote (Just branch, _) = do - Git offers no way to tell if a remote is bare or not, so both methods - are tried. - - - The direct push is likely to spew an ugly error message, so its stderr is - - often elided. Since git progress display goes to stderr too, the - - sync push is done first, and actually sends the data. Then the - - direct push is tried, with stderr discarded, to update the branch ref - - on the remote. + - The direct push is done first, because some hosting providers like + - github may treat the first branch pushed to a new repository as the + - default branch for that repository. - - The sync push first sends the synced/master branch, - and then forces the update of the remote synced/git-annex branch. - - - Since some providers like github may treat the first branch sent - - as the default branch, it's better to make that be synced/master than - - synced/git-annex. (Although neither is ideal, it's the best that - - can be managed given the constraints on order.) - - - The forcing is necessary if a transition has rewritten the git-annex branch. - Normally any changes to the git-annex branch get pulled and merged before - this push, so this forcing is unlikely to overwrite new data pushed @@ -728,34 +720,59 @@ pushRemote o remote (Just branch, _) = do - But overwriting of data on synced/git-annex can happen, in a race. - The only difference caused by using a forced push in that case is that - the last repository to push wins the race, rather than the first to push. + - + - The git-annex branch is pushed last. This push may fail if the remote + - has other changes in the git-annex branch, and that is not treated as an + - error, since the synced/git-annex branch has been sent already. Since no + - new data is usually sent in this push (due to synced/git-annex already + - having been pushed), it's ok to hide git's output to avoid displaying + - a push error. -} pushBranch :: Remote -> Maybe Git.Branch -> MessageState -> Git.Repo -> IO Bool -pushBranch remote mbranch ms g = directpush `after` annexpush `after` syncpush +pushBranch remote mbranch ms g = do + directpush + annexpush `after` syncpush where - syncpush = flip Git.Command.runBool g $ pushparams $ catMaybes - [ (refspec . origBranch) <$> mbranch - , Just $ Git.Branch.forcePush $ refspec Annex.Branch.name - ] - annexpush = void $ tryIO $ flip Git.Command.runQuiet g $ pushparams - [ Git.fromRef $ Git.Ref.base $ Annex.Branch.name ] directpush = case mbranch of - Nothing -> noop - -- Git prints out an error message when this fails. - -- In the default configuration of receive.denyCurrentBranch, - -- the error message mentions that config setting - -- (and should even if it is localized), and is quite long, - -- and the user was not intending to update the checked out - -- branch, so in that case, avoid displaying the error - -- message. Do display other error messages though, - -- including the error displayed when - -- receive.denyCurrentBranch=updateInstead -- the user - -- will want to see that one. Just branch -> do let p = flip Git.Command.gitCreateProcess g $ pushparams [ Git.fromRef $ Git.Ref.base $ origBranch branch ] - (transcript, ok) <- processTranscript' p Nothing - when (not ok && not ("denyCurrentBranch" `isInfixOf` transcript)) $ - hPutStr stderr transcript + let p' = p { std_err = CreatePipe } + bracket (createProcess p') cleanupProcess $ \h -> do + filterstderr [] (stderrHandle h) (processHandle h) + void $ waitForProcess (processHandle h) + Nothing -> noop + + syncpush = flip Git.Command.runBool g $ pushparams $ catMaybes + [ (syncrefspec . origBranch) <$> mbranch + , Just $ Git.Branch.forcePush $ syncrefspec Annex.Branch.name + ] + + annexpush = void $ tryIO $ flip Git.Command.runQuiet g $ pushparams + [ Git.fromRef $ Git.Ref.base $ Annex.Branch.name ] + + -- In the default configuration of receive.denyCurrentBranch, + -- git's stderr message mentions that config setting + -- (and should even if it is localized), and is quite long, + -- and the user was not intending to update the checked out + -- branch, so in that case, avoid displaying the error + -- message. Do display other error messages though, + -- including the error displayed when + -- receive.denyCurrentBranch=updateInstead; the user + -- will want to see that one. Also display progress messages. + filterstderr buf herr pid = hGetLineUntilExitOrEOF pid herr >>= \case + Just l + | "remote: " `isPrefixOf` l || not (null buf)-> + filterstderr (l:buf) herr pid + | otherwise -> do + hPutStrLn stderr l + filterstderr [] herr pid + Nothing -> displaybuf + where + displaybuf = + unless (any ("receive.denyCurrentBranch" `isInfixOf`) buf) $ + mapM_ (hPutStrLn stderr) (reverse buf) + pushparams branches = catMaybes [ Just $ Param "push" , if commandProgressDisabled' ms @@ -763,7 +780,8 @@ pushBranch remote mbranch ms g = directpush `after` annexpush `after` syncpush else Nothing , Just $ Param $ Remote.name remote ] ++ map Param branches - refspec b = concat + + syncrefspec b = concat [ Git.fromRef $ Git.Ref.base b , ":" , Git.fromRef $ Git.Ref.base $ syncBranch b diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn index 848fbfb30d..d1b7dd8e3d 100644 --- a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn @@ -12,3 +12,5 @@ This is very useful as it enables quick creation of repos without going through However, `git annex assist|sync|push` seem to push `git-annex`, `synced/git-annex`, or `synced/<currentbranch>` (in a seemingly random order? 🤔) **before** pushing `<currentbranch>` itself, causing this first pushed branch to become the repository's default branch. A `git clone ssh://me@myserver.com/me/myrepo` will then result in a local repo with e.g. `synced/main` checked out - or worse - `synced/git-annex`, causing a lot of confusion. Accidentally running `git annex assist` again will produce another level of `synced/synced/main` branches and all that fun stuff. (Very fun time during that summer school where I established git-annex + forgejo as data exchange 😉). Of course the solution is to just `git push` manually before `git annex assist`. But `git annex assist` is already such a brilliant command that does it all, and telling people to just run that to "do the git stuff" is very comfortable and easily accepted. Could the current branch be pushed first? Or is there a reason for pushing all the meta-branches first? + +> [[fixed|done]] --[[Joey]] diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment index ec22ced3a8..c04fcd917f 100644 --- a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment @@ -8,14 +8,19 @@ Basically: * We don't know if the remote is bare or non-bare. git does not generally provide a way to tell. -* Pushing to the checked out branch of a non-bare repo will complain on stderr. +* Pushing to the checked out branch of a non-bare repo will complain on + stderr, and the overall git push will fail even if other branches were + successfully pushed. But this is a fairly common use case for `git-annex sync`, and that complaint would be unwanted noise. git progress output also goes to stderr, so /dev/null of stderr is not desirable. * So instead push the synced branches, which doesn't have that problem, and lets - git display progress for the main data transfer. -* Then the current branch is pushed, with stderr collected and displayed - after filtering out denyCurrentBranch error messages. + git display progress for the main data transfer. As long as the + synced/master branch is pushed, the overall push part of sync can be + considered to succeed. +* Then the current branch is pushed, with stderr collected and displayed, + unless it contains the denyCurrentBranch warning message. A failure of this + push is not treated as an error. Also this was previously considered and partly addressed in [[!commit 1cc7b2661e5ec60f73f04dbe91940d2602df6246]] which made it push @@ -25,6 +30,7 @@ using a version from before that change. At that point I thought this was a github specific problem, mind. I think that to improve this, git-annex would need to run git push of master -with stderr intercepted and the denyCurrentBranch error message filtered out. +with stderr intercepted and the denyCurrentBranch error message filtered +out, but the rest of stderr (progress, etc) still displayed. Which does seem doable. """]] diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_2_922819fd788abf5b8863ab199c6930cb._comment b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_2_922819fd788abf5b8863ab199c6930cb._comment new file mode 100644 index 0000000000..75cb5b6a13 --- /dev/null +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_2_922819fd788abf5b8863ab199c6930cb._comment @@ -0,0 +1,164 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-06-04T15:11:21Z" + content=""" (Diff truncated)
comment
diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment new file mode 100644 index 0000000000..ec22ced3a8 --- /dev/null +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-04T13:45:18Z" + content=""" +Command/Sync.hs has a big comment on pushBranch about push order considerations. +Basically: + +* We don't know if the remote is bare or non-bare. git does not generally + provide a way to tell. +* Pushing to the checked out branch of a non-bare repo will complain on stderr. + But this is a fairly common use case for `git-annex sync`, and that + complaint would be unwanted noise. git progress output also goes to stderr, + so /dev/null of stderr is not desirable. +* So instead push the synced branches, which doesn't have that problem, and lets + git display progress for the main data transfer. +* Then the current branch is pushed, with stderr collected and displayed + after filtering out denyCurrentBranch error messages. + +Also this was previously considered and partly addressed in +[[!commit 1cc7b2661e5ec60f73f04dbe91940d2602df6246]] which made it push +synced/master before synced/git-annex, to at least avoid the git-annex branch +becoming the default branch. The varying behavior you're seeing may be due to +using a version from before that change. At that point I thought this was a +github specific problem, mind. + +I think that to improve this, git-annex would need to run git push of master +with stderr intercepted and the denyCurrentBranch error message filtered out. +Which does seem doable. +"""]]
comment
diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_4_c39de00401ad7f96fde93305e232139a._comment b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_4_c39de00401ad7f96fde93305e232139a._comment new file mode 100644 index 0000000000..e7c549bc52 --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_4_c39de00401ad7f96fde93305e232139a._comment @@ -0,0 +1,33 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-06-03T19:05:35Z" + content=""" +Nice work investigating this. I would not have guessed a kernel bug might +be involved. But I am not convinced one is, either. + +I agree with your analysis of your strace. The filename is getting into +git-annex ok. Then it runs the compute program with the mangled filename. + +I don't see how a kernel bug would cause git-annex to mangle the filename +though. As far as `git-annex addcomputed` is concerned, the filename is +just a parameter to use as input to the computation. Such parameters are +not limited to filenames actually. And so they pass through `git-annex +addcomputed` without being exposed to any kernel syscall that might do +something wrong on a buggy kernel. + +Unless, that is, the haskell `process` library, or indeed the kernel +itself, does something with parameters passed to the compute program. + +(This strace does rule out my theories around `hGetLineUntilExitOrEOF`.) + +---- + +What are the versions of git-annex in the VM where it worked vs +where it didn't? + +And, if you can possibly download and unpack the linuxstandalone tarball, +and use that to run git-annex in the bad VM, that would be a useful check +that the problem does not somehow involve the homebrew build. +<https://git-annex.branchable.com/install/Linux_standalone/> +"""]]
annex.fastcopy
Added annex.fastcopy and remote.name.annex-fastcopy config setting. When
set, this allows the copy_file_range syscall to be used, which can eg allow
for server-side copies on NFS. (For fastest copying, also disable
annex.verify or remote.name.annex-verify.)
This is a simple implementation, that does not handle resuming as well as
it possibly could.
It can be used with both local git remotes (including on NFS), and
directory special remotes. Other types of remotes could in theory also
support it, so I've left the config documented as a general thing.
Added annex.fastcopy and remote.name.annex-fastcopy config setting. When
set, this allows the copy_file_range syscall to be used, which can eg allow
for server-side copies on NFS. (For fastest copying, also disable
annex.verify or remote.name.annex-verify.)
This is a simple implementation, that does not handle resuming as well as
it possibly could.
It can be used with both local git remotes (including on NFS), and
directory special remotes. Other types of remotes could in theory also
support it, so I've left the config documented as a general thing.
diff --git a/Annex/CopyFile.hs b/Annex/CopyFile.hs index 83bc55e42a..9c9baf2e4f 100644 --- a/Annex/CopyFile.hs +++ b/Annex/CopyFile.hs @@ -1,6 +1,6 @@ {- Copying files. - - - Copyright 2011-2022 Joey Hess <id@joeyh.name> + - Copyright 2011-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -10,6 +10,7 @@ module Annex.CopyFile where import Annex.Common +import qualified Annex import Utility.Metered import Utility.CopyFile import Utility.FileMode @@ -77,6 +78,23 @@ tryCopyCoW (CopyCoWTried copycowtried) src dest meterupdate = data CopyMethod = CopiedCoW | Copied +-- Should cp be allowed to copy the file with --reflink=auto? +-- +-- The benefit is that this lets it use the copy_file_range +-- syscall, which is not used with --reflink=always. The drawback is that +-- the IncrementalVerifier is not updated, so verification, if it is done, +-- will need to re-read the whole content of the file. And, interrupted +-- copies are not resumed but are restarted from the beginning. +-- +-- Using this will result in CopiedCow being returned even in cases +-- where cp fell back to a slow copy. +newtype FastCopy = FastCopy Bool + +getFastCopy :: RemoteGitConfig -> Annex FastCopy +getFastCopy gc = case remoteAnnexFastCopy gc of + False -> FastCopy . annexFastCopy <$> Annex.getGitConfig + True -> return (FastCopy True) + {- Copies from src to dest, updating a meter. Preserves mode and mtime. - Uses copy-on-write if it is supported. If the the destination already - exists, an interrupted copy will resume where it left off. @@ -94,38 +112,49 @@ data CopyMethod = CopiedCoW | Copied - (eg when isStableKey is false), and doing this avoids getting a - corrupted file in such cases. -} -fileCopier :: CopyCoWTried -> OsPath -> OsPath -> MeterUpdate -> Maybe IncrementalVerifier -> IO CopyMethod +fileCopier :: CopyCoWTried -> FastCopy -> OsPath -> OsPath -> MeterUpdate -> Maybe IncrementalVerifier -> IO CopyMethod +fileCopier copycowtried (FastCopy True) src dest meterupdate iv = do + ok <- watchFileSize dest meterupdate $ const $ + copyFileExternal CopyTimeStamps src dest + if ok + then do + maybe noop unableIncrementalVerifier iv + return CopiedCoW + else fileCopier copycowtried (FastCopy False) src dest meterupdate iv #ifdef mingw32_HOST_OS -fileCopier _ src dest meterupdate iv = docopy +fileCopier _ _ src dest meterupdate iv = + fileCopier' src dest meterupdate iv #else -fileCopier copycowtried src dest meterupdate iv = +fileCopier copycowtried _ src dest meterupdate iv = ifM (tryCopyCoW copycowtried src dest meterupdate) ( do maybe noop unableIncrementalVerifier iv return CopiedCoW - , docopy + , fileCopier' src dest meterupdate iv ) #endif - where - docopy = do - -- The file might have had the write bit removed, - -- so make sure we can write to it. - void $ tryIO $ allowWrite dest - F.withBinaryFile src ReadMode $ \hsrc -> - fileContentCopier hsrc dest meterupdate iv +fileCopier' :: OsPath -> OsPath -> MeterUpdate -> Maybe IncrementalVerifier -> IO CopyMethod +fileCopier' src dest meterupdate iv = do + -- The file might have had the write bit removed, + -- so make sure we can write to it. + void $ tryIO $ allowWrite dest + + F.withBinaryFile src ReadMode $ \hsrc -> + fileContentCopier hsrc dest meterupdate iv - -- Copy src mode and mtime. - mode <- fileMode <$> R.getFileStatus (fromOsPath src) - mtime <- utcTimeToPOSIXSeconds <$> getModificationTime src - let dest' = fromOsPath dest - R.setFileMode dest' mode - touch dest' mtime False + -- Copy src mode and mtime. + mode <- fileMode <$> R.getFileStatus (fromOsPath src) + mtime <- utcTimeToPOSIXSeconds <$> getModificationTime src + let dest' = fromOsPath dest + R.setFileMode dest' mode + touch dest' mtime False - return Copied + return Copied {- Copies content from a handle to a destination file. Does not - use copy-on-write, and does not copy file mode and mtime. + - Updates the IncementalVerifier with the content it copies. -} fileContentCopier :: Handle -> OsPath -> MeterUpdate -> Maybe IncrementalVerifier -> IO () fileContentCopier hsrc dest meterupdate iv = diff --git a/CHANGELOG b/CHANGELOG index 9b7ddf6c5e..55ca8b37ef 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -10,6 +10,10 @@ git-annex (10.20250521) UNRELEASED; urgency=medium * map: Improve display of remote names. * Windows: Fix duplicate file bug that could occur when files were supposed to be moved across devices. + * Added annex.fastcopy and remote.name.annex-fastcopy config setting. + When set, this allows the copy_file_range syscall to be used, which + can eg allow for server-side copies on NFS. (For fastest copying, + also disable annex.verify or remote.name.annex-verify.) -- Joey Hess <id@joeyh.name> Thu, 22 May 2025 12:43:38 -0400 diff --git a/Remote/Directory.hs b/Remote/Directory.hs index 372a485ba7..5392caafa3 100644 --- a/Remote/Directory.hs +++ b/Remote/Directory.hs @@ -84,11 +84,12 @@ gen r u rc gc rs = do cst <- remoteCost gc c cheapRemoteCost let chunkconfig = getChunkConfig c cow <- liftIO newCopyCoWTried + fastcopy <- getFastCopy gc let ii = IgnoreInodes $ fromMaybe True $ getRemoteConfigValue ignoreinodesField c return $ Just $ specialRemote c - (storeKeyM dir chunkconfig cow) - (retrieveKeyFileM dir chunkconfig cow) + (storeKeyM dir chunkconfig cow fastcopy) + (retrieveKeyFileM dir chunkconfig cow fastcopy) (removeKeyM dir) (checkPresentM dir chunkconfig) Remote @@ -105,8 +106,8 @@ gen r u rc gc rs = do , checkPresent = checkPresentDummy , checkPresentCheap = True , exportActions = ExportActions - { storeExport = storeExportM dir cow - , retrieveExport = retrieveExportM dir cow + { storeExport = storeExportM dir cow fastcopy + , retrieveExport = retrieveExportM dir cow fastcopy , removeExport = removeExportM dir , checkPresentExport = checkPresentExportM dir -- Not needed because removeExportLocation @@ -118,7 +119,7 @@ gen r u rc gc rs = do { listImportableContents = listImportableContentsM ii dir , importKey = Just (importKeyM ii dir) , retrieveExportWithContentIdentifier = retrieveExportWithContentIdentifierM ii dir cow - , storeExportWithContentIdentifier = storeExportWithContentIdentifierM ii dir cow + , storeExportWithContentIdentifier = storeExportWithContentIdentifierM ii dir cow fastcopy , removeExportWithContentIdentifier = removeExportWithContentIdentifierM ii dir -- Not needed because removeExportWithContentIdentifier -- auto-removes empty directories. @@ -189,8 +190,8 @@ storeDir d k = addTrailingPathSeparator $ {- Check if there is enough free disk space in the remote's directory to - store the key. Note that the unencrypted key size is checked. -} -storeKeyM :: OsPath -> ChunkConfig -> CopyCoWTried -> Storer -storeKeyM d chunkconfig cow k c m = +storeKeyM :: OsPath -> ChunkConfig -> CopyCoWTried -> FastCopy -> Storer +storeKeyM d chunkconfig cow fastcopy k c m = ifM (checkDiskSpaceDirectory d k) ( do void $ liftIO $ tryIO $ createDirectoryUnder [d] tmpdir @@ -210,7 +211,7 @@ storeKeyM d chunkconfig cow k c m = in byteStorer go k c m NoChunks -> let go _k src p = liftIO $ do - void $ fileCopier cow src tmpf p Nothing + void $ fileCopier cow fastcopy src tmpf p Nothing finalizeStoreGeneric d tmpdir destdir in fileStorer go k c m _ -> @@ -247,12 +248,12 @@ finalizeStoreGeneric d tmp dest = do mapM_ preventWrite =<< dirContents dest preventWrite dest -retrieveKeyFileM :: OsPath -> ChunkConfig -> CopyCoWTried -> Retriever -retrieveKeyFileM d (LegacyChunks _) _ = Legacy.retrieve locations' d -retrieveKeyFileM d NoChunks cow = fileRetriever' $ \dest k p iv -> do +retrieveKeyFileM :: OsPath -> ChunkConfig -> CopyCoWTried -> FastCopy -> Retriever +retrieveKeyFileM d (LegacyChunks _) _ _ = Legacy.retrieve locations' d +retrieveKeyFileM d NoChunks cow fastcopy = fileRetriever' $ \dest k p iv -> do src <- liftIO $ getLocation d k - void $ liftIO $ fileCopier cow src dest p iv -retrieveKeyFileM d _ _ = byteRetriever $ \k sink -> + void $ liftIO $ fileCopier cow fastcopy src dest p iv (Diff truncated)
comment
diff --git a/doc/todo/use_copy__95__file__95__range_for_get_and_copy/comment_3_04c577bff624af761d997e5f9a8d951d._comment b/doc/todo/use_copy__95__file__95__range_for_get_and_copy/comment_3_04c577bff624af761d997e5f9a8d951d._comment new file mode 100644 index 0000000000..db3be71305 --- /dev/null +++ b/doc/todo/use_copy__95__file__95__range_for_get_and_copy/comment_3_04c577bff624af761d997e5f9a8d951d._comment @@ -0,0 +1,56 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-06-03T16:37:31Z" + content=""" +A config setting may be unncessesary. If git-annex tried to use +`copy_file_range` itself, that would fail with EOPNOTSUPP or EXDEV +or EXDEV when not supported. Then git-annex could use `cp --reflink=always` +as a fallback. + +However, `copy_file_range` is not necessarily inexpensive. Depending on the +filesystem it can still need to read and write the whole file. And, rather +than a single syscall copying the whole file, git-annex would need to call +it repeatedly in chunks in order to display a progress bar. But, making a +lot of syscalls against a NFS filesystem would be its own overhead. + +So there seems to be a tradeoff between progress display and efficiency on +NFS. And if the goal is to maximize speed for NFS with server-side copy, +maybe progress bars are not important enough to have in that case? + +Also, it seems likely to me that you would certainly want to turn off +annex.verify along with using `copy_file_range`, which is already a manual +config setting. So a second config setting would be no big deal. + +---- + +As to other filesystems, I found this comment with an overview as of 2022: +<https://github.com/openzfs/zfs/discussions/4237#discussioncomment-3579635> + +For btrfs, it does reflinking, so no benefit to using it over what +git-annex does now. + +Testing on ext4, `cp --reflink=auto` used `copy_file_range` in a copy on +the same filesystem (it tried it cross-filesystem, but it failed and had to +fall back to a regulat copy). So does `cp` with no options. On a SSD, +with big enough files (4 gb or so), I did see noticable performance +improvements. + +If git-annex did `copy_file_range` in chunks on ext4, it could read each +chunk after it was written to the destination file, and get it from the +page cache. But that would still copy the content of the file into user +space. So the savings from using `copy_file_range` with annex.verify set +on ext4 seem like they would only be in avoiding the userspace to kernel +transfer, with the kernel to userspace transfer still needed. + +That also notes that, on NFS, `copy_file_range` can do a CoW copy when the +underlying filesystem supports it. So with NFS on btrfs or zfs, a single +`copy_file_range` call could result in no more work than a reflink, +optimially efficient. If git-annex did `copy_file_range` on each chunk in +order to display a progress bar, that would be a lot of syscalls in flight +over the network, so noticably slower. + +All of this is making me lean toward a config setting that enables +`copy_file_range`, without progress bars, and that is intended to be +used with annex.verify disabled in order to get optimal performance. +"""]]
Added a comment
diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_3_70a0b4c7bad79a917fa0b9a8526ec428._comment b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_3_70a0b4c7bad79a917fa0b9a8526ec428._comment new file mode 100644 index 0000000000..844ba7e88d --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_3_70a0b4c7bad79a917fa0b9a8526ec428._comment @@ -0,0 +1,61 @@ +[[!comment format=mdwn + username="datawraith" + avatar="http://cdn.libravatar.org/avatar/e36c82a2b6f3150ad14a24eb7eb85826" + subject="comment 3" + date="2025-06-03T17:05:58Z" + content=""" +Thank you for taking the time to look into this! + +I am indeed using Homebrew on Linux. I'm on [Bluefin](https://projectbluefin.io/), which uses Fedora Silverblue as a base. Software there is generally installed either as Flatpak or via Homebrew because the root image is immutable. + +I see the behavior on both my desktop and laptop, both running a recent Bluefin version (bluefin-dx:latest, based on Fedora Silverblue 42), but it just occurred to me that I could try it in a virtual machine, too. + +When using a slightly older release of Bluefin I had on that VM, everything worked fine, but when I updated to the latest version, the `addcomputed` command started failing. Interestingly it works fine with files that were created before the update -- including with unicode filenames --, but when I create a new file with unicode characters after updating to the latest image, addcomputed fails on those, which seems to indicate this is **likely not a git-annex problem after all**. + +After a bit of research, I found [this](https://www.phoronix.com/news/Linux-Reverts-Special-Char-Uni) Linux problem that broke unicode handling in filenames, but I'm by no means certain that that is the cause of the problem, and if it is, there might be nothing you can do in git-annex to fix it. + +Unless you want to pursue this further, I'm fine with just closing the bug as not applicable. + +--- + +Still, I've added the requested strace log below -- I couldn't see a meaningful difference between the logs that worked, and the ones that failed, other than the failure itself and the missing unicode character escapes. + +Grepping the failure strace log for \"filename\" yields the following: + +``` +14497 execve(\"/usr/sbin/git\", [\"git\", \"annex\", \"addcomputed\", \"--to=passthrough\", \"\303\204 filename with Unic\303\266de ch\303\244ra\"..., \"foo.txt\"], 0x7ffe655d3190 /* 81 vars */) = 0 +14498 execve(\"/home/linuxbrew/.linuxbrew/bin/git-annex\", [\"/home/linuxbrew/.linuxbrew/bin/g\"..., \"addcomputed\", \"--to=passthrough\", \"\303\204 filename with Unic\303\266de ch\303\244ra\"..., \"foo.txt\"], 0x55d77d2df560 /* 82 vars */ <unfinished ...> +14507 execve(\"/usr/libexec/git-core/git-annex-compute-passthrough\", [\"git-annex-compute-passthrough\", \" filename with Unicde chracters.\"..., \"foo.txt\"], 0x4200427cb0 /* 82 vars */) = -1 ENOENT (No such file or directory) +14507 execve(\"/home/linuxbrew/.linuxbrew/bin/git-annex-compute-passthrough\", [\"git-annex-compute-passthrough\", \" filename with Unicde chracters.\"..., \"foo.txt\"], 0x4200427cb0 /* 82 vars */) = -1 ENOENT (No such file or directory) +14507 execve(\"/home/linuxbrew/.linuxbrew/sbin/git-annex-compute-passthrough\", [\"git-annex-compute-passthrough\", \" filename with Unicde chracters.\"..., \"foo.txt\"], 0x4200427cb0 /* 82 vars */) = -1 ENOENT (No such file or directory) +14507 execve(\"/var/home/myusername/.local/bin/git-annex-compute-passthrough\", [\"git-annex-compute-passthrough\", \" filename with Unicde chracters.\"..., \"foo.txt\"], 0x4200427cb0 /* 82 vars */) = -1 ENOENT (No such file or directory) +14507 execve(\"/var/home/myusername/bin/git-annex-compute-passthrough\", [\"git-annex-compute-passthrough\", \" filename with Unicde chracters.\"..., \"foo.txt\"], 0x4200427cb0 /* 82 vars */ <unfinished ...> +14507 write(1, \"INPUT filename with Unicde chra\"..., 42) = 42 +14498 <... read resumed>\"INPUT filename with Unicde chra\"..., 8192) = 42 +14498 write(19, \":./ filename with Unicde chracte\"..., 39 <unfinished ...> +14508 read(0, \":./ filename with Unicde chracte\"..., 4096) = 39 +14508 write(1, \":./ filename with Unicde chracte\"..., 47) = 47 +14498 read(20, \":./ filename with Unicde chracte\"..., 8192) = 47 +14498 write(19, \":./ filename with Unicde chracte\"..., 39) = 39 +14508 <... read resumed>\":./ filename with Unicde chracte\"..., 4096) = 39 +14508 write(1, \":./ filename with Unicde chracte\"..., 47 <unfinished ...> +14498 read(20, \":./ filename with Unicde chracte\"..., 8192) = 47 +``` + +Also with the /tmp/passthrough.log commented out. + +I haven't used strace before, but if I'm reading this right, it looks like the characters get lost as or after git-annex receives them, but before the passthrough script is called. There is a ton of output between the git-annex execve (14498) and the one for the passthrough script (14507), mostly seems to be loading libraries and examining the .git directory. It also loads the git.mo translation files and system locale settings in-between, but there is no obvious point of failure. + +--- + +Interestingly I get the same behavior for the invalid byte sequence example as for the unicode characters: + +``` +git-annex: The computation needs an input file that is not checked into the git repository: invalid +failed +addcomputed: 1 failed +``` + +They are simply stripped. + +"""]]
followup
diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_1_2da72486f0e83e74871706757b0badb6._comment b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_1_2da72486f0e83e74871706757b0badb6._comment new file mode 100644 index 0000000000..e2e06b3473 --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_1_2da72486f0e83e74871706757b0badb6._comment @@ -0,0 +1,43 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-02T17:21:37Z" + content=""" +> I'm running on Linux and my locale is de_DE.UTF-8: +> +> git-annex was installed using Homebrew. + +That's unusual. Linux and Homebrew? I just want to check you didn't +typo there and mean to say you're on OSX. + +Tried just now (including the same locale setting) and it does not fail for me: + + joey@darkstar:~/tmp/c>git annex addcomputed --to=passthrough "Ä filename with Unicöde chäracters.txt" fails.txt + addcomputed passthrough + (adding fails.txt...) (checksum...) + ok + (recording state in git...) + +There are 3 possibilities here: + +1. The unicode characters are getting stripped out before git-annex is run, + eg by your interactive shell or by git. +2. git-annex is stripping out valid (or invalid) unicode. +3. "read" or "echo" in your git-annex-compute-passthrough script is + for some reason stripping unicode + +The best way to track down which of these is the problem is `strace`, so could you please try this: + + strace -o log -f git annex addcomputed --to=passthrough "Ä filename with Unicöde chäracters.txt" foo.txt + grep "filename with" log + +Here's how that strace looks for me, when the characters are making it through unscathed: + + 2395608 execve("/usr/bin/git", ["git", "annex", "addcomputed", "--to=passthrough", "\303\204 filename with Unic\303\266de ch\303\244ra"..., "fails3.txt"], 0x7ffc44897f00 /* 69 vars */) = 0 + 2395609 execve("/home/joey/bin/git-annex", ["/home/joey/bin/git-annex", "addcomputed", "--to=passthrough", "\303\204 filename with Unic\303\266de ch\303\244ra"..., "fails3.txt"], 0x55c3cfdf27c0 /* 70 vars */ <unfinished ...> + 2395618 execve("/home/joey/bin/git-annex-compute-passthrough", ["git-annex-compute-passthrough", "\303\204 filename with Unic\303\266de ch\303\244ra"..., "fails3.txt"], 0x42000ec610 /* 70 vars */ <unfinished ...> + 2395618 write(1, "INPUT \303\204 filename with Unic\303\266de "..., 48) = 48 + 2395609 read(16, "INPUT \303\204 filename with Unic\303\266de "..., 8192) = 48 + +(I commented out the passthrough.log writing from the script to keep the strace easier to follow.) +"""]] diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_2_a51f8ba4f0b24c21e107bc33db2412ab._comment b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_2_a51f8ba4f0b24c21e107bc33db2412ab._comment new file mode 100644 index 0000000000..b4627aeac7 --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_2_a51f8ba4f0b24c21e107bc33db2412ab._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-06-02T17:54:05Z" + content=""" +I don't see how git-annex could be stripping even invalid unicode here. +When it runs the compute program it uses `process` with `CreatePipe`. That +is documented to use the default encoding. git-annex sets the default +encoding in `useFileSystemEncoding`. + +With that said, git-annex is here using `hGetLineUntilExitOrEOF`, and if +`hGetChar` ever failed with an encoding error, it does look like that +would skip over the problem and return the rest of the string. + +It would not hurt to throw in a `fileEncoding` on the compute process's +handles, but I'd really want to be able to reproduce this first. + +I have also tried with filenames that are not valid unicode at all, and +they pass through ok. Eg: + + invalid_byte_sequence=$'\x80\x81' + echo hi > invalid$(printf %s $invalid_byte_sequence) + git-annex add invalid* + git annex addcomputed --to=passthrough invalid* invalidout + cat invalidout + hi +"""]]
Added a comment
diff --git a/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_9_60cd62cff48ca72cb3b4a89d0313e10c._comment b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_9_60cd62cff48ca72cb3b4a89d0313e10c._comment new file mode 100644 index 0000000000..3755e75ac3 --- /dev/null +++ b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_9_60cd62cff48ca72cb3b4a89d0313e10c._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 9" + date="2025-06-02T17:22:37Z" + content=""" +Kai said + +> The repository is located on an NTFS drive. I don't recall whether it +> was cloned using git clone from within WSL or downloaded directly from +> GitHub, but the repository is stored on an NTFS drive and is accessible +> from WSL. I'm not sure if the cloning method is relevant to this issue, + +"""]]
diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames.mdwn b/doc/bugs/compute_remote_fails_for_unicode_filenames.mdwn new file mode 100644 index 0000000000..7ad0ccc47a --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames.mdwn @@ -0,0 +1,136 @@ +### Please describe the problem. + +I'm experimenting with the compute special remote by trying to convert FLAC files to .opus. + +Some of the music files have unicode characters in the filename, which leads to an incorrect error message saying that the file is not checked into the repository. + +It is possible that I'm just doing something wrong here, but as far as I can tell, the unicode characters are simply stripped by git-annex. + + +### What steps will reproduce the problem? + +1. Commit a file with unicode characters in the filename to the git repository +2. Invoke a compute remote with that file +3. git annex complains that the file is not checked into the git repository + +### What version of git-annex are you using? On what operating system? + +I'm running on Linux and my locale is de_DE.UTF-8: + +``` +$ locale +LANG=de_DE.UTF-8 +LC_CTYPE="de_DE.UTF-8" +LC_NUMERIC="de_DE.UTF-8" +LC_TIME="de_DE.UTF-8" +LC_COLLATE="de_DE.UTF-8" +LC_MONETARY="de_DE.UTF-8" +LC_MESSAGES="de_DE.UTF-8" +LC_PAPER="de_DE.UTF-8" +LC_NAME="de_DE.UTF-8" +LC_ADDRESS="de_DE.UTF-8" +LC_TELEPHONE="de_DE.UTF-8" +LC_MEASUREMENT="de_DE.UTF-8" +LC_IDENTIFICATION="de_DE.UTF-8" +LC_ALL= +``` + +git-annex was installed using Homebrew. + +``` +git-annex version: 10.20250520 +build flags: Pairing DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.8.4 http-client-0.7.19 persistent-sqlite-2.13.3.1 torrent-10000.1.3 uuid-1.3.16 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +### Please provide any additional information below. + +Here is a minimal reproduction of the problem: + +[[!format sh """ +$ git init compute-unicode +$ cd compute-unicode +$ touch "A filename without Unicode characters.txt" +$ touch "Ä filename with Unicöde chäracters.txt" +$ git add . +$ git commit -m "Demo" + +[main (Root-Commit) 3655a71] Demo + 2 files changed, 0 insertions(+), 0 deletions(-) + create mode 100644 A filename without Unicode characters.txt + create mode 100644 "\303\204 filename with Unic\303\266de ch\303\244racters.txt" + +$ git annex init + +init ok +(recording state in git...) + +$ git annex initremote passthrough type=compute program=git-annex-compute-passthrough + +initremote passthrough ok +(recording state in git...) + +$ git annex addcomputed --to=passthrough "A filename without Unicode characters.txt" works.txt + +addcomputed passthrough +(adding works.txt...) (checksum...) +ok +(recording state in git...) + +$ git annex addcomputed --to=passthrough "Ä filename with Unicöde chäracters.txt" fails.txt + +addcomputed passthrough + +git-annex: The computation needs an input file that is not checked into the git repository: filename with Unicde chracters.txt +failed +addcomputed: 1 failed +"""]] + +Note how the unicode characters are simply missing in git-annex's message: " filename with Unicde chracters.txt". + +I first thought this was a problem with my script, but it seems that git-annex strips the Unicode characters before invoking it. + +The passthrough-remote looks like this (adapted from the ImageMagick example): + +```sh +#!/bin/sh +set -e + +if [ -z "$1" ] || [ -z "$2" ]; then + echo "Specify the input file, followed by the output file." >&2 + echo "Example: input.txt output.txt" >&2 + exit 1 +fi + +echo "INPUT: $1" > /tmp/passthrough.log +echo "OUTPUT: $2" >> /tmp/passthrough.log + +echo "INPUT $1" +read input +echo "OUTPUT $2" +read output + + +if [ -n "$input" ]; then + cat "$input" > "$output" +fi +``` + +The log file in /tmp/passthrough.log doesn't have the Unicode characters: + +``` +INPUT: filename with Unicde chracters.txt +OUTPUT: fails.txt +``` + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + + +I've been happily managing my important data (as well as things like my music collection) with git-annex for a few years now, with it making sure that everything has several copies on different external storage media. :-)
Suggest pushing current branch before the meta-branches
diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn new file mode 100644 index 0000000000..848fbfb30d --- /dev/null +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn @@ -0,0 +1,14 @@ +Many git forges (gitlab, gitea, forgejo, etc.) support [push-to-create](https://forgejo.org/docs/latest/user/push-to-create/#push-to-create) to create repositories upon the first push, e.g. + +[[!format bash """ +# add the (still nonexistant) remote to this local repo +> git remote add myserver ssh://me@myserver.com/me/myrepo +# push to the remote as if it existed (will create it on the remote) +> git push -u myserver +"""]] + +This is very useful as it enables quick creation of repos without going through a tedious GUI. + +However, `git annex assist|sync|push` seem to push `git-annex`, `synced/git-annex`, or `synced/<currentbranch>` (in a seemingly random order? 🤔) **before** pushing `<currentbranch>` itself, causing this first pushed branch to become the repository's default branch. A `git clone ssh://me@myserver.com/me/myrepo` will then result in a local repo with e.g. `synced/main` checked out - or worse - `synced/git-annex`, causing a lot of confusion. Accidentally running `git annex assist` again will produce another level of `synced/synced/main` branches and all that fun stuff. (Very fun time during that summer school where I established git-annex + forgejo as data exchange 😉). + +Of course the solution is to just `git push` manually before `git annex assist`. But `git annex assist` is already such a brilliant command that does it all, and telling people to just run that to "do the git stuff" is very comfortable and easily accepted. Could the current branch be pushed first? Or is there a reason for pushing all the meta-branches first?
caps
diff --git a/doc/install.mdwn b/doc/install.mdwn index 3d8413e389..23f440689d 100644 --- a/doc/install.mdwn +++ b/doc/install.mdwn @@ -23,7 +23,7 @@ detailed instructions | quick install [[OpenBSD]] | `pkg_add git-annex` [[Android]] | **beta** [[Windows]] | **beta** -[[PyPi]] | `uv tool install git-annex` +[[PyPI]] | `uv tool install git-annex` """]] ## Historical builds diff --git a/doc/install/pypi.mdwn b/doc/install/pypi.mdwn index f80805bc22..9987da4dda 100644 --- a/doc/install/pypi.mdwn +++ b/doc/install/pypi.mdwn @@ -1,3 +1,3 @@ -git-annex is packaged in PyPi for ease of use for python users. +git-annex is packaged in PyPI for ease of use for python users. <https://pypi.org/project/git-annex/>
fix pipi link
markdown link didn't work, use a subpage
markdown link didn't work, use a subpage
diff --git a/doc/install.mdwn b/doc/install.mdwn index 6c5533fd12..3d8413e389 100644 --- a/doc/install.mdwn +++ b/doc/install.mdwn @@ -23,7 +23,7 @@ detailed instructions | quick install [[OpenBSD]] | `pkg_add git-annex` [[Android]] | **beta** [[Windows]] | **beta** -[PyPi][pypi] | `uv tool install git-annex` +[[PyPi]] | `uv tool install git-annex` """]] ## Historical builds @@ -40,5 +40,3 @@ it [[from source|fromsource]]. * [[autobuild overview|builds]] * [[upgrades]] - -[pypi]: https://pypi.org/project/git-annex/ diff --git a/doc/install/pypi.mdwn b/doc/install/pypi.mdwn new file mode 100644 index 0000000000..f80805bc22 --- /dev/null +++ b/doc/install/pypi.mdwn @@ -0,0 +1,3 @@ +git-annex is packaged in PyPi for ease of use for python users. + +<https://pypi.org/project/git-annex/>
add pypi
diff --git a/doc/install.mdwn b/doc/install.mdwn index 3b6a5058f9..6c5533fd12 100644 --- a/doc/install.mdwn +++ b/doc/install.mdwn @@ -23,6 +23,7 @@ detailed instructions | quick install [[OpenBSD]] | `pkg_add git-annex` [[Android]] | **beta** [[Windows]] | **beta** +[PyPi][pypi] | `uv tool install git-annex` """]] ## Historical builds @@ -39,3 +40,5 @@ it [[from source|fromsource]]. * [[autobuild overview|builds]] * [[upgrades]] + +[pypi]: https://pypi.org/project/git-annex/
diff --git a/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash.mdwn b/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash.mdwn new file mode 100644 index 0000000000..21a7c72cb6 --- /dev/null +++ b/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash.mdwn @@ -0,0 +1,42 @@ +### Please describe the problem. + +I have an old git-annex-remote-rclone remote that I'd like to switch over to the builtin rclone variant. I figured maybe a simple `git annex enableremote REMOTE type=rclone` would do it, but that crashes git-annex: + +``` +$ git annex enableremote remote type=rclone +enableremote remote +git-annex: getRemoteConfigValue externaltype found value of unexpected type PassedThrough. This is a bug in git-annex! +CallStack (from HasCallStack): + error, called at ./Annex/SpecialRemote/Config.hs:206:28 in main:Annex.SpecialRemote.Config + getRemoteConfigValue, called at ./Remote/External.hs:931:35 in main:Remote.External +failed +enableremote: 1 failed +``` + +### What steps will reproduce the problem? + +1. Create a remote using `type=external externaltype=rclone` +2. Try to change it to `type=rclone` + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20250521-g1a9e6bf26b56c39429d4a096bf733e57e5684e1b +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.34 DAV-1.3.4 feed-1.3.2.1 ghc-9.6.6 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 +BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM UR +L GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +Using the standalone amd64 build on Debian 12. + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +I use git-annex for "everything". I have somewhere along the lines of 14TiB stored in various git-annex repositories, synced in various degrees to anywhere between 3 and 10 hosts, with repos dating back to 2012. It's awesome.
update
diff --git a/doc/todo/map__58___add_--json/comment_3_f40842222d964ff1a9e0effba5a2e522._comment b/doc/todo/map__58___add_--json/comment_3_f40842222d964ff1a9e0effba5a2e522._comment new file mode 100644 index 0000000000..0cc882d30e --- /dev/null +++ b/doc/todo/map__58___add_--json/comment_3_f40842222d964ff1a9e0effba5a2e522._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-05-29T16:58:29Z" + content=""" +BTW, my first comment incorrectly described map's spidering capabilities +slightly. Suppose you have a remote on host foo, and that repository has +its own remote on host bar. Then map will ssh to foo to dump the git +config, find the additional urls on bar, and try to ssh to bar to get +the git config of the remote of the remote. And this can continue +artibtarily far, but limited of course by what hosts you can ssh to. +Whether that will be enough for your needs, I don't know. +"""]]
adjust json field names
Avoid using "name" for what git-annex otherwise refers to as a
description.
(For the remotes in the map, the "remote" field should be the remote
name, but there is a bug preventing it from being that.)
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Avoid using "name" for what git-annex otherwise refers to as a
description.
(For the remotes in the map, the "remote" field should be the remote
name, but there is a bug preventing it from being that.)
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/Command/Map.hs b/Command/Map.hs index 0deb5a6029..9ecceea2f0 100644 --- a/Command/Map.hs +++ b/Command/Map.hs @@ -47,7 +47,7 @@ start = startingNoMessage (ActionItemOther Nothing) $ do umap <- uuidDescMap trustmap <- trustMapLoad - + ifM (outputJSONMap rs trustmap umap) ( next $ return True , do @@ -108,8 +108,8 @@ hostname r basehostname :: Git.Repo -> String basehostname r = fromMaybe "" $ headMaybe $ splitc '.' $ hostname r -{- A name to display for a repo. Uses the name from uuid.log if available, - - or the remote name if not. -} +{- A name to display for a repo. Uses the description + - from uuid.log if available, or the remote name if not. -} repoName :: UUIDDescMap -> Git.Repo -> String repoName umap r | repouuid == NoUUID = fallback @@ -307,14 +307,14 @@ outputJSONMap rs trustmap umap = ] mknode (r, remotes) = JSON.object - [ "name" .= packString (repoName umap r) + [ "description" .= packString (repoName umap r) , "uuid" .= mkuuid (getUncachedUUID r) , "url" .= packString (Git.repoLocation r) , "remotes" .= map mkremote (filterdead id remotes) ] mkremote r = JSON.object - [ "name" .= packString (repoName umap r) + [ "remote" .= packString (repoName umap r) , "uuid" .= mkuuid (getUncachedUUID r) , "url" .= packString (Git.repoLocation r) ] diff --git a/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment b/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment index 15691af277..9463fe12f2 100644 --- a/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment +++ b/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment @@ -10,10 +10,10 @@ Example output, after being passed through `jq` to pretty-print it: { "nodes": [ { - "name": "joey@darkstar:~/tmp/mapbench/a", + "description": "joey@darkstar:~/tmp/mapbench/a", "remotes": [ { - "name": "joey@darkstar:~/tmp/mapbench/b", + "remote": "b", "url": "/home/joey/tmp/mapbench/b", "uuid": "645d92d8-6461-43c1-b23c-6dd04dc3a015" } @@ -22,10 +22,10 @@ Example output, after being passed through `jq` to pretty-print it: "uuid": "3f34e4c2-dd19-433a-ab04-9fd4be959325" }, { - "name": "joey@darkstar:~/tmp/mapbench/b", + "description": "joey@darkstar:~/tmp/mapbench/b", "remotes": [ { - "name": "joey@darkstar:~/tmp/mapbench/a", + "remote": "a", "url": "/home/joey/tmp/mapbench/a", "uuid": "3f34e4c2-dd19-433a-ab04-9fd4be959325" } @@ -34,10 +34,10 @@ Example output, after being passed through `jq` to pretty-print it: "uuid": "645d92d8-6461-43c1-b23c-6dd04dc3a015" }, { - "name": "unknown", + "description": "unknown", "remotes": [ { - "name": "joey@darkstar:~/tmp/mapbench/b", + "remote": "b", "url": "/home/joey/tmp/mapbench/b", "uuid": "645d92d8-6461-43c1-b23c-6dd04dc3a015" }
Added a comment: I need help with this too (c.f. submodule refactor)
diff --git a/doc/forum/Move_part_of_one_repository_into_other/comment_1_c07f4c20b187abf7e3c43021f72d672f._comment b/doc/forum/Move_part_of_one_repository_into_other/comment_1_c07f4c20b187abf7e3c43021f72d672f._comment new file mode 100644 index 0000000000..39bebb460c --- /dev/null +++ b/doc/forum/Move_part_of_one_repository_into_other/comment_1_c07f4c20b187abf7e3c43021f72d672f._comment @@ -0,0 +1,57 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="I need help with this too (c.f. submodule refactor)" + date="2025-05-29T03:42:42Z" + content=""" +I do this quite often because I use a monorepo approach with regular refactoring of subtrees into their own submodules. I have yet to find a bulletproof way to do this on the git-annex side. + +The first step is as simple as `git annex unannex` in `A`, or including `--include \"*\"` if pattern matching is easier. + +- On the `git` side, this logs the files as deleted from the main repo (`src`, let's call her). This is ideal so that you have a record for yourself (with a descriptive commit message) of where you've moved your files to. +- On the `git-annex` side, (once you commit), the file data will eventually become \"unused\" - you'll have to do some combination of `git annex push` and `git annex sync [--cleanup]` to ensure all branches really don't reference those files (including remote branches and `synced/*` branches). + +Now the question is: how do we get the data into the new repo (`dst`) and safely drop from `src`? + +- You could add `dst` as a remote of `src` and pull only `dst`'s `git-annex` branch, which (after moving, re-annexing, and committing the unannexed files to `dst`) now shows as having a copy of those files. (**Warning:** this has bad side-effects). +- You could do the opposite but use `dst` to move any (used) files from `src` (**Warning:** this has bad side-effects). +- You could add `dst` as a remote and `move` unused files over (requires a clean unused stack already and having to do the push/sync stuff correctly and fully before the files can be released) +- You could do the opposite and \"copy\" the files *to* `src` first *then* move them over to `dst`. (Required because per `dst`'s knowledge, it has no record of `src` having any keys. I find it logical albeit sad that `git-annex` can't dynamically poll local repos' annexes for file content) +- You could forcibly drop the data either by individual key or once it eventually becomes unused (super unsafe and sad) + +### Conclusions + +- Keep a clean unused stack (`git annex unused` gives nothing) as much as you can, and clean it out before testing out any sort of move/drop operations like this. +- Option 4 is the best so far. Following the initial step of `gx unannex` in `src`: + - Add `src` as a remote in `dst`, `mv` files into `dst`, `gx add` files in `dst`, `gx copy` files from `dst` back to `src`, then do `gx move -f <src>` + - This will only move the files known by `dst`. If it so happens that one of these files is actually duplicate data with something you want to also be in `src`, this *will* drop it and leave no record in `src` of where it went (besides your `git` commit message). + +As described, there are still side effects with Option 4, but it's so far the best option I've devised. +Oh, and if you want to keep `src` around as a remote on `dst` to e.g. remind yourself of various relations, make sure you configure it in `.git/config` with: + +- `annex.sync=false`. This skips it when you do a `git annex sync` +- Delete the `remote.fetch` spec, or add `remote.skipFetchAll=true`. This ensures `git fetch` doesn't fetch all the branch and unrelated objects +- (pray there are no more side-effects) + +Now, what happens if a side-effect does happen and it looks like you lost some content and don't know where it went? `git annex whereis` is no help. +Instead, you have to extract the key from the now broken symlink and run `find <> -type f -iname \"<KEY>\"`. Easy enough but kind of scary when it happens to you. + +### Side-Effects of Option 1+2: `git-annex` synchronization + +*DON'T DEAD OPEN INSIDE* + +While this is currently the only way to propagate annex key information, it has bad side-effects: + +- Remotes and known repos start to clutter whichever absorbs the others' `git-annex` branch. For me this is a no-go because I have redundant remotes (an exporttree called `dropbox` in my case) +- If you decide to `dead` these remotes or repos and by coincidence the `git-annex` branch is later absorbed in the other direction, chaos ensues (`dead` is propagated, remote annex key history is killed: especially gross for export/importtrees) + - Best way to avoid this is to `dead`, `forget --drop-dead` then `semitrust UUID`. Many steps, potentially undefined condition. Gross. + +## Potential Feature Requests + +Ideally, I would wish `git-annex` could intelligently scan another repo's annex and populate information about what keys it has simply by what keys are objectively in `.git/annex/objects`. This pulls in the information we care about without cluttering additional information relevant only to each respective repo. +Then, presuming you've set up a remote (`dst`) pointing to this repo (`src`) and run `git annex info`, then `src` should have a list of keys that are inside `dst`, and `gx whereis` from `src` will identify the keys inside `dst`, and `drop` will happily do so. + +- Maybe there could be something called an `acquaintance` repo that is not allowed to be synced, pulled, fetched, pushed to. +- Acquaintances are semitrusted because they're still annex-controlled. +- On removing an acquaintance repo, and running `gx forget`, the list of keys is wiped. +"""]]
Added a comment: Not enough information on special remotes
diff --git a/doc/forum/View_special_remote_information__63__/comment_2_e70275d412564f01446ed45c3a31dc7d._comment b/doc/forum/View_special_remote_information__63__/comment_2_e70275d412564f01446ed45c3a31dc7d._comment new file mode 100644 index 0000000000..1149bac8e0 --- /dev/null +++ b/doc/forum/View_special_remote_information__63__/comment_2_e70275d412564f01446ed45c3a31dc7d._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="guez@e17c318e09fc77b4a5be4cd330364e3a41a96971" + nickname="guez" + avatar="http://cdn.libravatar.org/avatar/ffec09075c5b5cd47832649a306d68c3" + subject="Not enough information on special remotes" + date="2025-05-28T21:58:23Z" + content=""" +You say that the command shows the url used for a WebDAV remote, but this does not seem to be the case any longer: + +``` +$ git annex info sdrive +uuid: d17d5946-d126-4a0e-b6c1-232fb34fb461 +description: sdrive +trust: semitrusted +remote annex keys: 1 +remote annex size: 249.11 kilobytes +``` + +I can get a list of special remotes with `git annex enableremote` but how can I get a more detailed list, with all the information on each special remote: the type, the configuration options (encryption or not, etc.), the URLs? +"""]]
map: Support --json option
Sponsored-by: Dartmouth College's OpenNeuro project
Sponsored-by: Dartmouth College's OpenNeuro project
diff --git a/CHANGELOG b/CHANGELOG index 789872e8c0..8fb85dfb1b 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -6,6 +6,7 @@ git-annex (10.20250521) UNRELEASED; urgency=medium configured but fails, prevent initialization. This allows the user to fix their configuration and avoid crippled filesystem detection entering an adjusted branch. + * map: Support --json option. -- Joey Hess <id@joeyh.name> Thu, 22 May 2025 12:43:38 -0400 diff --git a/Command/Map.hs b/Command/Map.hs index ce28ca32c3..0deb5a6029 100644 --- a/Command/Map.hs +++ b/Command/Map.hs @@ -9,8 +9,6 @@ module Command.Map where -import qualified Data.Map as M - import Command import qualified Git import qualified Git.Url @@ -25,12 +23,17 @@ import Logs.Trust import Types.TrustLevel import qualified Remote.Helper.Ssh as Ssh import qualified Utility.Dot as Dot +import qualified Messages.JSON as JSON +import Messages.JSON ((.=)) +import Utility.Aeson (packString) + +import qualified Data.Map as M -- a repo and its remotes type RepoRemotes = (Git.Repo, [Git.Repo]) cmd :: Command -cmd = dontCheck repoExists $ +cmd = dontCheck repoExists $ withAnnexOptions [jsonOptions] $ command "map" SectionQuery "generate map of repositories" paramNothing (withParams seek) @@ -45,19 +48,23 @@ start = startingNoMessage (ActionItemOther Nothing) $ do umap <- uuidDescMap trustmap <- trustMapLoad - file <- (</>) - <$> fromRepo gitAnnexDir - <*> pure (literalOsPath "map.dot") - - liftIO $ writeFile (fromOsPath file) (drawMap rs trustmap umap) - next $ - ifM (Annex.getRead Annex.fast) - ( runViewer file [] - , runViewer file - [ ("xdot", [File (fromOsPath file)]) - , ("dot", [Param "-Tx11", File (fromOsPath file)]) - ] - ) + ifM (outputJSONMap rs trustmap umap) + ( next $ return True + , do + file <- (</>) + <$> fromRepo gitAnnexDir + <*> pure (literalOsPath "map.dot") + + liftIO $ writeFile (fromOsPath file) (drawMap rs trustmap umap) + next $ + ifM (Annex.getRead Annex.fast) + ( runViewer file [] + , runViewer file + [ ("xdot", [File (fromOsPath file)]) + , ("dot", [Param "-Tx11", File (fromOsPath file)]) + ] + ) + ) runViewer :: OsPath -> [(String, [CommandParam])] -> Annex Bool runViewer file [] = do @@ -198,7 +205,8 @@ same a b {- reads the config of a remote, with progress display -} scan :: Git.Repo -> Annex Git.Repo scan r = do - showStartMessage (StartMessage "map" (ActionItemOther (Just $ UnquotedString $ Git.repoDescribe r)) (SeekInput [])) + unlessM jsonOutputEnabled $ + showStartMessage (StartMessage "map" (ActionItemOther (Just $ UnquotedString $ Git.repoDescribe r)) (SeekInput [])) v <- tryScan r case v of Just r' -> do @@ -269,7 +277,7 @@ tryScan r configlist ok -> return ok - sshnote = do + sshnote = unlessM jsonOutputEnabled $ do showAction "sshing" showOutput @@ -287,3 +295,33 @@ safely a = do case result of Left _ -> return Nothing Right r' -> return $ Just r' + +outputJSONMap :: [RepoRemotes] -> TrustMap -> UUIDDescMap -> Annex Bool +outputJSONMap rs trustmap umap = + showFullJSON $ JSON.AesonObject $ case mapo of + JSON.Object obj -> obj + _ -> error "internal" + where + mapo = JSON.object + [ "nodes" .= map mknode (filterdead fst rs) + ] + + mknode (r, remotes) = JSON.object + [ "name" .= packString (repoName umap r) + , "uuid" .= mkuuid (getUncachedUUID r) + , "url" .= packString (Git.repoLocation r) + , "remotes" .= map mkremote (filterdead id remotes) + ] + + mkremote r = JSON.object + [ "name" .= packString (repoName umap r) + , "uuid" .= mkuuid (getUncachedUUID r) + , "url" .= packString (Git.repoLocation r) + ] + + mkuuid NoUUID = Nothing + mkuuid u = Just $ packString $ fromUUID u + + filterdead f = filter + (\i -> M.lookup (getUncachedUUID (f i)) trustmap /= Just DeadTrusted) + diff --git a/doc/git-annex-map.mdwn b/doc/git-annex-map.mdwn index debfa1c31a..23585fdae2 100644 --- a/doc/git-annex-map.mdwn +++ b/doc/git-annex-map.mdwn @@ -39,6 +39,10 @@ on that host. Don't display the generated Graphviz file, but save it for later use. +* `--json` + + Output the map as a JSON object. + * Also the [[git-annex-common-options]](1) can be used. # SEE ALSO diff --git a/doc/todo/map__58___add_--json.mdwn b/doc/todo/map__58___add_--json.mdwn index 58b21a51f2..5b431ecc3e 100644 --- a/doc/todo/map__58___add_--json.mdwn +++ b/doc/todo/map__58___add_--json.mdwn @@ -6,3 +6,5 @@ Please let me know on how feasible that would be, and any other thoughts you hav [[!meta author=yoh]] [[!tag projects/openneuro]] + +> [[done]] --[[Joey]] diff --git a/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment b/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment new file mode 100644 index 0000000000..15691af277 --- /dev/null +++ b/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment @@ -0,0 +1,51 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-05-28T18:11:34Z" + content=""" +I went ahead and implemented `git-annx map --json`. + +Example output, after being passed through `jq` to pretty-print it: + + { + "nodes": [ + { + "name": "joey@darkstar:~/tmp/mapbench/a", + "remotes": [ + { + "name": "joey@darkstar:~/tmp/mapbench/b", + "url": "/home/joey/tmp/mapbench/b", + "uuid": "645d92d8-6461-43c1-b23c-6dd04dc3a015" + } + ], + "url": "/home/joey/tmp/mapbench/a", + "uuid": "3f34e4c2-dd19-433a-ab04-9fd4be959325" + }, + { + "name": "joey@darkstar:~/tmp/mapbench/b", + "remotes": [ + { + "name": "joey@darkstar:~/tmp/mapbench/a", + "url": "/home/joey/tmp/mapbench/a", + "uuid": "3f34e4c2-dd19-433a-ab04-9fd4be959325" + } + ], + "url": "/home/joey/tmp/mapbench/b", + "uuid": "645d92d8-6461-43c1-b23c-6dd04dc3a015" + }, (Diff truncated)
comment
diff --git a/doc/install/rpm_standalone/comment_5_a84ab211e00776b6631929e9a8f4e25e._comment b/doc/install/rpm_standalone/comment_5_a84ab211e00776b6631929e9a8f4e25e._comment new file mode 100644 index 0000000000..88fe775a8c --- /dev/null +++ b/doc/install/rpm_standalone/comment_5_a84ab211e00776b6631929e9a8f4e25e._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: Standalone rpms not available""" + date="2025-05-27T17:04:27Z" + content=""" +There was a problem with the last release, it's available now. +"""]]
prevent initialization with bad freeze/thaw hook configured
When annex.freezecontent-command or annex.thawcontent-command is configured
but fails, prevent initialization.
This allows the user to fix their configuration and avoid crippled
filesystem detection entering an adjusted unlocked branch unexpectedly,
when they had been relying on the hooks working around their filesystems's
infelicities.
In the case of git-remote-annex, a failure of these hooks is taken to mean
the filesystem may be crippled, so it deletes the bundles objects and
avoids initialization. That might mean extra work, but only in this edge
case where the hook is misconfigured. And it keeps the command working
for cloning even despite the misconfiguration.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
When annex.freezecontent-command or annex.thawcontent-command is configured
but fails, prevent initialization.
This allows the user to fix their configuration and avoid crippled
filesystem detection entering an adjusted unlocked branch unexpectedly,
when they had been relying on the hooks working around their filesystems's
infelicities.
In the case of git-remote-annex, a failure of these hooks is taken to mean
the filesystem may be crippled, so it deletes the bundles objects and
avoids initialization. That might mean extra work, but only in this edge
case where the hook is misconfigured. And it keeps the command working
for cloning even despite the misconfiguration.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/Annex/Hook.hs b/Annex/Hook.hs index 086665abce..366678490c 100644 --- a/Annex/Hook.hs +++ b/Annex/Hook.hs @@ -106,44 +106,55 @@ doesAnnexHookExist hook = do runAnnexHook :: Git.Hook -> (GitConfig -> Maybe String) -> Annex () runAnnexHook hook commandcfg = runAnnexHook' hook commandcfg >>= \case - Nothing -> noop - Just failedcommanddesc -> + HookSuccess -> noop + HookFailed failedcommanddesc -> warning $ UnquotedString $ failedcommanddesc ++ " failed" --- Returns Nothing if the hook or GitConfig command succeeded, or a --- description of what failed. -runAnnexHook' :: Git.Hook -> (GitConfig -> Maybe String) -> Annex (Maybe String) +data HookResult + = HookSuccess + | HookFailed String + -- ^ A description of the hook command that failed. + deriving (Eq, Show) + +runAnnexHook' :: Git.Hook -> (GitConfig -> Maybe String) -> Annex HookResult runAnnexHook' hook commandcfg = ifM (doesAnnexHookExist hook) ( runhook , runcommandcfg ) where runhook = ifM (inRepo $ Git.runHook boolSystem hook []) - ( return Nothing + ( return HookSuccess , do h <- fromRepo (Git.hookFile hook) - commandfailed (fromOsPath h) + return $ HookFailed $ fromOsPath h ) runcommandcfg = commandcfg <$> Annex.getGitConfig >>= \case - Nothing -> return Nothing + Nothing -> return HookSuccess Just command -> ifM (liftIO $ boolSystem "sh" [Param "-c", Param command]) - ( return Nothing - , commandfailed $ "git configured command '" ++ command ++ "'" + ( return HookSuccess + , return $ HookFailed $ "git configured command '" ++ command ++ "'" ) - commandfailed c = return $ Just c -runAnnexPathHook :: String -> Git.Hook -> (GitConfig -> Maybe String) -> OsPath -> Annex Bool +runAnnexPathHook :: String -> Git.Hook -> (GitConfig -> Maybe String) -> OsPath -> Annex HookResult runAnnexPathHook pathtoken hook commandcfg p = ifM (doesAnnexHookExist hook) ( runhook , runcommandcfg ) where - runhook = inRepo $ Git.runHook boolSystem hook [ File p' ] + runhook = ifM (inRepo $ Git.runHook boolSystem hook [ File p' ]) + ( return HookSuccess + , do + h <- fromRepo (Git.hookFile hook) + return $ HookFailed $ fromOsPath h + ) runcommandcfg = commandcfg <$> Annex.getGitConfig >>= \case - Nothing -> return True - Just basecmd -> liftIO $ - boolSystem "sh" [Param "-c", Param $ gencmd basecmd] + Nothing -> return HookSuccess + Just basecmd -> + ifM (liftIO $ boolSystem "sh" [Param "-c", Param (gencmd basecmd)]) + ( return HookSuccess + , return $ HookFailed $ "git configured command '" ++ basecmd ++ "'" + ) gencmd = massReplace [ (pathtoken, shellEscape p') ] p' = fromOsPath p diff --git a/Annex/Init.hs b/Annex/Init.hs index 81b07b54d1..64c924fd04 100644 --- a/Annex/Init.hs +++ b/Annex/Init.hs @@ -19,6 +19,7 @@ module Annex.Init ( uninitialize, probeCrippledFileSystem, probeCrippledFileSystem', + isCrippledFileSystem, ) where import Annex.Common @@ -75,10 +76,10 @@ data InitializeAllowed = InitializeAllowed checkInitializeAllowed :: (InitializeAllowed -> Annex a) -> Annex a checkInitializeAllowed a = guardSafeToUseRepo $ noAnnexFileContent' >>= \case Nothing -> runAnnexHook' preInitAnnexHook annexPreInitCommand >>= \case - Nothing -> do + HookSuccess -> do checkSqliteWorks a InitializeAllowed - Just failedcommanddesc -> do + HookFailed failedcommanddesc -> do initpreventedby failedcommanddesc notinitialized Just noannexmsg -> do @@ -94,8 +95,8 @@ checkInitializeAllowed a = guardSafeToUseRepo $ noAnnexFileContent' >>= \case initializeAllowed :: Annex Bool initializeAllowed = noAnnexFileContent' >>= \case Nothing -> runAnnexHook' preInitAnnexHook annexPreInitCommand >>= \case - Nothing -> return True - Just _ -> return False + HookSuccess -> return True + HookFailed _ -> return False Just _ -> return False noAnnexFileContent' :: Annex (Maybe String) @@ -288,73 +289,116 @@ isInitialized :: Annex Bool isInitialized = maybe Annex.Branch.hasSibling (const $ return True) =<< getVersion {- A crippled filesystem is one that does not allow making symlinks, - - or removing write access from files. -} -probeCrippledFileSystem :: Annex Bool -probeCrippledFileSystem = withEventuallyCleanedOtherTmp $ \tmp -> do - (r, warnings) <- probeCrippledFileSystem' tmp + - or removing write access from files. + - + - This displays messages about problems detected with the filesystem. + - + - If a freeze or thaw hook is configured, but exits nonzero, + - this returns Nothing after displaying a message to the user about the + - problem. Such a hook can in some cases make a filesystem + - that would otherwise be detected as crippled work ok, so this avoids + - a false positive. + -} +probeCrippledFileSystem :: Annex (Maybe Bool) +probeCrippledFileSystem = do + (r, warnings) <- isCrippledFileSystem' + mapM_ (warning . UnquotedString) warnings + return r + +isCrippledFileSystem :: Annex Bool +isCrippledFileSystem = do + (r, _warnings) <- isCrippledFileSystem' + return (fromMaybe True r) + +isCrippledFileSystem' :: Annex (Maybe Bool, [String]) +isCrippledFileSystem' = withEventuallyCleanedOtherTmp $ \tmp -> + probeCrippledFileSystem' tmp (Just (freezeContent' UnShared)) (Just (thawContent' UnShared)) =<< hasFreezeHook - mapM_ (warning . UnquotedString) warnings - return r probeCrippledFileSystem' :: (MonadIO m, MonadCatch m) => OsPath - -> Maybe (OsPath -> m ()) - -> Maybe (OsPath -> m ()) + -> Maybe (OsPath -> m HookResult) + -> Maybe (OsPath -> m HookResult) -> Bool - -> m (Bool, [String]) + -> m (Maybe Bool, [String]) #ifdef mingw32_HOST_OS -probeCrippledFileSystem' _ _ _ _ = return (True, []) +probeCrippledFileSystem' _ _ _ _ = return (Just True, []) #else probeCrippledFileSystem' tmp freezecontent thawcontent hasfreezehook = do let f = tmp </> literalOsPath "gaprobe" liftIO $ F.writeFile' f "" - r <- probe f - void $ tryNonAsync $ (fromMaybe (liftIO . allowWrite) thawcontent) f + r <- freezethaw f probe liftIO $ removeFile f return r where - probe f = catchDefaultIO (True, []) $ do + fallbackfreezecontent f = do + liftIO $ preventWrite f + return HookSuccess + + fallbackthawcontent f = do + liftIO $ allowWrite f + return HookSuccess + + freezethaw f cont = + (fromMaybe fallbackfreezecontent freezecontent) f >>= \case + HookFailed failedcommanddesc -> + return (Nothing, [hookfailed failedcommanddesc]) + HookSuccess -> do + r <- cont f + tryNonAsync ((fromMaybe fallbackthawcontent thawcontent) f) + >>= return . \case + Right (HookFailed failedcommanddesc) -> + let (_, warnings) = r + in (Nothing, hookfailed failedcommanddesc : warnings) + _ -> r + + hookfailed failedcommanddesc = "Failed to run " ++ failedcommanddesc + ++ ". Unable to initialize until this is fixed." + + probe f = catchDefaultIO (Just True, []) $ do let f2 = f <> literalOsPath "2" (Diff truncated)
Added a comment: Standalone rpms not available
diff --git a/doc/install/rpm_standalone/comment_4_a80fa98172357a2d20a160c186e9372d._comment b/doc/install/rpm_standalone/comment_4_a80fa98172357a2d20a160c186e9372d._comment new file mode 100644 index 0000000000..08ac12a8e0 --- /dev/null +++ b/doc/install/rpm_standalone/comment_4_a80fa98172357a2d20a160c186e9372d._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="zhunting" + avatar="http://cdn.libravatar.org/avatar/1439e56826a7befaefc79f66eef9d835" + subject="Standalone rpms not available" + date="2025-05-27T15:55:31Z" + content=""" +Hello, are the RPMs no longer being published, it doesn't seem to be available anymore. +"""]]
comment
diff --git a/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_8_c1b434d222c81514590461fa9fd23c01._comment b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_8_c1b434d222c81514590461fa9fd23c01._comment new file mode 100644 index 0000000000..66eca71e0c --- /dev/null +++ b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_8_c1b434d222c81514590461fa9fd23c01._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2025-05-27T14:14:56Z" + content=""" +Was the repository on the NTFS drive or on the WSL side (ext4 or whatever)? +"""]]
Added a comment
diff --git a/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_7_4b0662ed3467cd46c3332d52ad30cef4._comment b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_7_4b0662ed3467cd46c3332d52ad30cef4._comment new file mode 100644 index 0000000000..43e189ac52 --- /dev/null +++ b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_7_4b0662ed3467cd46c3332d52ad30cef4._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 7" + date="2025-05-27T10:57:18Z" + content=""" +FTR, Yukai reported that it was \"WSL (Ubuntu 22.04, x86_64) on Windows 10. Git version was 2.34.1, git-annex version was 10.20230407-1~ndall+1.\". So definitely not a trivial/typical setup ;) I do not remember when it was that I have tried git-annex under WSL. +"""]]
diff --git a/doc/bugs/proot_info__58___vpid_1__58___terminated_with_signal_4.mdwn b/doc/bugs/proot_info__58___vpid_1__58___terminated_with_signal_4.mdwn new file mode 100644 index 0000000000..9bad084bec --- /dev/null +++ b/doc/bugs/proot_info__58___vpid_1__58___terminated_with_signal_4.mdwn @@ -0,0 +1,46 @@ +### Please describe the problem. +I get this error when I try to run git annex sync on my amazon fire tablet. +`proot info: vpid 1: terminated with signal 4` + +``` +$ cat /proc/cpuinfo +processor : 0 +Processor : ARMv7 Processor rev 3 (v7l) +model name : ARMv7 Processor rev 3 (v7l) +BogoMIPS : 32.19 +Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm aes pmull sha1 sha2 crc32 +CPU implementer : 0x41 +CPU architecture: 7 +CPU variant : 0x0 +CPU part : 0xd03 +CPU revision : 3 + +processor : 1 +Processor : ARMv7 Processor rev 3 (v7l) +model name : ARMv7 Processor rev 3 (v7l) +BogoMIPS : 26.00 +Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm aes pmull sha1 sha2 crc32 +CPU implementer : 0x41 +CPU architecture: 7 +CPU variant : 0x0 +CPU part : 0xd03 +CPU revision : 3 + +Hardware : MT8163 +Revision : 0000 +Serial : 84b2d1e8651995fc +``` + +`uname -m` reports `armv7l` + +If I do `proot-distro login debian` then I can use the very same git-annex.linux and it works but it's slow (idk why). If I try to use git annex from termux then it fails with that error. Not that it matters, I'm using termux from here https://sourceforge.net/projects/android-ports-for-gnu-emacs/files/termux/ which is "a version of the Termux terminal emulator signed with +Emacs's signing keys" so that Emacs can use the termux binaries or something. + +### What version of git-annex are you using? On what operating system? +Android 9. LineageOS. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +So, I've been trying to uss it for years. Kind of like how I tried using Emacs and went bankrupt twice before things clicked. I was so hung up on the symlinks. Then I finally understood some things, thanks Gemini, and now I'm using annex.addunlocked and annex.thin and I feel really nerdy and cool. I would love to get this working on my tablet too. + +Thanks!
Added a comment
diff --git a/doc/bugs/git-annex__58_____60__stdout__62____58___hPutBuf__58___resource_vanished/comment_3_eaa0748eed9398ba59d49b3387ac4a82._comment b/doc/bugs/git-annex__58_____60__stdout__62____58___hPutBuf__58___resource_vanished/comment_3_eaa0748eed9398ba59d49b3387ac4a82._comment new file mode 100644 index 0000000000..413b654eb7 --- /dev/null +++ b/doc/bugs/git-annex__58_____60__stdout__62____58___hPutBuf__58___resource_vanished/comment_3_eaa0748eed9398ba59d49b3387ac4a82._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="pierreay" + avatar="http://cdn.libravatar.org/avatar/c1c640f9f581daaf2d9dedff2b84b614" + subject="comment 3" + date="2025-05-26T19:38:41Z" + content=""" +Thank you @joey and @mak for the hints. +I was unable (even with strace) for chase down the particular system call that cause the issue, since it is highly random and generated by my shell prompt. +However, mak seems right about the git_status module of starship. I increased my timeout from 500ms (default) to 1000ms (and will try slightly larger value if needed). +If this mitigate the issue correctly, I will not comment anymore! But I suspect it to work. ;) +"""]]
diff --git a/doc/forum/What_does___34__upgrade__34___in_the_webapp_do__63__.mdwn b/doc/forum/What_does___34__upgrade__34___in_the_webapp_do__63__.mdwn new file mode 100644 index 0000000000..abaa2f63d3 --- /dev/null +++ b/doc/forum/What_does___34__upgrade__34___in_the_webapp_do__63__.mdwn @@ -0,0 +1,3 @@ +The webapp showed "(metadata only)" behind a repository. Running `git annex upgrade` in the repositories didn't change that. I had top to "upgrade" the repository under "edit" in the webapp to fix that. + +What did upgrading in the webapp do, that running `git annex upgrade` did not?
Added a comment
diff --git a/doc/forum/tell_assistant_to_wait_5_mins_before_commiting__63__/comment_4_26bd8ac09928b5d9bf5eec2f254a8520._comment b/doc/forum/tell_assistant_to_wait_5_mins_before_commiting__63__/comment_4_26bd8ac09928b5d9bf5eec2f254a8520._comment new file mode 100644 index 0000000000..c7738312d5 --- /dev/null +++ b/doc/forum/tell_assistant_to_wait_5_mins_before_commiting__63__/comment_4_26bd8ac09928b5d9bf5eec2f254a8520._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jnkl" + avatar="http://cdn.libravatar.org/avatar/2ab576f3bf2e0d96b1ee935bb7f33dbe" + subject="comment 4" + date="2025-05-26T15:30:01Z" + content=""" +Thank you very much! +"""]]
dup
diff --git a/doc/todo/fail_on_encfs_more_informatively_or_do_not_fail_.mdwn b/doc/todo/fail_on_encfs_more_informatively_or_do_not_fail_.mdwn index 22722b5a53..d8923e8ba3 100644 --- a/doc/todo/fail_on_encfs_more_informatively_or_do_not_fail_.mdwn +++ b/doc/todo/fail_on_encfs_more_informatively_or_do_not_fail_.mdwn @@ -58,3 +58,5 @@ git-annex: cannot determine uuid for origin (perhaps you need to run "git annex ``` which is simply due to the fact that git-annex does not only unable to parse, it is unable to connect. But if so, IMHO ideally it should avoid claiming anything about git annex installation there. + +> Closing as duplicate of the other post, which did get though. [[done]] --[[Joey]]
close dup todo
diff --git a/doc/todo/Recent_remote_activities/comment_3_f7d67710fc0d880335220a3d9e3ec11d._comment b/doc/todo/Recent_remote_activities/comment_3_f7d67710fc0d880335220a3d9e3ec11d._comment new file mode 100644 index 0000000000..b497e0ed0d --- /dev/null +++ b/doc/todo/Recent_remote_activities/comment_3_f7d67710fc0d880335220a3d9e3ec11d._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-05-23T19:44:03Z" + content=""" +Basically the same todo previously: [[todo/show_time_of_last_interaction_with_a_repo]] + +I'll close that one in favor of this new one. The old one did have some +ideas about using groups to manually track activity, and a way to use +`git-annex expire` to list recently fsked repos. +"""]] diff --git a/doc/todo/show_time_of_last_interaction_with_a_repo.mdwn b/doc/todo/show_time_of_last_interaction_with_a_repo.mdwn index 315fdf91f8..57e49d1b4f 100644 --- a/doc/todo/show_time_of_last_interaction_with_a_repo.mdwn +++ b/doc/todo/show_time_of_last_interaction_with_a_repo.mdwn @@ -1 +1,4 @@ When [[`git-annex-info`|git-annex-info]] lists repos, it can be unclear which ones are still "active". It would help if the info command showed the time of last interaction for each repo. Seems like the code to determine that already exists in [[`git-annex-expire`|git-annex-expire]]? + +> Closing as a duplicate, since there is a newer todo +> [[show_time_of_last_interaction_with_a_repo]]. --[[Joey]]
correction
diff --git a/doc/todo/migration_to_VURL_by_default.mdwn b/doc/todo/migration_to_VURL_by_default.mdwn index 326967b90d..69db574059 100644 --- a/doc/todo/migration_to_VURL_by_default.mdwn +++ b/doc/todo/migration_to_VURL_by_default.mdwn @@ -15,9 +15,10 @@ transferring the content between repositories that it's not possible to verify it. > This would need a way to migrate from URL key to VURL key. -> Currently, `git-annex migrate` of an URL key defaults to using the -> default hashing backend. And adding `--backend=VURL` does not work. -> --[[Joey]] +> +> > Oh, I was wrong, that does exist already, just `git-annex migrate +> > --backend=VURL` works for URL keys. (Content must be present of course +> > or no migration is done). --[[Joey]] Of course if users want to continue to use their existing URL keys and not be able to verify content, that's fine. Users can also choose to use
Added a comment
diff --git a/doc/forum/Archive_group_with_special_repositories/comment_1_7581154b7208df9444e78e0e701eebe5._comment b/doc/forum/Archive_group_with_special_repositories/comment_1_7581154b7208df9444e78e0e701eebe5._comment new file mode 100644 index 0000000000..1d39ddf147 --- /dev/null +++ b/doc/forum/Archive_group_with_special_repositories/comment_1_7581154b7208df9444e78e0e701eebe5._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://tokariew.id.fedoraproject.org/" + nickname="tokariew" + avatar="http://cdn.libravatar.org/avatar/fcff1d07fd8c44bf9004540658358a6b" + subject="comment 1" + date="2025-05-23T11:05:27Z" + content=""" +Reorganized my idea, made git annex repo inside of Pictures folder, but set `annex.addunlocked = true` +I avoid symlinks, and COW filesystem don't care about duplicates +"""]]
comment
diff --git a/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_2_e9eb7c0ac4d1a87f3808a08f960e466d._comment b/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_2_e9eb7c0ac4d1a87f3808a08f960e466d._comment new file mode 100644 index 0000000000..e757c5669b --- /dev/null +++ b/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_2_e9eb7c0ac4d1a87f3808a08f960e466d._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-05-22T19:20:15Z" + content=""" +Filed a bug report on git, with a testcase that does not need git-annex: + +<https://lore.kernel.org/git/aC90kn2mE93DCJEH@kitenet.net/T/#u> +"""]]
git bug
diff --git a/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_1_cb16c170d49b628cca0c76d8843b9f52._comment b/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_1_cb16c170d49b628cca0c76d8843b9f52._comment new file mode 100644 index 0000000000..2be1bd3099 --- /dev/null +++ b/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_1_cb16c170d49b628cca0c76d8843b9f52._comment @@ -0,0 +1,49 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-05-22T17:52:10Z" + content=""" +A simplified test case, which works on any filesystem, not only crippled +filesystems: + + #!/bin/sh + git init r + cd r + git annex init + git commit -m initial --allow-empty + git-annex adjust --unlock + touch emptyfile + git annex add emptyfile + git diff + +The adjusted branch is not even needed. `git-annex add emptyfile` +followed by `git-annex unlock emptyfile` has the same result. + +In this case, `git diff` is running the `git-annex smudge --clean` +filter every time. Which IIRC is a bug of some kind with git when +smudging empty files. + +I've verified that `git-annex smudge --clean` behaves corretly. +It outputs the same annex link that was already staged. So git diff is +choosing for whatever reason to ignore what it output, and using "" +as the content of the file instead. + +So, I think this is a git bug, which git-annex cannot work around. + +See also [[bugs/Empty_files_make_git_status_slow]] which is about +the repeated and unncessary running of the smudge filter on empty files. +There I hypothesize that git treats 0 size in the index as an indication that it +doesn't know about the file, so generally mishandles empty files. + +And see also [[bugs/resolvemerge_fails_when_unlocked_empty_files_exist]] +where I identified a related git bug, where an empty unlocked file causes +git to crash with an internal error, and reported it to the git developers. +Unfortunately, nobody ever responded to my bug report. + +Perhaps the thing to do is for git-annex to refuse to store an empty file +as an unlocked file. It could still use annex symlinks for locked empty files, +but unlocking would necessarily switch to an empty file stored in git +the usual way. Unfortunately, that would make reverse adjusting an unlocked +branch not know if the file was intended to be annexed or not. Also, it doesn't +help for any repositories that already contain unlocked empty files. +"""]]
Added a comment
diff --git a/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_6_2f226e2bb6fcb1040ba8e645603607a1._comment b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_6_2f226e2bb6fcb1040ba8e645603607a1._comment new file mode 100644 index 0000000000..972b2b20e8 --- /dev/null +++ b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_6_2f226e2bb6fcb1040ba8e645603607a1._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 6" + date="2025-05-22T18:23:36Z" + content=""" +Somewhat unrelated and I feel like I might have even proposed smth like that -- wouldn't it be useful if git-annex did add its version and potentially filesystem detail (if cheaply known) within its commit message to `git-annex` branch? unless `annex forgotten` later (and forgetting could summarize all the versions and filesystems used to that point), could have been useful here, or not? + +FWIW, sent a few related questions on versions etc to the author of the commit which introduced that file. +"""]]
comment
diff --git a/doc/bugs/git-annex_add_behaves_differently_from_git_on_ACL/comment_1_5cff2db1646582f9e945bcf705b45c63._comment b/doc/bugs/git-annex_add_behaves_differently_from_git_on_ACL/comment_1_5cff2db1646582f9e945bcf705b45c63._comment new file mode 100644 index 0000000000..388f3f7eca --- /dev/null +++ b/doc/bugs/git-annex_add_behaves_differently_from_git_on_ACL/comment_1_5cff2db1646582f9e945bcf705b45c63._comment @@ -0,0 +1,46 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-05-22T17:07:32Z" + content=""" +> (datalad) [f006rq8@discovery-01 ds-perms]$ ls -l by-git-* +> -rwxrwx---+ 1 f006rq8 rc-DBIC 5 Oct 16 15:05 by-git-add +> -rwxrwx---+ 1 f006rq8 rc-DBIC 5 Oct 16 15:05 by-git-annex-add + +git-annex is seeing these files as executable for the same reason that `ls` +displays them as having `x` set. `stat()` is getting populated with values +based on the ACLs. + +I was able to reproduce that with `setfacl -m user::rwx-`, +run on a regular ext4 filesystem. Doing that to a file makes `ls` +display the owner x bit, as well as "+". + +But then, `git add` added the file as executable too. +So `git add` and `git-annex add` are behaving the same for me with ACLs. + + joey@darkstar:~/tmp/acl>touch foo + joey@darkstar:~/tmp/acl>touch bar + joey@darkstar:~/tmp/acl>setfacl -m user::rwx- foo + joey@darkstar:~/tmp/acl>setfacl -m user::rwx- bar + joey@darkstar:~/tmp/acl>git config 'annex.largefiles' 'nothing' + joey@darkstar:~/tmp/acl>git add foo + joey@darkstar:~/tmp/acl>git-annex add bar + joey@darkstar:~/tmp/acl>git diff --cached + diff --git a/foo b/foo + new file mode 100755 + index 0000000..e69de29 + diff --git a/bar b/bar + new file mode 100755 + index 0000000..e69de29 + +My guess is that something about your specific ACLs or your filesystem +is making git behave differently. Perhaps it's using a different variant +of the stat syscall which behaves differently than the stat git-annex does +in your specific situation somehow. + +With the x acl set, and without the x bit manually set, I am able to actually +execute the files. So it seems to me, if git chose to add the file without +the exeucte bit set, that would be a bug in git? After all, if I have a build +system that relies on executing a file that I can execute it, checking the file +into git and cloning should let me execute the file in the clone. +"""]]
comment
diff --git a/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_5_fb58126317c23c1710b3eb50102c3bd5._comment b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_5_fb58126317c23c1710b3eb50102c3bd5._comment new file mode 100644 index 0000000000..d1b9636e48 --- /dev/null +++ b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_5_fb58126317c23c1710b3eb50102c3bd5._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-05-22T16:59:42Z" + content=""" +So as far as I know this bug can only happen if something causes git to +lose the symlink bit. Which would be a git bug, or perhaps some misbehavior +on a fileystem like FAT? + +Since git-annex's behavior is to stage a change that fixes the file to be a +proper annex pointer file, a user who encounters whatever this is only has +to make a commit to get out of the weird situation. + +Unless we have a repeatable way for that to happen, that is not a git bug, +it's hard for me to justify making git-annex slow in order to deal with it +better. +"""]]
assistant: Avoid startup hang on active *.lock file
Avoid hanging at startup when a process has a *.lock file open in the .git
directory.
The goal is to repair stale locks, not wait for all active locks to be
closed. This was causing problems for a non-git process that has its own
lock file in a subdir of .git/.
If .git/index_lock is a non-stale lock, this does let the assistant start
up regardless. Commits by the assistant will then fail, until the process
locking the index finishes. This is not a problem, because the same
behavior could already happen if the assistant is started and then another
process locks the index.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Avoid hanging at startup when a process has a *.lock file open in the .git
directory.
The goal is to repair stale locks, not wait for all active locks to be
closed. This was causing problems for a non-git process that has its own
lock file in a subdir of .git/.
If .git/index_lock is a non-stale lock, this does let the assistant start
up regardless. Commits by the assistant will then fail, until the process
locking the index finishes. This is not a problem, because the same
behavior could already happen if the assistant is started and then another
process locks the index.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/Assistant/Repair.hs b/Assistant/Repair.hs index c024f93e6f..1dd549d694 100644 --- a/Assistant/Repair.hs +++ b/Assistant/Repair.hs @@ -147,17 +147,10 @@ repairStaleLocks lockfiles = go =<< getsizes <$> getFileSize lf getsizes = liftIO $ catMaybes <$> mapM getsize lockfiles go [] = return () - go l = ifM (liftIO $ null <$> Lsof.query ("--" : map (fromOsPath . fst) l)) - ( do - waitforit "to check stale git lock file" - l' <- getsizes - if l' == l - then liftIO $ mapM_ (removeWhenExistsWith removeFile . fst) l - else go l' - , do - waitforit "for git lock file writer" - go =<< getsizes - ) - waitforit why = do - debug ["Waiting for 60 seconds", why] + go l = whenM (liftIO $ null <$> Lsof.query ("--" : map (fromOsPath . fst) l)) $ do + debug ["Waiting for 60 seconds to check stale git lock file"] liftIO $ threadDelaySeconds $ Seconds 60 + l' <- getsizes + if l' == l + then liftIO $ mapM_ (removeWhenExistsWith removeFile . fst) l + else go l' diff --git a/CHANGELOG b/CHANGELOG index 3cdecf1b1d..8f5771b5f7 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,10 @@ +git-annex (10.20250521) UNRELEASED; urgency=medium + + * assistant: Avoid hanging at startup when a process has a *.lock file + open in the .git directory. + + -- Joey Hess <id@joeyh.name> Thu, 22 May 2025 12:43:38 -0400 + git-annex (10.20250520) upstream; urgency=medium * Preferred content now supports "balanced=groupname:lackingcopies" diff --git a/doc/bugs/assistant_does_not_commit_anything__44___waiting__63__.mdwn b/doc/bugs/assistant_does_not_commit_anything__44___waiting__63__.mdwn index dbf3f81227..aed7f2c8c1 100644 --- a/doc/bugs/assistant_does_not_commit_anything__44___waiting__63__.mdwn +++ b/doc/bugs/assistant_does_not_commit_anything__44___waiting__63__.mdwn @@ -87,3 +87,5 @@ Please advise [[!meta author=yoh]] [[!tag projects/repronim]] + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/assistant_does_not_commit_anything__44___waiting__63__/comment_4_73717884bf2129c55a894e7f1fff490c._comment b/doc/bugs/assistant_does_not_commit_anything__44___waiting__63__/comment_4_73717884bf2129c55a894e7f1fff490c._comment new file mode 100644 index 0000000000..a231d0de41 --- /dev/null +++ b/doc/bugs/assistant_does_not_commit_anything__44___waiting__63__/comment_4_73717884bf2129c55a894e7f1fff490c._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-05-22T16:19:35Z" + content=""" +It's treating `*.lock` as git lock files. Any other filename won't have the +problem. + +[[!commit 635c9a1549f28992b6ae370f6e8687170c971525]] has a rationalle for +that, that git has other lock files than index.lock. It does seem to me to +be doubtful that any other stale git lock than index.lock would cause +significant trouble to the assistant. + +But this code is supposed to deal with stale locks. This lock is not +stale; it has a process holding it open. So the assistant has no +reason to wait on it. I've removed the wait loop. +"""]]
diff --git a/doc/forum/Archive_group_with_special_repositories.mdwn b/doc/forum/Archive_group_with_special_repositories.mdwn new file mode 100644 index 0000000000..e5a95a323c --- /dev/null +++ b/doc/forum/Archive_group_with_special_repositories.mdwn @@ -0,0 +1,39 @@ +How can i achieve that my main remote is exported to bunch of WORM disk? +I tried making special remote directories for each of them on loopback device, but when exporting data it fail. And export the same files to each remote, rather than files not archived. + +What i did + + - Create git annex repo, then imported pictures from special remote directory, and merged it to main branch + +``` +git annex initremote main-pictures type=directory directory=../Pictures/ encryption=none importtree=yes exporttree=yes +git annex import --from main-pictures master --no-content +git annex merge --allow-unrelated-histories main-pictures/master +``` + + this step was done so i don't have symlinks on my current folder structures + + - I created 3 repositories as such on which i want to export data + +``` +git annex initremote BD2 type=directory directory=/run/media/tokariew/BD2/ exporttree=yes encryption=none +git annex wanted BD2 standard +git annex group BD2 archive +``` + - Then i tried to export to all 3 remotes data + +``` + git annex export master -t BD2 -f main-pictures +``` + + Each fail after copying ~25GB (size of loopback) + +``` +(recording state in git...) +export: 28555 failed +``` + + - I was thinking that this is ok, and with export to next device it will export files which are not in previous remotes. + + +What should i do to archive my pictures and have single copy on BD/dvd?
Added a comment
diff --git a/doc/forum/Fill_remotes_sequentially/comment_5_de87c0fd5ddb95c5d0e7c0c9c6b89460._comment b/doc/forum/Fill_remotes_sequentially/comment_5_de87c0fd5ddb95c5d0e7c0c9c6b89460._comment new file mode 100644 index 0000000000..037e64d4ff --- /dev/null +++ b/doc/forum/Fill_remotes_sequentially/comment_5_de87c0fd5ddb95c5d0e7c0c9c6b89460._comment @@ -0,0 +1,37 @@ +[[!comment format=mdwn + username="stv0g" + avatar="http://cdn.libravatar.org/avatar/6faa6cc783a165b25fc1c8f3154ba449" + subject="comment 5" + date="2025-05-21T23:40:28Z" + content=""" +Thanks a lot @joey & @nobodyinperson for your input :) + +> I think this is the same system that there will be a talk about at Distribits 2025? I have been looking forward to that talk. + +Yes exactly :) I am still working on the code. But having a deadline is sometimes helpful :D + +> Relatedly, I wonder about sequential reading when a big git-annex get is run. Do you have some solution for that in mind? + +I am using the approach proposed by you in this post: https://git-annex.branchable.com/forum/Storing_copies_on_LTO_tapes__63__/ +As you noted, this is quite similar to how Glacier is handled. + +And yes, it would also allow batching together multiple `git-annex get` into a single sequential pass over the tape. +I would like to also support batching together objects originating from multiple git-annex repos. + +But this would make it pretty difficult to track the available capacity per tape cartridge as multiple git-annex repos would contribute (or even other non git-annex files). + +LTO tapes are a bit special, as they are append-only. The available capacity will only decreases when new objects are added. +The only option to regain capacity is by erasing the tape. If this happens, I am marking the git-annex remote as dead and initialize a new fresh remote. + +I now realized, that I can use this fact to detect the first EOT (end of tape) error for each tape and then update its preferred content expression.. + +> Another approach would be to configure remote.<name>.annex-cost-command with a command that gives a low cost to the tape in the drive, and a high cost to other tapes. + +Oh that sounds really interesting. But how is this related to the `GETCOST` & `GETAVAILABILITY` messages of the external special remote protocol? + +It seems like that the remote's cost could be a way to define the order in which the remotes are filled? + +Its a lot to digest. I will start testing and playing around with your ideas. + +Thanks :) +"""]]
comment
diff --git a/doc/todo/remove_webapp/comment_2_2e6df80c2f58e4aa79191065d4f4dd76._comment b/doc/todo/remove_webapp/comment_2_2e6df80c2f58e4aa79191065d4f4dd76._comment new file mode 100644 index 0000000000..a2c8f94bb3 --- /dev/null +++ b/doc/todo/remove_webapp/comment_2_2e6df80c2f58e4aa79191065d4f4dd76._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-05-21T18:16:32Z" + content=""" +As part of removing the webapp, I patched Alerts out of the assistant in +[[!commit 33cf88c8b8962a7f5d3b3caada95890d5f4d377e]]. + +It did occur to me that logging the text of the Alert might make the +assistant's log more useful. That commit would be an easy starting point to +adding such logging. + +I don't think it solves [[todo/Recent_remote_activities]] though because +it would only show activity by the assistant, not by other commands, and +not activity that happened in other clones of the repository. +"""]]
comment
diff --git a/doc/forum/Fill_remotes_sequentially/comment_3_74a9fa412d7eb07818c2c031019dd03a._comment b/doc/forum/Fill_remotes_sequentially/comment_3_74a9fa412d7eb07818c2c031019dd03a._comment new file mode 100644 index 0000000000..27aca24f16 --- /dev/null +++ b/doc/forum/Fill_remotes_sequentially/comment_3_74a9fa412d7eb07818c2c031019dd03a._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-05-21T17:06:51Z" + content=""" +As to the ordering, at first I thought it would make sense for it to +pick the most full repository that still has space for a file. + +But: Suppose that the files being processed alternate between large, and +small. The fullest tape is too full for any of the large files, but it can +hold all the small files. The second fullest tape has plenty of room. +In this case, it would constantly switch back and forth between the two tapes. + +sizebalanced picks the least full repository. That's not what we want +either, clearly, since it alternates between repositories frequently when +they're near the same size. + +The optimal solution is for git-annex to remember what repository was used +to store the last file, and can just use that repository again. Unless it's +full, in which case it can pick any repository that still has space. And +then it will continue to use that new repository for subsequent files. + +That memory would necessarily be local to a repository in front of these +tape remotes. (Eg, a cluster gateway). If there were multiple repositories +that were all writing to the same tape remotes, they would each have their +own memory, and chaos would ensue. + +Needing a memory makes me a bit dubious about putting this in a preferred +content expression. But in your specific case, I guess it would work. +"""]] diff --git a/doc/forum/Fill_remotes_sequentially/comment_4_c9e1698e33432ae7cb61cd13706717b3._comment b/doc/forum/Fill_remotes_sequentially/comment_4_c9e1698e33432ae7cb61cd13706717b3._comment new file mode 100644 index 0000000000..3fdf6d9e38 --- /dev/null +++ b/doc/forum/Fill_remotes_sequentially/comment_4_c9e1698e33432ae7cb61cd13706717b3._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-05-21T18:05:27Z" + content=""" +Another approach would be to configure `remote.<name>.annex-cost-command` +with a command that gives a low cost to the tape in the drive, and a high +cost to other tapes. + +But git-annex only checks the cost once at startup. It would need to check +it again after each file. Which could be a new configuration setting. You +would need to make the cost command efficent enough that running it once per +file is not too slow. + +With this approach, the standard archive group preferred content +would probably suffice. +"""]]
comment
diff --git a/doc/forum/Fill_remotes_sequentially/comment_2_ad18dd206fc6b8a6cc3e11cd4d13a351._comment b/doc/forum/Fill_remotes_sequentially/comment_2_ad18dd206fc6b8a6cc3e11cd4d13a351._comment new file mode 100644 index 0000000000..2f3af14153 --- /dev/null +++ b/doc/forum/Fill_remotes_sequentially/comment_2_ad18dd206fc6b8a6cc3e11cd4d13a351._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-05-21T16:39:53Z" + content=""" +I think this is the same system that there will be a talk about at +Distribits 2025? I have been looking forward to that talk. + +@nobodyinperson seems on the right track with the `sequential=tape:1` idea. +And it seems fairly easy to implement using the same building blocks as +`sizebalanced`. + +Relatedly, I wonder about sequential reading when a big `git-annex get` +is run. Do you have some solution for that in mind? I could imagine doing +something similar to Amazon Glacier, where the first get of a file fails, +but is queued for later retrival from tape, allowing multiple requests to +be ordered more efficiently. +"""]]
make it a clickable url
diff --git a/doc/forum/Fill_remotes_sequentially/comment_1_eb69c2ab5c64683ab36ef26a45dd32f5._comment b/doc/forum/Fill_remotes_sequentially/comment_1_eb69c2ab5c64683ab36ef26a45dd32f5._comment index 8c92b6244c..86d8223368 100644 --- a/doc/forum/Fill_remotes_sequentially/comment_1_eb69c2ab5c64683ab36ef26a45dd32f5._comment +++ b/doc/forum/Fill_remotes_sequentially/comment_1_eb69c2ab5c64683ab36ef26a45dd32f5._comment @@ -6,5 +6,5 @@ content=""" It sounds like *maybe* the new `sizebalanced=tape:1` expression could help here? 🤔 But if I understand correctly, it would try to fill the tapes up equally, which is not what you want. There would need to be something like `sequential=tape:1`, which doesn't want to balance the annexes in terms of size, but just in order. But what order? 🤔 Ordered by descending filled annex size? That would be what you need I think. -https://git-annex.branchable.com/git-annex-preferred-content/#:~:text=sizebalanced=groupname +[git-annex-preferred-content](https://git-annex.branchable.com/git-annex-preferred-content/#:~:text=sizebalanced=groupname) """]]
Added a comment: Maybe the new sizebalanced= feature?
diff --git a/doc/forum/Fill_remotes_sequentially/comment_1_eb69c2ab5c64683ab36ef26a45dd32f5._comment b/doc/forum/Fill_remotes_sequentially/comment_1_eb69c2ab5c64683ab36ef26a45dd32f5._comment new file mode 100644 index 0000000000..8c92b6244c --- /dev/null +++ b/doc/forum/Fill_remotes_sequentially/comment_1_eb69c2ab5c64683ab36ef26a45dd32f5._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Maybe the new sizebalanced= feature?" + date="2025-05-21T14:04:50Z" + content=""" +It sounds like *maybe* the new `sizebalanced=tape:1` expression could help here? 🤔 But if I understand correctly, it would try to fill the tapes up equally, which is not what you want. There would need to be something like `sequential=tape:1`, which doesn't want to balance the annexes in terms of size, but just in order. But what order? 🤔 Ordered by descending filled annex size? That would be what you need I think. + +https://git-annex.branchable.com/git-annex-preferred-content/#:~:text=sizebalanced=groupname +"""]]
diff --git a/doc/forum/Fill_remotes_sequentially.mdwn b/doc/forum/Fill_remotes_sequentially.mdwn new file mode 100644 index 0000000000..05b7c110a1 --- /dev/null +++ b/doc/forum/Fill_remotes_sequentially.mdwn @@ -0,0 +1,9 @@ +I am currently working on a new special remote for storing git-annex objects on tape media. +In my setup every tape cartridge is tracked by git-annex as a dedicated special remote. +All these remotes are part of a new `tape` group. + +I would like to use a preferred content expression similar to the `archive` standard group: `(not copies=tape:1) or approxlackingcopies=1`. + +However, with having many tapes (remotes) which would match this expression, I would like to choose only one of the as the target (and always the same one) until it is full. + +This is necessary, as I need to avoid frequently swapping cartridges from the tape drive to minimize wear.
Added a comment: Valid reasons to retire the webapp, how about
diff --git a/doc/todo/remove_webapp/comment_1_445fef0c0c9ca2c54e76bb66bafbf214._comment b/doc/todo/remove_webapp/comment_1_445fef0c0c9ca2c54e76bb66bafbf214._comment new file mode 100644 index 0000000000..68e7fdfc6f --- /dev/null +++ b/doc/todo/remove_webapp/comment_1_445fef0c0c9ca2c54e76bb66bafbf214._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Valid reasons to retire the webapp, how about " + date="2025-05-21T08:22:28Z" + content=""" +These are all valid reasons to retire the webapp. The webapp lacks many features that it would need to be really useful. Also creation of new repos or addition of existing repos into the webapp is not as straightforward as it should be to make it similar in usability like e.g. syncthing. + +I do still use it for shared family folders on my and their machines. It's nice to have something to tell people to click on, then something happens and they can see if syncing works or does anything. `git annex info` is not quite the same, though it shows active transfers. + +What I would love to see as a replacement for the webapp is a command like `git annex assistant-status` that maybe outputs as json of human-readable text what the assitant currently does (pulling, merging, pushing to which remote, downloading, uploading, etc.), all the stuff that was nicely visible in the webapp. (Does this exist already? 🤔) + +Furthermore, a command like `git annex activity` that goes arbitrarily far back in time and statically (non-live) lists recent activities like: + +- yesterday 23:32: remote1 downloaded 5 files (45MB) +- today 10:45: you modified file `document.txt` (10MB) +- today 10:46: you uploaded file `document.txt` (from today 10:45) to remote1, remote2 and remote3 +- today 12:35: Fred McGitFace modified file `document.txt` (12MB) and uploaded to remote2 +- ... + +Basically a human-readable (or as JSON), chronological log of things that happened in the repo. This is a superpower of git-annex: all this information is available as far back as one wants, we just don't have a way to access it nicely. `git log` and `git annex log` exist, but they are too specific, too broad or a bit hard to parse on their own. For example: + +- `git annex activity --since=\"2 weeks ago\" --include='*.doc'` would list things (who committed, which remote received it, etc.) that happened in the last two weeks to *.doc files +- `git annex activity --only-annex --in=remote2` would list recent annex operations (in the `git-annex` branch only) of remote2 +- `git annex activity --only-changes --largerthan=10MB` would list recent file changes (additions, modifications, deletions, etc., in `git log` only) + +This `git annex assistant-log` and `git annex activity` would be a very nice feature to showcase git-annex's power (which other file syncing tool can to this? 🤔) and also solve [[todo/Recent_remote_activities]]. +"""]]
todo
diff --git a/doc/todo/remove_webapp.mdwn b/doc/todo/remove_webapp.mdwn new file mode 100644 index 0000000000..14223a1706 --- /dev/null +++ b/doc/todo/remove_webapp.mdwn @@ -0,0 +1,43 @@ +I am considering removing the `git-annex webapp`. Your feedback is +appreciated if you still use it. --[[Joey]] + +The assistant would be retained, so existing setups that were configured +with the webapp would keep working, although users of those would need to +replace any use of the webapp to control them with command-line use. + +The webapp has been only minimally maintained for about 10 years. There +have been no new features, and while it amazingly continues to work, it +doesn't addess many of the changes in git-annex. For example, there's no +way to configure exporttree special remotes in the webapp. + +I think the webapp is barely used by git-annex users. The point of it was +to make git-annex easy enough to set up to reach a larger user base. That +necessarily meant building something that aspired to be more like dropbox +than git. That never really happened. git-annex found its own user bases +that appreciate its actual strengths, and who have helped build it in the +directions where more and more people find it useful. + +Keeping the webapp in git-annex has a price. It has a complex and +annoying dependency chain. (See [[ditch_yesod]].) +It uses template haskell, which makes build times slow, and makes +building use a lot more memory. + +The webapp also has some security exposure that stock git-annex does not +have. Beyond the business of connecting to the webapp securely, the adhoc +network protocol used by the webapp's pairing interface is baked into the +assistant even when the webapp is not being used. And is not otherwise used +in git-annex, and has had at least one security issue in the past. + +The git-annex binary also ends up significantly larger due to containing +the webapp. And removing it deletes 28 thousand lines of code from +git-annex, including embedded code copies of bootstrap and jquery. + +---- + +The `removewebapp` branch has a working patch to remove the webapp. + +Documentation that mentions the webapp, including doc/git-annex-webapp.mdwn +still would need to be updated. + +Also annex.autoupgrade needs to be updated, one of the options was webapp +specific. Maybe upgrades are out of scope for the assistant too?
Added a comment
diff --git a/doc/bugs/assistant_does_not_commit_anything__44___waiting__63__/comment_3_21962c77c93a14a3d1eaf3658f556ee8._comment b/doc/bugs/assistant_does_not_commit_anything__44___waiting__63__/comment_3_21962c77c93a14a3d1eaf3658f556ee8._comment new file mode 100644 index 0000000000..e80e645db9 --- /dev/null +++ b/doc/bugs/assistant_does_not_commit_anything__44___waiting__63__/comment_3_21962c77c93a14a3d1eaf3658f556ee8._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 3" + date="2025-05-20T15:00:07Z" + content=""" +I see, in my case I have no git lock files but rather a lock file for our process: + +``` +reprostim@reproiner:/data/reprostim$ ls -ld .git/*.lock +-rw-r--r-- 1 reprostim reprostim 0 May 8 13:51 .git/reprostim-videocapture.lock + +``` + +which I guess `git-annex` treats as a git lock file. Is there a way to make them two play nicely without me coming up with some alternative location which is to be ignored by git but local to this repository? May be only known to belong to `git` lock files should be considered? Or may be me placing it under `.git/reprostim-videocapture/lock` would be satisfactory? (do not want to interrupt ATM - doing useful stuff) + +After all both of them \"are not git\" (in that they both also use `.git/` space for their own needs) ;-) + + +"""]]
add news item for git-annex 10.20250520
diff --git a/doc/news/version_10.20241202.mdwn b/doc/news/version_10.20241202.mdwn deleted file mode 100644 index 0c3b2f2cfc..0000000000 --- a/doc/news/version_10.20241202.mdwn +++ /dev/null @@ -1,28 +0,0 @@ -git-annex 10.20241202 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * add: Consistently treat files in a dotdir as dotfiles, even - when ran inside that dotdir. - * add: When adding a dotfile as a non-large file, mention that it's a - dotfile. - * p2phttp: Added --directory option which serves multiple git-annex - repositories located inside a directory. - * When remote.name.annexUrl is an annex+http(s) url, that - uses the same hostname as remote.name.url, which is itself a http(s) - url, they are assumed to share a username and password. This avoids - unnecessary duplicate password prompts. - * git-remote-annex: Fix a reversion introduced in version 10.20241031 - that broke cloning from a special remote. - * git-remote-annex: Fix cloning from a special remote on a crippled - filesystem. - * git-remote-annex: Fix buggy behavior when annex.stalldetection is - configured. - * git-remote-annex: Require git version 2.31 or newer, since old - ones had a buggy git bundle command. - * S3: Support versioning=yes with a readonly bucket. - (Needs aws-0.24.3) - * S3: Send git-annex or other configured User-Agent. - (Needs aws-0.24.3) - * S3: Fix infinite loop and memory blowup when importing from an - unversioned S3 bucket that is large enough to need pagination. - * S3: Use significantly less memory when importing from a - versioned S3 bucket. - * vpop: Only update state after successful checkout."""]] \ No newline at end of file diff --git a/doc/news/version_10.20250102.mdwn b/doc/news/version_10.20250102.mdwn deleted file mode 100644 index bc80c217d8..0000000000 --- a/doc/news/version_10.20250102.mdwn +++ /dev/null @@ -1,20 +0,0 @@ -git-annex 10.20250102 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Added config `url.<base>.annexInsteadOf` corresponding to git's - `url.<base>.pushInsteadOf`, to configure the urls to use for accessing - the git-annex repositories on a server without needing to configure - remote.name.annexUrl in each repository. - * Work around git hash-object --stdin-paths's odd stripping of carriage - return from the end of the line (some windows infection), avoiding - crashing when the repo contains a filename ending in a carriage return. - * Document that settting preferred content to "" is the same as the - default unset behavior. - * sync: Avoid misleading warning about future preferred content - transition when preferred content is set to "". - * Honor annex.addunlocked configuration when importing a tree from a - special remote. - * Removed the i386ancient standalone tarball build for linux, which - was increasingly unable to support new git-annex features. - * Removed support for building with ghc older than 9.0.2, - and with older versions of haskell libraries than are in current Debian - stable. - * stack.yaml: Update to lts-23.2."""]] \ No newline at end of file diff --git a/doc/news/version_10.20250520.mdwn b/doc/news/version_10.20250520.mdwn new file mode 100644 index 0000000000..07a4e9c893 --- /dev/null +++ b/doc/news/version_10.20250520.mdwn @@ -0,0 +1,12 @@ +git-annex 10.20250520 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * Preferred content now supports "balanced=groupname:lackingcopies" + to make files be evenly balanced amoung as many repositories as are + needed to satisfy numcopies. + * map: Fix buggy handling of remotes that are bare git repositories + accessed via ssh. + * map: Avoid looping forever with mutually recursive paths between + repositories accessed via ssh. + * whereused: Fix bug that could find matches from grafts + in remote git-annex branches. + * Windows: Fix bug that can cause git status to show annexed files as + modified when built with OsPath."""]] \ No newline at end of file
close
diff --git a/doc/todo/compute_special_remote.mdwn b/doc/todo/compute_special_remote.mdwn index 5b9fa5bca3..6803e9b466 100644 --- a/doc/todo/compute_special_remote.mdwn +++ b/doc/todo/compute_special_remote.mdwn @@ -62,3 +62,6 @@ I believe that no particular handling of annex key that are declared inputs to c We would need a way for users to indicate that they trust a particular compute introduction or the entity that provided it. Even if git-annex does not implement tooling for that, it would be good to settle on a concept that can be interpreted/implemented by such special remotes. [[!tag projects/INM7]] + +> [[done]], with [[compute_special_remote_remaining_todos]] having some +> more things that could be done to improve this. --[[Joey]]
close
diff --git a/doc/bugs/performance_regression__63___init_takes_times_more.mdwn b/doc/bugs/performance_regression__63___init_takes_times_more.mdwn index 349fd48937..83a152c34f 100644 --- a/doc/bugs/performance_regression__63___init_takes_times_more.mdwn +++ b/doc/bugs/performance_regression__63___init_takes_times_more.mdwn @@ -55,3 +55,9 @@ Since difference is quite substantial I have decided to file this issue. [[!meta author=yoh]] [[!tag projects/dandi]] +> I have decided to close this, it seems plausible that the additional +> overhead of reconcileStaged that I was measuring is not something that +> can be eliminated. Measurement error also is possible. Since I made +> several optimisations and persistent-sqlite got optimised as well, and +> since the scan was moved to not happen at init time, this should no +> longer be a problem. [[done]] --[[Joey]]
close
diff --git a/doc/bugs/too_aggressive_in_claiming___34__Transfer_stalled__34____63__.mdwn b/doc/bugs/too_aggressive_in_claiming___34__Transfer_stalled__34____63__.mdwn index af10cc024d..21aeb27a94 100644 --- a/doc/bugs/too_aggressive_in_claiming___34__Transfer_stalled__34____63__.mdwn +++ b/doc/bugs/too_aggressive_in_claiming___34__Transfer_stalled__34____63__.mdwn @@ -71,3 +71,5 @@ first was a year old version, then tried with bleeding edge 10.20231227+git24-gd [[!meta author=yoh]] [[!tag projects/dandi]] + +> Looks like I probably addressed this, so closing. [[done]] --[[Joey]]
Added a comment: git-annex and starship
diff --git a/doc/bugs/git-annex__58_____60__stdout__62____58___hPutBuf__58___resource_vanished/comment_2_5f2183eaf48075baf3abc2d302fc828d._comment b/doc/bugs/git-annex__58_____60__stdout__62____58___hPutBuf__58___resource_vanished/comment_2_5f2183eaf48075baf3abc2d302fc828d._comment new file mode 100644 index 0000000000..b9096e28f8 --- /dev/null +++ b/doc/bugs/git-annex__58_____60__stdout__62____58___hPutBuf__58___resource_vanished/comment_2_5f2183eaf48075baf3abc2d302fc828d._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="mak" + avatar="http://cdn.libravatar.org/avatar/22a5cc617809fed3c309300a28e8adc4" + subject="git-annex and starship" + date="2025-05-20T04:06:38Z" + content=""" +I arrived here because I was having the same issue. At least in my case, the issue seems to stem from the `git_status` module in starship. I know this is not the best solution, but I was able to stop this issue by increasing the value of `command_timeout` in my starship config. Another potential solution might be to use `gitoxide` for checking the status of the repository with starship (my assumption here is that `gitoxide` might be faster than regular git for checking the status of the repository). +"""]]
Added a comment
diff --git a/doc/forum/Scalability_Issues/comment_2_76087a3733f97655a825c96a2345db36._comment b/doc/forum/Scalability_Issues/comment_2_76087a3733f97655a825c96a2345db36._comment new file mode 100644 index 0000000000..57fb6d240a --- /dev/null +++ b/doc/forum/Scalability_Issues/comment_2_76087a3733f97655a825c96a2345db36._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="cznug" + avatar="http://cdn.libravatar.org/avatar/fc76e1657886a3bb6f2905c554d0f80c" + subject="comment 2" + date="2025-05-16T08:29:33Z" + content=""" +Thanks a lot **joey** for your help. + +I gave it another try without setting the metadata and by using v4 index. + +Instead of directly adding all files to the gateway repository, I distributed the files equally across the 16 nodes to make use of their resources. +On each node I added the file portion to a git-annex repository in order to merge them later via the gateway repository. +Adding the files on each node worked very well using the `--jobs=\"cpus\"` flag. + +However, once I tried to merge all 16 repos using `git-annex sync --no-content --allow-unrelated-histories --jobs=\"cpus\"` all of the nodes crashed due to out-of-memory during this step: + +`remote: (merging synced/git-annex bigserver/git-annex into git-annex...)` + +I assume that you are right and that I simply have too many files. + +Unfortunately, I currently cannot spend more time on investigating the issues. +Thanks again for your help. :) +"""]]
diff --git a/doc/bugs/Re-Adjust_Loses_Commits.mdwn b/doc/bugs/Re-Adjust_Loses_Commits.mdwn new file mode 100644 index 0000000000..bc0f236553 --- /dev/null +++ b/doc/bugs/Re-Adjust_Loses_Commits.mdwn @@ -0,0 +1,80 @@ +Hey there, + +I have come across a condition that leads to the "loss" of commits. + +### Please describe the problem. +After merging some branch into an adjusted branch, git annex can no longer sync the adjusted branch with the main branch. `git annex sync` prints `unable to propagate merge commit Ref "XXX" back to Ref "refs/heads/main"`, however the exit code does not indicate failure. + +Based on this statement from the adjust man page: +> Re-running this command with the same options while inside the adjusted branch will update the adjusted branch as necessary (eg for --hide-missing and --unlock-present), and will also propagate commits back to the original branch. + +I re-adjusted the branch. However, this printed the same warning, but reset the adjusted branch back to main, leading to the loss of all commits only present in the adjusted branch. + +### What steps will reproduce the problem? +The sequence of commands below reproduces the issue. Create a fresh git-annex repository and create some commit in it. Then create a new branch and adding a commit there. Switch back to the main branch, and adjust it. Merge the new branch into the adjusted branch. Create some more commits in the adjusted branch. Try to sync the adjusted branch with the main branch. `git annex sync` fails, while `git annex adjust` leads to the loss of the second and third commit. + +```sh +# Setup Repo +mkdir test && cd test +git init +git annex init + +# Add first data +echo "Some first data" > 01 +git annex add 01 +git commit -m "Add first data" + +# Create adjusted branch +git annex adjust --hide-missing --unlock + +# Branch of main and add data +git switch main +git switch -c new +echo "Some second data" > 02 +git annex add 02 +git commit -m "Add second data" + +# Merge new branch into adjusted +git switch "adjusted/main(hidemissing-unlocked)" +git merge new --no-edit + +# Add more data to adjusted +echo "Some third data" > 03 +git annex add 03 +git commit -m "Add third data" + +# Try to sync adjusted with main +# This reports "unable to propagate merge commit Ref XXX back to Ref "refs/heads/main" +git annex sync + +# Try to sync by re-adjusting +# Also reports the unable to propagate, but also resets "adjusted/main(hidemissing-unlocked)" to the very first commit, loosing two subsequent two +git annex adjust --hide-missing --unlock +``` + +### What version of git-annex are you using? On what operating system? + +Version 10.20250416-X on Linux: + +```sh + git annex version +git-annex version: 10.20250416-gb22a72cd9444071e86a46cc1eb8799e7d085b49d +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24 bloomfilter-2.0.1.2 crypton-0.34 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.8 http-client-0.7.17 persistent-sqlite-2.13.1.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +``` + +But it also occurred on a macOS system, of which I do not have the version at hand. + +### Please provide any additional information below. +Do you know how I can recover the lost commits? + +### Have you had any luck using git-annex before? +Git-annex is such a great piece of software, thanks for creating it. I use it to manage my photography archive. I have a main branch that contains all my images and that I sync and backup across various devices. When adding new images to the archive I create a branch just containing those new images. After culling those photos, I merge it into main. Git-annex does a perfect jobs with this. +But now I started using some SW that cannot deal with symlinks, so I use an adjusted branch of main. Merging the new import branch into the adjusted branch leads to the described issue. + +Many thanks and have a great day!
Added a comment: How to mitigate not finding git-annex-shell on MacOS remote?
diff --git a/doc/forum/not_finding_git-annex-shell_on_remote/comment_6_90bdc29882c09a7e002c3cfd80ba8bfc._comment b/doc/forum/not_finding_git-annex-shell_on_remote/comment_6_90bdc29882c09a7e002c3cfd80ba8bfc._comment new file mode 100644 index 0000000000..d8e02d92b2 --- /dev/null +++ b/doc/forum/not_finding_git-annex-shell_on_remote/comment_6_90bdc29882c09a7e002c3cfd80ba8bfc._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="liam" + avatar="http://cdn.libravatar.org/avatar/5cb416d010c1d3f0ca677dd7f6c822ea" + subject="How to mitigate not finding git-annex-shell on MacOS remote?" + date="2025-05-15T02:35:31Z" + content=""" +As a solution to my own question, for anyone who stumbles upon this with the same problem... +You don't have to set the path on the remote machine to get this working. + +On the local machine, simply do: `git config remote.macbook-remote.annex-shell /usr/local/bin/git-annex-shell` +Then do: `git annex enableremote macbook-remote` +Finally, `git annex sync macbook-remote` to sync it with the remote one. + +See: [[https://git-annex.branchable.com/tips/get_git-annex-shell_into_PATH/]] + +I don't think this really fixed whatever weirdness is going on with the remote, but thankfully the tip works. +"""]]
Added a comment: How to mitigate not finding git-annex-shell on MacOS remote?
diff --git a/doc/forum/not_finding_git-annex-shell_on_remote/comment_5_9592297b8d22822997eb09a4953d8f64._comment b/doc/forum/not_finding_git-annex-shell_on_remote/comment_5_9592297b8d22822997eb09a4953d8f64._comment new file mode 100644 index 0000000000..c116a80e89 --- /dev/null +++ b/doc/forum/not_finding_git-annex-shell_on_remote/comment_5_9592297b8d22822997eb09a4953d8f64._comment @@ -0,0 +1,40 @@ +[[!comment format=mdwn + username="liam" + avatar="http://cdn.libravatar.org/avatar/5cb416d010c1d3f0ca677dd7f6c822ea" + subject="How to mitigate not finding git-annex-shell on MacOS remote?" + date="2025-05-15T02:08:58Z" + content=""" +I have a similar issue on MacOS with git-annex installed via homebrew. + +I'm trying to do `git annex sync macbook-remote` which is setup as a remote pointing to `ssh://macbook-remote:/Users/me/annex`. + +I get the messages: + +``` +Unable to parse git config from macbook-remote +Remote macbook-remote does not have git-annex installed; setting annex-ignore +This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. +Once you have fixed the git-annex installation, run: git annex enableremote macbook-remote +``` + +I have confirmed git-annex is installed on the remote machine, and `which git-annex` and `which git-annex-shell` both show the binaries in the `/usr/local/bin/` directory. + +I tried to fix it by setting the path in `~/.zprofile` of the remote machine: +`PATH=$PATH:/usr/local/bin/git-annex-shell` +Then I ran `git annex enableremote macbook-remote` on the local machine. +This doesn't seem to work. It gives the same error. +I also tried changing the path in the `.zshrc` even though from my understanding I should be setting the `.zprofile` one. +Am I doing it wrong? + +What confuses me is why there is a message about parsing the git config. +The message is not clear which git config it is talking about. +Does this mean there is an issue with the `~/.gitconfig` file? +Maybe it is referring to the `~/annex/.git/config` file instead? +Maybe the shell issue is not the only problem here. + +Toggling verbosity / debug with the `--verbose --debug` flags doesn't seem to give any extra information to identify which file it's having problems parsing. + +Any insight is appreciated. + +Thanks +"""]]
Added a comment: How to install man pages for git-annex?
diff --git a/doc/install/OSX/Homebrew/comment_4_b727faa6cb65a9d0b13fc901ef41881c._comment b/doc/install/OSX/Homebrew/comment_4_b727faa6cb65a9d0b13fc901ef41881c._comment new file mode 100644 index 0000000000..7cd54feb90 --- /dev/null +++ b/doc/install/OSX/Homebrew/comment_4_b727faa6cb65a9d0b13fc901ef41881c._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="liam" + avatar="http://cdn.libravatar.org/avatar/5cb416d010c1d3f0ca677dd7f6c822ea" + subject="How to install man pages for git-annex?" + date="2025-05-15T01:10:48Z" + content=""" +Did anyone figure out how to get man pages to show for git-annex on mac with homebrew? +I have installed on several macs via `brew install git-annex`. +I don't seem to have any man pages for git-annex on my system. + +Thanks +"""]]
response
diff --git a/doc/forum/Scalability_Issues/comment_1_50d812abc8d3531aa5311e362b684575._comment b/doc/forum/Scalability_Issues/comment_1_50d812abc8d3531aa5311e362b684575._comment new file mode 100644 index 0000000000..1113a5d5d2 --- /dev/null +++ b/doc/forum/Scalability_Issues/comment_1_50d812abc8d3531aa5311e362b684575._comment @@ -0,0 +1,37 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-05-14T18:05:28Z" + content=""" +24 million files is a lot of files for a git repository, and is past the +threshhold where git generally slows down, in my experience. + +With that said, the git-annex cluster feature is fairly new, you're the +first person I've heard from using it at scale, and if it has its own +scalability problems, I'd like to address that. And I generally think it's +cool you're using a cluster for such a large repo! + +Have you looked at the "balanced" preferred content expression? It is +designed for the cluster use case and picks nodes so content gets evenly +balanced amoung them. Without needing the overhead of setting metadata. + +The reason your pre-commit-annex hook script is slow is that running +`git-annex metadata` has to update the `.git/annex/index` file, which +you'll probably find is quite a large file. And git updates index files, +by default by rewriting the whole file. + +Needing to rewrite the index file is also probably a lot of the slow +down of "(recording state in git...)". + +There are ways to make git update index files more efficiently, eg +switching to v4 index files. Enabling split index mode can also help +in cases where the index file is being written repeatedly. Do bear +in mind that you would want to make these changes both to `.git/index` +and to `.git/annex/index` + +Your pre-commit-annex hook is running `git-annex metadata` once per file, +so the index gets updated once per file. +Rather than running `git-annex metadata` in a loop, that can also +be sped up by using `--batch` and feed in JSON, and it will only +need to update the index once. +"""]]