Recent changes to this wiki:
diff --git a/doc/forum/Old_files_being_pushed_to_transfer_repository.mdwn b/doc/forum/Old_files_being_pushed_to_transfer_repository.mdwn new file mode 100644 index 0000000000..f48589ba29 --- /dev/null +++ b/doc/forum/Old_files_being_pushed_to_transfer_repository.mdwn @@ -0,0 +1,15 @@ +Hello, + +could someone please help me understand why old files are being pushed to `transfer`-type repository on `git annex sync --content`? The command is executed from a `client`-type repository. The goal is to have new photos downloaded from `transfer`, not the other way around. + +With `--explain` I am seeing: + +``` +[ 20210428_164158.jpg matches preferred content: standard[TRUE] ] +copy 20210428_164158.jpg (to transfer_repository...) +ok +``` + +`transfer_repository` is put into `transfer` group and wanted is set to `standard` hence it *should* not want the old files (there is a client repo that already has them). + +Thanks, jose
diff --git a/doc/bugs/git-annex_tests_fail_with_git_2.50.0.mdwn b/doc/bugs/git-annex_tests_fail_with_git_2.50.0.mdwn new file mode 100644 index 0000000000..0fdae22ca5 --- /dev/null +++ b/doc/bugs/git-annex_tests_fail_with_git_2.50.0.mdwn @@ -0,0 +1,76 @@ +### Please describe the problem. + +The git-annex package in Guix fails the test suite since git was updated to 2.50.0 (from 2.49.0). Since this was a simple version bump I assume it is not guix-specific. + + +### What steps will reproduce the problem? + +Run git-annex' tests with git==2.50.0 installed. + + +### What version of git-annex are you using? On what operating system? + +10.20250605 using Guix + + +### Please provide any additional information below. + +Log excerpt from one of the failing tests: +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + +Tests + Repo Tests v10 unlocked + Init Tests + init: OK (0.15s) + add: OK (0.50s) + preferred content: OK (2.07s) + conflict resolution (removed file): FAIL (2.31s) + ./Test/Framework.hs:92: + sync failed with unexpected exit code (transcript follows) + commit + [master 11f1990] git-annex in tmprepo2 + 1 file changed, 1 insertion(+), 1 deletion(-) + ok + merge synced/master + fatal: stash failed + failed + pull r1 + From ../tmprepo1 + 7080688..2926879 git-annex -> r1/git-annex + b7efeee..c72d963 master -> r1/master + b7efeee..c72d963 synced/master -> r1/synced/master + + CONFLICT (modify/delete): conflictor deleted in refs/remotes/r1/master and modified in HEAD. Version HEAD of conflictor left in tree. + Automatic merge failed; fix conflicts and then commit the result. + (recording state in git...) + + Merge conflict was automatically resolved; you may want to examine the result. + [master 873369e] git-annex automatic merge conflict fix + + Already up to date. + ok + (recording state in git...) + push r1 + remote: (merging synced/git-annex into git-annex...) + To ../tmprepo1 + 7080688..ed3fd17 git-annex -> synced/git-annex + c72d963..873369e master -> synced/master + ok + sync: 1 failed + + Use -p '/conflict resolution (removed file)/' to rerun this test only. + find: OK (0.87s) + edit (no pre-commit): OK (0.70s) + magic: OK (0.75s) + +1 out of 7 tests failed (7.35s) + +# End of transcript or log. +"""]] +All failing tests seem to be related to conflict resolution. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Yes, it's awesome!
diff --git a/doc/bugs/Windows_FTBFS_ATM.mdwn b/doc/bugs/Windows_FTBFS_ATM.mdwn new file mode 100644 index 0000000000..017ab60155 --- /dev/null +++ b/doc/bugs/Windows_FTBFS_ATM.mdwn @@ -0,0 +1,20 @@ +### Please describe the problem. + + +``` +[741 of 750] Compiling Assistant.DeleteRemote + +Error: [S-7282] + Stack failed to execute the build plan. + + While executing the build plan, Stack encountered the error: + + [S-7011] + While building package git-annex-10.20250605 (scroll up to its section to see the error) + using: + D:\a\git-annex\git-annex\.stack-work\dist\8850645a\setup\setup --verbose=1 --builddir=.stack-work\dist\8850645a build exe:git-annex --ghc-options "" + Process exited with code: ExitFailure 1 +Error: Process completed with exit code 1. +``` + +[https://github.com/datalad/git-annex/actions/runs/15867028000/job/44735747787](https://github.com/datalad/git-annex/actions/runs/15867028000/job/44735747787)
Added a comment: We don’t need a 'git annex lock' after a 'git annex add', right?
diff --git a/doc/git-annex-unlock/comment_12_201fdec69ebe9fe1afa24e742e3a7dac._comment b/doc/git-annex-unlock/comment_12_201fdec69ebe9fe1afa24e742e3a7dac._comment new file mode 100644 index 0000000000..ceae23686f --- /dev/null +++ b/doc/git-annex-unlock/comment_12_201fdec69ebe9fe1afa24e742e3a7dac._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="Cletip" + avatar="http://cdn.libravatar.org/avatar/246315cf8f80f70f9b400c24df275806" + subject="We don’t need a 'git annex lock' after a 'git annex add', right?" + date="2025-06-25T22:12:57Z" + content=""" +Hi there! + +I'm learning how to use `unlock`, and I'm relatively new to `git-annex`. + +I think I’ve understood the concept behind this feature, but I don’t get why, in the example, the `git annex lock` command is used after `git annex add`: + +```bash +git annex unlock photo.jpg +gimp photo.jpg +git annex add photo.jpg +git annex lock photo.jpg <------ This one +git commit -m \"redeye removal\" +``` + +Isn’t the file already locked by the `git annex add` ? + +Thank you in advance. +"""]]
Work around git 2.50 bug that caused it to crash when there is a merge conflict with an unlocked annexed file
This fixes several test suite failures with git 2.50.
See the bug report for the full, gory details.
This fixes several test suite failures with git 2.50.
See the bug report for the full, gory details.
diff --git a/CHANGELOG b/CHANGELOG index bf58959309..8af3c1889a 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -6,6 +6,8 @@ git-annex (10.20250606) UNRELEASED; urgency=medium filesystems like VFAT that don't support such filenames. * webapp: Rename "Upgrade Repository" to "Convert Repository" to avoid confusion with git-annex upgrade. + * Work around git 2.50 bug that caused it to crash when there is a merge + conflict with an unlocked annexed file. -- Joey Hess <id@joeyh.name> Mon, 23 Jun 2025 11:11:29 -0400 diff --git a/Database/Keys.hs b/Database/Keys.hs index cc3f189b99..22962e1372 100644 --- a/Database/Keys.hs +++ b/Database/Keys.hs @@ -1,6 +1,6 @@ {- Sqlite database of information about Keys - - - Copyright 2015-2022 Joey Hess <id@joeyh.name> + - Copyright 2015-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -260,7 +260,7 @@ isInodeKnown i s = or <$> runReaderIO ContentTable - is an associated file. -} reconcileStaged :: Bool -> H.DbQueue -> Annex DbTablesChanged -reconcileStaged dbisnew qh = ifM isBareRepo +reconcileStaged dbisnew qh = ifM notneeded ( return mempty , do gitindex <- inRepo currentIndexFile @@ -335,6 +335,14 @@ reconcileStaged dbisnew qh = ifM isBareRepo getindextree = inRepo $ \r -> writeTreeQuiet $ r { gitGlobalOpts = gitGlobalOpts r ++ bypassSmudgeConfig } + notneeded = isBareRepo + -- Avoid doing anything when run by the + -- smudge clean filter. When that happens in a conflicted + -- merge situation, running git write-tree + -- here would cause git merge to fail with an internal + -- error. This works around around that bug in git. + <||> Annex.getState Annex.insmudgecleanfilter + diff old new = -- Avoid running smudge clean filter, since we want the -- raw output, and it would block trying to access the diff --git a/doc/bugs/test_suite_fail_with_git_2.50.mdwn b/doc/bugs/test_suite_fail_with_git_2.50.mdwn index 0838279883..b8940169b6 100644 --- a/doc/bugs/test_suite_fail_with_git_2.50.mdwn +++ b/doc/bugs/test_suite_fail_with_git_2.50.mdwn @@ -82,3 +82,15 @@ So, something that reconcileStaged does is making git unhappy when it runs the smudge clean filter while creating a stash. It seems logical that the problem would involve the index file, which `reconcileStaged` touches, and which gets updated when stashing.. + +> Made reconcileStaged run `git write-tree` but not do anything else, and +> that is sufficient to make git stash fail. This must be a bug in git, +> `git write-tree` should be able to be run at any time, even if it exits +> 1 due to the index being in conflict. Having `git write-tree` affect +> another process that was already running is not good behavior for git. +> Since `git write-tree` does need to sometimes update the index, +> this feels like lacking locking. +> +> I have worked around this by making reconcileStaged avoid doing anything +> when called by the smudge clean filter. Which I don't think will cause +> any other problems, fingers crossed. [[done]] --[[Joey]]
bug
diff --git a/doc/bugs/test_suite_fail_with_git_2.50.mdwn b/doc/bugs/test_suite_fail_with_git_2.50.mdwn new file mode 100644 index 0000000000..0838279883 --- /dev/null +++ b/doc/bugs/test_suite_fail_with_git_2.50.mdwn @@ -0,0 +1,84 @@ +With git 2.50, there are several test suite failures, all involving +`git-annex sync` run in a situation with a conflict. + +git pull or merge fails with "fatal: stash failed" + +git has a known bug with the same symptom, that affected older versions too, +and was reported upstream with a test case, but never fixed. +See [[bugs/resolvemerge_fails_when_unlocked_empty_files_exist]]. + +That bug only affects unlocked empty annexed files. +The failing parts of the test suite use unlocked files, but not empty +files. + +To run one of the failing tests, without running the rest of the test +suite: + + git-annex test -p '$0=="Tests.Repo Tests v10 unlocked.conflict resolution"' + +The failure is somewhat intermittent. + +Comparing with the same test case run with git 2.47.2, with `GIT_TRACE=1` +and `--test-debug`, the new git has this: + + [2025-06-25 10:55:00.363063271] (Utility.Process) process [3665269] call: git ["--git-dir=.git","--work-tree=.","-c","merge.directoryRenames=false","--literal-pathspecs","-c","annex.debug=true","merge","--no-edit","refs/remotes/r2/master"] + ... + 10:55:00.492909 run-command.c:673 trace: run_command: git stash create + 10:55:00.492924 run-command.c:765 trace: start_command: /usr/lib/git-core/git stash create + 10:55:00.494550 git.c:476 trace: built-in: git stash create + 10:55:00.495283 run-command.c:673 trace: run_command: 'git-annex smudge --clean -- '\''conflictor'\''' + 10:55:00.495304 run-command.c:765 trace: start_command: /bin/sh -c 'git-annex smudge --clean -- '\''conflictor'\''' 'git-annex smudge --clean -- '\''conflictor'\''' + 10:55:00.510128 git.c:476 trace: built-in: git config --null --list + [2025-06-25 10:55:00.514560172] (Utility.Process) process [3665339] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"] + 10:55:00.514833 git.c:476 trace: built-in: git cat-file '--batch-check=%(objectname) %(objecttype) %(objectsize)' + 10:55:00.517739 git.c:476 trace: built-in: git cat-file --batch + [2025-06-25 10:55:00.519817243] (Utility.Process) process [3665342] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"] + [2025-06-25 10:55:00.524719227] (Utility.Process) process [3665344] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","write-tree"] + [2025-06-25 10:55:00.525956861] (Utility.Process) process [3665344] done ExitFailure 128 + [2025-06-25 10:55:00.526033972] (Database.Keys) reconcileStaged start (in conflict) + [2025-06-25 10:55:00.529125922] (Utility.Process) process [3665346] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/annex/last-index"] + 10:55:00.530711 git.c:476 trace: built-in: git show-ref --hash refs/annex/last-index + [2025-06-25 10:55:00.531264053] (Utility.Process) process [3665346] done ExitSuccess + [2025-06-25 10:55:00.531808824] (Utility.Process) process [3665347] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"] + [2025-06-25 10:55:00.534102268] (Utility.Process) process [3665348] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] + 10:55:00.534260 git.c:476 trace: built-in: git cat-file '--batch-check=%(objectname) %(objecttype) %(objectsize)' --buffer + [2025-06-25 10:55:00.537040154] (Utility.Process) process [3665349] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","-c","diff.external=","diff","0fbd88293710b6a33cda68626d317e6ee0e991d5","--staged","--raw","-z","--no-abbrev","-G/annex/objects/","--no-renames","--ignore-submodules=all","--no-textconv","--no-ext-diff"] + 10:55:00.537633 git.c:476 trace: built-in: git cat-file '--batch=%(objectname) %(objecttype) %(objectsize)' --buffer + 10:55:00.537839 git.c:476 trace: built-in: git diff 0fbd88293710b6a33cda68626d317e6ee0e991d5 --staged --raw -z --no-abbrev-G/annex/objects/ --no-renames --ignore-submodules=all --no-textconv --no-ext-diff + [2025-06-25 10:55:00.539717016] (Utility.Process) process [3665349] done ExitSuccess + [2025-06-25 10:55:00.540326407] (Utility.Process) process [3665348] done ExitSuccess + [2025-06-25 10:55:00.540404379] (Utility.Process) process [3665347] done ExitSuccess + [2025-06-25 10:55:00.540459121] (Database.Keys) reconcileStaged end + [2025-06-25 10:55:00.541662584] (Utility.Process) process [3665342] done ExitSuccess + [2025-06-25 10:55:00.542107582] (Utility.Process) process [3665339] done ExitSuccess + fatal: stash failed + [2025-06-25 10:55:00.555752855] (Utility.Process) process [3665269] done ExitFailure 128 + +The old git has: + + 10:52:18.580330 run-command.c:666 trace: run_command: git stash create + 10:52:18.580348 run-command.c:758 trace: start_command: /usr/lib/git-core/git stash create + 10:52:18.581939 git.c:479 trace: built-in: git stash create + 10:52:18.583737 run-command.c:666 trace: run_command: 'git-annex smudge -- '\''conflictor'\''' + 10:52:18.583751 run-command.c:758 trace: start_command: /bin/sh -c 'git-annex smudge -- '\''conflictor'\''' 'git-annex smudge -- '\''conflictor'\''' + 10:52:18.589604 git.c:479 trace: built-in: git config --null --list + Auto-merging conflictor + +So, `git merge` has changed to run the smudge clean hook, on the conflicted +file. + +It seemed possible that git is feeding in a different version of the file +than the one in the working tree, which would make git-annex's smudge clean +filter use the working tree version of the file, which would not be good. +(Why that would cause git to explode I don't know.) So, I instrumented +`git-annex smudge`, and verified that in each case where it uses the +content of the file on disk, that is the same as the file content that was +provided to it on stdin. So that seems to rule out this theory. + +Noticing that `reconcileStaged` is getting run in the new log and not in +the old log, I replaced it with a noop. **That avoids the problem.** + +So, something that reconcileStaged does is making git unhappy when it +runs the smudge clean filter while creating a stash. It seems logical that +the problem would involve the index file, which `reconcileStaged` touches, +and which gets updated when stashing..
Added a comment
diff --git a/doc/bugs/adb_fchown_error/comment_5_068d43f41f5c0f7537b3f7d9e4f3c8ad._comment b/doc/bugs/adb_fchown_error/comment_5_068d43f41f5c0f7537b3f7d9e4f3c8ad._comment new file mode 100644 index 0000000000..235b6f8354 --- /dev/null +++ b/doc/bugs/adb_fchown_error/comment_5_068d43f41f5c0f7537b3f7d9e4f3c8ad._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="xentac" + avatar="http://cdn.libravatar.org/avatar/773b6c7b0dc34f10b66aa46d2730a5b3" + subject="comment 5" + date="2025-06-24T20:54:16Z" + content=""" +To close the loop (for anyone searching for this later), I had to setgid (`chown g+s <directory`) on the directories created by git annex so that `adb push` could successfully fchown them. It was able to write the contents of the file (the percentage went up and I could see the file being written from an `adb shell` but once adb tried to `fchown` it would fail and delete the file from the filesystem. Now my files are uploading successfully again. +"""]]
diff --git a/doc/forum/Confirming_my_preferred_content_understanding.mdwn b/doc/forum/Confirming_my_preferred_content_understanding.mdwn new file mode 100644 index 0000000000..67ca20d911 --- /dev/null +++ b/doc/forum/Confirming_my_preferred_content_understanding.mdwn @@ -0,0 +1,14 @@ +I've been using git-annex for a little while now to keep two copies of my data backed up across three and soon to be four external hard drives. I recently came across the preferred content expressions and realized that they could make my life a lot easier, but I want to make sure I understand them correctly first. + +As I mentioned, I want two copies of my data stored across several external hard drives. Right now I keep track of this manually and I don't try to balance the data between drives. From what I understand, I can add the repository from each drive to the archive group. And then call + +``` + git annex groupwanted archive (not (copies=archive:2 and balanced=archive:lackingcopies)) + git annex wanted here groupwanted +``` + +And then when I sync the repositories it will try and balance the data between themselves while ensuring there are at least two copies of the data. I can also easily add hard drives to the system by cloning a repository on them and then adding it to the group. + +Do I have the correct understanding here or am I missing something? Are there any suggestions or advice you have for this setup? Is using `fullybalanced` or `sizebalanced` better? + +
webapp: Rename "Upgrade Repository" to "Convert Repository"
To avoid confusion with git-annex upgrade.
Sponsored-by: Graham Spencer on Patreon
To avoid confusion with git-annex upgrade.
Sponsored-by: Graham Spencer on Patreon
diff --git a/Assistant/WebApp/Configurators/Edit.hs b/Assistant/WebApp/Configurators/Edit.hs index 4103f6bccb..b0a9d7f4f1 100644 --- a/Assistant/WebApp/Configurators/Edit.hs +++ b/Assistant/WebApp/Configurators/Edit.hs @@ -291,9 +291,9 @@ encrypted using gpg key: |] getRepoEncryption _ _ = return () -- local repo -getUpgradeRepositoryR :: RepoId -> Handler () -getUpgradeRepositoryR (RepoUUID _) = redirect DashboardR -getUpgradeRepositoryR r = go =<< liftAnnex (repoIdRemote r) +getConvertRepositoryR :: RepoId -> Handler () +getConvertRepositoryR (RepoUUID _) = redirect DashboardR +getConvertRepositoryR r = go =<< liftAnnex (repoIdRemote r) where go Nothing = redirect DashboardR go (Just rmt) = do diff --git a/Assistant/WebApp/routes b/Assistant/WebApp/routes index 8774228081..526a8741bc 100644 --- a/Assistant/WebApp/routes +++ b/Assistant/WebApp/routes @@ -36,7 +36,7 @@ /config/repository/edit/new/cloud/#UUID EditNewCloudRepositoryR GET POST /config/repository/sync/disable/#RepoId DisableSyncR GET /config/repository/sync/enable/#RepoId EnableSyncR GET -/config/repository/upgrade/#RepoId UpgradeRepositoryR GET +/config/repository/convert/#RepoId ConvertRepositoryR GET /config/repository/add/drive AddDriveR GET POST /config/repository/add/drive/confirm/#RemovableDrive ConfirmAddDriveR GET diff --git a/CHANGELOG b/CHANGELOG index 2ce4f60831..bf58959309 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -4,6 +4,8 @@ git-annex (10.20250606) UNRELEASED; urgency=medium which can happen with eg a S3 bucket. * Avoid a problem with temp file names ending in whitespace on filesystems like VFAT that don't support such filenames. + * webapp: Rename "Upgrade Repository" to "Convert Repository" + to avoid confusion with git-annex upgrade. -- Joey Hess <id@joeyh.name> Mon, 23 Jun 2025 11:11:29 -0400 diff --git a/doc/forum/What_does___34__upgrade__34___in_the_webapp_do__63__/comment_1_9fcb53dcfaa815e872ea5a84612d214b._comment b/doc/forum/What_does___34__upgrade__34___in_the_webapp_do__63__/comment_1_9fcb53dcfaa815e872ea5a84612d214b._comment new file mode 100644 index 0000000000..7b4abeb09d --- /dev/null +++ b/doc/forum/What_does___34__upgrade__34___in_the_webapp_do__63__/comment_1_9fcb53dcfaa815e872ea5a84612d214b._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-24T16:27:05Z" + content=""" +This sets `remote.<name>.annex-ignore` to false in the git config. + +You would normally only see this if the remote server didn't have +git-annex-shell installed so was only a git repository with no annex. +If it then got installed, so you were able to use git-annex to store +files there, this makes git-annex try again to use git-annex-shell in +that repository. + +The naming is unfortunate since it's not related to upgrading the +repository version at all. I'll rename it to "Convert Repository". +"""]] diff --git a/templates/configurators/edit/nonannexremote.hamlet b/templates/configurators/edit/nonannexremote.hamlet index acfa10ec66..98be4763ef 100644 --- a/templates/configurators/edit/nonannexremote.hamlet +++ b/templates/configurators/edit/nonannexremote.hamlet @@ -8,11 +8,11 @@ $if sshrepo <p> If this repository's ssh server has git-annex installed, you can # - upgrade this repository to a full git annex, which will store the + convert this repository to a full git annex, which will store the contents of your files, not only their metadata. <p> - <a .btn .btn-default href="@{UpgradeRepositoryR r}"> - Upgrade Repository + <a .btn .btn-default href="@{ConvertRepositoryR r}"> + Convert Repository <h2> Repository information <p>
comment
diff --git a/doc/forum/Fill_remotes_sequentially/comment_6_6f128a1a28298bb5662fa32e9b350946._comment b/doc/forum/Fill_remotes_sequentially/comment_6_6f128a1a28298bb5662fa32e9b350946._comment new file mode 100644 index 0000000000..b73f832b44 --- /dev/null +++ b/doc/forum/Fill_remotes_sequentially/comment_6_6f128a1a28298bb5662fa32e9b350946._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2025-06-24T16:13:56Z" + content=""" +> It seems like that the remote's cost could be a way to define the order in which the remotes are filled? + +Yes because git-annex always tries to use the lowest cost remote first when +storing or retrieving a file. So using eg the standard archive group +preferred content setting, it would store a file on the lowest cost remote, +and then the other remotes would no longer want a copy of the file since it +was already stored to one. + +`GETCOST` defines the default cost for a special remote. So rather than +configuring `remote.<name>.annex-cost-command`, your special remote could +check if the expected tape is currently in the drive, and return a lower cost. + +If the cost needs to change while git-annex is running, due eg to a tape +being swapped, it could re-query `GETCOST` after every file. Which would +be less expensive than running a cost command. I think that a config +setting to make it do that is a feasible change to make to git-annex. + +(Ignore `GETAVAILABILITY`, it's barely used and only by the assistant.) +"""]]
comment
diff --git a/doc/bugs/adb_fchown_error/comment_4_cd37d2fe9767162cbc616ef67b1d349f._comment b/doc/bugs/adb_fchown_error/comment_4_cd37d2fe9767162cbc616ef67b1d349f._comment new file mode 100644 index 0000000000..9d109afc94 --- /dev/null +++ b/doc/bugs/adb_fchown_error/comment_4_cd37d2fe9767162cbc616ef67b1d349f._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-06-24T15:12:38Z" + content=""" +Might be that an Android upgrade changed permissions.. +"""]]
response
diff --git a/doc/todo/Relative_Ignores_for_Relative_Imports/comment_1_2c7dd6b757c9bddd71d7cbc9f0eb2b2d._comment b/doc/todo/Relative_Ignores_for_Relative_Imports/comment_1_2c7dd6b757c9bddd71d7cbc9f0eb2b2d._comment new file mode 100644 index 0000000000..e5ba5ac957 --- /dev/null +++ b/doc/todo/Relative_Ignores_for_Relative_Imports/comment_1_2c7dd6b757c9bddd71d7cbc9f0eb2b2d._comment @@ -0,0 +1,55 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-24T14:55:03Z" + content=""" +As far as I can see, this works correctly already. + + joey@darkstar:~/tmp/bench5/r>cat .gitignore + foo/*.c + joey@darkstar:~/tmp/bench5/r>git-annex initremote d type=directory importtree=yes encryption=none directory=../d + joey@darkstar:~/tmp/bench5/r>echo hi > ../d/y.c + joey@darkstar:~/tmp/bench5/r>echo hi2 > ../d/y.d + joey@darkstar:~/tmp/bench5/r>git-annex import master:foo --from d + list d ok + import d y.d + ok + update refs/remotes/d/master ok + (recording state in git...) + joey@darkstar:~/tmp/bench5/r>git merge d/master + Merge made by the 'ort' strategy. + foo/y.d | 1 + + 1 file changed, 1 insertion(+) + create mode 120000 foo/y.d + +So the .gitignore of `foo/*.c` applied when importing from `d` into a `foo` +subdirectory, with the `.c` file not being imported, and other files being +imported. When I import from `d` into a different subdirectory, the +.gitignore does not match, and those files are imported: + + joey@darkstar:~/tmp/bench5/r>git-annex import master:bar --from d + list d ok + import d x.c + ok + import d y.c + ok + update refs/remotes/d/master ok + +> Therefore I suggest that imports to a subtree respect ignores as if the files in the tree were already adjusted to their new +> destination. + +The above shows this does happen. Also I can confirm it by inspection of +the code, particularly `notIgnoredImportLocation` adds the ImportSubTree +location to the filepath. + +> A similar argument could be made for attributes in general. +> I haven't done the testing on import attributes (namely `largefiles`), but I would want these to respect subtree paths as well. + +Based on my inspection of the code, it already does. + +---- + +So, I suspect I am either misunderstanding what you are trying to do, or you are +confused. In either case, it would be helpful if you show a complete +example of what you're trying to do. +"""]]
Added a comment
diff --git a/doc/bugs/adb_fchown_error/comment_3_db5e2fb33fa8f1b219fd16dee4c39a24._comment b/doc/bugs/adb_fchown_error/comment_3_db5e2fb33fa8f1b219fd16dee4c39a24._comment new file mode 100644 index 0000000000..155cfba26f --- /dev/null +++ b/doc/bugs/adb_fchown_error/comment_3_db5e2fb33fa8f1b219fd16dee4c39a24._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="xentac" + avatar="http://cdn.libravatar.org/avatar/773b6c7b0dc34f10b66aa46d2730a5b3" + subject="comment 3" + date="2025-06-23T18:24:47Z" + content=""" +The weirdest part is this actually did work previously. A few months ago I was able to get files into that directory. I'll start looking into it as an adb problem. +"""]]
comment
diff --git a/doc/special_remotes/directory/comment_24_4e65af70ce85cbd1d1d6efeb24462673._comment b/doc/special_remotes/directory/comment_24_4e65af70ce85cbd1d1d6efeb24462673._comment new file mode 100644 index 0000000000..5d7bc1acef --- /dev/null +++ b/doc/special_remotes/directory/comment_24_4e65af70ce85cbd1d1d6efeb24462673._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: Ignoring files on directory special remote""" + date="2025-06-23T17:32:23Z" + content=""" +If you exclude them from wanted or gitignore them before ever importing +from the special remote, it won't delete them. But if you already imported +a tree containing the files, and then exclude them, and then export a tree, +git-annex will see that the old tree contained the file, and the new tree +did not, and so will delete the file. +"""]]
comment
diff --git a/doc/bugs/openTempfile_invalid_argument_on_sd_card/comment_1_89e96808f0274df3d6144546a2761d57._comment b/doc/bugs/openTempfile_invalid_argument_on_sd_card/comment_1_89e96808f0274df3d6144546a2761d57._comment new file mode 100644 index 0000000000..c643020e83 --- /dev/null +++ b/doc/bugs/openTempfile_invalid_argument_on_sd_card/comment_1_89e96808f0274df3d6144546a2761d57._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-23T17:11:25Z" + content=""" +Reproduced this, and found the filename it's using with strace: + + joey@darkstar:~/tmp/annex>touch '/home/joey/mnt/Music/KIRA/KIRA - The Introduction (Deluxe Edition)/KIRA - The Introduction (Deluxe Edition) - 05 Games (feat566f1-3-1a21815. ' + touch: cannot touch '/home/joey/mnt/Music/KIRA/KIRA - The Introduction (Deluxe Edition)/KIRA - The Introduction (Deluxe Edition) - 05 Games (feat566f1-3-1a21815. ': Invalid argument + +The problem is the trailing space in the filename, which VFAT does not support: + + joey@darkstar:~/tmp/annex>touch '/home/joey/mnt/Music/KIRA/foo ' + touch: cannot touch '/home/joey/mnt/Music/KIRA/foo ': Invalid argument + +There was already a similar workaround for of not allowing a filename to +end with ".", so I made it also check for whitespace. +"""]]
prevent relatedTemplate from truncating a filename to end in whitespace
Avoid a problem with temp file names ending in whitespace on filesystems
like VFAT that don't support such filenames.
See a6eb7d73398fc1729faec663c1822023e540643b previously for the same but
with "."
At some point relatedTemplate is more bother than it's worth and it would
be simpler to just use "temp" as the basename of all temp files. We seem to
be approaching that point, since my interest in absurd ancient filesystem
limitations is limited.
Sponsored-by: unqueued on Patreon
Avoid a problem with temp file names ending in whitespace on filesystems
like VFAT that don't support such filenames.
See a6eb7d73398fc1729faec663c1822023e540643b previously for the same but
with "."
At some point relatedTemplate is more bother than it's worth and it would
be simpler to just use "temp" as the basename of all temp files. We seem to
be approaching that point, since my interest in absurd ancient filesystem
limitations is limited.
Sponsored-by: unqueued on Patreon
diff --git a/CHANGELOG b/CHANGELOG index 2b2823d998..2ce4f60831 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -2,6 +2,8 @@ git-annex (10.20250606) UNRELEASED; urgency=medium * Skip and warn when a tree import includes empty filenames, which can happen with eg a S3 bucket. + * Avoid a problem with temp file names ending in whitespace on + filesystems like VFAT that don't support such filenames. -- Joey Hess <id@joeyh.name> Mon, 23 Jun 2025 11:11:29 -0400 diff --git a/Utility/Tmp.hs b/Utility/Tmp.hs index f8be5b29c0..f373ca6c1c 100644 --- a/Utility/Tmp.hs +++ b/Utility/Tmp.hs @@ -117,13 +117,15 @@ relatedTemplate' :: RawFilePath -> RawFilePath relatedTemplate' f | len > templateAddedLength = {- Some filesystems like FAT have issues with filenames - - ending in ".", so avoid truncating a filename to end - - that way. -} - B.dropWhileEnd (== dot) $ + - ending in ".", and others like VFAT don't allow a + - filename to end with trailing whitespace, so avoid + - truncating a filename to end that way. -} + B.dropWhileEnd disallowed $ truncateFilePath (len - templateAddedLength) f | otherwise = f where len = B.length f + disallowed c = c == dot || isSpace (chr (fromIntegral c)) dot = fromIntegral (ord '.') #else -- Avoids a test suite failure on windows, reason unknown, but diff --git a/doc/bugs/openTempfile_invalid_argument_on_sd_card.mdwn b/doc/bugs/openTempfile_invalid_argument_on_sd_card.mdwn index f3c37c5327..2763ae3a17 100644 --- a/doc/bugs/openTempfile_invalid_argument_on_sd_card.mdwn +++ b/doc/bugs/openTempfile_invalid_argument_on_sd_card.mdwn @@ -43,3 +43,5 @@ failed ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) I use it quite successfully to archive media on removable spinning hard drives. + +> [[fixed|done]] --[[Joey]]
Skip and warn when a tree import includes empty filenames
Which can happen with eg a S3 bucket.
Sponsored-by: Dartmouth College's DANDI project
Which can happen with eg a S3 bucket.
Sponsored-by: Dartmouth College's DANDI project
diff --git a/Annex/Import.hs b/Annex/Import.hs index 2d2526a544..b9c1b74e87 100644 --- a/Annex/Import.hs +++ b/Annex/Import.hs @@ -1,6 +1,6 @@ {- git-annex import from remotes - - - Copyright 2019-2024 Joey Hess <id@joeyh.name> + - Copyright 2019-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -64,6 +64,7 @@ import qualified Utility.Matcher import qualified Database.Export as Export import qualified Database.ContentIdentifier as CIDDb import qualified Logs.ContentIdentifier as CIDLog +import qualified Utility.OsString as OS import Backend.Utilities import Control.Concurrent.STM @@ -1048,6 +1049,10 @@ pruneImportMatcher = Utility.Matcher.pruneMatcher matchNeedsKey - write a git tree that contains that, git will complain and refuse to - check it out. - + - Filters out any paths that contain an empty filename, because git cannot + - represent an empty filename in a tree, but some special remotes do + - support empty filenames. + - - Filters out new things not matching the FileMatcher or that are - gitignored. However, files that are already in git get imported - regardless. (Similar to how git add behaves on gitignored files.) @@ -1094,19 +1099,35 @@ getImportableContents r importtreeconfig ci matcher = do wanted dbhandle (loc, (_cid, sz)) | ingitdir = pure False + | OS.null (fromImportLocation loc) = do + warning $ UnquotedString "Cannot import a file with an empty filename" + return False + | isdirectory = do + warning $ UnquotedString "Cannot import a file with a name that appears to be a directory: " + <> QuotedPath (fromImportLocation loc) + return False | otherwise = isknown <||> (matches <&&> notignored) where -- Checks, from least to most expensive. #ifdef mingw32_HOST_OS - ingitdir = ".git" `elem` Posix.splitDirectories (fromOsPath (fromImportLocation loc)) + ingitdir = ".git" `elem` Posix.splitDirectories loc' #else ingitdir = literalOsPath ".git" `elem` splitDirectories (fromImportLocation loc) +#endif +#ifdef mingw32_HOST_OS + isdirectory = Posix.dropFileName loc' == loc' +#else + isdirectory = dropFileName (fromImportLocation loc) == fromImportLocation loc #endif matches = matchesImportLocation matcher loc sz isknown = isKnownImportLocation dbhandle loc notignored = notIgnoredImportLocation importtreeconfig ci loc - + +#ifdef mingw32_HOST_OS + loc' = fromOsPath (fromImportLocation loc) +#endif + wantedunder dbhandle root (loc, v) = wanted dbhandle (importableContentsChunkFullLocation root loc, v) diff --git a/CHANGELOG b/CHANGELOG index a2036e20fd..2b2823d998 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,10 @@ +git-annex (10.20250606) UNRELEASED; urgency=medium + + * Skip and warn when a tree import includes empty filenames, + which can happen with eg a S3 bucket. + + -- Joey Hess <id@joeyh.name> Mon, 23 Jun 2025 11:11:29 -0400 + git-annex (10.20250605) upstream; urgency=medium * sync: Push the current branch first, rather than a synced branch, diff --git a/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__.mdwn b/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__.mdwn index f0ffec47fb..497f4bdbb5 100644 --- a/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__.mdwn +++ b/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__.mdwn @@ -44,3 +44,5 @@ the version from pypi @mih started to build recently [[!meta author=yoh]] [[!tag projects/dandi]] + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__/comment_1_85ea14723ade98ed24658ee13b42814f._comment b/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__/comment_1_85ea14723ade98ed24658ee13b42814f._comment new file mode 100644 index 0000000000..e61e32ea65 --- /dev/null +++ b/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__/comment_1_85ea14723ade98ed24658ee13b42814f._comment @@ -0,0 +1,29 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-23T14:32:13Z" + content=""" +Your hypothesis is right, it's items in the bucket with names ending in "/". + +After fixing git-annex to skip and warn about those, it looks like this: + + list s3-origin + Cannot import a file with a name that appears to be a directory: models/smartspim_production_models/ + + Cannot import a file with a name that appears to be a directory: models/smartspim_production_models/model_2_12202024/ + + Cannot import a file with a name that appears to be a directory: point_annotations/ + + Cannot import a file with a name that appears to be a directory: point_annotations/06-21-2024/ + ok + +Note that "models/smartspim_production_models/config.json" is a file in the +bucket located "inside" the first path. So this is not a case of an empty +directory being somehow stored to a S3 bucket as a file, but of something else. +I have not looked at the contents of these objects, as I would likely not +understand them anyway. + +I couldn't think of a better method than to warn and skip them. Any name mangling +would take a name that could be used by some other file. And not warning risks the user +being surprised when all the data in the bucket does not get imported. +"""]]
response
diff --git a/doc/bugs/add__58___inconsistently_treats_files_in_dotdirs_as_dotfiles/comment_9_022c07b37308d43a1afb1999d5379c99._comment b/doc/bugs/add__58___inconsistently_treats_files_in_dotdirs_as_dotfiles/comment_9_022c07b37308d43a1afb1999d5379c99._comment new file mode 100644 index 0000000000..f4f5a87f52 --- /dev/null +++ b/doc/bugs/add__58___inconsistently_treats_files_in_dotdirs_as_dotfiles/comment_9_022c07b37308d43a1afb1999d5379c99._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 9""" + date="2025-06-23T14:20:44Z" + content=""" +(Note that this is a bug that has already been closed.) + +While yes, leading dot just means "hide it from ls", people *do* have a +legitimate complaint when `git-annex add` annexes .gitattributes or a file +like that. Since we don't have any other general semantic information about +config files besides leading dot, this seems to be to be the best that can +be done to avoid what would otherwise be a common complaint, and turn it +into an uncommon complaint. + +The only other good approach seems to be the git-lfs approach, of requiring +that the user configure explicitly which files they consider large, with eg +`git lfs track "*.iso"` +"""]]
not a bug in git-annex
diff --git a/doc/bugs/adb_fchown_error.mdwn b/doc/bugs/adb_fchown_error.mdwn index 6e4bc2f399..c9b1a2e948 100644 --- a/doc/bugs/adb_fchown_error.mdwn +++ b/doc/bugs/adb_fchown_error.mdwn @@ -43,3 +43,5 @@ failed ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) I use git-annex to back up photos, videos, and all sorts of stuff! It's really cool! + +> not a bug in git-annex, [[closed|done]] --[[Joey]] diff --git a/doc/bugs/adb_fchown_error/comment_2_de03d805b4557869fa465af4fdf2ff4b._comment b/doc/bugs/adb_fchown_error/comment_2_de03d805b4557869fa465af4fdf2ff4b._comment new file mode 100644 index 0000000000..559e58a5c0 --- /dev/null +++ b/doc/bugs/adb_fchown_error/comment_2_de03d805b4557869fa465af4fdf2ff4b._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-06-23T14:11:10Z" + content=""" +This is the `adb push` command itself failing. Since git-annex has to use +that command with an adb special remote, I don't see how this could be +fixed in git-annex. + +It seems likely that the Android device is configured to allow adb to read +files in `Android/data/org.opencpn.opencpn` but not write to files there. +You might be able to change the permissions with root access. +"""]]
response
diff --git a/doc/forum/change_special_remote__39__s_config_parameter/comment_1_a309dbb7922ef3375644a26ae29110a2._comment b/doc/forum/change_special_remote__39__s_config_parameter/comment_1_a309dbb7922ef3375644a26ae29110a2._comment new file mode 100644 index 0000000000..625431f094 --- /dev/null +++ b/doc/forum/change_special_remote__39__s_config_parameter/comment_1_a309dbb7922ef3375644a26ae29110a2._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-23T14:07:38Z" + content=""" +This is the solution: + + git annex enableremote annexA rsyncurl=host:/new/path + +The configremote command changes the configuration of a remote that does +not have to be enabled for use at all, and is currently only used to change +the autoenable=true configuration. For changing other configuration the +enableremote command is the thing to use. +"""]]
Added a comment
diff --git a/doc/bugs/adb_fchown_error/comment_1_c65c4b511538e81a4ea9f4536359c6d8._comment b/doc/bugs/adb_fchown_error/comment_1_c65c4b511538e81a4ea9f4536359c6d8._comment new file mode 100644 index 0000000000..c544c03386 --- /dev/null +++ b/doc/bugs/adb_fchown_error/comment_1_c65c4b511538e81a4ea9f4536359c6d8._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="xentac" + avatar="http://cdn.libravatar.org/avatar/773b6c7b0dc34f10b66aa46d2730a5b3" + subject="comment 1" + date="2025-06-23T08:50:15Z" + content=""" +The same files were able to be copied over to my pixel 8a no problem. +"""]]
diff --git a/doc/bugs/adb_fchown_error.mdwn b/doc/bugs/adb_fchown_error.mdwn new file mode 100644 index 0000000000..6e4bc2f399 --- /dev/null +++ b/doc/bugs/adb_fchown_error.mdwn @@ -0,0 +1,45 @@ +### Please describe the problem. +When I try to use an adb special remote with exporttree and importtree set on my samsung tab active5, it's able to transfer the files but then fails trying to fchown them, then deletes the files so a sync never completes. + +### What steps will reproduce the problem? +Set up an adb remote using a command like this: `git annex initremote tablet1 type=adb androiddirectory=/sdcard encryption=none exporttree=yes importtree=yes androidserial=R52X6055Z2K`. Then only track certain directories like this: `git annex wanted tablet1 "include=Android/data/org.opencpn.opencpn/files/Charts/* or include=Audiobooks/* or present"`. Finally sync some data like this: `git annex sync --content tablet1`. + +### What version of git-annex are you using? On what operating system? +``` +git-annex version: 10.20241031-1~ndall+1 +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.22.1 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.13.1 persistent-sqlite-2.13.1.0 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +Ubuntu 24.04.2 LTS + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +Error logs look like this + +``` +export tablet1 Android/data/org.opencpn.opencpn/files/Charts/ChartLocker/South Pacific/Fiji Navionics/PLEASE_README_TheChartLocker_2024-05-27.pdf +adb: error: failed to copy '.git/annex/objects/56/XF/SHA256E-s741697--c5b4490a3cb04d0d5127c9294e7f337582a97d0f40e709b8144b76fda3732378.pdf/SHA256E-s741697--c5b4490a3cb04d0d5127c9294e7f337582a97d0f40e709b8144b76fda3732378.pdf' to '/sdcard/Android/data/org.opencpn.opencpn/files/Charts/ChartLocker/South Pacific/Fiji Navionics/PLEASE_README_TheChartLocker_2024-05-27.pdf': remote fchown() failed uid: 0 gid: 0: Operation not permitted +.git/annex/objects/56/XF/SHA256E-s741697--c5b4490a3cb04d0d...file pushed, 0 skipped. 41.7 MB/s (741697 bytes in 0.017s) + + adb failed to store file +failed +``` + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +I use git-annex to back up photos, videos, and all sorts of stuff! It's really cool!
diff --git a/doc/forum/change_special_remote__39__s_config_parameter.mdwn b/doc/forum/change_special_remote__39__s_config_parameter.mdwn new file mode 100644 index 0000000000..c810a94f25 --- /dev/null +++ b/doc/forum/change_special_remote__39__s_config_parameter.mdwn @@ -0,0 +1,49 @@ +Hey there, + +I have a couple of special remotes I use as storage. In one of them I had to +relocate the path of the directory. + +## Setup + +It was first initalized with something along the lines + +```sh +git annex initremote annexA type=rsync rsyncurl=host:/old/path encryption=none +``` + +Enabling it on other repositories worked without any additional +parameters, since all have access to same host names through ssh. + +Some time later I tweaked the `/old/path` to `/new/path`. For an existing repo +I could handle this via changing the necessary value in `.git/config`. + +## Problem + +Much later I was setting up a new repository on a new host and I did + +```sh +git annex enableremote annexA +``` + +Since I forgot the path tweak I did weeks ago, I couldn't give meaning to why +none of the files were getting found, and `fsck --from annexA` was failing. + +As soon as I remembered the path change I fixed the issue but I was curious if +I can somehow change the default url for this remote permanently for new repos. + +## Attempts + +I tried the following to no avail + +```sh +$ git annex configremote annexA rsyncurl=host:/new/path +configremote annexA +git-annex: Cannot change field "rsyncurl" with this command. Use git-annex enableremote instead. +failed +configremote: 1 failed +``` + +And I can't see the `/old/path` in `git annex vicfg` to change. + +Thanks in advance, +C.
Added a comment
diff --git a/doc/bugs/add__58___inconsistently_treats_files_in_dotdirs_as_dotfiles/comment_8_52510b89aa6a287493f2f77c07b8a682._comment b/doc/bugs/add__58___inconsistently_treats_files_in_dotdirs_as_dotfiles/comment_8_52510b89aa6a287493f2f77c07b8a682._comment new file mode 100644 index 0000000000..3f4bcde57f --- /dev/null +++ b/doc/bugs/add__58___inconsistently_treats_files_in_dotdirs_as_dotfiles/comment_8_52510b89aa6a287493f2f77c07b8a682._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 8" + date="2025-06-22T03:53:19Z" + content=""" +I did spend some time now trying to figure FTW and why git-annex (version from end of last year) says \"non-large file; adding content to git repository\" whenever `check-attr` insists that `largefile` should apply to my huge `.dandi/assets.json`. Only trying newer git-annex I think I got the reason which it finally announced as \"dotfile; adding content to git repository\" and I was able to recover this discussion! + +Re + +> But, .config/ seems to me to perfectly match what dotfiles are, which is files that are configuration that are named with a name starting with a dot in order to keep them from cluttering up ls. + +As far as I know, having leading dot just a convention for **hidden** and not config, and even not neccessarily text files. +Even though, `dotfile` files (not folders) are most of the time are text files, I would not generalize that to dot-folders: +content of `.cache/` or `.venv/` (created by `uv`) etc are unlikely to be text files to even start with. Those folder names start with dot to signal \"hidden\" not \"text\" or \"small\". + +That is why I retain that it remains confusing and inconsistent to have any special treatment and need for extra configuration (`git annex config --set annex.dotfiles true`) for content of dot-folders. I appreciate that such change would likely change behavior but IMHO it might just be \"for the best\". +"""]]
FR for gx-import
diff --git a/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn b/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn new file mode 100644 index 0000000000..b46105c19d --- /dev/null +++ b/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn @@ -0,0 +1,49 @@ +[[!meta author="Spencer"]] + +I have discovered that what is meant by `Any files that are gitignored will not be included in the import, but will be left on the remote.` in the [import doc](https://git-annex.branchable.com/git-annex-import) is `Any files that are [locally] gitignored [relative to the repo's root] will not be included in the import`. This is to say that when importing, only `.gitignore` paths from the root repo are used to exclude paths in the imported tree as if the tree were imported relative to root, regardless if a subtree is specified. This means that the repo gitignores must include ignores as desired to import the correct files from an import tree. + +This makes it challenging to import special remotes into subtrees. Ignores must be written to match the trees' roots but this might lead to clobbering of paths/names which overlap with other trees or the main repo. + +Therefore I suggest that imports to a subtree respect ignores as if the files in the tree were already adjusted to their new destination. +I suspect that annex is listing the tree, comparing the list to ignores, then importing what doesn't match. +So, instead, this would involve listing, moving the list to its subtree path, then comparing to ignores. + +A similar argument could be made for attributes in general. +I haven't done the testing on import attributes (namely `largefiles`), but I would want these to respect subtree paths as well. + +<details> +<summary>Testing Notes</summary> + +I made various .gitignore files in a fresh repo with a tree at `../tree` relative to fresh repo. +The tree had files `a`-`g`. +The ignores all began from this template: + +```gitignore +# -- Import into ROOT +# -- tree ignore +a + +# -- root ignore +b + +# -- root ignore in root +root-ignore/c + +# -- relative ignore in relative +d + +# -- root ignore in relative +root-ignore/e + +# -- relative ignore in root +f + +# -- tree ignore relative to root +root-ignore/g +``` + +Then I commented out certain lines for each location. E.g. only try ignoring `a` and `root-ignore/g` in the tree, `b`, `root-ignore/c` in root, and `f` in root. + +Regardless of import or ignore, only `b` and `f` were ignored pertaining to the root `.gitignore` matching these files in the tree, even when the tree was imported to subtree `rel-ignore` or `root-ignore`. + +</details>
diff --git a/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn b/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn index 28abcfc8f1..13c6700f10 100644 --- a/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn +++ b/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn @@ -58,3 +58,4 @@ local repository version: 10 OSX (brew) +[[!meta author="Spencer"]]
import to nonexistent path/branch gives unable to find base tree error
diff --git a/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn b/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn new file mode 100644 index 0000000000..28abcfc8f1 --- /dev/null +++ b/doc/bugs/Import_Subtree_to_New_Branch_Fails.mdwn @@ -0,0 +1,60 @@ +### Please describe the problem. + +I thought the branch in an import was arbitrary? E.g. `gx import <branch>:<subtree> -f <remote>`. + +While I could understand if it is not arbitrary if it corresponds to an existing local branch, in which case the local branch is taken as a basis for which the import should be based on, I assumed if the branch name did not have a corresponding local branch that import would just base its work on an orphan. However, this fails when importing a subtree and gives `Unable to find base tree for branch <branch>`. + + +### What steps will reproduce the problem? + +Here's an example setup in a fresh repo with no commits to master: + +``` +(base) ➜ repo git:(master) ✗ grv +tree + +(base) ➜ repo git:(master) ✗ gx info tree +uuid: df2c15bd-0d12-4508-99c0-31da0b5e00d6 +description: [tree] +trust: untrusted +remote: tree +cost: 100.0 +type: directory +available: true +directory: /Users/coesite/Documents/Temp/annex-tests/import-which-gitignore/tree +encryption: none +chunking: none +importtree: yes +remote annex keys: 6 +remote annex size: 472 bytes + +(base) ➜ repo git:(master) ✗ la +total 16 +drwxr-xr-x@ 13 coesite staff 416B Jun 20 14:28 .git +-rw-r--r-- 1 coesite staff 239B Jun 20 14:13 .gitignore +-rw-r--r-- 1 coesite staff 1.8K Jun 20 14:24 README.md +drwxr-xr-x 3 coesite staff 96B Jun 20 14:05 rel-ignore +drwxr-xr-x 2 coesite staff 64B Jun 20 14:06 root-ignore + +(base) ➜ repo git:(master) ✗ gx import two:rel-ignore -f tree +git-annex: Unable to find base tree for branch two +``` + +I would have suspected that even though path `rel-ignore` doesn't yet exist on orphan branch `two` that this would still import `tree` to be under that path. + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20250605 +build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.8.4 http-client-0.7.19 persistent-sqlite-2.13.3.1 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: darwin aarch64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +OSX (brew) +
Added a comment: A (Mildly) Compelling Reason
diff --git a/doc/todo/symlinks_to_symlinks_to_the_annex/comment_6_743a6d9f8f4061f429a844024ba1208f._comment b/doc/todo/symlinks_to_symlinks_to_the_annex/comment_6_743a6d9f8f4061f429a844024ba1208f._comment new file mode 100644 index 0000000000..e9887c02c4 --- /dev/null +++ b/doc/todo/symlinks_to_symlinks_to_the_annex/comment_6_743a6d9f8f4061f429a844024ba1208f._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="A (Mildly) Compelling Reason" + date="2025-06-19T01:34:17Z" + content=""" +This feature would alleviate one problem I have with annex in that the path stored in annex symlinks depends on the tree a file sits in. +This makes each *`git`* object of a annexed file in a different folder unique. +If annexed files ever move, we now have a fairly useless new git object introduced into the repo. +Not at all a problem for one file but if you have tens of thousands of annexed files and you refactor, you start to notice that. + +Unlocked files don't have this problem because their blobs point agnostically to the annex and key. +But, of course, unlocking large amounts of files mean content copies so that's not great. + +Symlink chains alleviate this because if I have a chain like `.root -> ./` in the root and `.root -> ../.root` in essentially every directory, then annex symlinks become agnostic too. +And on the git side, that's two new objects to add, and only a new tree object when performing a move. + +Again this is only relevant when the number of files becomes massive. +For sense of scale, let's assume a symlink payload is on the order of 100 bytes. +So 10,000 files generates roughly a Mb of git objects, meaning if I had 100,000 files and moved them around once, I'd have 20 Mb of data dedicated to locating these files w/ 10 Mb of what I would deem as waste. +Honestly, annex and git slow down appreciably at that scale for other reasons (pull/push/checkout, especially on slower file systems), so I say this is a non-issue by comparison. +For those who had similar concerns, there's your benchmark: 10Mb of bloat per 10,000 files per move! +"""]]
Added a comment: Easy Approach
diff --git a/doc/bugs/git_worktree_remove_fails/comment_3_fffd495306d1eb4093f1235c337326cd._comment b/doc/bugs/git_worktree_remove_fails/comment_3_fffd495306d1eb4093f1235c337326cd._comment new file mode 100644 index 0000000000..c012bbf112 --- /dev/null +++ b/doc/bugs/git_worktree_remove_fails/comment_3_fffd495306d1eb4093f1235c337326cd._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="Easy Approach" + date="2025-06-18T04:27:16Z" + content=""" +Here's how to remove a worktree: + +```bash +echo \"gitdir: $(readlink .git)\" > .git0; +rm .git; +mv .git0 .git; +git worktree remove .; +``` + +as done inside the worktree itself. Update paths if you want to remove the worktree from outside of it. +So long as you don't run another `git annex` command after replacing the symlink with a file, `worktree remove` should work! +"""]]
Added a comment
diff --git a/doc/bugs/git-lfs_special_insists_on_https/comment_2_fdf1d13005f8a32663643773a09cc273._comment b/doc/bugs/git-lfs_special_insists_on_https/comment_2_fdf1d13005f8a32663643773a09cc273._comment new file mode 100644 index 0000000000..3b20ba13bd --- /dev/null +++ b/doc/bugs/git-lfs_special_insists_on_https/comment_2_fdf1d13005f8a32663643773a09cc273._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec" + nickname="beryllium" + avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137" + subject="comment 2" + date="2025-06-17T10:00:16Z" + content=""" +Sorry to comment on a done bug. It's just that it seems opening a forum thread is going to lose the context. + +Can I ask, why is a git url even required? Isn't that going to require that only a self-hosted git is available anyway... because you aren't going to get the specific configuration to allow git-annex to fetch .git/config for the annex id? + +I thought the git-lfs remote was only for \"blob\" storage, and so API only. Or at least, it being integrated with a git service would have been optional, not mandatory. + +The example I gave with http and non-standard port was based on running the reference implementation https://github.com/git-lfs/lfs-test-server alone. Which works when a git project is configured just local (no remotes) and then the lfs url is set. + +"""]]
thx
diff --git a/doc/forum/Import_-_Changing_Largefiles.mdwn b/doc/forum/Import_-_Changing_Largefiles.mdwn index 88cf543cec..0dfa3769b6 100644 --- a/doc/forum/Import_-_Changing_Largefiles.mdwn +++ b/doc/forum/Import_-_Changing_Largefiles.mdwn @@ -20,3 +20,9 @@ For a better understanding, here is a MWE to reproduce this: 1. Note that all files are still considered large. Maybe there's another way of fixing this that I don't know about, but as far as I know, from this point you have to delete the special remote and redo the above now with the desired `.gitattributes` file staged for files in this external tree to be imported as small. + +[[done]] + +--- + +Conclusion: Don't just delete the imported branch, update it with a commit to force small/large the files as desired.
Added a comment: Solutions
diff --git a/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_1_50b69686e3574a407a334556303a11cb._comment b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_1_50b69686e3574a407a334556303a11cb._comment new file mode 100644 index 0000000000..aceb6d582a --- /dev/null +++ b/doc/todo/Allow_to_unlock_and_fix___40__now__41___non-checked-in_items/comment_1_50b69686e3574a407a334556303a11cb._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="Solutions" + date="2025-06-16T20:33:03Z" + content=""" +While I agree that being unable to fix symlinks can sometimes be a bit annoying your examples have straightforward solutions using existing tools: + +1. `g add -f <symlink>` - [`gx fix`] - `gx unannex`. \"Unlocks\" and `rm`'s in one go. Does still leave a copy in annex (as did your `git rm --cached`) so you still have to contend with that. +1. `diff` before moving the file. You have to type the relpath anyway to move the file so might as well just type the relpath into diff instead of mv. + +It's unfortunately fairly antithetical to modify any untracked file by `git`. This includes modifying symlink paths. Therefore the existing friction is actually helping new users figure out the proper way of doing things in a git environment IMHO. +"""]]
Added a comment: Ignoring files on directory special remote
diff --git a/doc/special_remotes/directory/comment_23_d6a4a7bd602260051eef1cc1c57bf01a._comment b/doc/special_remotes/directory/comment_23_d6a4a7bd602260051eef1cc1c57bf01a._comment new file mode 100644 index 0000000000..b885217e1d --- /dev/null +++ b/doc/special_remotes/directory/comment_23_d6a4a7bd602260051eef1cc1c57bf01a._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="ruka" + avatar="http://cdn.libravatar.org/avatar/8844137c8ca327cdd49ed692f0a30e02" + subject="Ignoring files on directory special remote" + date="2025-06-15T14:41:28Z" + content=""" +Is there a way for git-annex to completely ignore some files on a directory special remote? I'm managing files on my MP3 player's SD card using a directory special remote, but git-annex also tries to manage the player's database files, which I don't want. If I exclude them from wanted or put them in .gitignore, git-annex tries to delete them on sync or export. +"""]]
diff --git a/doc/bugs/openTempfile_invalid_argument_on_sd_card.mdwn b/doc/bugs/openTempfile_invalid_argument_on_sd_card.mdwn new file mode 100644 index 0000000000..f3c37c5327 --- /dev/null +++ b/doc/bugs/openTempfile_invalid_argument_on_sd_card.mdwn @@ -0,0 +1,45 @@ +### Please describe the problem. + +I am using a directory special remote with "exporttree=yes" and "importtree=yes" to manage my music collection on the SD card for my Tangara. Some filenames produce an "invalid argument" error when git-annex tries to export them to the card even though the filename is perfectly valid for vfat. The main commonality seems to be multiple dots in the filename, though other files with multiple dots work fine. + + +### What steps will reproduce the problem? + +1. Create a repository with a bunch of files that have multiple dots in them in different places +2. Create a directory special remote on a vfat filesystem with "exporttree=yes" and "importtree=yes" and no encryption +3. Attempt to export or sync files to the directory special remote + +### What version of git-annex are you using? On what operating system? + +10.20250605-gb9e3cf8780a04c8b1ac0cf4768c9ec510483477c +Linux Mint + +### Please provide any additional information below. + +[[!format sh """ +$ git annex sync --content +commit +On branch main +nothing to commit, working tree clean +ok +list tangara ok +update refs/remotes/tangara/main ok +unexport tangara Music/Cloudpunk/City of Ghosts/07. Home is Now.mp3 ok +... +unexport tangara .git-annex-tmp-content-SHA256E-s7284686--102594598eea9c5e7fd96ef20e9d5fd0485244716a1b5e95a528ca887a81ae59.mp3 ok +... +export tangara Music/Cloudpunk/City of Ghosts/01 - Bandit Queens.mp3 ok +... +export tangara Music/KIRA/KIRA - The Introduction (Deluxe Edition)/KIRA - The Introduction (Deluxe Edition) - 05 Games (feat. Ruby & Gumi).ogg + /media/ciara/F0F5-1E76/Music/KIRA/KIRA - The Introduction (Deluxe Edition)/: openTempFile template KIRA - The Introduction (Deluxe Edition) - 05 Games (feat. : invalid argument (Invalid argument) +failed +export tangara Music/KIRA/KIRA ft. GUMI - Burn Me Down.ogg + /media/ciara/F0F5-1E76/Music/KIRA/: openTempFile template KIRA ft. GUMI - : invalid argument (Invalid argument) +failed +... +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +I use it quite successfully to archive media on removable spinning hard drives.
response
diff --git a/doc/forum/special_remote_protocol__58___How_to_identify_exactsize/comment_1_bbc07fed3ef028d5551932853dba3f45._comment b/doc/forum/special_remote_protocol__58___How_to_identify_exactsize/comment_1_bbc07fed3ef028d5551932853dba3f45._comment new file mode 100644 index 0000000000..087d6c6b85 --- /dev/null +++ b/doc/forum/special_remote_protocol__58___How_to_identify_exactsize/comment_1_bbc07fed3ef028d5551932853dba3f45._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-13T12:35:50Z" + content=""" +`git-annex examinekey --format='${bytesize}\n'` + +Or `git-annex examinekey --json` and use the `bytesize` field. + +(You will probably want to use `--batch` to keep a single examinekey +process running, for speed.) + +Note that not all keys have a known size. Usually keys without a known size +were added with eg `git-anex addurl --fast`. Encrypted keys also won't have +a size field. + +Also, when chunking is used with a special remote (without +encryption), each chunk is a key, with its size field set to the total size +of the original key. In that case there is a separate chunk size field, +although the last chunk may be smaller than its chunk size field. +If it would be useful, examinekey could have something added to it to +indicate when a key is a chunk key, and show the chunk size. +"""]]
response
diff --git a/doc/forum/Import_-_Changing_Largefiles/comment_1_1d560f2c337e5f067f42a3088e686467._comment b/doc/forum/Import_-_Changing_Largefiles/comment_1_1d560f2c337e5f067f42a3088e686467._comment new file mode 100644 index 0000000000..85cf6c1417 --- /dev/null +++ b/doc/forum/Import_-_Changing_Largefiles/comment_1_1d560f2c337e5f067f42a3088e686467._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-13T12:27:35Z" + content=""" +git-annex has to maintain a considerable amount of state about the content +of a special remote in order to efficiently import trees from it, and this +caching is what is preventing the new configuration of annex.largefiles +from being used. + +In particular, git-annex knows the content identifier associated with the +file you imported before. And the key associated with that content +identifier is present in the repository. So it uses the existing content +rather than download it again. + +While it would be possible to either remove enough information from the +git-annex branch to defeat that, or modify git-annex to have a mode where +it redoes expensive work, it seems to me to be easier to just treat this as +a case of an annexed file that you want to change to be stored in git +instead. Since that is a general problem, with a general solution. See +[[tips/largefiles]], "converting annexed to git". +"""]]
revert man page changes
Revert "Linked to discussion on caveat"
This reverts commit 9fe60062a38228594ce8d48bbe1b14532934f22d.
We don't link from man pages to forum discussions. If there is a
problem, it should be fixed, and if there is a wart it should be
documented on the man page in enough detail to understand on its own.
In this case, I don't know that there is any problem at all.
Revert "Linked to discussion on caveat"
This reverts commit 9fe60062a38228594ce8d48bbe1b14532934f22d.
We don't link from man pages to forum discussions. If there is a
problem, it should be fixed, and if there is a wart it should be
documented on the man page in enough detail to understand on its own.
In this case, I don't know that there is any problem at all.
diff --git a/doc/git-annex-import.mdwn b/doc/git-annex-import.mdwn index 71b7dedbb4..e78fa0ac14 100644 --- a/doc/git-annex-import.mdwn +++ b/doc/git-annex-import.mdwn @@ -229,12 +229,10 @@ link, and that symbolic link will be followed. Note that using `--deduplicate` or `--clean-duplicates` with the WORM backend does not look at file content, but filename and mtime. -If `annex.largefiles` is configured (in the current repo's `.gitattributes` file), -and does not match a file, `git annex import` will add the non-large file directly to the git repository, +If annex.largefiles is configured, and does not match a file, `git annex +import` will add the non-large file directly to the git repository, instead of to the annex. -[[Caveat Discussion: Adjusting Largefiles Specification|forum/Import_-_Changing_Largefiles]] - # SEE ALSO [[git-annex]](1)
Added a comment: OK I may have overcomplicated things
diff --git a/doc/forum/Move_part_of_one_repository_into_other/comment_2_6845c2bec20af80f7457386f403e3bb0._comment b/doc/forum/Move_part_of_one_repository_into_other/comment_2_6845c2bec20af80f7457386f403e3bb0._comment new file mode 100644 index 0000000000..5cbe60a430 --- /dev/null +++ b/doc/forum/Move_part_of_one_repository_into_other/comment_2_6845c2bec20af80f7457386f403e3bb0._comment @@ -0,0 +1,29 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="OK I may have overcomplicated things" + date="2025-06-11T21:47:32Z" + content=""" +Turns out, the answer is simple: + +1. `git rm --cached \"B\"` +1. (in `B`): + 1. `git add` + 1. `git remote add tmp.parent <relpath/from/B/root/to/A/root>` + 1. `git annex get` + 1. `git remote remove tmp.parent` + +***if you need just the files moved around*** + +I haven't used metadata so I can't comment on how to move that around but you might have to rely on something akin to my first comment. +In my brief testing, because metadata is stored in the `git-annex` branch on a per-key level, it does in fact require merging of the git-annex branch somehow to transfer. + +In short: `git-annex` can get file content in both an *informed* and *uninformed* way. +If `git-annex` knows about content in a repo because of historic moves/copies-to or merging of `git-annex` branches, +it has *informed* knowledge of what's in certain remotes. +If it does not, then it can still do an *uninformed* query for potential file content. +In this way, e.g. `git annex info` and `git annex list` may show file content as not in a particular remote, +but a `git annex get` or `git annex move` *may actually still work*. + + +"""]]
Added a comment: wait, something's up with the suffix!
diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_2_1f276654e46182acc1f55f65c4b95dc1._comment b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_2_1f276654e46182acc1f55f65c4b95dc1._comment new file mode 100644 index 0000000000..cbf370e544 --- /dev/null +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_2_1f276654e46182acc1f55f65c4b95dc1._comment @@ -0,0 +1,85 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="wait, something's up with the suffix!" + date="2025-06-11T11:25:03Z" + content=""" +Hm, Just the non-ascii characters can't be the problem. Only with the suff `.pdf` or `.pdff` it fails. Without the suffix or just `.pd` or longer extensions `.pdfff` it works 🤪 + +[[!format bash \"\"\" +yann in yann-desktop-nixos in …/nonascii on main as 🧙 +🐟 ❮ touch 🌸.txt 🎶.txt ★.txt α.txt β.txt δ.txt 乙.txt 山.txt 川.txt 空.txt 愛.txt 心.txt 学.txt 数.txt 詩.txt 韓.txt 北.txt 南.txt 墨.txt 漬.txt 墨漬.txt \"墨漬 \" \"墨漬 Ink\" \"墨漬 Ink Stains\" \"墨漬 Ink Stains.pdf\" \"墨漬 Ink Stains.\" \"墨漬 Ink Stains.p\" \"墨漬 Ink Stains.pd\" \"墨漬 Ink Stains.pdf\" \"墨漬 Ink Stains.pdff\" \"墨漬 Ink Stains.pdfff\" \"墨漬 Ink Stains.pdffff\" +yann in yann-desktop-nixos in …/nonascii on main [?] as 🧙 +🐟 ❯ git annex add --jobs 1 +add α.txt +ok +add β.txt +ok +add δ.txt +ok +add ★.txt +ok +add 乙.txt +ok +add 北.txt +ok +add 南.txt +ok +add 墨.txt +ok +add 墨漬 +ok +add 墨漬 Ink +ok +add 墨漬 Ink Stains +ok +add 墨漬 Ink Stains. +ok +add 墨漬 Ink Stains.p +ok +add 墨漬 Ink Stains.pd +ok +add 墨漬 Ink Stains.pdf + +git-annex: createSymbolicLink '.git/annex/objects/7x/w0/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdf/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdf' to '.git/annex/othertmp/.0': already exists (File exists) +failed +add 墨漬 Ink Stains.pdff + +git-annex: createSymbolicLink '.git/annex/objects/7p/22/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdff/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdff' to '.git/annex/othertmp/.0': already exists (File exists) +failed +add 墨漬 Ink Stains.pdfff +ok +add 墨漬 Ink Stains.pdffff +ok +add 墨漬.txt +ok +add 学.txt +ok +add 山.txt +ok +add 川.txt +ok +add 心.txt +ok +add 愛.txt +ok +add 数.txt +ok +add 漬.txt +ok +add 空.txt +ok +add 詩.txt +ok +add 韓.txt +ok +add 🌸.txt +ok +add 🎶.txt +ok +(recording state in git...) +add: 2 failed +yann in yann-desktop-nixos in …/nonascii on main [+?] as 🧙 +❌1 🐟 ❯ +\"\"\"]] +"""]]
Added a comment: confirm, but not all non-ascii characters are a problem
diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_1_f33a3cce42b31952562ff2688b8bae8f._comment b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_1_f33a3cce42b31952562ff2688b8bae8f._comment new file mode 100644 index 0000000000..bb09292e9e --- /dev/null +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_1_f33a3cce42b31952562ff2688b8bae8f._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="confirm, but not all non-ascii characters are a problem" + date="2025-06-11T11:11:19Z" + content=""" +I can confirm this behaviour. But it's not precisely \"non-ascii characters\" that cause this, emojis and greek letters for example are no problem. + +[[!format bash \"\"\" +🐟 ❯ touch \"墨漬 Ink Stains.pdf\" +🐟 ❯ touch 📝.txt +🐟 ❯ touch σ.txt +🐟 ❯ git annex add +add 📝.txt ok +add 墨漬 Ink Stains.pdf +git-annex: createSymbolicLink '.git/annex/objects/7x/w0/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdf/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.pdf' to '.git/annex/othertmp/.0': already exists (File exists) +failed +add σ.txt ok +(recording state in git...) +add: 1 failed +\"\"\"]] +"""]]
diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn new file mode 100644 index 0000000000..50e3f1ba51 --- /dev/null +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn @@ -0,0 +1,52 @@ +### Please describe the problem. + +In a large import, three files (all with non-ascii names) gave the following error: `git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists)` + +I've tried to extract the relevant part of a `strace -f`: + +``` +mkdir(".git/annex/othertmp", 0777) = -1 EEXIST (File exists) +newfstatat(AT_FDCWD, ".git/annex/othertmp", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0 +mkdir(".git/annex/othertmp", 0777) = -1 EEXIST (File exists) +newfstatat(AT_FDCWD, ".git/annex/othertmp", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0 +mkdir(".git/annex/othertmp/.0", 0777) = 0 +unlink(".git/annex/othertmp/.0") = -1 EISDIR (Is a directory) +symlink("../../../../../.git/annex/objects/9w/wJ/SHA256E-s5426861--cdc0664822c9df3ffbf255d160870fc39a6fdd1168b02fc2c9b59cc65bc81c26.pdf/SHA256E-s5426861 +--cdc0664822c9df3ffbf255d160870fc39a6fdd1168b02fc2c9b59cc65bc81c26.pdf", ".git/annex/othertmp/.0") = -1 EEXIST (File exists) +newfstatat(AT_FDCWD, ".git/annex/othertmp/.0", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0 +newfstatat(AT_FDCWD, ".git/annex/othertmp/.0", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 +openat(AT_FDCWD, ".git/annex/othertmp/.0", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 31 +fstat(31, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 +getdents64(31, 0x7f2694001180 /* 2 entries */, 32768) = 48 +getdents64(31, 0x7f2694001180 /* 0 entries */, 32768) = 0 +close(31) = 0 +rmdir(".git/annex/othertmp/.0") = 0 +close(26) = 0 +``` + +### What steps will reproduce the problem? + +``` +touch "墨漬 Ink Stains.pdf" +git annex add "墨漬 Ink Stains.pdf" +``` + +(the file name base64 encoded is `5aKo5rysIEluayBTdGFpbnMucGRm`) + + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20250605-gd2dc318a867f571cbc848b5d45e82e153e364e4e +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24 bloomfilter-2.0.1.2 crypton-1.0.0 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.8 http-client-0.7.18 persistent-sqlite-2.13.1.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +``` + +I'm running Arch Linux (kernel 6.15.1-arch1-2). The repo I'm running the commands in is on an ext4 filesystem. + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +git-annex has been brilliant for managing my large media collection across several removable drives, and I'm confident it will continue to scale. This is the first issue I've run into with it. + +
Added a comment
diff --git a/doc/special_remotes/compute/comment_7_8fa94e05ff0e67ae87a13132bfb40b61._comment b/doc/special_remotes/compute/comment_7_8fa94e05ff0e67ae87a13132bfb40b61._comment new file mode 100644 index 0000000000..08854589fc --- /dev/null +++ b/doc/special_remotes/compute/comment_7_8fa94e05ff0e67ae87a13132bfb40b61._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec" + nickname="beryllium" + avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137" + subject="comment 7" + date="2025-06-10T12:01:14Z" + content=""" +I've realised that... I'm overlooking that the input filename itself is metadata. I have a methodology that I like now. + +As per: `git-annex addcomputed --to=imageconvert foo.jpeg foo.gif`, where foo. is linking metadata, I can just generate a filename (and as I've learnt, path), that links back to the source by retaining it. + +I also see now that there is no need to avoid duplication of pointer files to the same computed file by key. + +The uncomplicated existing approach is more than sufficient. + +"""]]
diff --git a/doc/users/Spencer.mdwn b/doc/users/Spencer.mdwn index b521ab0894..382ce88aa9 100644 --- a/doc/users/Spencer.mdwn +++ b/doc/users/Spencer.mdwn @@ -1,4 +1,4 @@ ---- +[[!meta author="Spencer"]] ## Contributions
diff --git a/doc/forum/Import_-_Changing_Largefiles.mdwn b/doc/forum/Import_-_Changing_Largefiles.mdwn index a49fe3d970..88cf543cec 100644 --- a/doc/forum/Import_-_Changing_Largefiles.mdwn +++ b/doc/forum/Import_-_Changing_Largefiles.mdwn @@ -1,3 +1,5 @@ +[[!meta author="Spencer"]] + # Changing Largefile Specification for Imported Trees If you want files to be large/small *after* already importing a tree from an `importtree` enabled remote, well, it appears you can't.
Added a comment
diff --git a/doc/special_remotes/compute/comment_6_0785f4683f0e7f9848aced2357ca1ec0._comment b/doc/special_remotes/compute/comment_6_0785f4683f0e7f9848aced2357ca1ec0._comment new file mode 100644 index 0000000000..8606110243 --- /dev/null +++ b/doc/special_remotes/compute/comment_6_0785f4683f0e7f9848aced2357ca1ec0._comment @@ -0,0 +1,58 @@ +[[!comment format=mdwn + username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec" + nickname="beryllium" + avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137" + subject="comment 6" + date="2025-06-09T13:09:25Z" + content=""" +I'm getting acquainted with this special remote. I cannot praise it enough. It is brilliant. + +This is my first cut git-annex-compute-stripexif: + +[[!format bash \"\"\" +#!/bin/bash + +set -e + +if [ -z \"$1\" ]; then + echo \"Specify the input image file, followed by the output image file.\" >&2 + echo \"Example: foo.jpg foo.gif\" >&2 + exit 1 +fi + +echo REPRODUCIBLE +echo \"INPUT $1\" +read input + +if [ -n \"$input\" ]; then + tf=$(mktemp) + cp \"$input\" \"$tf\" >&2 + exiftool -overwrite_original -ALL= \"$tf\" >&2 + outfile=\"SANSEXIF-\"$(git-annex calckey \"$tf\") +fi +echo \"OUTPUT $outfile\" +read output + +cp -v \"$tf\" \"$outfile\" >&2 +rm -v \"$tf\" >&2 +\"\"\"]] + +Along the way, I've learnt that EXIF metadata isn't the only metadata stored in a jpeg, so the name is now a bit of a misnomer. Also, as it was more proof-of-concept, the target name and location is not well thought out, and there's no preservation of file extension. It's indicative for now. + +The aim is to aid (only) in the identifying two copies of the same jpeg, where only the metadata has been changed (eg. either by adjustments I made by script eons ago, or by apps like Microsoft photoviewer where orientation changes were made via metadata). I say aid only, because it's not going to help if the image is resized, etc. and I understand that. + +To that end, I do have some questions. The first is... is it wise (or possible) to try to set metadata on the source files whilst in the script? (since writing this, I have come to understand that the compute script is not run within the working directory, and the implication is that you're not meant to run any git-annex commands) + +Obviously, the idea would be to tag the source file with the computed key. I have already verified that if two copies of a jpeg that differ only by metadata, the computed file and key will be the same. + +But what I found is, if I don't have that option to set metadata, then respectfully, git-annex-findcomputed may have some deficiencies. + +From what I can gather, git-annex-findcomputed will not list the subsequent input file that when added, computes it. Only the first one. + +So trying to post process the computed files to perform the setting of metadata on the source files would likely not work. + +Also, I was curious about what happens if the input file moves within the archive? I haven't tried... but from what I can see, you wouldn't be able to backtrack from the computed file, because you won't know the key of the input file, in turn to go searching for it (eg. git-annex-whereused). + +Is my use case way off base as to why you should use the compute remote? + +"""]]
For one: why would preview show a nonexistent page as an existent link instead of a question mark? For two: why is []() syntax relative to current page but [[|]] syntax is relative to root?
diff --git a/doc/git-annex-import.mdwn b/doc/git-annex-import.mdwn index a0ecc9fa08..71b7dedbb4 100644 --- a/doc/git-annex-import.mdwn +++ b/doc/git-annex-import.mdwn @@ -233,7 +233,7 @@ If `annex.largefiles` is configured (in the current repo's `.gitattributes` file and does not match a file, `git annex import` will add the non-large file directly to the git repository, instead of to the annex. -[Caveat Discussion: Adjusting Largefiles Specification](forum/Import_-_Changing_Largefiles) +[[Caveat Discussion: Adjusting Largefiles Specification|forum/Import_-_Changing_Largefiles]] # SEE ALSO
Linked to discussion on caveat
diff --git a/doc/git-annex-import.mdwn b/doc/git-annex-import.mdwn index e78fa0ac14..a0ecc9fa08 100644 --- a/doc/git-annex-import.mdwn +++ b/doc/git-annex-import.mdwn @@ -229,10 +229,12 @@ link, and that symbolic link will be followed. Note that using `--deduplicate` or `--clean-duplicates` with the WORM backend does not look at file content, but filename and mtime. -If annex.largefiles is configured, and does not match a file, `git annex -import` will add the non-large file directly to the git repository, +If `annex.largefiles` is configured (in the current repo's `.gitattributes` file), +and does not match a file, `git annex import` will add the non-large file directly to the git repository, instead of to the annex. +[Caveat Discussion: Adjusting Largefiles Specification](forum/Import_-_Changing_Largefiles) + # SEE ALSO [[git-annex]](1)
diff --git a/doc/forum/Import_-_Changing_Largefiles.mdwn b/doc/forum/Import_-_Changing_Largefiles.mdwn new file mode 100644 index 0000000000..a49fe3d970 --- /dev/null +++ b/doc/forum/Import_-_Changing_Largefiles.mdwn @@ -0,0 +1,20 @@ +# Changing Largefile Specification for Imported Trees + +If you want files to be large/small *after* already importing a tree from an `importtree` enabled remote, well, it appears you can't. + +I tried removing the imported branch via `git branch -d --remote <tree>/<branch>`. +While this produces a new clean import commit upon running `import` again, it does *not* respect changes to `.gitattributes`. +Instead, `git-annex` seems to hold onto information about which files were large/small in a given special remote. +So, the only way to change what are considered large files and small files is to create a new special remote entirely :/ + +For most people, this should not be too problematic since the history of imported trees isn't too important, but for some diffs on an external tree may be valuable. +Is there any interest in addressing this issue? +For a better understanding, here is a MWE to reproduce this: + +1. Create an `importtree` enabled special remote for a fresh repo without a `.gitattributes` file (or at least one without `annex.largefiles` attributes) +1. Import (e.g. `gx import -f tree main`) from this tree and note that all files are considered large (e.g. `git log --raw tree/main` -> `git show <hash>`) +1. Modify/create a local `.gitattributes` file (and add it to the index) that would specify one of the tree files as small (i.e. `annex.largefiles` does *not* match) +1. Attempt new import, or do `git branch -d --remote tree/main` and perform new import. +1. Note that all files are still considered large. + +Maybe there's another way of fixing this that I don't know about, but as far as I know, from this point you have to delete the special remote and redo the above now with the desired `.gitattributes` file staged for files in this external tree to be imported as small.
Added a comment: Now the current branch is pushed first! 🥳
diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_5_3f06a7f454747ec9359dd4c45b12a563._comment b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_5_3f06a7f454747ec9359dd4c45b12a563._comment new file mode 100644 index 0000000000..4910adf894 --- /dev/null +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_5_3f06a7f454747ec9359dd4c45b12a563._comment @@ -0,0 +1,57 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Now the current branch is pushed first! 🥳" + date="2025-06-07T09:39:27Z" + content=""" +Thank you very much joey, I can confirm that the current branch is now pushed first and thus used as the default branch of the newly created repo: + +## New version + +[[!format bash \"\"\" +$ git annex version --raw +10.20250605-gb9e3cf8780a04c8b1ac0cf4768c9ec510483477c$ +$ git init repo +Initialized empty Git repository in /home/yann/Downloads/git-annex.linux/repo/.git/ +$ cd repo +$ git annex init +init ok +(recording state in git...) +$ git remote add homelab ssh://.../yann/testrepo +$ touch bla +$ git annex assist +add bla ok +(recording state in git...) +commit (recording state in git...) +ok +pull homelab ok +push homelab ok +$ git remote show homelab | grep HEAD + HEAD branch: main ✅✅✅✅✅✅✅✅✅✅✅✅✅ +\"\"\"]] + +## Old version + +[[!format bash \"\"\" +🐟 ❯ git annex version --raw +10.20250521-g1a9e6bf26b56c39429d4a096bf733e57e5684e1b +🐟 ❯ git init repo2 +Leeres Git-Repository in /home/yann/Downloads/repo2/.git/ initialisiert +🐟 ❯ cd repo2/ +🐟 ❯ git annex init +init ok +(recording state in git...) +🐟 ❯ git remote add homelab ssh://.../yann/testrepo2 +🐟 ❯ touch bla +🐟 ❯ git annex assist +add bla ok +(recording state in git...) +commit (recording state in git...) +ok +pull homelab ok +push homelab ok +🐟 ❯ LC_ALL=C.UTF-8 git remote show homelab | grep HEAD + HEAD branch: synced/main ⚠️⚠️⚠️⚠️⚠️ +\"\"\"]] + +"""]]
Special remote protocol: How to identify exact size of a particular key?
diff --git a/doc/forum/special_remote_protocol__58___How_to_identify_exactsize.mdwn b/doc/forum/special_remote_protocol__58___How_to_identify_exactsize.mdwn new file mode 100644 index 0000000000..472d69d1f9 --- /dev/null +++ b/doc/forum/special_remote_protocol__58___How_to_identify_exactsize.mdwn @@ -0,0 +1,4 @@ +I'm trying to write a special remote protocol in which it would be really helpful to have the exact size for a particular key. I was thinking of something like the special remote asking git-annex `GETKEYINFO <key-id>` and git annex responds with some useful info (Something like a dictionary of useful values maybe?) + +I considered doing something like `git annex info ..` to figure this out but realized it's a bad idea(That'll be very brittle, plus it won't work well with chunked/encrypted remotes at all). Does git annex typically have this info available? It would even be helpful if it only gives responses in specific cases (eg: no encryption since it'll presumably be hard to keep track of that case) +
add news item for git-annex 10.20250605
diff --git a/doc/news/version_10.20250115.mdwn b/doc/news/version_10.20250115.mdwn deleted file mode 100644 index c6b56c47d6..0000000000 --- a/doc/news/version_10.20250115.mdwn +++ /dev/null @@ -1,26 +0,0 @@ -git-annex 10.20250115 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Improve handing of ssh connection problems during - remote annex.uuid discovery. - * log: Support --key, as well as --branch and --unused. - * Avoid verification error when addurl --verifiable is used - with an url claimed by a special remote other than the web. - * Fix installation on Android. - * Allow enableremote of an existing webdav special remote that has - read-only access. - * git-remote-annex: Use enableremote rather than initremote. - * Windows: Fix permission denied error when dropping files that - have the readonly attribute set. - * Added freezecontent-annex and thawcontent-annex hooks that - correspond to the git configs annex.freezecontent and - annex.thawcontent. - * Added secure-erase-annex hook that corresponds to the git config - annex.secure-erase-command. - * Added commitmessage-annex hook that corresponds to the git config - annex.commitmessage-command. - * Added http-headers-annex hook that corresponds to the git config - annex.http-headers-command. - * Added git configs annex.post-update-command and annex.pre-commit-command - that correspond to the post-update-annex and pre-commit-annex hooks. - * Added annex.pre-init-command git config and pre-init-annex hook - that is run before git-annex repository initialization. - * Linux standalone builds' bundled rsync updated to fix security holes."""]] \ No newline at end of file diff --git a/doc/news/version_10.20250605.mdwn b/doc/news/version_10.20250605.mdwn new file mode 100644 index 0000000000..5a9016e9f5 --- /dev/null +++ b/doc/news/version_10.20250605.mdwn @@ -0,0 +1,19 @@ +git-annex 10.20250605 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * sync: Push the current branch first, rather than a synced branch, + to better support git forges (gitlab, gitea, forgejo, etc.) which + use push-to-create with the first pushed branch becoming the default + branch. + * Added annex.fastcopy and remote.name.annex-fastcopy config setting. + When set, this allows the copy\_file\_range syscall to be used, which + can eg allow for server-side copies on NFS. (For fastest copying, + also disable annex.verify or remote.name.annex-verify.) + * map: Support --json option. + * map: Improve display of remote names. + * When annex.freezecontent-command or annex.thawcontent-command is + configured but fails, prevent initialization. This allows the user to + fix their configuration and avoid crippled filesystem detection + entering an adjusted branch. + * assistant: Avoid hanging at startup when a process has a *.lock file + open in the .git directory. + * Windows: Fix duplicate file bug that could occur when files were + supposed to be moved across devices."""]] \ No newline at end of file
initial report on "fatal: empty filename in tree entry"
diff --git a/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__.mdwn b/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__.mdwn new file mode 100644 index 0000000000..f0ffec47fb --- /dev/null +++ b/doc/bugs/import__58_____34__fatal__58___empty_filename_in_tree_entry__34__.mdwn @@ -0,0 +1,46 @@ +### Please describe the problem. + +`import` manages to import something empty (wild idea, didnt check -- might be a aws s3 key ending with `/` and creating empty named file or folder?) which lead to + +``` +$ git annex initremote s3-origin type=S3 importtree=yes encryption=none autoenable=true bucket=aind-benchmark-data fileprefix=mesoscale-anatomy-cell-detection/ public=yes signature=v4 storageclass=STANDARD port=443 signature=anonymous +... + +$ git annex import --from s3-origin master +... +update refs/remotes/s3-origin/master fatal: empty filename in tree entry +ok +(recording state in git...) + +$ git merge --allow-unrelated-histories s3-origin/master +fatal: empty filename in tree entry + +``` + +watchout if to reproduce -- it is about 12GB + +### What steps will reproduce the problem? + + +### What version of git-annex are you using? On what operating system? + +``` +(venv-annex) dandi@drogon:/mnt/backup/dandi/aind-benchmark-data/mesoscale-anatomy-cell-detection$ git annex version +git-annex version: 10.20250521-gafbe7e15b0f44ffa4c597dffc73b7cbdc0d06820 +build flags: Assistant Webapp Pairing Inotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.1 http-client-0.7.19 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +the version from pypi @mih started to build recently + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + + +[[!meta author=yoh]] +[[!tag projects/dandi]]
Added a comment
diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_5_4acd238cbe4eee4edf1311172f24555a._comment b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_5_4acd238cbe4eee4edf1311172f24555a._comment new file mode 100644 index 0000000000..960252d8d7 --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_5_4acd238cbe4eee4edf1311172f24555a._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="datawraith" + avatar="http://cdn.libravatar.org/avatar/e36c82a2b6f3150ad14a24eb7eb85826" + subject="comment 5" + date="2025-06-04T19:26:29Z" + content=""" +> What are the versions of git-annex in the VM where it worked vs where it didn't? + +The version on the VM is the same one I reported in the initial post: 10.20250520, installed via Homebrew. git-annex wasn't originally installed on that VM, so I installed it at that version to test it. + +When everything worked at first, I updated the VM to the Bluefin version I was running on my laptop, thinking that might be the problem, and then had the strange results I reported above. + +Since the git-annex installation itself had not changed between when things worked and when they stopped, I started to suspect something like the kernel bug I mentioned (because the Kernel *had* changed). + +I'm now also having trouble reproducing the problem in the VM at all. The files that were failing before are now added without problems again, as are newly created files -- though I had had to shut down and later restart the VM. I wish I had thought of making a full snapshot when I started experimenting, but I didn't. :-/ + +The only machine that exhibits the problem consistently now is my laptop. + +> And, if you can possibly download and unpack the linuxstandalone tarball, and use that to run git-annex in the bad VM, that would be a useful check that the problem does not somehow involve the homebrew build. https://git-annex.branchable.com/install/Linux_standalone/ + +With the standalone tarball (`10.20250521-g1a9e6bf26b56c39429d4a096bf733e57e5684e1b`, using the `./runshell`) `addcomputed` works as expected on my laptop -- Unicode characters are shown with the backslash escape, whereas the Homebrew build alone fails by stripping the unicode characters. + +Hm. + +Running the `git-annex` executable from `/home/linuxbrew/.linuxbrew/bin/` inside of the runshell works as well -- it doesn't strip the characters. That might mean that it is not the Homebrew build that is broken, but that something about my environment is simply screwed up. + +"""]]
tag as INM7 because it involves git-annex integration with forgejo
diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn index d1b7dd8e3d..e179f2e768 100644 --- a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn @@ -14,3 +14,5 @@ However, `git annex assist|sync|push` seem to push `git-annex`, `synced/git-anne Of course the solution is to just `git push` manually before `git annex assist`. But `git annex assist` is already such a brilliant command that does it all, and telling people to just run that to "do the git stuff" is very comfortable and easily accepted. Could the current branch be pushed first? Or is there a reason for pushing all the meta-branches first? > [[fixed|done]] --[[Joey]] + +[[!tag projects/INM7]]
sync: push current branch first
sync: Push the current branch first, rather than a synced branch, to better
support git forges (gitlab, gitea, forgejo, etc.) which use push-to-create
with the first pushed branch becoming the default branch.
With considerable complication to filter out warning message about
receive.denyCurrentBranch when pushing to a non-bare repository. Localization
may break it in the future, but it seems like the best way to handle this. See
my comments for the gory details.
sync: Push the current branch first, rather than a synced branch, to better
support git forges (gitlab, gitea, forgejo, etc.) which use push-to-create
with the first pushed branch becoming the default branch.
With considerable complication to filter out warning message about
receive.denyCurrentBranch when pushing to a non-bare repository. Localization
may break it in the future, but it seems like the best way to handle this. See
my comments for the gory details.
diff --git a/CHANGELOG b/CHANGELOG index 55ca8b37ef..d9675a47de 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -14,6 +14,10 @@ git-annex (10.20250521) UNRELEASED; urgency=medium When set, this allows the copy_file_range syscall to be used, which can eg allow for server-side copies on NFS. (For fastest copying, also disable annex.verify or remote.name.annex-verify.) + * sync: Push the current branch first, rather than a synced branch, + to better support git forges (gitlab, gitea, forgejo, etc.) which + use push-to-create with the first pushed branch becoming the default + branch. -- Joey Hess <id@joeyh.name> Thu, 22 May 2025 12:43:38 -0400 diff --git a/Command/Sync.hs b/Command/Sync.hs index 2892768b73..02326a390e 100644 --- a/Command/Sync.hs +++ b/Command/Sync.hs @@ -83,7 +83,6 @@ import Types.Availability import qualified Database.Export as Export import Utility.Bloom import Utility.OptParse -import Utility.Process.Transcript import Utility.Tuple import Utility.Matcher @@ -706,20 +705,13 @@ pushRemote o remote (Just branch, _) = do - Git offers no way to tell if a remote is bare or not, so both methods - are tried. - - - The direct push is likely to spew an ugly error message, so its stderr is - - often elided. Since git progress display goes to stderr too, the - - sync push is done first, and actually sends the data. Then the - - direct push is tried, with stderr discarded, to update the branch ref - - on the remote. + - The direct push is done first, because some hosting providers like + - github may treat the first branch pushed to a new repository as the + - default branch for that repository. - - The sync push first sends the synced/master branch, - and then forces the update of the remote synced/git-annex branch. - - - Since some providers like github may treat the first branch sent - - as the default branch, it's better to make that be synced/master than - - synced/git-annex. (Although neither is ideal, it's the best that - - can be managed given the constraints on order.) - - - The forcing is necessary if a transition has rewritten the git-annex branch. - Normally any changes to the git-annex branch get pulled and merged before - this push, so this forcing is unlikely to overwrite new data pushed @@ -728,34 +720,59 @@ pushRemote o remote (Just branch, _) = do - But overwriting of data on synced/git-annex can happen, in a race. - The only difference caused by using a forced push in that case is that - the last repository to push wins the race, rather than the first to push. + - + - The git-annex branch is pushed last. This push may fail if the remote + - has other changes in the git-annex branch, and that is not treated as an + - error, since the synced/git-annex branch has been sent already. Since no + - new data is usually sent in this push (due to synced/git-annex already + - having been pushed), it's ok to hide git's output to avoid displaying + - a push error. -} pushBranch :: Remote -> Maybe Git.Branch -> MessageState -> Git.Repo -> IO Bool -pushBranch remote mbranch ms g = directpush `after` annexpush `after` syncpush +pushBranch remote mbranch ms g = do + directpush + annexpush `after` syncpush where - syncpush = flip Git.Command.runBool g $ pushparams $ catMaybes - [ (refspec . origBranch) <$> mbranch - , Just $ Git.Branch.forcePush $ refspec Annex.Branch.name - ] - annexpush = void $ tryIO $ flip Git.Command.runQuiet g $ pushparams - [ Git.fromRef $ Git.Ref.base $ Annex.Branch.name ] directpush = case mbranch of - Nothing -> noop - -- Git prints out an error message when this fails. - -- In the default configuration of receive.denyCurrentBranch, - -- the error message mentions that config setting - -- (and should even if it is localized), and is quite long, - -- and the user was not intending to update the checked out - -- branch, so in that case, avoid displaying the error - -- message. Do display other error messages though, - -- including the error displayed when - -- receive.denyCurrentBranch=updateInstead -- the user - -- will want to see that one. Just branch -> do let p = flip Git.Command.gitCreateProcess g $ pushparams [ Git.fromRef $ Git.Ref.base $ origBranch branch ] - (transcript, ok) <- processTranscript' p Nothing - when (not ok && not ("denyCurrentBranch" `isInfixOf` transcript)) $ - hPutStr stderr transcript + let p' = p { std_err = CreatePipe } + bracket (createProcess p') cleanupProcess $ \h -> do + filterstderr [] (stderrHandle h) (processHandle h) + void $ waitForProcess (processHandle h) + Nothing -> noop + + syncpush = flip Git.Command.runBool g $ pushparams $ catMaybes + [ (syncrefspec . origBranch) <$> mbranch + , Just $ Git.Branch.forcePush $ syncrefspec Annex.Branch.name + ] + + annexpush = void $ tryIO $ flip Git.Command.runQuiet g $ pushparams + [ Git.fromRef $ Git.Ref.base $ Annex.Branch.name ] + + -- In the default configuration of receive.denyCurrentBranch, + -- git's stderr message mentions that config setting + -- (and should even if it is localized), and is quite long, + -- and the user was not intending to update the checked out + -- branch, so in that case, avoid displaying the error + -- message. Do display other error messages though, + -- including the error displayed when + -- receive.denyCurrentBranch=updateInstead; the user + -- will want to see that one. Also display progress messages. + filterstderr buf herr pid = hGetLineUntilExitOrEOF pid herr >>= \case + Just l + | "remote: " `isPrefixOf` l || not (null buf)-> + filterstderr (l:buf) herr pid + | otherwise -> do + hPutStrLn stderr l + filterstderr [] herr pid + Nothing -> displaybuf + where + displaybuf = + unless (any ("receive.denyCurrentBranch" `isInfixOf`) buf) $ + mapM_ (hPutStrLn stderr) (reverse buf) + pushparams branches = catMaybes [ Just $ Param "push" , if commandProgressDisabled' ms @@ -763,7 +780,8 @@ pushBranch remote mbranch ms g = directpush `after` annexpush `after` syncpush else Nothing , Just $ Param $ Remote.name remote ] ++ map Param branches - refspec b = concat + + syncrefspec b = concat [ Git.fromRef $ Git.Ref.base b , ":" , Git.fromRef $ Git.Ref.base $ syncBranch b diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn index 848fbfb30d..d1b7dd8e3d 100644 --- a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn @@ -12,3 +12,5 @@ This is very useful as it enables quick creation of repos without going through However, `git annex assist|sync|push` seem to push `git-annex`, `synced/git-annex`, or `synced/<currentbranch>` (in a seemingly random order? 🤔) **before** pushing `<currentbranch>` itself, causing this first pushed branch to become the repository's default branch. A `git clone ssh://me@myserver.com/me/myrepo` will then result in a local repo with e.g. `synced/main` checked out - or worse - `synced/git-annex`, causing a lot of confusion. Accidentally running `git annex assist` again will produce another level of `synced/synced/main` branches and all that fun stuff. (Very fun time during that summer school where I established git-annex + forgejo as data exchange 😉). Of course the solution is to just `git push` manually before `git annex assist`. But `git annex assist` is already such a brilliant command that does it all, and telling people to just run that to "do the git stuff" is very comfortable and easily accepted. Could the current branch be pushed first? Or is there a reason for pushing all the meta-branches first? + +> [[fixed|done]] --[[Joey]] diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment index ec22ced3a8..c04fcd917f 100644 --- a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment @@ -8,14 +8,19 @@ Basically: * We don't know if the remote is bare or non-bare. git does not generally provide a way to tell. -* Pushing to the checked out branch of a non-bare repo will complain on stderr. +* Pushing to the checked out branch of a non-bare repo will complain on + stderr, and the overall git push will fail even if other branches were + successfully pushed. But this is a fairly common use case for `git-annex sync`, and that complaint would be unwanted noise. git progress output also goes to stderr, so /dev/null of stderr is not desirable. * So instead push the synced branches, which doesn't have that problem, and lets - git display progress for the main data transfer. -* Then the current branch is pushed, with stderr collected and displayed - after filtering out denyCurrentBranch error messages. + git display progress for the main data transfer. As long as the + synced/master branch is pushed, the overall push part of sync can be + considered to succeed. +* Then the current branch is pushed, with stderr collected and displayed, + unless it contains the denyCurrentBranch warning message. A failure of this + push is not treated as an error. Also this was previously considered and partly addressed in [[!commit 1cc7b2661e5ec60f73f04dbe91940d2602df6246]] which made it push @@ -25,6 +30,7 @@ using a version from before that change. At that point I thought this was a github specific problem, mind. I think that to improve this, git-annex would need to run git push of master -with stderr intercepted and the denyCurrentBranch error message filtered out. +with stderr intercepted and the denyCurrentBranch error message filtered +out, but the rest of stderr (progress, etc) still displayed. Which does seem doable. """]] diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_2_922819fd788abf5b8863ab199c6930cb._comment b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_2_922819fd788abf5b8863ab199c6930cb._comment new file mode 100644 index 0000000000..75cb5b6a13 --- /dev/null +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_2_922819fd788abf5b8863ab199c6930cb._comment @@ -0,0 +1,164 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-06-04T15:11:21Z" + content=""" (Diff truncated)
comment
diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment new file mode 100644 index 0000000000..ec22ced3a8 --- /dev/null +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex/comment_1_feb218bccfe9d47fd5faac4c702b4c2f._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-04T13:45:18Z" + content=""" +Command/Sync.hs has a big comment on pushBranch about push order considerations. +Basically: + +* We don't know if the remote is bare or non-bare. git does not generally + provide a way to tell. +* Pushing to the checked out branch of a non-bare repo will complain on stderr. + But this is a fairly common use case for `git-annex sync`, and that + complaint would be unwanted noise. git progress output also goes to stderr, + so /dev/null of stderr is not desirable. +* So instead push the synced branches, which doesn't have that problem, and lets + git display progress for the main data transfer. +* Then the current branch is pushed, with stderr collected and displayed + after filtering out denyCurrentBranch error messages. + +Also this was previously considered and partly addressed in +[[!commit 1cc7b2661e5ec60f73f04dbe91940d2602df6246]] which made it push +synced/master before synced/git-annex, to at least avoid the git-annex branch +becoming the default branch. The varying behavior you're seeing may be due to +using a version from before that change. At that point I thought this was a +github specific problem, mind. + +I think that to improve this, git-annex would need to run git push of master +with stderr intercepted and the denyCurrentBranch error message filtered out. +Which does seem doable. +"""]]
comment
diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_4_c39de00401ad7f96fde93305e232139a._comment b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_4_c39de00401ad7f96fde93305e232139a._comment new file mode 100644 index 0000000000..e7c549bc52 --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_4_c39de00401ad7f96fde93305e232139a._comment @@ -0,0 +1,33 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-06-03T19:05:35Z" + content=""" +Nice work investigating this. I would not have guessed a kernel bug might +be involved. But I am not convinced one is, either. + +I agree with your analysis of your strace. The filename is getting into +git-annex ok. Then it runs the compute program with the mangled filename. + +I don't see how a kernel bug would cause git-annex to mangle the filename +though. As far as `git-annex addcomputed` is concerned, the filename is +just a parameter to use as input to the computation. Such parameters are +not limited to filenames actually. And so they pass through `git-annex +addcomputed` without being exposed to any kernel syscall that might do +something wrong on a buggy kernel. + +Unless, that is, the haskell `process` library, or indeed the kernel +itself, does something with parameters passed to the compute program. + +(This strace does rule out my theories around `hGetLineUntilExitOrEOF`.) + +---- + +What are the versions of git-annex in the VM where it worked vs +where it didn't? + +And, if you can possibly download and unpack the linuxstandalone tarball, +and use that to run git-annex in the bad VM, that would be a useful check +that the problem does not somehow involve the homebrew build. +<https://git-annex.branchable.com/install/Linux_standalone/> +"""]]
annex.fastcopy
Added annex.fastcopy and remote.name.annex-fastcopy config setting. When
set, this allows the copy_file_range syscall to be used, which can eg allow
for server-side copies on NFS. (For fastest copying, also disable
annex.verify or remote.name.annex-verify.)
This is a simple implementation, that does not handle resuming as well as
it possibly could.
It can be used with both local git remotes (including on NFS), and
directory special remotes. Other types of remotes could in theory also
support it, so I've left the config documented as a general thing.
Added annex.fastcopy and remote.name.annex-fastcopy config setting. When
set, this allows the copy_file_range syscall to be used, which can eg allow
for server-side copies on NFS. (For fastest copying, also disable
annex.verify or remote.name.annex-verify.)
This is a simple implementation, that does not handle resuming as well as
it possibly could.
It can be used with both local git remotes (including on NFS), and
directory special remotes. Other types of remotes could in theory also
support it, so I've left the config documented as a general thing.
diff --git a/Annex/CopyFile.hs b/Annex/CopyFile.hs index 83bc55e42a..9c9baf2e4f 100644 --- a/Annex/CopyFile.hs +++ b/Annex/CopyFile.hs @@ -1,6 +1,6 @@ {- Copying files. - - - Copyright 2011-2022 Joey Hess <id@joeyh.name> + - Copyright 2011-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -10,6 +10,7 @@ module Annex.CopyFile where import Annex.Common +import qualified Annex import Utility.Metered import Utility.CopyFile import Utility.FileMode @@ -77,6 +78,23 @@ tryCopyCoW (CopyCoWTried copycowtried) src dest meterupdate = data CopyMethod = CopiedCoW | Copied +-- Should cp be allowed to copy the file with --reflink=auto? +-- +-- The benefit is that this lets it use the copy_file_range +-- syscall, which is not used with --reflink=always. The drawback is that +-- the IncrementalVerifier is not updated, so verification, if it is done, +-- will need to re-read the whole content of the file. And, interrupted +-- copies are not resumed but are restarted from the beginning. +-- +-- Using this will result in CopiedCow being returned even in cases +-- where cp fell back to a slow copy. +newtype FastCopy = FastCopy Bool + +getFastCopy :: RemoteGitConfig -> Annex FastCopy +getFastCopy gc = case remoteAnnexFastCopy gc of + False -> FastCopy . annexFastCopy <$> Annex.getGitConfig + True -> return (FastCopy True) + {- Copies from src to dest, updating a meter. Preserves mode and mtime. - Uses copy-on-write if it is supported. If the the destination already - exists, an interrupted copy will resume where it left off. @@ -94,38 +112,49 @@ data CopyMethod = CopiedCoW | Copied - (eg when isStableKey is false), and doing this avoids getting a - corrupted file in such cases. -} -fileCopier :: CopyCoWTried -> OsPath -> OsPath -> MeterUpdate -> Maybe IncrementalVerifier -> IO CopyMethod +fileCopier :: CopyCoWTried -> FastCopy -> OsPath -> OsPath -> MeterUpdate -> Maybe IncrementalVerifier -> IO CopyMethod +fileCopier copycowtried (FastCopy True) src dest meterupdate iv = do + ok <- watchFileSize dest meterupdate $ const $ + copyFileExternal CopyTimeStamps src dest + if ok + then do + maybe noop unableIncrementalVerifier iv + return CopiedCoW + else fileCopier copycowtried (FastCopy False) src dest meterupdate iv #ifdef mingw32_HOST_OS -fileCopier _ src dest meterupdate iv = docopy +fileCopier _ _ src dest meterupdate iv = + fileCopier' src dest meterupdate iv #else -fileCopier copycowtried src dest meterupdate iv = +fileCopier copycowtried _ src dest meterupdate iv = ifM (tryCopyCoW copycowtried src dest meterupdate) ( do maybe noop unableIncrementalVerifier iv return CopiedCoW - , docopy + , fileCopier' src dest meterupdate iv ) #endif - where - docopy = do - -- The file might have had the write bit removed, - -- so make sure we can write to it. - void $ tryIO $ allowWrite dest - F.withBinaryFile src ReadMode $ \hsrc -> - fileContentCopier hsrc dest meterupdate iv +fileCopier' :: OsPath -> OsPath -> MeterUpdate -> Maybe IncrementalVerifier -> IO CopyMethod +fileCopier' src dest meterupdate iv = do + -- The file might have had the write bit removed, + -- so make sure we can write to it. + void $ tryIO $ allowWrite dest + + F.withBinaryFile src ReadMode $ \hsrc -> + fileContentCopier hsrc dest meterupdate iv - -- Copy src mode and mtime. - mode <- fileMode <$> R.getFileStatus (fromOsPath src) - mtime <- utcTimeToPOSIXSeconds <$> getModificationTime src - let dest' = fromOsPath dest - R.setFileMode dest' mode - touch dest' mtime False + -- Copy src mode and mtime. + mode <- fileMode <$> R.getFileStatus (fromOsPath src) + mtime <- utcTimeToPOSIXSeconds <$> getModificationTime src + let dest' = fromOsPath dest + R.setFileMode dest' mode + touch dest' mtime False - return Copied + return Copied {- Copies content from a handle to a destination file. Does not - use copy-on-write, and does not copy file mode and mtime. + - Updates the IncementalVerifier with the content it copies. -} fileContentCopier :: Handle -> OsPath -> MeterUpdate -> Maybe IncrementalVerifier -> IO () fileContentCopier hsrc dest meterupdate iv = diff --git a/CHANGELOG b/CHANGELOG index 9b7ddf6c5e..55ca8b37ef 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -10,6 +10,10 @@ git-annex (10.20250521) UNRELEASED; urgency=medium * map: Improve display of remote names. * Windows: Fix duplicate file bug that could occur when files were supposed to be moved across devices. + * Added annex.fastcopy and remote.name.annex-fastcopy config setting. + When set, this allows the copy_file_range syscall to be used, which + can eg allow for server-side copies on NFS. (For fastest copying, + also disable annex.verify or remote.name.annex-verify.) -- Joey Hess <id@joeyh.name> Thu, 22 May 2025 12:43:38 -0400 diff --git a/Remote/Directory.hs b/Remote/Directory.hs index 372a485ba7..5392caafa3 100644 --- a/Remote/Directory.hs +++ b/Remote/Directory.hs @@ -84,11 +84,12 @@ gen r u rc gc rs = do cst <- remoteCost gc c cheapRemoteCost let chunkconfig = getChunkConfig c cow <- liftIO newCopyCoWTried + fastcopy <- getFastCopy gc let ii = IgnoreInodes $ fromMaybe True $ getRemoteConfigValue ignoreinodesField c return $ Just $ specialRemote c - (storeKeyM dir chunkconfig cow) - (retrieveKeyFileM dir chunkconfig cow) + (storeKeyM dir chunkconfig cow fastcopy) + (retrieveKeyFileM dir chunkconfig cow fastcopy) (removeKeyM dir) (checkPresentM dir chunkconfig) Remote @@ -105,8 +106,8 @@ gen r u rc gc rs = do , checkPresent = checkPresentDummy , checkPresentCheap = True , exportActions = ExportActions - { storeExport = storeExportM dir cow - , retrieveExport = retrieveExportM dir cow + { storeExport = storeExportM dir cow fastcopy + , retrieveExport = retrieveExportM dir cow fastcopy , removeExport = removeExportM dir , checkPresentExport = checkPresentExportM dir -- Not needed because removeExportLocation @@ -118,7 +119,7 @@ gen r u rc gc rs = do { listImportableContents = listImportableContentsM ii dir , importKey = Just (importKeyM ii dir) , retrieveExportWithContentIdentifier = retrieveExportWithContentIdentifierM ii dir cow - , storeExportWithContentIdentifier = storeExportWithContentIdentifierM ii dir cow + , storeExportWithContentIdentifier = storeExportWithContentIdentifierM ii dir cow fastcopy , removeExportWithContentIdentifier = removeExportWithContentIdentifierM ii dir -- Not needed because removeExportWithContentIdentifier -- auto-removes empty directories. @@ -189,8 +190,8 @@ storeDir d k = addTrailingPathSeparator $ {- Check if there is enough free disk space in the remote's directory to - store the key. Note that the unencrypted key size is checked. -} -storeKeyM :: OsPath -> ChunkConfig -> CopyCoWTried -> Storer -storeKeyM d chunkconfig cow k c m = +storeKeyM :: OsPath -> ChunkConfig -> CopyCoWTried -> FastCopy -> Storer +storeKeyM d chunkconfig cow fastcopy k c m = ifM (checkDiskSpaceDirectory d k) ( do void $ liftIO $ tryIO $ createDirectoryUnder [d] tmpdir @@ -210,7 +211,7 @@ storeKeyM d chunkconfig cow k c m = in byteStorer go k c m NoChunks -> let go _k src p = liftIO $ do - void $ fileCopier cow src tmpf p Nothing + void $ fileCopier cow fastcopy src tmpf p Nothing finalizeStoreGeneric d tmpdir destdir in fileStorer go k c m _ -> @@ -247,12 +248,12 @@ finalizeStoreGeneric d tmp dest = do mapM_ preventWrite =<< dirContents dest preventWrite dest -retrieveKeyFileM :: OsPath -> ChunkConfig -> CopyCoWTried -> Retriever -retrieveKeyFileM d (LegacyChunks _) _ = Legacy.retrieve locations' d -retrieveKeyFileM d NoChunks cow = fileRetriever' $ \dest k p iv -> do +retrieveKeyFileM :: OsPath -> ChunkConfig -> CopyCoWTried -> FastCopy -> Retriever +retrieveKeyFileM d (LegacyChunks _) _ _ = Legacy.retrieve locations' d +retrieveKeyFileM d NoChunks cow fastcopy = fileRetriever' $ \dest k p iv -> do src <- liftIO $ getLocation d k - void $ liftIO $ fileCopier cow src dest p iv -retrieveKeyFileM d _ _ = byteRetriever $ \k sink -> + void $ liftIO $ fileCopier cow fastcopy src dest p iv (Diff truncated)
comment
diff --git a/doc/todo/use_copy__95__file__95__range_for_get_and_copy/comment_3_04c577bff624af761d997e5f9a8d951d._comment b/doc/todo/use_copy__95__file__95__range_for_get_and_copy/comment_3_04c577bff624af761d997e5f9a8d951d._comment new file mode 100644 index 0000000000..db3be71305 --- /dev/null +++ b/doc/todo/use_copy__95__file__95__range_for_get_and_copy/comment_3_04c577bff624af761d997e5f9a8d951d._comment @@ -0,0 +1,56 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-06-03T16:37:31Z" + content=""" +A config setting may be unncessesary. If git-annex tried to use +`copy_file_range` itself, that would fail with EOPNOTSUPP or EXDEV +or EXDEV when not supported. Then git-annex could use `cp --reflink=always` +as a fallback. + +However, `copy_file_range` is not necessarily inexpensive. Depending on the +filesystem it can still need to read and write the whole file. And, rather +than a single syscall copying the whole file, git-annex would need to call +it repeatedly in chunks in order to display a progress bar. But, making a +lot of syscalls against a NFS filesystem would be its own overhead. + +So there seems to be a tradeoff between progress display and efficiency on +NFS. And if the goal is to maximize speed for NFS with server-side copy, +maybe progress bars are not important enough to have in that case? + +Also, it seems likely to me that you would certainly want to turn off +annex.verify along with using `copy_file_range`, which is already a manual +config setting. So a second config setting would be no big deal. + +---- + +As to other filesystems, I found this comment with an overview as of 2022: +<https://github.com/openzfs/zfs/discussions/4237#discussioncomment-3579635> + +For btrfs, it does reflinking, so no benefit to using it over what +git-annex does now. + +Testing on ext4, `cp --reflink=auto` used `copy_file_range` in a copy on +the same filesystem (it tried it cross-filesystem, but it failed and had to +fall back to a regulat copy). So does `cp` with no options. On a SSD, +with big enough files (4 gb or so), I did see noticable performance +improvements. + +If git-annex did `copy_file_range` in chunks on ext4, it could read each +chunk after it was written to the destination file, and get it from the +page cache. But that would still copy the content of the file into user +space. So the savings from using `copy_file_range` with annex.verify set +on ext4 seem like they would only be in avoiding the userspace to kernel +transfer, with the kernel to userspace transfer still needed. + +That also notes that, on NFS, `copy_file_range` can do a CoW copy when the +underlying filesystem supports it. So with NFS on btrfs or zfs, a single +`copy_file_range` call could result in no more work than a reflink, +optimially efficient. If git-annex did `copy_file_range` on each chunk in +order to display a progress bar, that would be a lot of syscalls in flight +over the network, so noticably slower. + +All of this is making me lean toward a config setting that enables +`copy_file_range`, without progress bars, and that is intended to be +used with annex.verify disabled in order to get optimal performance. +"""]]
Added a comment
diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_3_70a0b4c7bad79a917fa0b9a8526ec428._comment b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_3_70a0b4c7bad79a917fa0b9a8526ec428._comment new file mode 100644 index 0000000000..844ba7e88d --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_3_70a0b4c7bad79a917fa0b9a8526ec428._comment @@ -0,0 +1,61 @@ +[[!comment format=mdwn + username="datawraith" + avatar="http://cdn.libravatar.org/avatar/e36c82a2b6f3150ad14a24eb7eb85826" + subject="comment 3" + date="2025-06-03T17:05:58Z" + content=""" +Thank you for taking the time to look into this! + +I am indeed using Homebrew on Linux. I'm on [Bluefin](https://projectbluefin.io/), which uses Fedora Silverblue as a base. Software there is generally installed either as Flatpak or via Homebrew because the root image is immutable. + +I see the behavior on both my desktop and laptop, both running a recent Bluefin version (bluefin-dx:latest, based on Fedora Silverblue 42), but it just occurred to me that I could try it in a virtual machine, too. + +When using a slightly older release of Bluefin I had on that VM, everything worked fine, but when I updated to the latest version, the `addcomputed` command started failing. Interestingly it works fine with files that were created before the update -- including with unicode filenames --, but when I create a new file with unicode characters after updating to the latest image, addcomputed fails on those, which seems to indicate this is **likely not a git-annex problem after all**. + +After a bit of research, I found [this](https://www.phoronix.com/news/Linux-Reverts-Special-Char-Uni) Linux problem that broke unicode handling in filenames, but I'm by no means certain that that is the cause of the problem, and if it is, there might be nothing you can do in git-annex to fix it. + +Unless you want to pursue this further, I'm fine with just closing the bug as not applicable. + +--- + +Still, I've added the requested strace log below -- I couldn't see a meaningful difference between the logs that worked, and the ones that failed, other than the failure itself and the missing unicode character escapes. + +Grepping the failure strace log for \"filename\" yields the following: + +``` +14497 execve(\"/usr/sbin/git\", [\"git\", \"annex\", \"addcomputed\", \"--to=passthrough\", \"\303\204 filename with Unic\303\266de ch\303\244ra\"..., \"foo.txt\"], 0x7ffe655d3190 /* 81 vars */) = 0 +14498 execve(\"/home/linuxbrew/.linuxbrew/bin/git-annex\", [\"/home/linuxbrew/.linuxbrew/bin/g\"..., \"addcomputed\", \"--to=passthrough\", \"\303\204 filename with Unic\303\266de ch\303\244ra\"..., \"foo.txt\"], 0x55d77d2df560 /* 82 vars */ <unfinished ...> +14507 execve(\"/usr/libexec/git-core/git-annex-compute-passthrough\", [\"git-annex-compute-passthrough\", \" filename with Unicde chracters.\"..., \"foo.txt\"], 0x4200427cb0 /* 82 vars */) = -1 ENOENT (No such file or directory) +14507 execve(\"/home/linuxbrew/.linuxbrew/bin/git-annex-compute-passthrough\", [\"git-annex-compute-passthrough\", \" filename with Unicde chracters.\"..., \"foo.txt\"], 0x4200427cb0 /* 82 vars */) = -1 ENOENT (No such file or directory) +14507 execve(\"/home/linuxbrew/.linuxbrew/sbin/git-annex-compute-passthrough\", [\"git-annex-compute-passthrough\", \" filename with Unicde chracters.\"..., \"foo.txt\"], 0x4200427cb0 /* 82 vars */) = -1 ENOENT (No such file or directory) +14507 execve(\"/var/home/myusername/.local/bin/git-annex-compute-passthrough\", [\"git-annex-compute-passthrough\", \" filename with Unicde chracters.\"..., \"foo.txt\"], 0x4200427cb0 /* 82 vars */) = -1 ENOENT (No such file or directory) +14507 execve(\"/var/home/myusername/bin/git-annex-compute-passthrough\", [\"git-annex-compute-passthrough\", \" filename with Unicde chracters.\"..., \"foo.txt\"], 0x4200427cb0 /* 82 vars */ <unfinished ...> +14507 write(1, \"INPUT filename with Unicde chra\"..., 42) = 42 +14498 <... read resumed>\"INPUT filename with Unicde chra\"..., 8192) = 42 +14498 write(19, \":./ filename with Unicde chracte\"..., 39 <unfinished ...> +14508 read(0, \":./ filename with Unicde chracte\"..., 4096) = 39 +14508 write(1, \":./ filename with Unicde chracte\"..., 47) = 47 +14498 read(20, \":./ filename with Unicde chracte\"..., 8192) = 47 +14498 write(19, \":./ filename with Unicde chracte\"..., 39) = 39 +14508 <... read resumed>\":./ filename with Unicde chracte\"..., 4096) = 39 +14508 write(1, \":./ filename with Unicde chracte\"..., 47 <unfinished ...> +14498 read(20, \":./ filename with Unicde chracte\"..., 8192) = 47 +``` + +Also with the /tmp/passthrough.log commented out. + +I haven't used strace before, but if I'm reading this right, it looks like the characters get lost as or after git-annex receives them, but before the passthrough script is called. There is a ton of output between the git-annex execve (14498) and the one for the passthrough script (14507), mostly seems to be loading libraries and examining the .git directory. It also loads the git.mo translation files and system locale settings in-between, but there is no obvious point of failure. + +--- + +Interestingly I get the same behavior for the invalid byte sequence example as for the unicode characters: + +``` +git-annex: The computation needs an input file that is not checked into the git repository: invalid +failed +addcomputed: 1 failed +``` + +They are simply stripped. + +"""]]
followup
diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_1_2da72486f0e83e74871706757b0badb6._comment b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_1_2da72486f0e83e74871706757b0badb6._comment new file mode 100644 index 0000000000..e2e06b3473 --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_1_2da72486f0e83e74871706757b0badb6._comment @@ -0,0 +1,43 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-06-02T17:21:37Z" + content=""" +> I'm running on Linux and my locale is de_DE.UTF-8: +> +> git-annex was installed using Homebrew. + +That's unusual. Linux and Homebrew? I just want to check you didn't +typo there and mean to say you're on OSX. + +Tried just now (including the same locale setting) and it does not fail for me: + + joey@darkstar:~/tmp/c>git annex addcomputed --to=passthrough "Ä filename with Unicöde chäracters.txt" fails.txt + addcomputed passthrough + (adding fails.txt...) (checksum...) + ok + (recording state in git...) + +There are 3 possibilities here: + +1. The unicode characters are getting stripped out before git-annex is run, + eg by your interactive shell or by git. +2. git-annex is stripping out valid (or invalid) unicode. +3. "read" or "echo" in your git-annex-compute-passthrough script is + for some reason stripping unicode + +The best way to track down which of these is the problem is `strace`, so could you please try this: + + strace -o log -f git annex addcomputed --to=passthrough "Ä filename with Unicöde chäracters.txt" foo.txt + grep "filename with" log + +Here's how that strace looks for me, when the characters are making it through unscathed: + + 2395608 execve("/usr/bin/git", ["git", "annex", "addcomputed", "--to=passthrough", "\303\204 filename with Unic\303\266de ch\303\244ra"..., "fails3.txt"], 0x7ffc44897f00 /* 69 vars */) = 0 + 2395609 execve("/home/joey/bin/git-annex", ["/home/joey/bin/git-annex", "addcomputed", "--to=passthrough", "\303\204 filename with Unic\303\266de ch\303\244ra"..., "fails3.txt"], 0x55c3cfdf27c0 /* 70 vars */ <unfinished ...> + 2395618 execve("/home/joey/bin/git-annex-compute-passthrough", ["git-annex-compute-passthrough", "\303\204 filename with Unic\303\266de ch\303\244ra"..., "fails3.txt"], 0x42000ec610 /* 70 vars */ <unfinished ...> + 2395618 write(1, "INPUT \303\204 filename with Unic\303\266de "..., 48) = 48 + 2395609 read(16, "INPUT \303\204 filename with Unic\303\266de "..., 8192) = 48 + +(I commented out the passthrough.log writing from the script to keep the strace easier to follow.) +"""]] diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_2_a51f8ba4f0b24c21e107bc33db2412ab._comment b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_2_a51f8ba4f0b24c21e107bc33db2412ab._comment new file mode 100644 index 0000000000..b4627aeac7 --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames/comment_2_a51f8ba4f0b24c21e107bc33db2412ab._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-06-02T17:54:05Z" + content=""" +I don't see how git-annex could be stripping even invalid unicode here. +When it runs the compute program it uses `process` with `CreatePipe`. That +is documented to use the default encoding. git-annex sets the default +encoding in `useFileSystemEncoding`. + +With that said, git-annex is here using `hGetLineUntilExitOrEOF`, and if +`hGetChar` ever failed with an encoding error, it does look like that +would skip over the problem and return the rest of the string. + +It would not hurt to throw in a `fileEncoding` on the compute process's +handles, but I'd really want to be able to reproduce this first. + +I have also tried with filenames that are not valid unicode at all, and +they pass through ok. Eg: + + invalid_byte_sequence=$'\x80\x81' + echo hi > invalid$(printf %s $invalid_byte_sequence) + git-annex add invalid* + git annex addcomputed --to=passthrough invalid* invalidout + cat invalidout + hi +"""]]
Added a comment
diff --git a/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_9_60cd62cff48ca72cb3b4a89d0313e10c._comment b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_9_60cd62cff48ca72cb3b4a89d0313e10c._comment new file mode 100644 index 0000000000..3755e75ac3 --- /dev/null +++ b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_9_60cd62cff48ca72cb3b4a89d0313e10c._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 9" + date="2025-06-02T17:22:37Z" + content=""" +Kai said + +> The repository is located on an NTFS drive. I don't recall whether it +> was cloned using git clone from within WSL or downloaded directly from +> GitHub, but the repository is stored on an NTFS drive and is accessible +> from WSL. I'm not sure if the cloning method is relevant to this issue, + +"""]]
diff --git a/doc/bugs/compute_remote_fails_for_unicode_filenames.mdwn b/doc/bugs/compute_remote_fails_for_unicode_filenames.mdwn new file mode 100644 index 0000000000..7ad0ccc47a --- /dev/null +++ b/doc/bugs/compute_remote_fails_for_unicode_filenames.mdwn @@ -0,0 +1,136 @@ +### Please describe the problem. + +I'm experimenting with the compute special remote by trying to convert FLAC files to .opus. + +Some of the music files have unicode characters in the filename, which leads to an incorrect error message saying that the file is not checked into the repository. + +It is possible that I'm just doing something wrong here, but as far as I can tell, the unicode characters are simply stripped by git-annex. + + +### What steps will reproduce the problem? + +1. Commit a file with unicode characters in the filename to the git repository +2. Invoke a compute remote with that file +3. git annex complains that the file is not checked into the git repository + +### What version of git-annex are you using? On what operating system? + +I'm running on Linux and my locale is de_DE.UTF-8: + +``` +$ locale +LANG=de_DE.UTF-8 +LC_CTYPE="de_DE.UTF-8" +LC_NUMERIC="de_DE.UTF-8" +LC_TIME="de_DE.UTF-8" +LC_COLLATE="de_DE.UTF-8" +LC_MONETARY="de_DE.UTF-8" +LC_MESSAGES="de_DE.UTF-8" +LC_PAPER="de_DE.UTF-8" +LC_NAME="de_DE.UTF-8" +LC_ADDRESS="de_DE.UTF-8" +LC_TELEPHONE="de_DE.UTF-8" +LC_MEASUREMENT="de_DE.UTF-8" +LC_IDENTIFICATION="de_DE.UTF-8" +LC_ALL= +``` + +git-annex was installed using Homebrew. + +``` +git-annex version: 10.20250520 +build flags: Pairing DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.8.4 http-client-0.7.19 persistent-sqlite-2.13.3.1 torrent-10000.1.3 uuid-1.3.16 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +### Please provide any additional information below. + +Here is a minimal reproduction of the problem: + +[[!format sh """ +$ git init compute-unicode +$ cd compute-unicode +$ touch "A filename without Unicode characters.txt" +$ touch "Ä filename with Unicöde chäracters.txt" +$ git add . +$ git commit -m "Demo" + +[main (Root-Commit) 3655a71] Demo + 2 files changed, 0 insertions(+), 0 deletions(-) + create mode 100644 A filename without Unicode characters.txt + create mode 100644 "\303\204 filename with Unic\303\266de ch\303\244racters.txt" + +$ git annex init + +init ok +(recording state in git...) + +$ git annex initremote passthrough type=compute program=git-annex-compute-passthrough + +initremote passthrough ok +(recording state in git...) + +$ git annex addcomputed --to=passthrough "A filename without Unicode characters.txt" works.txt + +addcomputed passthrough +(adding works.txt...) (checksum...) +ok +(recording state in git...) + +$ git annex addcomputed --to=passthrough "Ä filename with Unicöde chäracters.txt" fails.txt + +addcomputed passthrough + +git-annex: The computation needs an input file that is not checked into the git repository: filename with Unicde chracters.txt +failed +addcomputed: 1 failed +"""]] + +Note how the unicode characters are simply missing in git-annex's message: " filename with Unicde chracters.txt". + +I first thought this was a problem with my script, but it seems that git-annex strips the Unicode characters before invoking it. + +The passthrough-remote looks like this (adapted from the ImageMagick example): + +```sh +#!/bin/sh +set -e + +if [ -z "$1" ] || [ -z "$2" ]; then + echo "Specify the input file, followed by the output file." >&2 + echo "Example: input.txt output.txt" >&2 + exit 1 +fi + +echo "INPUT: $1" > /tmp/passthrough.log +echo "OUTPUT: $2" >> /tmp/passthrough.log + +echo "INPUT $1" +read input +echo "OUTPUT $2" +read output + + +if [ -n "$input" ]; then + cat "$input" > "$output" +fi +``` + +The log file in /tmp/passthrough.log doesn't have the Unicode characters: + +``` +INPUT: filename with Unicde chracters.txt +OUTPUT: fails.txt +``` + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + + +I've been happily managing my important data (as well as things like my music collection) with git-annex for a few years now, with it making sure that everything has several copies on different external storage media. :-)
Suggest pushing current branch before the meta-branches
diff --git a/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn new file mode 100644 index 0000000000..848fbfb30d --- /dev/null +++ b/doc/todo/Pushing_current_branch_before_synced__42___or_git-annex.mdwn @@ -0,0 +1,14 @@ +Many git forges (gitlab, gitea, forgejo, etc.) support [push-to-create](https://forgejo.org/docs/latest/user/push-to-create/#push-to-create) to create repositories upon the first push, e.g. + +[[!format bash """ +# add the (still nonexistant) remote to this local repo +> git remote add myserver ssh://me@myserver.com/me/myrepo +# push to the remote as if it existed (will create it on the remote) +> git push -u myserver +"""]] + +This is very useful as it enables quick creation of repos without going through a tedious GUI. + +However, `git annex assist|sync|push` seem to push `git-annex`, `synced/git-annex`, or `synced/<currentbranch>` (in a seemingly random order? 🤔) **before** pushing `<currentbranch>` itself, causing this first pushed branch to become the repository's default branch. A `git clone ssh://me@myserver.com/me/myrepo` will then result in a local repo with e.g. `synced/main` checked out - or worse - `synced/git-annex`, causing a lot of confusion. Accidentally running `git annex assist` again will produce another level of `synced/synced/main` branches and all that fun stuff. (Very fun time during that summer school where I established git-annex + forgejo as data exchange 😉). + +Of course the solution is to just `git push` manually before `git annex assist`. But `git annex assist` is already such a brilliant command that does it all, and telling people to just run that to "do the git stuff" is very comfortable and easily accepted. Could the current branch be pushed first? Or is there a reason for pushing all the meta-branches first?
caps
diff --git a/doc/install.mdwn b/doc/install.mdwn index 3d8413e389..23f440689d 100644 --- a/doc/install.mdwn +++ b/doc/install.mdwn @@ -23,7 +23,7 @@ detailed instructions | quick install [[OpenBSD]] | `pkg_add git-annex` [[Android]] | **beta** [[Windows]] | **beta** -[[PyPi]] | `uv tool install git-annex` +[[PyPI]] | `uv tool install git-annex` """]] ## Historical builds diff --git a/doc/install/pypi.mdwn b/doc/install/pypi.mdwn index f80805bc22..9987da4dda 100644 --- a/doc/install/pypi.mdwn +++ b/doc/install/pypi.mdwn @@ -1,3 +1,3 @@ -git-annex is packaged in PyPi for ease of use for python users. +git-annex is packaged in PyPI for ease of use for python users. <https://pypi.org/project/git-annex/>
fix pipi link
markdown link didn't work, use a subpage
markdown link didn't work, use a subpage
diff --git a/doc/install.mdwn b/doc/install.mdwn index 6c5533fd12..3d8413e389 100644 --- a/doc/install.mdwn +++ b/doc/install.mdwn @@ -23,7 +23,7 @@ detailed instructions | quick install [[OpenBSD]] | `pkg_add git-annex` [[Android]] | **beta** [[Windows]] | **beta** -[PyPi][pypi] | `uv tool install git-annex` +[[PyPi]] | `uv tool install git-annex` """]] ## Historical builds @@ -40,5 +40,3 @@ it [[from source|fromsource]]. * [[autobuild overview|builds]] * [[upgrades]] - -[pypi]: https://pypi.org/project/git-annex/ diff --git a/doc/install/pypi.mdwn b/doc/install/pypi.mdwn new file mode 100644 index 0000000000..f80805bc22 --- /dev/null +++ b/doc/install/pypi.mdwn @@ -0,0 +1,3 @@ +git-annex is packaged in PyPi for ease of use for python users. + +<https://pypi.org/project/git-annex/>
add pypi
diff --git a/doc/install.mdwn b/doc/install.mdwn index 3b6a5058f9..6c5533fd12 100644 --- a/doc/install.mdwn +++ b/doc/install.mdwn @@ -23,6 +23,7 @@ detailed instructions | quick install [[OpenBSD]] | `pkg_add git-annex` [[Android]] | **beta** [[Windows]] | **beta** +[PyPi][pypi] | `uv tool install git-annex` """]] ## Historical builds @@ -39,3 +40,5 @@ it [[from source|fromsource]]. * [[autobuild overview|builds]] * [[upgrades]] + +[pypi]: https://pypi.org/project/git-annex/
diff --git a/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash.mdwn b/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash.mdwn new file mode 100644 index 0000000000..21a7c72cb6 --- /dev/null +++ b/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash.mdwn @@ -0,0 +1,42 @@ +### Please describe the problem. + +I have an old git-annex-remote-rclone remote that I'd like to switch over to the builtin rclone variant. I figured maybe a simple `git annex enableremote REMOTE type=rclone` would do it, but that crashes git-annex: + +``` +$ git annex enableremote remote type=rclone +enableremote remote +git-annex: getRemoteConfigValue externaltype found value of unexpected type PassedThrough. This is a bug in git-annex! +CallStack (from HasCallStack): + error, called at ./Annex/SpecialRemote/Config.hs:206:28 in main:Annex.SpecialRemote.Config + getRemoteConfigValue, called at ./Remote/External.hs:931:35 in main:Remote.External +failed +enableremote: 1 failed +``` + +### What steps will reproduce the problem? + +1. Create a remote using `type=external externaltype=rclone` +2. Try to change it to `type=rclone` + +### What version of git-annex are you using? On what operating system? + +``` +git-annex version: 10.20250521-g1a9e6bf26b56c39429d4a096bf733e57e5684e1b +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.34 DAV-1.3.4 feed-1.3.2.1 ghc-9.6.6 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 +BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM UR +L GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +``` + +Using the standalone amd64 build on Debian 12. + + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +I use git-annex for "everything". I have somewhere along the lines of 14TiB stored in various git-annex repositories, synced in various degrees to anywhere between 3 and 10 hosts, with repos dating back to 2012. It's awesome.
update
diff --git a/doc/todo/map__58___add_--json/comment_3_f40842222d964ff1a9e0effba5a2e522._comment b/doc/todo/map__58___add_--json/comment_3_f40842222d964ff1a9e0effba5a2e522._comment new file mode 100644 index 0000000000..0cc882d30e --- /dev/null +++ b/doc/todo/map__58___add_--json/comment_3_f40842222d964ff1a9e0effba5a2e522._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-05-29T16:58:29Z" + content=""" +BTW, my first comment incorrectly described map's spidering capabilities +slightly. Suppose you have a remote on host foo, and that repository has +its own remote on host bar. Then map will ssh to foo to dump the git +config, find the additional urls on bar, and try to ssh to bar to get +the git config of the remote of the remote. And this can continue +artibtarily far, but limited of course by what hosts you can ssh to. +Whether that will be enough for your needs, I don't know. +"""]]
adjust json field names
Avoid using "name" for what git-annex otherwise refers to as a
description.
(For the remotes in the map, the "remote" field should be the remote
name, but there is a bug preventing it from being that.)
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Avoid using "name" for what git-annex otherwise refers to as a
description.
(For the remotes in the map, the "remote" field should be the remote
name, but there is a bug preventing it from being that.)
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/Command/Map.hs b/Command/Map.hs index 0deb5a6029..9ecceea2f0 100644 --- a/Command/Map.hs +++ b/Command/Map.hs @@ -47,7 +47,7 @@ start = startingNoMessage (ActionItemOther Nothing) $ do umap <- uuidDescMap trustmap <- trustMapLoad - + ifM (outputJSONMap rs trustmap umap) ( next $ return True , do @@ -108,8 +108,8 @@ hostname r basehostname :: Git.Repo -> String basehostname r = fromMaybe "" $ headMaybe $ splitc '.' $ hostname r -{- A name to display for a repo. Uses the name from uuid.log if available, - - or the remote name if not. -} +{- A name to display for a repo. Uses the description + - from uuid.log if available, or the remote name if not. -} repoName :: UUIDDescMap -> Git.Repo -> String repoName umap r | repouuid == NoUUID = fallback @@ -307,14 +307,14 @@ outputJSONMap rs trustmap umap = ] mknode (r, remotes) = JSON.object - [ "name" .= packString (repoName umap r) + [ "description" .= packString (repoName umap r) , "uuid" .= mkuuid (getUncachedUUID r) , "url" .= packString (Git.repoLocation r) , "remotes" .= map mkremote (filterdead id remotes) ] mkremote r = JSON.object - [ "name" .= packString (repoName umap r) + [ "remote" .= packString (repoName umap r) , "uuid" .= mkuuid (getUncachedUUID r) , "url" .= packString (Git.repoLocation r) ] diff --git a/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment b/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment index 15691af277..9463fe12f2 100644 --- a/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment +++ b/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment @@ -10,10 +10,10 @@ Example output, after being passed through `jq` to pretty-print it: { "nodes": [ { - "name": "joey@darkstar:~/tmp/mapbench/a", + "description": "joey@darkstar:~/tmp/mapbench/a", "remotes": [ { - "name": "joey@darkstar:~/tmp/mapbench/b", + "remote": "b", "url": "/home/joey/tmp/mapbench/b", "uuid": "645d92d8-6461-43c1-b23c-6dd04dc3a015" } @@ -22,10 +22,10 @@ Example output, after being passed through `jq` to pretty-print it: "uuid": "3f34e4c2-dd19-433a-ab04-9fd4be959325" }, { - "name": "joey@darkstar:~/tmp/mapbench/b", + "description": "joey@darkstar:~/tmp/mapbench/b", "remotes": [ { - "name": "joey@darkstar:~/tmp/mapbench/a", + "remote": "a", "url": "/home/joey/tmp/mapbench/a", "uuid": "3f34e4c2-dd19-433a-ab04-9fd4be959325" } @@ -34,10 +34,10 @@ Example output, after being passed through `jq` to pretty-print it: "uuid": "645d92d8-6461-43c1-b23c-6dd04dc3a015" }, { - "name": "unknown", + "description": "unknown", "remotes": [ { - "name": "joey@darkstar:~/tmp/mapbench/b", + "remote": "b", "url": "/home/joey/tmp/mapbench/b", "uuid": "645d92d8-6461-43c1-b23c-6dd04dc3a015" }
Added a comment: I need help with this too (c.f. submodule refactor)
diff --git a/doc/forum/Move_part_of_one_repository_into_other/comment_1_c07f4c20b187abf7e3c43021f72d672f._comment b/doc/forum/Move_part_of_one_repository_into_other/comment_1_c07f4c20b187abf7e3c43021f72d672f._comment new file mode 100644 index 0000000000..39bebb460c --- /dev/null +++ b/doc/forum/Move_part_of_one_repository_into_other/comment_1_c07f4c20b187abf7e3c43021f72d672f._comment @@ -0,0 +1,57 @@ +[[!comment format=mdwn + username="Spencer" + avatar="http://cdn.libravatar.org/avatar/2e0829f36a68480155e09d0883794a55" + subject="I need help with this too (c.f. submodule refactor)" + date="2025-05-29T03:42:42Z" + content=""" +I do this quite often because I use a monorepo approach with regular refactoring of subtrees into their own submodules. I have yet to find a bulletproof way to do this on the git-annex side. + +The first step is as simple as `git annex unannex` in `A`, or including `--include \"*\"` if pattern matching is easier. + +- On the `git` side, this logs the files as deleted from the main repo (`src`, let's call her). This is ideal so that you have a record for yourself (with a descriptive commit message) of where you've moved your files to. +- On the `git-annex` side, (once you commit), the file data will eventually become \"unused\" - you'll have to do some combination of `git annex push` and `git annex sync [--cleanup]` to ensure all branches really don't reference those files (including remote branches and `synced/*` branches). + +Now the question is: how do we get the data into the new repo (`dst`) and safely drop from `src`? + +- You could add `dst` as a remote of `src` and pull only `dst`'s `git-annex` branch, which (after moving, re-annexing, and committing the unannexed files to `dst`) now shows as having a copy of those files. (**Warning:** this has bad side-effects). +- You could do the opposite but use `dst` to move any (used) files from `src` (**Warning:** this has bad side-effects). +- You could add `dst` as a remote and `move` unused files over (requires a clean unused stack already and having to do the push/sync stuff correctly and fully before the files can be released) +- You could do the opposite and \"copy\" the files *to* `src` first *then* move them over to `dst`. (Required because per `dst`'s knowledge, it has no record of `src` having any keys. I find it logical albeit sad that `git-annex` can't dynamically poll local repos' annexes for file content) +- You could forcibly drop the data either by individual key or once it eventually becomes unused (super unsafe and sad) + +### Conclusions + +- Keep a clean unused stack (`git annex unused` gives nothing) as much as you can, and clean it out before testing out any sort of move/drop operations like this. +- Option 4 is the best so far. Following the initial step of `gx unannex` in `src`: + - Add `src` as a remote in `dst`, `mv` files into `dst`, `gx add` files in `dst`, `gx copy` files from `dst` back to `src`, then do `gx move -f <src>` + - This will only move the files known by `dst`. If it so happens that one of these files is actually duplicate data with something you want to also be in `src`, this *will* drop it and leave no record in `src` of where it went (besides your `git` commit message). + +As described, there are still side effects with Option 4, but it's so far the best option I've devised. +Oh, and if you want to keep `src` around as a remote on `dst` to e.g. remind yourself of various relations, make sure you configure it in `.git/config` with: + +- `annex.sync=false`. This skips it when you do a `git annex sync` +- Delete the `remote.fetch` spec, or add `remote.skipFetchAll=true`. This ensures `git fetch` doesn't fetch all the branch and unrelated objects +- (pray there are no more side-effects) + +Now, what happens if a side-effect does happen and it looks like you lost some content and don't know where it went? `git annex whereis` is no help. +Instead, you have to extract the key from the now broken symlink and run `find <> -type f -iname \"<KEY>\"`. Easy enough but kind of scary when it happens to you. + +### Side-Effects of Option 1+2: `git-annex` synchronization + +*DON'T DEAD OPEN INSIDE* + +While this is currently the only way to propagate annex key information, it has bad side-effects: + +- Remotes and known repos start to clutter whichever absorbs the others' `git-annex` branch. For me this is a no-go because I have redundant remotes (an exporttree called `dropbox` in my case) +- If you decide to `dead` these remotes or repos and by coincidence the `git-annex` branch is later absorbed in the other direction, chaos ensues (`dead` is propagated, remote annex key history is killed: especially gross for export/importtrees) + - Best way to avoid this is to `dead`, `forget --drop-dead` then `semitrust UUID`. Many steps, potentially undefined condition. Gross. + +## Potential Feature Requests + +Ideally, I would wish `git-annex` could intelligently scan another repo's annex and populate information about what keys it has simply by what keys are objectively in `.git/annex/objects`. This pulls in the information we care about without cluttering additional information relevant only to each respective repo. +Then, presuming you've set up a remote (`dst`) pointing to this repo (`src`) and run `git annex info`, then `src` should have a list of keys that are inside `dst`, and `gx whereis` from `src` will identify the keys inside `dst`, and `drop` will happily do so. + +- Maybe there could be something called an `acquaintance` repo that is not allowed to be synced, pulled, fetched, pushed to. +- Acquaintances are semitrusted because they're still annex-controlled. +- On removing an acquaintance repo, and running `gx forget`, the list of keys is wiped. +"""]]
Added a comment: Not enough information on special remotes
diff --git a/doc/forum/View_special_remote_information__63__/comment_2_e70275d412564f01446ed45c3a31dc7d._comment b/doc/forum/View_special_remote_information__63__/comment_2_e70275d412564f01446ed45c3a31dc7d._comment new file mode 100644 index 0000000000..1149bac8e0 --- /dev/null +++ b/doc/forum/View_special_remote_information__63__/comment_2_e70275d412564f01446ed45c3a31dc7d._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="guez@e17c318e09fc77b4a5be4cd330364e3a41a96971" + nickname="guez" + avatar="http://cdn.libravatar.org/avatar/ffec09075c5b5cd47832649a306d68c3" + subject="Not enough information on special remotes" + date="2025-05-28T21:58:23Z" + content=""" +You say that the command shows the url used for a WebDAV remote, but this does not seem to be the case any longer: + +``` +$ git annex info sdrive +uuid: d17d5946-d126-4a0e-b6c1-232fb34fb461 +description: sdrive +trust: semitrusted +remote annex keys: 1 +remote annex size: 249.11 kilobytes +``` + +I can get a list of special remotes with `git annex enableremote` but how can I get a more detailed list, with all the information on each special remote: the type, the configuration options (encryption or not, etc.), the URLs? +"""]]
map: Support --json option
Sponsored-by: Dartmouth College's OpenNeuro project
Sponsored-by: Dartmouth College's OpenNeuro project
diff --git a/CHANGELOG b/CHANGELOG index 789872e8c0..8fb85dfb1b 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -6,6 +6,7 @@ git-annex (10.20250521) UNRELEASED; urgency=medium configured but fails, prevent initialization. This allows the user to fix their configuration and avoid crippled filesystem detection entering an adjusted branch. + * map: Support --json option. -- Joey Hess <id@joeyh.name> Thu, 22 May 2025 12:43:38 -0400 diff --git a/Command/Map.hs b/Command/Map.hs index ce28ca32c3..0deb5a6029 100644 --- a/Command/Map.hs +++ b/Command/Map.hs @@ -9,8 +9,6 @@ module Command.Map where -import qualified Data.Map as M - import Command import qualified Git import qualified Git.Url @@ -25,12 +23,17 @@ import Logs.Trust import Types.TrustLevel import qualified Remote.Helper.Ssh as Ssh import qualified Utility.Dot as Dot +import qualified Messages.JSON as JSON +import Messages.JSON ((.=)) +import Utility.Aeson (packString) + +import qualified Data.Map as M -- a repo and its remotes type RepoRemotes = (Git.Repo, [Git.Repo]) cmd :: Command -cmd = dontCheck repoExists $ +cmd = dontCheck repoExists $ withAnnexOptions [jsonOptions] $ command "map" SectionQuery "generate map of repositories" paramNothing (withParams seek) @@ -45,19 +48,23 @@ start = startingNoMessage (ActionItemOther Nothing) $ do umap <- uuidDescMap trustmap <- trustMapLoad - file <- (</>) - <$> fromRepo gitAnnexDir - <*> pure (literalOsPath "map.dot") - - liftIO $ writeFile (fromOsPath file) (drawMap rs trustmap umap) - next $ - ifM (Annex.getRead Annex.fast) - ( runViewer file [] - , runViewer file - [ ("xdot", [File (fromOsPath file)]) - , ("dot", [Param "-Tx11", File (fromOsPath file)]) - ] - ) + ifM (outputJSONMap rs trustmap umap) + ( next $ return True + , do + file <- (</>) + <$> fromRepo gitAnnexDir + <*> pure (literalOsPath "map.dot") + + liftIO $ writeFile (fromOsPath file) (drawMap rs trustmap umap) + next $ + ifM (Annex.getRead Annex.fast) + ( runViewer file [] + , runViewer file + [ ("xdot", [File (fromOsPath file)]) + , ("dot", [Param "-Tx11", File (fromOsPath file)]) + ] + ) + ) runViewer :: OsPath -> [(String, [CommandParam])] -> Annex Bool runViewer file [] = do @@ -198,7 +205,8 @@ same a b {- reads the config of a remote, with progress display -} scan :: Git.Repo -> Annex Git.Repo scan r = do - showStartMessage (StartMessage "map" (ActionItemOther (Just $ UnquotedString $ Git.repoDescribe r)) (SeekInput [])) + unlessM jsonOutputEnabled $ + showStartMessage (StartMessage "map" (ActionItemOther (Just $ UnquotedString $ Git.repoDescribe r)) (SeekInput [])) v <- tryScan r case v of Just r' -> do @@ -269,7 +277,7 @@ tryScan r configlist ok -> return ok - sshnote = do + sshnote = unlessM jsonOutputEnabled $ do showAction "sshing" showOutput @@ -287,3 +295,33 @@ safely a = do case result of Left _ -> return Nothing Right r' -> return $ Just r' + +outputJSONMap :: [RepoRemotes] -> TrustMap -> UUIDDescMap -> Annex Bool +outputJSONMap rs trustmap umap = + showFullJSON $ JSON.AesonObject $ case mapo of + JSON.Object obj -> obj + _ -> error "internal" + where + mapo = JSON.object + [ "nodes" .= map mknode (filterdead fst rs) + ] + + mknode (r, remotes) = JSON.object + [ "name" .= packString (repoName umap r) + , "uuid" .= mkuuid (getUncachedUUID r) + , "url" .= packString (Git.repoLocation r) + , "remotes" .= map mkremote (filterdead id remotes) + ] + + mkremote r = JSON.object + [ "name" .= packString (repoName umap r) + , "uuid" .= mkuuid (getUncachedUUID r) + , "url" .= packString (Git.repoLocation r) + ] + + mkuuid NoUUID = Nothing + mkuuid u = Just $ packString $ fromUUID u + + filterdead f = filter + (\i -> M.lookup (getUncachedUUID (f i)) trustmap /= Just DeadTrusted) + diff --git a/doc/git-annex-map.mdwn b/doc/git-annex-map.mdwn index debfa1c31a..23585fdae2 100644 --- a/doc/git-annex-map.mdwn +++ b/doc/git-annex-map.mdwn @@ -39,6 +39,10 @@ on that host. Don't display the generated Graphviz file, but save it for later use. +* `--json` + + Output the map as a JSON object. + * Also the [[git-annex-common-options]](1) can be used. # SEE ALSO diff --git a/doc/todo/map__58___add_--json.mdwn b/doc/todo/map__58___add_--json.mdwn index 58b21a51f2..5b431ecc3e 100644 --- a/doc/todo/map__58___add_--json.mdwn +++ b/doc/todo/map__58___add_--json.mdwn @@ -6,3 +6,5 @@ Please let me know on how feasible that would be, and any other thoughts you hav [[!meta author=yoh]] [[!tag projects/openneuro]] + +> [[done]] --[[Joey]] diff --git a/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment b/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment new file mode 100644 index 0000000000..15691af277 --- /dev/null +++ b/doc/todo/map__58___add_--json/comment_2_e9d4957cf5a3d5c015108cb1805df7da._comment @@ -0,0 +1,51 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-05-28T18:11:34Z" + content=""" +I went ahead and implemented `git-annx map --json`. + +Example output, after being passed through `jq` to pretty-print it: + + { + "nodes": [ + { + "name": "joey@darkstar:~/tmp/mapbench/a", + "remotes": [ + { + "name": "joey@darkstar:~/tmp/mapbench/b", + "url": "/home/joey/tmp/mapbench/b", + "uuid": "645d92d8-6461-43c1-b23c-6dd04dc3a015" + } + ], + "url": "/home/joey/tmp/mapbench/a", + "uuid": "3f34e4c2-dd19-433a-ab04-9fd4be959325" + }, + { + "name": "joey@darkstar:~/tmp/mapbench/b", + "remotes": [ + { + "name": "joey@darkstar:~/tmp/mapbench/a", + "url": "/home/joey/tmp/mapbench/a", + "uuid": "3f34e4c2-dd19-433a-ab04-9fd4be959325" + } + ], + "url": "/home/joey/tmp/mapbench/b", + "uuid": "645d92d8-6461-43c1-b23c-6dd04dc3a015" + }, (Diff truncated)
comment
diff --git a/doc/install/rpm_standalone/comment_5_a84ab211e00776b6631929e9a8f4e25e._comment b/doc/install/rpm_standalone/comment_5_a84ab211e00776b6631929e9a8f4e25e._comment new file mode 100644 index 0000000000..88fe775a8c --- /dev/null +++ b/doc/install/rpm_standalone/comment_5_a84ab211e00776b6631929e9a8f4e25e._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: Standalone rpms not available""" + date="2025-05-27T17:04:27Z" + content=""" +There was a problem with the last release, it's available now. +"""]]
prevent initialization with bad freeze/thaw hook configured
When annex.freezecontent-command or annex.thawcontent-command is configured
but fails, prevent initialization.
This allows the user to fix their configuration and avoid crippled
filesystem detection entering an adjusted unlocked branch unexpectedly,
when they had been relying on the hooks working around their filesystems's
infelicities.
In the case of git-remote-annex, a failure of these hooks is taken to mean
the filesystem may be crippled, so it deletes the bundles objects and
avoids initialization. That might mean extra work, but only in this edge
case where the hook is misconfigured. And it keeps the command working
for cloning even despite the misconfiguration.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
When annex.freezecontent-command or annex.thawcontent-command is configured
but fails, prevent initialization.
This allows the user to fix their configuration and avoid crippled
filesystem detection entering an adjusted unlocked branch unexpectedly,
when they had been relying on the hooks working around their filesystems's
infelicities.
In the case of git-remote-annex, a failure of these hooks is taken to mean
the filesystem may be crippled, so it deletes the bundles objects and
avoids initialization. That might mean extra work, but only in this edge
case where the hook is misconfigured. And it keeps the command working
for cloning even despite the misconfiguration.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/Annex/Hook.hs b/Annex/Hook.hs index 086665abce..366678490c 100644 --- a/Annex/Hook.hs +++ b/Annex/Hook.hs @@ -106,44 +106,55 @@ doesAnnexHookExist hook = do runAnnexHook :: Git.Hook -> (GitConfig -> Maybe String) -> Annex () runAnnexHook hook commandcfg = runAnnexHook' hook commandcfg >>= \case - Nothing -> noop - Just failedcommanddesc -> + HookSuccess -> noop + HookFailed failedcommanddesc -> warning $ UnquotedString $ failedcommanddesc ++ " failed" --- Returns Nothing if the hook or GitConfig command succeeded, or a --- description of what failed. -runAnnexHook' :: Git.Hook -> (GitConfig -> Maybe String) -> Annex (Maybe String) +data HookResult + = HookSuccess + | HookFailed String + -- ^ A description of the hook command that failed. + deriving (Eq, Show) + +runAnnexHook' :: Git.Hook -> (GitConfig -> Maybe String) -> Annex HookResult runAnnexHook' hook commandcfg = ifM (doesAnnexHookExist hook) ( runhook , runcommandcfg ) where runhook = ifM (inRepo $ Git.runHook boolSystem hook []) - ( return Nothing + ( return HookSuccess , do h <- fromRepo (Git.hookFile hook) - commandfailed (fromOsPath h) + return $ HookFailed $ fromOsPath h ) runcommandcfg = commandcfg <$> Annex.getGitConfig >>= \case - Nothing -> return Nothing + Nothing -> return HookSuccess Just command -> ifM (liftIO $ boolSystem "sh" [Param "-c", Param command]) - ( return Nothing - , commandfailed $ "git configured command '" ++ command ++ "'" + ( return HookSuccess + , return $ HookFailed $ "git configured command '" ++ command ++ "'" ) - commandfailed c = return $ Just c -runAnnexPathHook :: String -> Git.Hook -> (GitConfig -> Maybe String) -> OsPath -> Annex Bool +runAnnexPathHook :: String -> Git.Hook -> (GitConfig -> Maybe String) -> OsPath -> Annex HookResult runAnnexPathHook pathtoken hook commandcfg p = ifM (doesAnnexHookExist hook) ( runhook , runcommandcfg ) where - runhook = inRepo $ Git.runHook boolSystem hook [ File p' ] + runhook = ifM (inRepo $ Git.runHook boolSystem hook [ File p' ]) + ( return HookSuccess + , do + h <- fromRepo (Git.hookFile hook) + return $ HookFailed $ fromOsPath h + ) runcommandcfg = commandcfg <$> Annex.getGitConfig >>= \case - Nothing -> return True - Just basecmd -> liftIO $ - boolSystem "sh" [Param "-c", Param $ gencmd basecmd] + Nothing -> return HookSuccess + Just basecmd -> + ifM (liftIO $ boolSystem "sh" [Param "-c", Param (gencmd basecmd)]) + ( return HookSuccess + , return $ HookFailed $ "git configured command '" ++ basecmd ++ "'" + ) gencmd = massReplace [ (pathtoken, shellEscape p') ] p' = fromOsPath p diff --git a/Annex/Init.hs b/Annex/Init.hs index 81b07b54d1..64c924fd04 100644 --- a/Annex/Init.hs +++ b/Annex/Init.hs @@ -19,6 +19,7 @@ module Annex.Init ( uninitialize, probeCrippledFileSystem, probeCrippledFileSystem', + isCrippledFileSystem, ) where import Annex.Common @@ -75,10 +76,10 @@ data InitializeAllowed = InitializeAllowed checkInitializeAllowed :: (InitializeAllowed -> Annex a) -> Annex a checkInitializeAllowed a = guardSafeToUseRepo $ noAnnexFileContent' >>= \case Nothing -> runAnnexHook' preInitAnnexHook annexPreInitCommand >>= \case - Nothing -> do + HookSuccess -> do checkSqliteWorks a InitializeAllowed - Just failedcommanddesc -> do + HookFailed failedcommanddesc -> do initpreventedby failedcommanddesc notinitialized Just noannexmsg -> do @@ -94,8 +95,8 @@ checkInitializeAllowed a = guardSafeToUseRepo $ noAnnexFileContent' >>= \case initializeAllowed :: Annex Bool initializeAllowed = noAnnexFileContent' >>= \case Nothing -> runAnnexHook' preInitAnnexHook annexPreInitCommand >>= \case - Nothing -> return True - Just _ -> return False + HookSuccess -> return True + HookFailed _ -> return False Just _ -> return False noAnnexFileContent' :: Annex (Maybe String) @@ -288,73 +289,116 @@ isInitialized :: Annex Bool isInitialized = maybe Annex.Branch.hasSibling (const $ return True) =<< getVersion {- A crippled filesystem is one that does not allow making symlinks, - - or removing write access from files. -} -probeCrippledFileSystem :: Annex Bool -probeCrippledFileSystem = withEventuallyCleanedOtherTmp $ \tmp -> do - (r, warnings) <- probeCrippledFileSystem' tmp + - or removing write access from files. + - + - This displays messages about problems detected with the filesystem. + - + - If a freeze or thaw hook is configured, but exits nonzero, + - this returns Nothing after displaying a message to the user about the + - problem. Such a hook can in some cases make a filesystem + - that would otherwise be detected as crippled work ok, so this avoids + - a false positive. + -} +probeCrippledFileSystem :: Annex (Maybe Bool) +probeCrippledFileSystem = do + (r, warnings) <- isCrippledFileSystem' + mapM_ (warning . UnquotedString) warnings + return r + +isCrippledFileSystem :: Annex Bool +isCrippledFileSystem = do + (r, _warnings) <- isCrippledFileSystem' + return (fromMaybe True r) + +isCrippledFileSystem' :: Annex (Maybe Bool, [String]) +isCrippledFileSystem' = withEventuallyCleanedOtherTmp $ \tmp -> + probeCrippledFileSystem' tmp (Just (freezeContent' UnShared)) (Just (thawContent' UnShared)) =<< hasFreezeHook - mapM_ (warning . UnquotedString) warnings - return r probeCrippledFileSystem' :: (MonadIO m, MonadCatch m) => OsPath - -> Maybe (OsPath -> m ()) - -> Maybe (OsPath -> m ()) + -> Maybe (OsPath -> m HookResult) + -> Maybe (OsPath -> m HookResult) -> Bool - -> m (Bool, [String]) + -> m (Maybe Bool, [String]) #ifdef mingw32_HOST_OS -probeCrippledFileSystem' _ _ _ _ = return (True, []) +probeCrippledFileSystem' _ _ _ _ = return (Just True, []) #else probeCrippledFileSystem' tmp freezecontent thawcontent hasfreezehook = do let f = tmp </> literalOsPath "gaprobe" liftIO $ F.writeFile' f "" - r <- probe f - void $ tryNonAsync $ (fromMaybe (liftIO . allowWrite) thawcontent) f + r <- freezethaw f probe liftIO $ removeFile f return r where - probe f = catchDefaultIO (True, []) $ do + fallbackfreezecontent f = do + liftIO $ preventWrite f + return HookSuccess + + fallbackthawcontent f = do + liftIO $ allowWrite f + return HookSuccess + + freezethaw f cont = + (fromMaybe fallbackfreezecontent freezecontent) f >>= \case + HookFailed failedcommanddesc -> + return (Nothing, [hookfailed failedcommanddesc]) + HookSuccess -> do + r <- cont f + tryNonAsync ((fromMaybe fallbackthawcontent thawcontent) f) + >>= return . \case + Right (HookFailed failedcommanddesc) -> + let (_, warnings) = r + in (Nothing, hookfailed failedcommanddesc : warnings) + _ -> r + + hookfailed failedcommanddesc = "Failed to run " ++ failedcommanddesc + ++ ". Unable to initialize until this is fixed." + + probe f = catchDefaultIO (Just True, []) $ do let f2 = f <> literalOsPath "2" (Diff truncated)
Added a comment: Standalone rpms not available
diff --git a/doc/install/rpm_standalone/comment_4_a80fa98172357a2d20a160c186e9372d._comment b/doc/install/rpm_standalone/comment_4_a80fa98172357a2d20a160c186e9372d._comment new file mode 100644 index 0000000000..08ac12a8e0 --- /dev/null +++ b/doc/install/rpm_standalone/comment_4_a80fa98172357a2d20a160c186e9372d._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="zhunting" + avatar="http://cdn.libravatar.org/avatar/1439e56826a7befaefc79f66eef9d835" + subject="Standalone rpms not available" + date="2025-05-27T15:55:31Z" + content=""" +Hello, are the RPMs no longer being published, it doesn't seem to be available anymore. +"""]]
comment
diff --git a/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_8_c1b434d222c81514590461fa9fd23c01._comment b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_8_c1b434d222c81514590461fa9fd23c01._comment new file mode 100644 index 0000000000..66eca71e0c --- /dev/null +++ b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_8_c1b434d222c81514590461fa9fd23c01._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2025-05-27T14:14:56Z" + content=""" +Was the repository on the NTFS drive or on the WSL side (ext4 or whatever)? +"""]]
Added a comment
diff --git a/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_7_4b0662ed3467cd46c3332d52ad30cef4._comment b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_7_4b0662ed3467cd46c3332d52ad30cef4._comment new file mode 100644 index 0000000000..43e189ac52 --- /dev/null +++ b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_7_4b0662ed3467cd46c3332d52ad30cef4._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 7" + date="2025-05-27T10:57:18Z" + content=""" +FTR, Yukai reported that it was \"WSL (Ubuntu 22.04, x86_64) on Windows 10. Git version was 2.34.1, git-annex version was 10.20230407-1~ndall+1.\". So definitely not a trivial/typical setup ;) I do not remember when it was that I have tried git-annex under WSL. +"""]]
diff --git a/doc/bugs/proot_info__58___vpid_1__58___terminated_with_signal_4.mdwn b/doc/bugs/proot_info__58___vpid_1__58___terminated_with_signal_4.mdwn new file mode 100644 index 0000000000..9bad084bec --- /dev/null +++ b/doc/bugs/proot_info__58___vpid_1__58___terminated_with_signal_4.mdwn @@ -0,0 +1,46 @@ +### Please describe the problem. +I get this error when I try to run git annex sync on my amazon fire tablet. +`proot info: vpid 1: terminated with signal 4` + +``` +$ cat /proc/cpuinfo +processor : 0 +Processor : ARMv7 Processor rev 3 (v7l) +model name : ARMv7 Processor rev 3 (v7l) +BogoMIPS : 32.19 +Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm aes pmull sha1 sha2 crc32 +CPU implementer : 0x41 +CPU architecture: 7 +CPU variant : 0x0 +CPU part : 0xd03 +CPU revision : 3 + +processor : 1 +Processor : ARMv7 Processor rev 3 (v7l) +model name : ARMv7 Processor rev 3 (v7l) +BogoMIPS : 26.00 +Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm aes pmull sha1 sha2 crc32 +CPU implementer : 0x41 +CPU architecture: 7 +CPU variant : 0x0 +CPU part : 0xd03 +CPU revision : 3 + +Hardware : MT8163 +Revision : 0000 +Serial : 84b2d1e8651995fc +``` + +`uname -m` reports `armv7l` + +If I do `proot-distro login debian` then I can use the very same git-annex.linux and it works but it's slow (idk why). If I try to use git annex from termux then it fails with that error. Not that it matters, I'm using termux from here https://sourceforge.net/projects/android-ports-for-gnu-emacs/files/termux/ which is "a version of the Termux terminal emulator signed with +Emacs's signing keys" so that Emacs can use the termux binaries or something. + +### What version of git-annex are you using? On what operating system? +Android 9. LineageOS. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +So, I've been trying to uss it for years. Kind of like how I tried using Emacs and went bankrupt twice before things clicked. I was so hung up on the symlinks. Then I finally understood some things, thanks Gemini, and now I'm using annex.addunlocked and annex.thin and I feel really nerdy and cool. I would love to get this working on my tablet too. + +Thanks!
Added a comment
diff --git a/doc/bugs/git-annex__58_____60__stdout__62____58___hPutBuf__58___resource_vanished/comment_3_eaa0748eed9398ba59d49b3387ac4a82._comment b/doc/bugs/git-annex__58_____60__stdout__62____58___hPutBuf__58___resource_vanished/comment_3_eaa0748eed9398ba59d49b3387ac4a82._comment new file mode 100644 index 0000000000..413b654eb7 --- /dev/null +++ b/doc/bugs/git-annex__58_____60__stdout__62____58___hPutBuf__58___resource_vanished/comment_3_eaa0748eed9398ba59d49b3387ac4a82._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="pierreay" + avatar="http://cdn.libravatar.org/avatar/c1c640f9f581daaf2d9dedff2b84b614" + subject="comment 3" + date="2025-05-26T19:38:41Z" + content=""" +Thank you @joey and @mak for the hints. +I was unable (even with strace) for chase down the particular system call that cause the issue, since it is highly random and generated by my shell prompt. +However, mak seems right about the git_status module of starship. I increased my timeout from 500ms (default) to 1000ms (and will try slightly larger value if needed). +If this mitigate the issue correctly, I will not comment anymore! But I suspect it to work. ;) +"""]]
diff --git a/doc/forum/What_does___34__upgrade__34___in_the_webapp_do__63__.mdwn b/doc/forum/What_does___34__upgrade__34___in_the_webapp_do__63__.mdwn new file mode 100644 index 0000000000..abaa2f63d3 --- /dev/null +++ b/doc/forum/What_does___34__upgrade__34___in_the_webapp_do__63__.mdwn @@ -0,0 +1,3 @@ +The webapp showed "(metadata only)" behind a repository. Running `git annex upgrade` in the repositories didn't change that. I had top to "upgrade" the repository under "edit" in the webapp to fix that. + +What did upgrading in the webapp do, that running `git annex upgrade` did not?
Added a comment
diff --git a/doc/forum/tell_assistant_to_wait_5_mins_before_commiting__63__/comment_4_26bd8ac09928b5d9bf5eec2f254a8520._comment b/doc/forum/tell_assistant_to_wait_5_mins_before_commiting__63__/comment_4_26bd8ac09928b5d9bf5eec2f254a8520._comment new file mode 100644 index 0000000000..c7738312d5 --- /dev/null +++ b/doc/forum/tell_assistant_to_wait_5_mins_before_commiting__63__/comment_4_26bd8ac09928b5d9bf5eec2f254a8520._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jnkl" + avatar="http://cdn.libravatar.org/avatar/2ab576f3bf2e0d96b1ee935bb7f33dbe" + subject="comment 4" + date="2025-05-26T15:30:01Z" + content=""" +Thank you very much! +"""]]
dup
diff --git a/doc/todo/fail_on_encfs_more_informatively_or_do_not_fail_.mdwn b/doc/todo/fail_on_encfs_more_informatively_or_do_not_fail_.mdwn index 22722b5a53..d8923e8ba3 100644 --- a/doc/todo/fail_on_encfs_more_informatively_or_do_not_fail_.mdwn +++ b/doc/todo/fail_on_encfs_more_informatively_or_do_not_fail_.mdwn @@ -58,3 +58,5 @@ git-annex: cannot determine uuid for origin (perhaps you need to run "git annex ``` which is simply due to the fact that git-annex does not only unable to parse, it is unable to connect. But if so, IMHO ideally it should avoid claiming anything about git annex installation there. + +> Closing as duplicate of the other post, which did get though. [[done]] --[[Joey]]
close dup todo
diff --git a/doc/todo/Recent_remote_activities/comment_3_f7d67710fc0d880335220a3d9e3ec11d._comment b/doc/todo/Recent_remote_activities/comment_3_f7d67710fc0d880335220a3d9e3ec11d._comment new file mode 100644 index 0000000000..b497e0ed0d --- /dev/null +++ b/doc/todo/Recent_remote_activities/comment_3_f7d67710fc0d880335220a3d9e3ec11d._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-05-23T19:44:03Z" + content=""" +Basically the same todo previously: [[todo/show_time_of_last_interaction_with_a_repo]] + +I'll close that one in favor of this new one. The old one did have some +ideas about using groups to manually track activity, and a way to use +`git-annex expire` to list recently fsked repos. +"""]] diff --git a/doc/todo/show_time_of_last_interaction_with_a_repo.mdwn b/doc/todo/show_time_of_last_interaction_with_a_repo.mdwn index 315fdf91f8..57e49d1b4f 100644 --- a/doc/todo/show_time_of_last_interaction_with_a_repo.mdwn +++ b/doc/todo/show_time_of_last_interaction_with_a_repo.mdwn @@ -1 +1,4 @@ When [[`git-annex-info`|git-annex-info]] lists repos, it can be unclear which ones are still "active". It would help if the info command showed the time of last interaction for each repo. Seems like the code to determine that already exists in [[`git-annex-expire`|git-annex-expire]]? + +> Closing as a duplicate, since there is a newer todo +> [[show_time_of_last_interaction_with_a_repo]]. --[[Joey]]
correction
diff --git a/doc/todo/migration_to_VURL_by_default.mdwn b/doc/todo/migration_to_VURL_by_default.mdwn index 326967b90d..69db574059 100644 --- a/doc/todo/migration_to_VURL_by_default.mdwn +++ b/doc/todo/migration_to_VURL_by_default.mdwn @@ -15,9 +15,10 @@ transferring the content between repositories that it's not possible to verify it. > This would need a way to migrate from URL key to VURL key. -> Currently, `git-annex migrate` of an URL key defaults to using the -> default hashing backend. And adding `--backend=VURL` does not work. -> --[[Joey]] +> +> > Oh, I was wrong, that does exist already, just `git-annex migrate +> > --backend=VURL` works for URL keys. (Content must be present of course +> > or no migration is done). --[[Joey]] Of course if users want to continue to use their existing URL keys and not be able to verify content, that's fine. Users can also choose to use
Added a comment
diff --git a/doc/forum/Archive_group_with_special_repositories/comment_1_7581154b7208df9444e78e0e701eebe5._comment b/doc/forum/Archive_group_with_special_repositories/comment_1_7581154b7208df9444e78e0e701eebe5._comment new file mode 100644 index 0000000000..1d39ddf147 --- /dev/null +++ b/doc/forum/Archive_group_with_special_repositories/comment_1_7581154b7208df9444e78e0e701eebe5._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://tokariew.id.fedoraproject.org/" + nickname="tokariew" + avatar="http://cdn.libravatar.org/avatar/fcff1d07fd8c44bf9004540658358a6b" + subject="comment 1" + date="2025-05-23T11:05:27Z" + content=""" +Reorganized my idea, made git annex repo inside of Pictures folder, but set `annex.addunlocked = true` +I avoid symlinks, and COW filesystem don't care about duplicates +"""]]
comment
diff --git a/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_2_e9eb7c0ac4d1a87f3808a08f960e466d._comment b/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_2_e9eb7c0ac4d1a87f3808a08f960e466d._comment new file mode 100644 index 0000000000..e757c5669b --- /dev/null +++ b/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_2_e9eb7c0ac4d1a87f3808a08f960e466d._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-05-22T19:20:15Z" + content=""" +Filed a bug report on git, with a testcase that does not need git-annex: + +<https://lore.kernel.org/git/aC90kn2mE93DCJEH@kitenet.net/T/#u> +"""]]
git bug
diff --git a/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_1_cb16c170d49b628cca0c76d8843b9f52._comment b/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_1_cb16c170d49b628cca0c76d8843b9f52._comment new file mode 100644 index 0000000000..2be1bd3099 --- /dev/null +++ b/doc/bugs/git_diff_in_adj_unlock_reports_diff_for_empty_file/comment_1_cb16c170d49b628cca0c76d8843b9f52._comment @@ -0,0 +1,49 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-05-22T17:52:10Z" + content=""" +A simplified test case, which works on any filesystem, not only crippled +filesystems: + + #!/bin/sh + git init r + cd r + git annex init + git commit -m initial --allow-empty + git-annex adjust --unlock + touch emptyfile + git annex add emptyfile + git diff + +The adjusted branch is not even needed. `git-annex add emptyfile` +followed by `git-annex unlock emptyfile` has the same result. + +In this case, `git diff` is running the `git-annex smudge --clean` +filter every time. Which IIRC is a bug of some kind with git when +smudging empty files. + +I've verified that `git-annex smudge --clean` behaves corretly. +It outputs the same annex link that was already staged. So git diff is +choosing for whatever reason to ignore what it output, and using "" +as the content of the file instead. + +So, I think this is a git bug, which git-annex cannot work around. + +See also [[bugs/Empty_files_make_git_status_slow]] which is about +the repeated and unncessary running of the smudge filter on empty files. +There I hypothesize that git treats 0 size in the index as an indication that it +doesn't know about the file, so generally mishandles empty files. + +And see also [[bugs/resolvemerge_fails_when_unlocked_empty_files_exist]] +where I identified a related git bug, where an empty unlocked file causes +git to crash with an internal error, and reported it to the git developers. +Unfortunately, nobody ever responded to my bug report. + +Perhaps the thing to do is for git-annex to refuse to store an empty file +as an unlocked file. It could still use annex symlinks for locked empty files, +but unlocking would necessarily switch to an empty file stored in git +the usual way. Unfortunately, that would make reverse adjusting an unlocked +branch not know if the file was intended to be annexed or not. Also, it doesn't +help for any repositories that already contain unlocked empty files. +"""]]
Added a comment
diff --git a/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_6_2f226e2bb6fcb1040ba8e645603607a1._comment b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_6_2f226e2bb6fcb1040ba8e645603607a1._comment new file mode 100644 index 0000000000..972b2b20e8 --- /dev/null +++ b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_6_2f226e2bb6fcb1040ba8e645603607a1._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 6" + date="2025-05-22T18:23:36Z" + content=""" +Somewhat unrelated and I feel like I might have even proposed smth like that -- wouldn't it be useful if git-annex did add its version and potentially filesystem detail (if cheaply known) within its commit message to `git-annex` branch? unless `annex forgotten` later (and forgetting could summarize all the versions and filesystems used to that point), could have been useful here, or not? + +FWIW, sent a few related questions on versions etc to the author of the commit which introduced that file. +"""]]
comment
diff --git a/doc/bugs/git-annex_add_behaves_differently_from_git_on_ACL/comment_1_5cff2db1646582f9e945bcf705b45c63._comment b/doc/bugs/git-annex_add_behaves_differently_from_git_on_ACL/comment_1_5cff2db1646582f9e945bcf705b45c63._comment new file mode 100644 index 0000000000..388f3f7eca --- /dev/null +++ b/doc/bugs/git-annex_add_behaves_differently_from_git_on_ACL/comment_1_5cff2db1646582f9e945bcf705b45c63._comment @@ -0,0 +1,46 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-05-22T17:07:32Z" + content=""" +> (datalad) [f006rq8@discovery-01 ds-perms]$ ls -l by-git-* +> -rwxrwx---+ 1 f006rq8 rc-DBIC 5 Oct 16 15:05 by-git-add +> -rwxrwx---+ 1 f006rq8 rc-DBIC 5 Oct 16 15:05 by-git-annex-add + +git-annex is seeing these files as executable for the same reason that `ls` +displays them as having `x` set. `stat()` is getting populated with values +based on the ACLs. + +I was able to reproduce that with `setfacl -m user::rwx-`, +run on a regular ext4 filesystem. Doing that to a file makes `ls` +display the owner x bit, as well as "+". + +But then, `git add` added the file as executable too. +So `git add` and `git-annex add` are behaving the same for me with ACLs. + + joey@darkstar:~/tmp/acl>touch foo + joey@darkstar:~/tmp/acl>touch bar + joey@darkstar:~/tmp/acl>setfacl -m user::rwx- foo + joey@darkstar:~/tmp/acl>setfacl -m user::rwx- bar + joey@darkstar:~/tmp/acl>git config 'annex.largefiles' 'nothing' + joey@darkstar:~/tmp/acl>git add foo + joey@darkstar:~/tmp/acl>git-annex add bar + joey@darkstar:~/tmp/acl>git diff --cached + diff --git a/foo b/foo + new file mode 100755 + index 0000000..e69de29 + diff --git a/bar b/bar + new file mode 100755 + index 0000000..e69de29 + +My guess is that something about your specific ACLs or your filesystem +is making git behave differently. Perhaps it's using a different variant +of the stat syscall which behaves differently than the stat git-annex does +in your specific situation somehow. + +With the x acl set, and without the x bit manually set, I am able to actually +execute the files. So it seems to me, if git chose to add the file without +the exeucte bit set, that would be a bug in git? After all, if I have a build +system that relies on executing a file that I can execute it, checking the file +into git and cloning should let me execute the file in the clone. +"""]]
comment
diff --git a/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_5_fb58126317c23c1710b3eb50102c3bd5._comment b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_5_fb58126317c23c1710b3eb50102c3bd5._comment new file mode 100644 index 0000000000..d1b9636e48 --- /dev/null +++ b/doc/bugs/keeps_trying_to_commit_file_unlocked/comment_5_fb58126317c23c1710b3eb50102c3bd5._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-05-22T16:59:42Z" + content=""" +So as far as I know this bug can only happen if something causes git to +lose the symlink bit. Which would be a git bug, or perhaps some misbehavior +on a fileystem like FAT? + +Since git-annex's behavior is to stage a change that fixes the file to be a +proper annex pointer file, a user who encounters whatever this is only has +to make a commit to get out of the weird situation. + +Unless we have a repeatable way for that to happen, that is not a git bug, +it's hard for me to justify making git-annex slow in order to deal with it +better. +"""]]