Recent comments posted to this site:
When trying to change a remote to the new rclone special remote (from type=external externaltype=rclone
), I encountered this:
$ git annex enableremote halde-pcloud type=rclone
enableremote halde-pcloud
git-annex: getRemoteConfigValue externaltype found value of unexpected type PassedThrough. This is a bug in git-annex!
CallStack (from HasCallStack):
error, called at ./Annex/SpecialRemote/Config.hs:192:28 in main:Annex.SpecialRemote.Config
getRemoteConfigValue, called at ./Remote/External.hs:920:35 in main:Remote.External
failed
enableremote: 1 failed
(The reason I tried it this way is that I didn't want to lose the encrypted files (encryption=shared
))
but may be it is actually a separate issue of the unlocked mode since it does drop the file
reprostim@reproiner:/data/reprostim$ find .git/annex -iname *377.mkv
.git/annex/objects/Qp/XF/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
.git/annex/objects/Qp/XF/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
reprostim@reproiner:/data/reprostim$ git annex drop Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
drop Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv (locking rolando...) ok
(recording state in git...)
reprostim@reproiner:/data/reprostim$ find .git/annex -iname *377.mkv
reprostim@reproiner:/data/reprostim$ cat Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
/annex/objects/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
but then when I get it, it does not actually copy into the tree:
reprostim@reproiner:/data/reprostim$ git annex get --json Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
{"command":"get","error-messages":[],"file":"Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv","input":["Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv"],"key":"MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv","note":"from rolando...","success":true}
reprostim@reproiner:/data/reprostim$ cat Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
/annex/objects/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
reprostim@reproiner:/data/reprostim$ find .git/annex -iname *377.mkv
.git/annex/objects/Qp/XF/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
.git/annex/objects/Qp/XF/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
reprostim@reproiner:/data/reprostim$ cat .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[annex]
uuid = 9806a90e-4cdd-48cb-b03d-7a113663fce7
version = 10
addunlocked = false
[filter "annex"]
smudge = git-annex smudge -- %f
clean = git-annex smudge --clean -- %f
process = git-annex filter-process
[remote "rolando"]
url = bids@rolando.cns.dartmouth.edu:VIDS/
fetch = +refs/heads/*:refs/remotes/rolando/*
annex-uuid = 285d851e-77a8-4d31-b24c-fa72deb4d3cc
[branch "master"]
remote = rolando
merge = refs/heads/master
reprostim@reproiner:/data/reprostim$ git annex version
git-annex version: 10.20240831-1~ndall+1
I got back to this issue, since even after upgrade of git-annex to 10.20240831-1~ndall+1
and trying on a sample file which I guess was screwed up
reprostim@reproiner:/data/reprostim$ git annex get --json Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
reprostim@reproiner:/data/reprostim$ git annex find --in here
Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
reprostim@reproiner:/data/reprostim$ ls -lL Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
-rw-r--r-- 2 reprostim reprostim 72 Sep 5 10:42 Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
reprostim@reproiner:/data/reprostim$ cat Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
/annex/objects/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
so, I need to figure out how to actually get that key/file here.
The reason for reflink=always is that git-annex wants it to fail when reflink is not supported and the copy is going to be slow. Then it falls back to copying the file itself, which allows an interrupted copy of a large file to be resumed, rather than restarted from the beginning as cp would do when it's not making a reflink.
So, at first it seemed to me that the solution will need to involve
git-annex using copy_file_range
itself.
But, git-annex would like to checksum the file as it's copying it (unless
annex.verify is not set), in order to avoid needing to re-read it to hash it
after the fact, which would double the disk IO in many cases.
Using copy_file_range
by default would prevent git-annex from doing that.
So it needs to either be probed, or be a config setting. And whichever way
git-annex determines it, it may as well use cp reflink=auto
then
rather than using copy_file_range
itself.
I'd certainly rather avoid a config setting if I can. But if this is specific to NFS on ZFS, I don't know what would be a good way to probe for that? Or is this happening on NFS when not on ZFS as well?
appendonly=yes
for the special directory remote would likely help in my scenario.
I have old readonly backup media, say something like
tapeA1/apples.txt
tapeA2/apples.txt
tapeB1/earth.svg
tapeB2/earth.svg
I use git-annex special directory remotes to be able to navigate the directory tree that lives on those media (e.g. to decide if and which media I need to find to copy a file from that I need). I added the remotes like so (they are too big to import with content):
git annex initremote tapeA1 type=directory directory=/tapes/tapeA1 encryption=none importtree=yes
git annex import master:tapeA1 --from tapeA1 --no-content
git annex merge --allow-unrelated-histories tapeA1/main
At some point I may buy new hardware and recreate those backup media as proper git-annex remotes, but wouldn't it be great to keep the existing backups as long as they show no sign of bitrot and together hold enough copies?
Though, git-annex fsck behaves unexpected: It seems I cannot force trust these remotes nor does --numcopies=0 --mincopies=0
have the desired effect.
Concretely, when calling git annex fsck --from=tapeA1 --numcopies=0 --mincopies=0 --trust=tapeA1 --force
,
for every file that is still intact on tapeA1, git-annex fsck reports a failure as follows
fsck tapeA1/apples.txt
Only these untrusted locations may have copies of tapeA1/apples.txt
abc-def-ghi -- [tapeA1]
Back it up to trusted locations with git-annex copy.
failed
while I'd be happy to (semi)trust tapeA1 or to accept no copies whatsoever. So fsck ignores --trust=tapeA1 --force
and/or --numcopies=0 --mincopies=0
which are common git-annex options that should work for fsck?
Ideally, I would be able to (semi)trust my readonly tape remotes (which likely should be behind a --force
as it may lead to data loss in classical directory remote settings). Then I can use git-annex to index those tapes, but also to monitor their health via fsck (so I can over the years replace the tapes that are showing signs of corruption).
As for the corruption, I emulated bitrot on a test directory remote, which then leads to a fsck failure as follows:
fsck tapeB2/earth.svg
verification of content failed
(checksum...)
tapeB2/earth.svg: Bad file content; failed to drop fromtapeB2: dropping content from this remote is not supported because it is configured with importtree=yes
This suffices to detect tapes that should be replaced, and it's kinda expected that files cannot be dropped.
Somehow fsck does not work as I would expect -- am I misunderstanding the numcopies/mincopies arguments here? Is there really no way to force-trust a directory remote, which to me seems appropriate in this case? Is there another way to achieve what I have in mind with git-annex?
Thanks for this great piece of software – also use the assistant in another day-to-day usecase and it's simply great!
I'm also getting list borg failed
when I run git annex sync borg
. In my case, syncing succeeds after creating the first borg archive but fails when the borg repo contains a second archive.
I'm running:
- git-annex 10.20240731
- borg 1.4.0
- NixOS 24.11.20240821.c374d94 (Vicuna)
To reproduce this problem:
borg init --encryption=keyfile /path/to/borgrepo
git annex initremote borg type=borg borgrepo=/path/to/borgrepo
borg create /path/to/borgrepo::archive1 `pwd`
git annex sync borg
git annex add newfile
borg create /path/to/borgrepo::archive2 `pwd`
git annex sync borg
From the debug output the first time running git-annex sync
, the only ExitFailure line:
[2024-08-28 19:13:31.056388087] (Utility.Process) process [79595] done ExitFailure 1
ok
And the first appearance of process 79595:
[2024-08-28 19:13:31.011783181] (utility.process) process [79595] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","commit","-a","-m","git-annex in user@nixos:~/sandbox/gr"]
Only once, after running the command a second time, I got the following additional lines:
[2024-08-28 19:48:41.942245332] (Utility.Process) process [122585] read: borg ["list","--format","{size}{NUL}{path}{NUL}{extra}{NUL}","/home/user/sandbox/br::archive2",""]
...
borg list: error: argument PATH: Empty strings are not accepted as paths.
[2024-08-28 19:48:42.296294751] (Utility.Process) process [122585] done ExitFailure 2
I have set LANG=C
and git annex enableremote borg subdir=
as suggested in this thread to no avail.
Thanks in advance for your help! I have used and loved git-annex for years and am very thankful for the work Joey and others have put into it. I'm planning to buy a git-annex backpack soon.
Fixed that.
It kind of seems like metadata could have an option to get the metadata for a specific ref as well, but since it already has --branch which takes a branch ref, adding a --ref which takes a file ref seems confusing. Maybe --fileref? There are a decent number of other commands that also use parseKeyOptions to support --branch/--key/--all that would also get the new option if it were implemented.
Here are a few pointers for switching from
git-annex-remote-rclone
(old helper program) torclone gitannex
(rclone's builtin support):rcloneprefix
(directory relative to the rclone remote (rclone term here)) andrclonelayout
(layout of the git-annex content therein). If you set it up just like ingit-annex-remote-rclone
's README, those aregit-annex
andlower
.git remote rename my_rclone_remote my_rclone_remote.old; git annex renameremote my_rclone_remote my_rclone_remote.old
git annex initremote my_rclone_remote --sameas=my_rclone_remote.old type=rclone rcloneremotename=my_rclone_remote rcloneprefix=git-annex rclonelayout=lower
It might be possible to just change the type of the remote but at the time I'm writing this, that didn't work so I renamed the old remote and created a new one, with
--sameas
to not lose any encryption settings.