Recent comments posted to this site:

Here are a few pointers for switching from git-annex-remote-rclone (old helper program) to rclone gitannex (rclone's builtin support):

  1. Figure out rcloneprefix (directory relative to the rclone remote (rclone term here)) and rclonelayout (layout of the git-annex content therein). If you set it up just like in git-annex-remote-rclone's README, those are git-annex and lower.
  2. Update rclone and git-annex
  3. Rename the old remote, git remote rename my_rclone_remote my_rclone_remote.old; git annex renameremote my_rclone_remote my_rclone_remote.old
  4. Create a new remote, copying the encryption settings: git annex initremote my_rclone_remote --sameas=my_rclone_remote.old type=rclone rcloneremotename=my_rclone_remote rcloneprefix=git-annex rclonelayout=lower

It might be possible to just change the type of the remote but at the time I'm writing this, that didn't work so I renamed the old remote and created a new one, with --sameas to not lose any encryption settings.

Comment by mike Thu Sep 12 15:40:24 2024
Funny, playing around with my own forgejo-aneksajo instance, I thought about exactly that 😀 Being able to encrypt only the annex but keeping the repo open would be cool.
Comment by nobodyinperson Thu Sep 12 14:51:20 2024

When trying to change a remote to the new rclone special remote (from type=external externaltype=rclone), I encountered this:

$ git annex enableremote halde-pcloud type=rclone 
enableremote halde-pcloud 
git-annex: getRemoteConfigValue externaltype found value of unexpected type PassedThrough. This is a bug in git-annex!
CallStack (from HasCallStack):
  error, called at ./Annex/SpecialRemote/Config.hs:192:28 in main:Annex.SpecialRemote.Config
  getRemoteConfigValue, called at ./Remote/External.hs:920:35 in main:Remote.External
failed
enableremote: 1 failed

(The reason I tried it this way is that I didn't want to lose the encrypted files (encryption=shared))

Comment by mike Thu Sep 12 05:22:18 2024

but may be it is actually a separate issue of the unlocked mode since it does drop the file

reprostim@reproiner:/data/reprostim$ find .git/annex -iname *377.mkv
.git/annex/objects/Qp/XF/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
.git/annex/objects/Qp/XF/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
reprostim@reproiner:/data/reprostim$ git annex drop Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
drop Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv (locking rolando...) ok
(recording state in git...)
reprostim@reproiner:/data/reprostim$ find .git/annex -iname *377.mkv
reprostim@reproiner:/data/reprostim$ cat Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
/annex/objects/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv

but then when I get it, it does not actually copy into the tree:

reprostim@reproiner:/data/reprostim$ git annex get --json Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
{"command":"get","error-messages":[],"file":"Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv","input":["Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv"],"key":"MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv","note":"from rolando...","success":true}
reprostim@reproiner:/data/reprostim$ cat Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
/annex/objects/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
reprostim@reproiner:/data/reprostim$ find .git/annex -iname *377.mkv
.git/annex/objects/Qp/XF/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
.git/annex/objects/Qp/XF/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv
reprostim@reproiner:/data/reprostim$ cat .git/config
[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
[annex]
    uuid = 9806a90e-4cdd-48cb-b03d-7a113663fce7
    version = 10
    addunlocked = false
[filter "annex"]
    smudge = git-annex smudge -- %f
    clean = git-annex smudge --clean -- %f
    process = git-annex filter-process
[remote "rolando"]
    url = bids@rolando.cns.dartmouth.edu:VIDS/
    fetch = +refs/heads/*:refs/remotes/rolando/*
    annex-uuid = 285d851e-77a8-4d31-b24c-fa72deb4d3cc
[branch "master"]
    remote = rolando
    merge = refs/heads/master

reprostim@reproiner:/data/reprostim$ git annex version
git-annex version: 10.20240831-1~ndall+1
Comment by yarikoptic Thu Sep 5 14:52:51 2024

I got back to this issue, since even after upgrade of git-annex to 10.20240831-1~ndall+1 and trying on a sample file which I guess was screwed up

reprostim@reproiner:/data/reprostim$ git annex get --json Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv

reprostim@reproiner:/data/reprostim$ git annex find --in here
Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv

reprostim@reproiner:/data/reprostim$ ls -lL Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
-rw-r--r-- 2 reprostim reprostim 72 Sep  5 10:42 Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv

reprostim@reproiner:/data/reprostim$ cat Videos/2024/08/2024.08.30-11.31.56.000--2024.08.30-11.48.03.377.mkv
/annex/objects/MD5E-s20610854--4fa8311cf5fc0ea247dca2b0ae556bab.377.mkv

so, I need to figure out how to actually get that key/file here.

Comment by yarikoptic Thu Sep 5 14:49:06 2024

The reason for reflink=always is that git-annex wants it to fail when reflink is not supported and the copy is going to be slow. Then it falls back to copying the file itself, which allows an interrupted copy of a large file to be resumed, rather than restarted from the beginning as cp would do when it's not making a reflink.

So, at first it seemed to me that the solution will need to involve git-annex using copy_file_range itself.

But, git-annex would like to checksum the file as it's copying it (unless annex.verify is not set), in order to avoid needing to re-read it to hash it after the fact, which would double the disk IO in many cases. Using copy_file_range by default would prevent git-annex from doing that.

So it needs to either be probed, or be a config setting. And whichever way git-annex determines it, it may as well use cp reflink=auto then rather than using copy_file_range itself.

I'd certainly rather avoid a config setting if I can. But if this is specific to NFS on ZFS, I don't know what would be a good way to probe for that? Or is this happening on NFS when not on ZFS as well?

Comment by joey Thu Sep 5 12:56:27 2024
PS
If I am understanding the documentation of the borg special remote, then having something like appendonly=yes for the special directory remote would likely help in my scenario.
Comment by tapesafer Wed Sep 4 15:48:01 2024

I have old readonly backup media, say something like

  • tapeA1/apples.txt
  • tapeA2/apples.txt
  • tapeB1/earth.svg
  • tapeB2/earth.svg

I use git-annex special directory remotes to be able to navigate the directory tree that lives on those media (e.g. to decide if and which media I need to find to copy a file from that I need). I added the remotes like so (they are too big to import with content):

git annex initremote tapeA1 type=directory directory=/tapes/tapeA1 encryption=none importtree=yes
git annex import master:tapeA1 --from tapeA1 --no-content
git annex merge --allow-unrelated-histories tapeA1/main

At some point I may buy new hardware and recreate those backup media as proper git-annex remotes, but wouldn't it be great to keep the existing backups as long as they show no sign of bitrot and together hold enough copies?

Though, git-annex fsck behaves unexpected: It seems I cannot force trust these remotes nor does --numcopies=0 --mincopies=0 have the desired effect.

Concretely, when calling git annex fsck --from=tapeA1 --numcopies=0 --mincopies=0 --trust=tapeA1 --force, for every file that is still intact on tapeA1, git-annex fsck reports a failure as follows

fsck tapeA1/apples.txt
  Only these untrusted locations may have copies of tapeA1/apples.txt
        abc-def-ghi -- [tapeA1]
  Back it up to trusted locations with git-annex copy.
failed

while I'd be happy to (semi)trust tapeA1 or to accept no copies whatsoever. So fsck ignores --trust=tapeA1 --force and/or --numcopies=0 --mincopies=0 which are common git-annex options that should work for fsck?

Ideally, I would be able to (semi)trust my readonly tape remotes (which likely should be behind a --force as it may lead to data loss in classical directory remote settings). Then I can use git-annex to index those tapes, but also to monitor their health via fsck (so I can over the years replace the tapes that are showing signs of corruption).

As for the corruption, I emulated bitrot on a test directory remote, which then leads to a fsck failure as follows:

fsck tapeB2/earth.svg
  verification of content failed
(checksum...) 
  tapeB2/earth.svg: Bad file content; failed to drop fromtapeB2: dropping content from this remote is not supported because it is configured with importtree=yes

This suffices to detect tapes that should be replaced, and it's kinda expected that files cannot be dropped.

Somehow fsck does not work as I would expect -- am I misunderstanding the numcopies/mincopies arguments here? Is there really no way to force-trust a directory remote, which to me seems appropriate in this case? Is there another way to achieve what I have in mind with git-annex?

Thanks for this great piece of software – also use the assistant in another day-to-day usecase and it's simply great!

Comment by tapesafer Wed Sep 4 14:50:16 2024

I'm also getting list borg failed when I run git annex sync borg. In my case, syncing succeeds after creating the first borg archive but fails when the borg repo contains a second archive.

I'm running:

  • git-annex 10.20240731
  • borg 1.4.0
  • NixOS 24.11.20240821.c374d94 (Vicuna)

To reproduce this problem:

borg init --encryption=keyfile /path/to/borgrepo
git annex initremote borg type=borg borgrepo=/path/to/borgrepo
borg create /path/to/borgrepo::archive1 `pwd`
git annex sync borg
git annex add newfile
borg create /path/to/borgrepo::archive2 `pwd`
git annex sync borg

From the debug output the first time running git-annex sync, the only ExitFailure line:

[2024-08-28 19:13:31.056388087] (Utility.Process) process [79595] done ExitFailure 1
ok

And the first appearance of process 79595:

[2024-08-28 19:13:31.011783181] (utility.process) process [79595] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","commit","-a","-m","git-annex in user@nixos:~/sandbox/gr"]

Only once, after running the command a second time, I got the following additional lines:

[2024-08-28 19:48:41.942245332] (Utility.Process) process [122585] read: borg ["list","--format","{size}{NUL}{path}{NUL}{extra}{NUL}","/home/user/sandbox/br::archive2",""]
...
borg list: error: argument PATH: Empty strings are not accepted as paths.
[2024-08-28 19:48:42.296294751] (Utility.Process) process [122585] done ExitFailure 2

I have set LANG=C and git annex enableremote borg subdir= as suggested in this thread to no avail.

Thanks in advance for your help! I have used and loved git-annex for years and am very thankful for the work Joey and others have put into it. I'm planning to buy a git-annex backpack soon.

Comment by Rick Tue Sep 3 19:40:57 2024

Fixed that.

It kind of seems like metadata could have an option to get the metadata for a specific ref as well, but since it already has --branch which takes a branch ref, adding a --ref which takes a file ref seems confusing. Maybe --fileref? There are a decent number of other commands that also use parseKeyOptions to support --branch/--key/--all that would also get the new option if it were implemented.

Comment by joey Fri Aug 30 14:47:41 2024