I use a backup service which mirrors the contents of directories to The CloudTM. One of the dirs I make it mirror is a directory special remote with encryption and chunking which I move data into.
Now I would like to make GA aware of the second copy that the mirroring service creates of that special remote.
I use BTRFS as the underlying FS, so my idea is it to store the special remote inside a subvolume and then snapshot it before sending it off to mirror. (The same could probably be done with hardlinks as special remotes are read- and unlink-only.) This way I've got a local directory which represents the state of the remote mirror once the service's tool is done uploading it.
My idea was to then let GA know about it in the form of a second directory special remote to give me the correct numcopies count.
The problem I have run into here is that there does not seem to be a way to get GA to "import" pre-existing state of a directory special remote. I just can't get it to recognise the existance of the keys inside it.
I could probably hack up an something that could query the mirrored keys directly and maybe make a special remote out of that but the same problem would apply here as it's still a special remote that changes outside of GA's control.
Is there a way of doing what I want that I may have overlooked? Is there a better way of making GA aware of this external copy perhaps?
Thanks, - Atemu
Hi, I tested it with
encryption=shared
and it works, but not with chunking. When creating the 2nd remote, you have to look up the cipher of the first one withgit cat-file blob git-annex:remote.log
and pass it to initremote withcipher=
. Finally, after externally copying everything from the 1st remote to the 2nd remote, you have to rungit annex fsck --fast --all --from=remote2
to make git-annex aware of the copies.I think it should be possible to get this to work with chunking, if you have git-annex version 8.20201103 or newer, and if you configure the second special remote with the same chunk size.
git-annex records state about a special remote's chunks, and that state is not available for the second special remote. Which used to prevent accessing chunks when the information is not available, but that version made it fall back to trying chunks of the configured chunk size.
See the bug report that resulted in that change for details: ?strong> information in git-annex does not try chunks
Oh also this only works with keys that have a recorded size. Which is most of them, but git-annex addurl --fast adds keys without a recorded size.
An alternative you might consider is to use the --sameas flag to initremote when setting up the second remote. Then git-annex would consider the two remotes as one repository, which means it only considers them to be one copy, but also it can retrieve content from either.
If git-annex only had a way to treat a repository a more than 1 copy, that would do just what you want. I do think there might be the possibility to add such a feature, but it would need some thought. repositories that count as more than one copy
I used the exact same settings for the second special remote as the first one:
type=directory chunk=50MiB encryption=hybrid mac=HMACSHA256
.GA was 8.20200810 though because my server machine is built from the stable Nixpkgs channel; I will test that again with the most recent version tomorrow.
--sameas
won't help here; the special remotes are accessible via the same FS (the second is just a btrfs snapshot of the first) and they'd still only count as one copy. That's the same situation I have right now.Counting it as two copies would work but there is a large delay between having moved the files to the special remote and them actually being mirrored (residential internet upload) which means the numcopies of somewhat newly added files wouldn't be correct. It'd be a step up though.
It worked! Thank you so much you two!
The cipher was indeed different for some reason, what could cause that?