I thought I was getting the hang of annex, but have run into a bit of a problem. I could use some help ensuring that everything ends up in the right place.
Specifically, it seems like sync does a lot more than push and pull files - that it might actually try to drop things from remotes (at least with the -a) command.
I have two machines I work on, and should only have active content on them. I have two special remotes (S3/wasabi) that should have everything that's ever been annexed, including old versions of files.
If I git annex sync -a -A
then it will pull all versions locally as well. So I think I may have to separate the get and copy commands?
Here's what I'm doing so far:
git annex config --set annex.synccontent false
git annex config --set annex.synconlyannex true
git annex config --set annex.autocommit false
git annex group wasabi-east wasabi
git annex group wasabi-west wasabi
git annex groupwanted wasabi anything
git annex required wasabi-east groupwanted
git annex required wasabi-west groupwanted
git annex group machine1 active
git annex group machine2 active
git annex groupwanted active anything
# from machine1
git annex sync -a origin machine2 wasabi-east wasabi-west
git annex get -a
for remote in "wasabi-east wasabi-west"
do
git annex copy -A --not --in $remote -t $remote
done
I think that's what I need to do? I don't think I can use git annex sync -a -A wasabi-east wasabi-west
because I don't want to pull old versions to my local machine.
The
unused
preferred content expression is probably what you're looking for.As for your second problem, add your client repositories to a group aswell and make them only want
present
,approxlackingcopies=1
or something along those lines. That will stopsync --content
from trying to pull down everything.sync --content
will certianly remove files from a repository when the preferred content settings for that repository indicate it should not contain that content.When you use
sync --all
, a preferred content setting like"include=*"
or"exclude=*"
will only ever match files in the current working tree, not past versions of files.So, if the remote has such a preferred content expression,
sync --all --content
will remove the past versions of files from it.The way to avoid this behavior is to use a preferred content expression that does not match on the filename. Eg,
"anything"
. Or don't set a preferred content expression in the first place.hrm… as you can see in my post, I AM using “anything” as the wanted content. So I would expect all of the remotes (wasabi and machines) to get all of the file versions. But that’s not happening. It’s behaving more like “used” would.
I will try “anything or unused” despite the fact that it seems like “or unused” should be unnecessary.
I think I've set up what you wanted.. And I think enroute I started to understand the problem you were having.
Both of the wasabis want all content whether it's unused or not. So I left their preferred and required content settings unchanged since that's the default. ("anything" would have the same effect). To make the local repository want to not hang onto unused content I used:
With that,
git-annex sync --content --all wasabi-east
would copy an unused key to wasabi-east. But then it would drop it from the local repository. So a subsequent sync with wasabi-west was not able to send a copy to there, because it's already been removed from the local repo. I think perhaps that is the problem you were having?A workaround is to sync with both at once, like you have been doing:
But if you forget to sync with both at the same time, or if one of them is unreachable, you can end up with only one of them having a copy. Not great.
To avoid that problem, I set numcopies, to force there to be 2 copies at all times:
Now syncing with one of the wasabi remotes keeps the unused content locally present:
Once the content reaches the other wasabi remote too, it can drop the local copy:
But you also have two repositories that you work in. That complicates things a bit. Let's bring repository bar into the picture:
Now, when there's a file that is on foo and bar and wasabi-east, syncing with wasabi-west will copy it to there. But what if the file gets deleted, and we sync with wasabi-east before syncing with wasabi-west?
It dropped it because there are enough copies that it could. Now a sync with wasabi-west from bar won't send the content to it, since bar no longer has it. Since foo still has a copy, syncing with wasabi-west on foo will move it to there still. But this is perhaps suboptimal.
A better configuration, that avoids that problem, but is more complicated follows:
This forces the work repositories to hang onto unused keys until they reach all the collector repositories.
Thanks for looking into this, and explaining. Your final configuration makes sense to me... I can't say I fully understand why, but I need
-a
otherwise the local repo will get all versions, leaving me with a bunch of unused keys. So my command isgit annex sync -a -A wasabi-east wasabi-west
. I took the other machine out of the sync because sometimes it's offline and I don't want to wait around for an SSH timeout.I have tested that if I force drop a key from
wasabi-east
, that sync command will get the key fromwasabi-west
, and then copy it towasabi-east
. It doesn't automatically drop it locally, I have to dogit annex drop --unused
- but that's not a big deal.I would just like to be confident that the wasabi remotes are a lockbox for any keys added to my annex. So... I think it works? I'll keep using it and report back if I run into any weirdness.