Does anybody know if it's possible to rsync files into an annex folder without constantly overwriting already annexed files? I know that the -L
option will de-reference links on the source side, but I don't see any option for doing the same on the target side.
I have two machines -- alpha and beta. Alpha doesn't know anything about git annex, but knows how to rsync. I have a folder on alpha that I regularly sync over to beta with:
rsync -av folder/ beta:/.../annex/folder
That way I know that I can eventually safely delete any material in the folder on alpha because it's been backed up/archived to my annex. The problem is, every time I add the files into the annex on beta, they get replaced with symlinks, which wouldn't be a problem except for the fact that now the next time I run the rsync command all of the files get retransmitted because they don't match the symlinks over on beta. This doesn't cause a problem on the git annex side, but it significantly slows down the rsync process.
Is this a case for an rsync remote? (I haven't really figured out special remotes yet.) Or is there a typical workflow on the git annex side that I could be using to fix this (like import
rather than add
)?
Thanks!
Right after posting this last night I came across this forum entry, which led me to the tip on how to create a cache annex, which eventually led me to a little more detail on the
annex.hardlink
andannex.thin
options.It sounds like they may kind of be what I'm looking for, but I'm not sure how to find more details (doc) on how to use them or specifically what they do. They aren't listed in supported options when I run (on v6.2):
I google
git annex.hardlink
andgit annex.thin
, but those just point me to bugs or forums or tips that refer to the settings.Are these annex wide settings? (that seems to be the case). Is it possible to apply them at a folder level? Am I maybe just missing the point of lock/unlock?
I'll keep looking and run some experiments on my own.
Thanks again!
The options are listed in the
man git-annex
output (which is also available at https://git-annex.branchable.com/git-annex/).I think conceptually that's a good fit. You could set
importtree=yes
with the special remote and ingest changes withgit annex import
on beta's side. However, the rsync special remote doesn't supportimporttree
yet.https://git-annex.branchable.com/todo/import_tree_from_rsync_special_remote/
In your followup comment, you mention unlocked files. That would get you around the link problem. You could call
rsync
with--checksum
to limit what is transferred, though that might be expensive depending on how big your files are.You can set then at the repository level in the repo's .git/config.
Oh boy -- or should I say, oh "
man
"... Now I feel like a bit of an idiot for not checking the actual high level man pages... Thanks for that tip.I did some experimenting and noticed another thing that I hadn't noticed -- although my binary version is v6.20180227 my annexes were all using the v5 index. That came as a bit of a surprise. Once I upgraded my annex to v6 the annex.thin settings started working.
As for rsync, I had tried the
-c
(--checksum
) option, but it wasn't dereferencing the links on the target side, so the files still registered as different (at least I think I tried this, but I may go back and check again, because I was doing a lot of different things...) Nevermind, I just checked my history and I never actually tried it -- I had been usingto try to confirm that the contents of the two folders matched so that I'd know I could safely delete my local copy, but that didn't work because of the links. I didn't add the
-L
option until I reversed the direction and ran the command from the server side withalpha
as the target.Thanks for all the help -- this should work well enough for my stage folder issue, but it also solves a separate problem that I'd been struggling with for making my photos available to a self hosted photo webserver tool that I was trying out (photoprism). It can't currently handle symlinks and my local drive was getting filled up with all the extra copies of my photos directory tree!
Just to clarify: My comment was in the context of unlocked files (in v6+ repos). In that case, symlinks aren't used: the content is kept in the working tree (and a pointer file is tracked by git).
Also, since it sounds like you may want all files to be unlocked, you might want to look into
git annex adjust --unlock
to enter an adjusted with all files in an unlocked state.FWIW, if you don't need importing for this use case, I think using
git annex export
with an rsync special remote configured withexporttree=yes
would work well.