I have two copies of a large collection of photos on two computers. I have been using rsync to keep them synchronized, but I would like to use git annex instead. I finished initializing one repo, and now its time to initialize the other repo before I can sync them up. But it seems redundant that the other repo will compute its own hashes although I know that the two contents are identical. Can't I just copy the keys and tell git annex to assume that any files seen have already been hashed?
As a start, I cloned the .git from the first machine to the second. But that wasn't enough ... I now have an empty git-annex repo in the second machine. What else is missing?
git init; git annex init; git annex add .; git commit
then add the remotes in both directions and "sync".I'd say that method, or any similar set of steps, is the typical way to handle this.
Sure, everything gets hashed twice. This is unlikely to waste enough time to make it worthwhile to develop a hack that only hashes once.
If you really want to develop such a hack, the plumbing command that you can use to make it happen is
git annex setkey
. So, you'd add all the files to the first repository, and then usegit-annex find --format="${key} ${file}"
to list all the files and the keys that resulted from hashing them. Then in the second repository, you'd use that list to rungit annex setkey
and force the files into the annex without hashing them.This will probably turn out to be slower than just re-hashing the files would be, since you'll have to run
git annex setkey
once per file. Adding a--batch
option that reads from stdin would probably be called for to get it fast enough to bother with. Although passing-c annex.alwayscommit=false
might speed it up enough.In my super-heavy use case, the second hashing of the files is dwarfed by the 45 minute wait for git to update .git/index, so I would agree with this.