I have a remote repository that I have yet to add as a remote into git-annex, but which already contains data that I want git-annex to manage. I already know what the SHA256 hashes and sizes of all the files are, and I can arrange them to match what git-annex will expect.
Is there a way that I can tell git-annex about the presence of the data, to save me having to download and re-upload everything, in a way that is safe? What I want seems to be similar to "git annex reinject" but for special remotes (and I'll take care of the renaming), but I don't see anything in the manpage that looks likely.
I can quite easily create and commit the symlinks with correctly predicted names in my master branch. Will git-annex will treat these correctly?
If you also have the files present locally, you can simply do
git annex copy --fast --to remote
. git-annex copy will first check to see if the remote has the file; seeing that it does it will update the location log.Another option, if you have shell access on the remote is to simply set up a git repository there, move the files into it and
git annex add
them, and merge that into your local repository.There is not currently any way to set the location tracking information to tell git-annex that a file has appeared on a remote. Of course, you can modify the git-annex branch manually to do so. See internals.
Thanks to Joey for pointing me in the right direction. I got this working now.
There are approximately three steps:
Obtain a mapping of git-annex key to friendly name, and rename all entries in the special remote to their git-annex keys.
Create and commit symlinks in the
master
branch (or wherever you want them).Add location tracking entries to the
git-annex
branch for all entries.First, I created an "index" file describing the contents of my special remote, in the form "KEY NAME" where KEY is the git-annex key (I used SHA256) and NAME is the name I want to use for each file.
Step 1: Map and rename
In my case I was "importing" a ddar remote, so I wrote a quick script (https://github.com/basak/ddar/blob/master/contrib/git-annex-convert.py) to generate this index as well as rename all ddar archive members to their git-annex keys instead.
Step 2: Create and commit symlinks
Then, I created symlinks in my master branch using:
Step 3: Add to location tracking
First I set three variables:
Then I added the entries from
index
into location tracking as follows:Verifying
git-annex fsck --from ddar --fast
checks that the keys expected in the special remote can be found (replace the special remote name as needed).Skipping
--fast
will download all data to verify it. I didn't do this - instead I just sampled one entry which seemed to be OK.