I have a bunch of files I want to track with git-annex
that are sitting in an external drive. I have a repository sitting on my laptop, but I don't actually have enough free disk space on my laptop to import the files. Really, I just want the file content to be sitting in a special remote. I was thinking the following workflow could be useful:
cd ~/my-laptop-repo
git-annex import --to=s3-remote /mnt/usb-drive/myfiles
The proposed --to=remote
option would add the files to my repo as import
normally does, but it wouldn't every keep the content in the repo, the only copy would now sit in s3-remote
. As little disk space as possible would be staged temporarily in ~/my-laptop-repo
. Perhaps the easiest option would be to import a file normally, but them immediately do a move
to s3-remote
? But, ideally for larger files, we would want to stream them directly from /mnt/usb-drive/myfiles
to s3-remote
without ever staging them at ~/my-laptop-repo
.
git annex import --to <remote>
as well. (Same withgit annex move --from remote-a --to remote-b
actually.)Import tree + move --from --to would be equivilant functionality to import --to. The directory you want to import "to" would be set up as a directory special remote and import tree used on it.
However, it seems to me that if I wanted to accomplish this today, I'd simply make a clone of my git-annex repository onto the external drive, move the files I want into that clone (a cheap operation as it's the same drive), and
git annex copy --to s3-remote
Since git-annex does now support importtree from directory special remotes, you can almost get what you said you want by:
Then
git annex import master --from usb-drive
will import the files into a usb-drive/master branch that you can merge. And you can run it repeatedly to import new and changed files from the directory.So then you have the files sitting in a special remote like you wanted. Namely the directory special remote on the USB drive. Only problem is that importing the files does also copy them into the git-annex repo. So you'd have to drop the files again, assuming you had disk space for them all to begin with.
I wonder, if it were possible to import the files without add their content to the repo you ran the import from, leaving them on the special remote, would that meet your use case? That seems like something it would be possible to add.
It would still probably have to copy the file into the local repo, in order to hash it, and then just delete the content from the local repo. Of course when the file is in a directory on the local system, that's not strictly necessary; it could do the hashing of the file in place. But that would need an extension to the special remote API to hash a file.
But like I said in my other comment, I'd just clone my git-annex repo onto the drive and add the files to the repo there. Avoids all this complication. You'd need to provide a good justification for why you can't do that for me to pursue this any further.
(As far as adding a --to switch to import, transitive transfers discusses this kind of thing, and some issues with implementing that.)
It's now possible to
git-annex import --no-content --from a-directory-special-remote
. So then you have the files sitting on the external drive and not filling up the git repository.And
git-annex copy --from foo --to bar
is also supported, so you can use that to send the files over the S3 or wherever.So, I think this todo can be closed.