The old git annex import /dir interface should be removed, in favor of the new import from special remote interface, which can be used with a directory special remote to do the same kind of operation.

There have always been complaints about the old interface being surprising and/or not doing quite what some users want. Tried to find a principled way to address some of that with the "duplicate" options, but users just complain they're confusing (which they certianly are) and don't quite do what they want.

The fundamental mistake that the old interface made is it conflated copying content into the repository, deleting content from the directory, and updating the working tree. The new interface decouples all 3, only doing the first, and updating a tracking branch. The user is then free to merge the tracking branch as-is, or otherwise modify before merging. There are some options to manipulate the tracking branch in commonly wanted ways, which just boil down do git branch manipulation. Less common desires can be handled using all of git's facilities. As for deleting from the directory, that's an export of a branch, which can just be an empty branch if they want to delete everything, or again they can use all of git to construct the branch with the changes they desire.

So while it's not been used as much as the old interface, I think the new interface will be much more flexible to the varied needs of users. What's less clear is if it can well support every way that the old interface can be used.

Of course the first pain point is that the user has to set up a directory special remote. Which may be annoying if they are importing from a variety of different directories ad-hoc.

Another likely pain point is ad-hoc importing of individual files or files matched by wildcard. The new interface is much more about importing whole trees, perhaps configured by preferred content settings

This is now addressed; --fast import from directory special remotes followed by git-annex get of the files that are wanted. --Joey

By --fast do you actually mean --no-content because I can't seem to find --fast documented in the manual page of git-annex-import? Otherwise, what would a --fast operation mean in this context? --jkniiv

Err, yes, I meant --no-content. --Joey

Another pain point is that to remove files from an export, the user has to create trees that lack the files they want to remove. drop from export remote will resolve that.

One approach would be to make the old interface be implemented using the new interface, and paper over the cracks, by eg setting up a directory special remote automatically.

Or, the old interface could warn when used, linking to some documentation about how to accomplish the same tasks with the new interface.

Either could be done incrementally, eg start with the most common import cases, convert to the new interface, and keep others using the old interface.


Just a small plea. 🙏 I'd rather see the the old functionality spared, although in a gutted form as a command called git-annex-ingest (as the name was suggested by ?Ilya Shlyakhter in change git-annex-import not to delete original files by default). It would have none of the presumably confusing --*duplicate* options. It would still move files into the annex thus saving a lot of disk space when you're in a tight spot; as I see it for instance the "import with no options" case as suggested below would pretty much double the disk space requirements during the import. Also ingesting/importing huge files by moving instead of copying them into the annex saves a lot of time, too. --jkniiv

Let's consider the cases:

import with no options

This deletes all files from the directory. So, the equivilant would be:

git-annex import master --from directory
git merge directory/master
git-annex export $emptytree --to directory

Actually the git merge is not quite right because on subseqent runs, that would merge the empty tree, so deleting from master the files imported before. So what's really needed is to diff from the empty tree to directory/master, and create a tree that has all new/changed files, and merge that into master. Which is something git-annex could pretty easily do, although so could a simple script.

import --duplicate

This is the simplest one, it's just:

git-annex import master --from directory
git merge directory/master

import --deduplicate

This needs to deletes duplicate files from the special remote, and import the other files. The easiest way to do that seems to be to first implement drop from export remote. That will allow first an import --from --fast, followed by iterating over the new/changed files in the import, and dropping any that use keys whose content is present anywhere other than in the directory special remote.

And following that, the same kind of creation of a tree that has all (remaining) new/changed files, and mergeing that into mater.

import --skip-duplicates

This will be very similar in implementation to --deduplicate. But avoid dropping any files from the directory special remote. Just build a tree of files that are new/changed and whose content is not present anywhere else, and merge that.

import --clean-duplicates

git-annex import master --from directory

Followed by interating over the new/changed files in the import, and if they have content anywhere else, dropping from the directory special remote. Needs drop from export remote to be implemented.

import --reinject-duplicates

This takes care of most of it:

git-annex import master --from directory

Then all that's needed is to iterate over new/changed files, and filter out any that use a key that is present anywhere else. Create a tree from that, and merge it into master.

import --include= (and other file matching options)

Using these options will add another pass over the imported tree, that filters out files that don't match.


This makes existing files be overwritten by imported files. But with import tree, there's a tree that gets merged by git, and so regular git conflict handling can be used. Which is a better interface than just always overwriting. If desired though, this could be implemented by merging with the "theirs" conflict resolution strategy.