The old git annex import /dir
interface should be removed, in favor of
the new import from special remote interface, which can be used with a
directory special remote to do the same kind of operation.
There have always been complaints about the old interface being surprising and/or not doing quite what some users want. Tried to find a principled way to address some of that with the "duplicate" options, but users just complain they're confusing (which they certianly are) and don't quite do what they want.
The fundamental mistake that the old interface made is it conflated copying content into the repository, deleting content from the directory, and updating the working tree. The new interface decouples all 3, only doing the first, and updating a tracking branch. The user is then free to merge the tracking branch as-is, or otherwise modify before merging. There are some options to manipulate the tracking branch in commonly wanted ways, which just boil down do git branch manipulation. Less common desires can be handled using all of git's facilities. As for deleting from the directory, that's an export of a branch, which can just be an empty branch if they want to delete everything, or again they can use all of git to construct the branch with the changes they desire.
So while it's not been used as much as the old interface, I think the new interface will be much more flexible to the varied needs of users. What's less clear is if it can well support every way that the old interface can be used.
Of course the first pain point is that the user has to set up a directory special remote. Which may be annoying if they are importing from a variety of different directories ad-hoc.
Another likely pain point is ad-hoc importing of individual files or files matched by wildcard. The new interface is much more about importing whole trees, perhaps configured by preferred content settings
This is now addressed;
--fast
import from directory special remotes followed bygit-annex get
of the files that are wanted. --JoeyBy
--fast
do you actually mean--no-content
because I can't seem to find--fast
documented in the manual page of git-annex-import? Otherwise, what would a--fast
operation mean in this context? --jkniivErr, yes, I meant
--no-content
. --Joey
Another pain point is that to remove files from an export, the user has to create trees that lack the files they want to remove. drop from export remote will resolve that.
One approach would be to make the old interface be implemented using the new interface, and paper over the cracks, by eg setting up a directory special remote automatically.
Or, the old interface could warn when used, linking to some documentation about how to accomplish the same tasks with the new interface.
Either could be done incrementally, eg start with the most common import cases, convert to the new interface, and keep others using the old interface.
--Joey
Just a small plea. 🙏 I'd rather see the the old functionality spared, although in a gutted form as a command called
git-annex-ingest
(as the name was suggested by ?Ilya Shlyakhter in change git-annex-import not to delete original files by default). It would have none of the presumably confusing--*duplicate*
options. It would still move files into the annex thus saving a lot of disk space when you're in a tight spot; as I see it for instance the "import with no options" case as suggested below would pretty much double the disk space requirements during the import. Also ingesting/importing huge files by moving instead of copying them into the annex saves a lot of time, too. --jkniiv
Let's consider the cases:
import with no options
This deletes all files from the directory. So, the equivilant would be:
git-annex import master --from directory
git merge directory/master
git-annex export $emptytree --to directory
Actually the git merge is not quite right because on subseqent runs, that would merge the empty tree, so deleting from master the files imported before. So what's really needed is to diff from the empty tree to directory/master, and create a tree that has all new/changed files, and merge that into master. Which is something git-annex could pretty easily do, although so could a simple script.
import --duplicate
This is the simplest one, it's just:
git-annex import master --from directory
git merge directory/master
import --deduplicate
This needs to deletes duplicate files from the special remote, and import the other files. The easiest way to do that seems to be to first implement drop from export remote. That will allow first an import --from --fast, followed by iterating over the new/changed files in the import, and dropping any that use keys whose content is present anywhere other than in the directory special remote.
And following that, the same kind of creation of a tree that has all (remaining) new/changed files, and mergeing that into mater.
import --skip-duplicates
This will be very similar in implementation to --deduplicate. But avoid dropping any files from the directory special remote. Just build a tree of files that are new/changed and whose content is not present anywhere else, and merge that.
import --clean-duplicates
git-annex import master --from directory
Followed by interating over the new/changed files in the import, and if they have content anywhere else, dropping from the directory special remote. Needs drop from export remote to be implemented.
import --reinject-duplicates
This takes care of most of it:
git-annex import master --from directory
Then all that's needed is to iterate over new/changed files, and filter out any that use a key that is present anywhere else. Create a tree from that, and merge it into master.
import --include= (and other file matching options)
Using these options will add another pass over the imported tree, that filters out files that don't match.
--force
This makes existing files be overwritten by imported files. But with import tree, there's a tree that gets merged by git, and so regular git conflict handling can be used. Which is a better interface than just always overwriting. If desired though, this could be implemented by merging with the "theirs" conflict resolution strategy.
@Joey Please, I'd rather see the legacy import functionality still saved as a sub-command that would have a new name than deleted altogether. You could de-emphasize this command in the documentation and rather consider it like a plumbing command on the level of
git-annex-reinject
. See my inline comment for further justification.