Well, may be there is already a way, I could not find via search or look at drop --help
.
In our repositories/workflows we quite often encounter cases where multiple subfolders might contain the same (in content, and thus linking to the same key) file. At times it is desired to drop content in specific folders while still retaining annexed content for other folders in the tree (for further processing etc).
git annex drop path1
would drop a key path1 points to regardless either there is another path2 within a tree which points to it and might still be "needed". So what I am looking is some option (can't even come up with a good name for it, smth like --not-used-elsewhere
?) for drop
, so it would not drop keys which are used in the tree not pointed by path
s provided to drop
command.
Problem is git-annex does not keep track of the information it would need in order to do this. Same problem as in ?indeterminite preferred content state for duplicated file.
Unlike that bug, I think it's actually rather ambiguous whether the user wants the file to be dropped in this case. Obviously you want it not to, the way your file tree is arranged, but other could rely on the current behavior.
Here's one way: Imagine a repo storing music. It has directories for albums, and also directories containing playlists, which are copies of files from albums. If I was in a mood for Brazilian music, but have gotten over it for now, I might want to drop Brazilian_playlist (which got very long in my travels there) to free up some space. If it refused to drop files because the same files were also in the corresponding album directories, I would wonder why git-annex had gotten broken.
But the --not-used-elsewhere switch seems reasonable, if the needed info was available. I suppose git-annex could scan the index for changes and update state when this switch was used. Could be slow to update that state though.
+1 for
drop --not-used-elsewhere
. Would be good if "elsewhere" included linked worktrees. For unlocked files, could just look at the hardlink count to the content file? (But would be odd if only worked for unlocked files.)—includeifany=glob
(true for a key if any file using that key matchesglob
),—includeifall=glob
(true for a key if all files using that key matchglob
), and similarly—excludeifany/all
. Then usedrop —includeifall=path/*
.I have a plan over in ?indeterminite preferred content state for duplicated file and will be working on it over there.
Linked worktrees seems out of scope for this though.
On the --includeifany=glob idea, that seems to suggest a preferred content expression like includeifany=, analagous to how --include matches include=
I'm feeling a bit cautious about adding a preferred content expression for this brand new capability.
And also unfortunately, it turned out not to be possible to prevent the associated files db from sometimes having stale filenames in it (see c1b50282118520350d5328153fceedac2b8d8ed5). Which all current uses of the associated files db deal with by checking the list of associated files to see if all of them are in HEAD tree. A preferred content expression would also have to deal with that, and that risks slowing down evaluation of preferred content expressions generally.
So I think it's best to not add a preferred content expression, at least until there's a use case and this has had some time to soak.
What happens if I run
git annex drop --not-used-elsewhere foo bar
and foo and bar have the same content?The content is not used except for in the files I listed, so it could be argued that the --not-used-elsewhere does not apply, and it should drop it. But of course, that becomes a problem when dropping large directory trees.
This also makes me thing that not-used-elsewhere is too broad, maybe I want to only avoid dropping content shared by files in bardir while dropping foodir, and the option does not allow it.
So I do think @Ilya's on to something with his suggestion.
git annex drop foo --excludeifany=bar
does not have the ambiguity.I guess it's also useful for querying, potentially. Eg, if I have an inbox and an outbox and think perhaps some things from the inbox are things I've already dealt with before, I can find such files:
Thinking about preferred content expression some more, there should not be much reason to use this in one. includesamecontent=foo would have the same effect as include=foo, because when it operates on bar, it already checks if foo is the same content and is preferred content, and if so avoids dropping it. And when getting files, the effect would also be the same, because include=foo makes it get foo, same as includesamecontent=foo would.