Add preferred content expression and matching option to match the file extension incorporated into a *E
key, e.g. keyext=.mp3
. This would help address the limitation that include=*.mp3
does not work with --all
or --unused
.
Add preferred content expression and matching option to match the file extension incorporated into a *E
key, e.g. keyext=.mp3
. This would help address the limitation that include=*.mp3
does not work with --all
or --unused
.
I don't like the idea much. Keys with extensions are really a hack to work around the limitations of certian programs, and adding this feature would make it harder to deprecate them, which is a long term goal. Though admittedly not a likely one to be achieved entirely, it certianly seems possible right now for git-annex to eventually default to a non-extension backend.. And adding this feature would work against that.
But.. After recent changes, it would be perfectly possible for include= to match on filenames in the current branch when used with --all etc. I'll copy some thoughts on it from https://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/:
I'm feeling a bit cautious about adding a preferred content expression for this brand new capability.
And also unfortunately, it turned out not to be possible to prevent the associated files db from sometimes having stale filenames in it (see c1b50282118520350d5328153fceedac2b8d8ed5). Which all current uses of the associated files db deal with by checking the list of associated files to see if all of them are in HEAD tree. A preferred content expression would also have to deal with that, and that risks slowing down evaluation of preferred content expressions generally.
So I think it's best to not add a preferred content expression, at least until there's a use case and this has had some time to soak.
Although as far as slow down goes, that would only need to affect cases where --all/--key/--unused is used, and preferred content matches on include=/exclude=. Which is a combination noone should be using now, since it doesn't match on anything. So a slow down is not a problem.
When those options are not used, include=/exclude= could skip looking at the associated files db. Consider
git-annex get --auto .
-- If the preferred content expression matches foo and not bar, and they have the same content, it will not get bar, but will get foo, which has the desired result. And it's ok forgit annex get --auto bar
to not get the content of foo; the user didn't request it. And in the case ofgit-annex drop --auto bar
, the preferred content expression not matching bar doesn't matter, because dropping checks if other files using the content are preferred content and so would skip dropping bar in order not to drop foo.So this would only need a way for preferred content matching to know if an option like --all was used, and check the keys database then. I think it could probably just check for
providedFilePath == Nothing
I think the ability of
--all
to operate on all files regardless of branch is important. I often have many branches, and have files on the current branch that I've overwritten but may want to revert. It's useful to e.g. copy the content of all*.bam
files ever created to a given S3 remote, or to generate a report of how much total space is used by content with different extensions.Which still have them Worse, it's hard to know which do. So please don't deprecate. But also, even if started as a hack, letting keys encode metadata like file type -- and then efficiently matching on it -- is separately useful.