Consider if file foo
uses key K, and file archive/bar
uses the same key K.
Using standard client preferred content settings, git annex drop --auto
will want to drop archive/bar
, but git annex get --auto
will want to get
foo
. git annex sync --content
will do both operations, getting and then
dropping the key. Running these commands repeatedly churns unncessarily.
In the preferred content expressions for standard groups, the only place this bug can be triggered involves archive directories of repositories in the client group. A file both in the archive directory and in another directory has indeterminite status.
Fixing this needs a map from key to files using it. Then, when checking
preferred content of archive/bar
, it can see that foo
also uses the
key. Since foo
is wanted, it should not drop the key, even though
archive/bar
is not wanted.
Such a map exists, in the keys database, but only in v7 mode repositories. So, this seems solvable in v7 repositories, but not in v5. Also, the associated files map may not be accurate at all times, so that's a wrinkle to using it for this. Also, only unlocked files get into the associated files map. --Joey
I ran into this exact situation trying to split content across multiple repos by directory, and ran into a couple of files with many duplicate keys throughout the repo. I worked around it by applying a tag to the offending key, and including that tag as part of the preferred content expression:
This would result in a file possibly being included in a repo where it wouldn't be otherwise (if it wasn't in one of the include directories to begin with), but in my case, the common files were few and small (2, both less than 100K).