git-annex tries to ensure that the configured number of copies of your
data always exist, and leaves it up to you to use commands like git annex
get
and git annex drop
to move the content to the repositories you want
to contain it. But often, it can be good to have more fine-grained
control over which content is wanted by which repositories. Configuring
this allows the git-annex assistant as well as
git annex get --auto
, git annex drop --auto
, git annex sync --content
,
etc to do smarter things.
Preferred content settings can be edited using git
annex vicfg
, or viewed and set at the command line with git annex wanted
.
Each repository can have its own settings, and other repositories will
try to honor those settings when interacting with it.
(So there's no local .git/config
for preferred content settings.)
The idea is that you write an expression that files are matched against. If a file matches, the repository wants to store its content. If it doesn't, the repository wants to drop its content (if there are enough copies elsewhere to allow removing it).
writing expressions
quickstart
Rather than writing your own preferred content expression, you can use several standard ones included in git-annex that are tuned to cover different common use cases.
You do this by putting a repository in a group, and simply setting its preferred content to "standard" to match whatever is standard for that group. See standard groups for a list.
See the man page git-annex-preferred-content for details on the syntax of preferred content expressions.
An example:
include=*.mp3 and (not largerthan=100mb) and exclude=old/*
This makes all .mp3 files, and all other files that are less than 100 mb in size be preferred content. It excludes all files under the "old" directory.
upgrades
It's important that all clones of a repository can understand one-another's preferred content expressions, especially when using the git-annex assistant. So using newly added keywords can cause a problem if an older version of git-annex is in use elsewhere.
Before git-annex version 5.20140320, when git-annex saw a keyword it did not understand, it defaulted to assuming all files were preferred content. From version 5.20140320, git-annex has a nicer fallback behavior: When it is unable to parse a preferred content expression, it assumes all files that are currently present are preferred content.
Here are recent changes to preferred content expressions, and the version they were added in.
- "balanced=", "fullybalanced=" 10.20240831
- "securehash" 6.20170228
- "nothing" 6.201600202
- "anything" 5.20150616
- "standard" 5.20140314
(only when used in a more complicated expression; "standard" by itself has been supported for a long time) - "groupwanted=" 5.20140314
- "metadata=" 5.20140221
- "lackingcopies=", "approxlackingcopies=", "unused=" 5.20140127
- "inpreferreddir=" 4.20130501
- "metadata=field<number" etc 6.20160227
How does the preferred content settings interfere with the numcopies setting?
I could not get behind it. E.g. a case I do not unterstand:
I have a preferred setting evaluating to true and still
does nothing, if the number of copies produced would surpass the numcopies setting.
Thx
Built a new copy of git-annex yesterday. I have a "client" on my macbook, and two "backup"s, one on an external HD, one on an ssh git remote.
git annex get --auto works beautifully!
It doesn't seem to work for copying content to a place where it's needed, though.
If I drop a file from my "backup" USB drive, and then go back to my macbook and do a "git annex sync" and "git annex copy --to=usbdrive --auto" it does not send the file out to the USB drive, even though by preferred content settings, the USB drive should "want" the file because it's a backup drive and it wants all content.
Similarly, if I add a new file on my macbook and then do a "git annex copy --to=usbdrive auto" it does not get copied to the USB drive.
Is this missing functionality, or should the preferred content setting for remotes only affect the assistant?
Is there a way to change these definitions for a given annex?
ie: in this repo make "client" mean
The expressions used for "standard" are built in.
But, you can use "groupwanted" instead, see documentation above.
Is there a way to drop only the files that are located in an "archive" directory? I want to drop all files when calling
if I move them to the archive. But I want to keep the files that are outside of the archive, even if they are already present in other repos. As far as I have seen and tested, as soon as I have the files in an other repo all files get dropped, also the ones outside the archive directory. Or do I have to increase "numcopies" in order to circumvent the "(not copies=semitrusted+:1)" case?
drop --auto
will only drop files that are not preferred content. I'd need to know what preferred content expression you're using to say more.Could you clarify how the default (empty) preferred content setting works? Is it equivalent to
anything
, or more magical than that?(Asking because I noticed that
annex find --in . --want-drop
in such a repo matches basically all files, even the ones thatannex sync --content
just retrieved. Likewise, it retrieves files whileannex find --not --in . --want-get
lists nothing. I'm fine with the sync behavior here, but somewhat worried that a futureannex sync --content
would actually decide to drop everything... I'm running 6.20160126 with numcopies=1.)Also, what happens if I use
standard
for a repo that's not in any group?@grawity, the empty (or unset) preferred content setting causes a default behavior to be used. What the default is varies depending on what you're doing. For example, if you're running
git annex drop --auto
, it defaults to dropping everything when there's no preferred content setting. On the other hand,git annex sync --content
defaults to getting everything. So, this is diffent than using "anything".If you use "standard" and the repo is not in exactly one of the standard groups, it behaves the same as if you'd given it an unparseable preferred content setting. It will want all files it has and none it doesn't have.
Is there any way to set a default preferred content setting -- either used when a new clone is made or whenever a repo doesn't specify one?
I've got an annex that has a couple servers with all the content, and several clients[1] -- which I create more often and more manually -- that just want the content I pick. Basically every time I set up another client, I run
git annex sync --content
, am surprised to see a bunch ofget ...
lines, go kill the sync, set group and preferred content to be manual/standard, and run the sync again. It'd be handy if I could set up the repo in advance to just configure that by default. (I guess I could make an alias that does likegit clone $server/$repo && cd $repo && git annex wanted . standard && git annex group . manual
, but it'd be nice if I could just do thegit clone
I'm used to and it would all work.)[1] AIUI, the "client" group means "get every file referenced in HEAD, unless it's in archive/, and skip older versions"? I guess that makes sense for like a software project with some media assets. I've mostly used git-annex for situations where most files aren't being actively worked with and clients only have a few of them, which is where it seems to really shine over GitLFS. I've always been vaguely surprised by how the client group works as a result. Any sense of how commonly people use it for different use cases? It is excellent for the sparse checkout case though.
git annex assist
directly after agit clone
, wondering why I'm getting a million files shoved into my face, CTRL+C'ing it, being left with a weird unclean work tree for the download-aborted unlocked files, so I have togit restore .
again, then configuringgit annex wanted present
before I continue.