Normally commands like git annex add
always add files to the annex,
while git add
adds files to git.
Let's suppose you're developing a video game, written in C. You have source code, and some large game assets. You want to ensure the source code is stored in git -- that's what git's for! And you want to store the game assets in the git annex -- to avoid bloating your git repos with possibly enormous files, but still version control them.
You could take care to use git annex add
after changes to the assets,
but it would be easy to slip up and git commit -a
(which runs git add
),
checking your large assets into git. Configuring annex.largefiles
saves you the bother of keeping things straight when adding files.
Once you've told git-annex what files are large, both git annex add
and git add
/git commit -a
will add the large files to the annex and the
small files to git.
Other commands that use the annex.largefiles configuration include
git annex import
, git annex addurl
, git annex importfeed
, and
the assistant.
examples
For example, let's make only files larger than 100 kb be added to the annex,
and never *.c
and *.h
source code files.
git config annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)'
That is a local configuration, so will only apply to your clone of the repository. To set a default that will apply to all clones, unless overridden, do this instead:
git annex config --set annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)'
There's one other way to configure the same thing, you can put this in
the .gitattributes
file:
* annex.largefiles=largerthan=100kb
*.c annex.largefiles=nothing
*.h annex.largefiles=nothing
The syntax in .gitattributes is a bit different, because the .gitattributes
matches files itself, and the values of attributes cannot contain spaces.
So using .gitattributes for this is not recommended (but it does work for
older versions of git-annex, where the git annex config
setting does
not). Any .gitattributes setting overrides the git annex config
setting,
but will be overridden by the git config
setting.
Another example. If you wanted git add
to put all files the annex
in your local repository:
git config annex.largefiles anything
Or in all clones:
git annex config --set annex.largefiles anything
syntax
See git-annex-matching-expression for details about the syntax.
gitattributes format
Here's that example .gitattributes
again:
* annex.largefiles=largerthan=100kb
*.c annex.largefiles=nothing
*.h annex.largefiles=nothing
The way that works is, *.c
and *.h
files have the annex.largefiles
attribute set to "nothing", and so those files are never treated as large
files. All other files use the other value, which checks the file size.
Since git attribute values cannot contain whitespace, when you need a more complicated annex.largefiles expression, you can instead parenthesize the terms of the annex.largefiles attribute. For example, this is the same as the git config shown earlier, shoehorned into a single git attribute:
* annex.largefiles=(largerthan=100kb)and(not((include=*.c)or(include=*.h)))
It's generally a better idea to use git annex config
instead.
temporarily override
If you've set up an annex.largefiles configuration but want to force a file to be stored in the annex, you can temporarily override the configuration like this:
git annex add --force-large smallfile
converting git to annexed
When you have a file that is currently stored in git, and you want to convert that to be stored in the annex, here's how to accomplish that:
git rm --cached file
git annex add --force-large file
git commit file
This first removes the file from git's index cache, and then adds it back
using git-annex. You can modify the file before the git-annex add
step,
perhaps replacing it with new larger content that necessitates git-annex.
The --force-large option needs git-annex version 7.20200202.7 or newer.
converting annexed to git
When you have a file that is currently stored in the annex, and you want to convert that to be stored in git, here's how to accomplish that:
git annex unlock file
git rm --cached file
git annex add --force-small file
git commit file
You can modify the file after unlocking it and before adding it to git. And this is probably a good idea if it was really a big file, so that you can replace its content with something smaller.
The --force-small option needs git-annex version 7.20200202.7 or newer.
I use version 5.20140412, and I've tried the annex.largefiles in .gitattributes, but doesn't work. Every file is added to .git/annex/objects, including the ones that are excluded in .gitattributes.
I've just started using git-annex so maybe I'm doing something wrong...
I have the same problem: the annex.largefiles is ignored by "git add" when set in .gitattributes allthouch git check-attr does list it.
but it works when set with git config annex.largefiles
git annex version 6.20160126
The first version to support largefiles in .gitattributes was 6.20160211, so both the above commenters just have too old a version.
Hello
it took me some time to figure out how to exclude directories matching a specific structure within the .gitattributes file:
Maybe it helps someone else. (In case this way is the intended way)
Hi guys!
sigh
Currently I am pulling my hair, maybe anybody here can clear things up a bit. I tried to setup a brand new mixed content repo with git-annex but it bluntly ignores my .gitattributes and annexes everything. When I set largefiles in config everything is fine and restrictions are applied right, in .gitattributes even a "* annex.largefiles=nothing" has no effect. All attributes show up right with git check-attr, I double checked. Same thing with a newly initialized minimal example repo.
I tried git-annex as distributed by openSUSE and the current stand-alone-package (in case it's a distribution bug), too. So no clues here, too.
Output of git annex version:
git-annex version: 6.20170302-gb35a50cca build flags: Assistant Webapp Pairing Testsuite S3(multipartupload)(storageclasses) WebDAV Inotify DBus DesktopNotify ConcurrentOutput TorrentParser MagicMime Feeds Quvi key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
System: OpenSUSE Tumbleweed Linux 4.9.3-1-default #1 SMP PREEMPT Thu Jan 12 11:32:53 UTC 2017 (2c7dfab) x86_64 x86_64 x86_64 GNU/Linux
Any ideas? After trying around for hours I am somewhat flabberghasted. Did I miss some config- or buildoption to enable support for .gitattributes?
Kind regards
Jörn
@joern.mankiewicz, you need to file a bug report with enough information to reproduce your problem.
annex.largefiles in .gitattributes works fine:
Note that if annex.largefiles is set in git config (including global git config), it overrides the .gitattributes setting. So a reasonable guess would be that you set it in the git config.
Thanks, joey.
Your last comment brought me onto the right track. The Problem was not in the repository, but an old stale global .gitconfig in my homedir. I just checked $XDG_CONFIG_HOME/git/config were currently my global git-config is residing and totaly forgot about this old config. Stupid me!
was my savior here as it clearly indicated that there is indeed a (unintended) config setting and where to find the file. So i can strongly recommend anybody experiencing strange behavior to try this one-liner. It might have saved me hours of time.
Thanks for your help!
Cheers
Jörn
Hi, from technical point of view, are there any drawbacks/limitations on adopting a workflow of everyone in the project using "git annex add" and relying on the annex.largefiles settings instead of them having to use the separate commands? * I would use repo v5 as repo v6 seems to still need work to do, and I don't need it's features. I just would like to avoid human error of people not using by mistake regular git add for bigfiles. I understand that repo v6 would allow, but I don't like it's default behavior of using unlocked mode when I add things with git add (although it would properly annex the files, but in unlocked mode these files would occupy space in the work copy, and I don't want that). Thanks.
@davicastro yes, using git-annex add for adding both kinds of files is workflow this is about. Other than git add features like
--interactive
I see no need to ever use git add once you have this set up.IMHO it's somewhat user-unfriendly and error-prone to have to remember a sequence of three commands to convert an unannexed file to annexed, or vice-versa. So it would be nice if there were git-annex commands to do this in one go.
In fact, I would expect
git annex add
to handle adding to the annex, andgit annex unannex
to do the opposite. Are there good reasons why they should not be made to do that? Even if there are, aren't those subcommands already doing very similar things? In which case maybe the solution would be to add extra option to each to enable this behaviour? For example--force
or something like that in order to tell it to ignoreannex.largefiles
(although foradd
that is already used to allow adding ignored files, so we probably shouldn't conflate the two cases under one option).unannex
is to reverseadd
, then IMHOunadd
orundo-add
seem like more logical choices. Renaming to one of these would free upunannex
for the conversion use case. But I appreciate that this is potentially confusing for users already used tounannex
's current behaviour, so I expect the idea will be rejected.I have a repo where I have run: git annex config --set annex.largefiles '(mimeencoding=binary and not mimetype=inode/symlink) or mimeencoding=unknown-8bit'
My git-annex package is built with support for MIME detection. It works for other files. But I have noticed that for certain files,
git add
does not add them to the annex. I have verified the MIME type and encoding withfile
so that I could checkgit annex matchexpression
, and it matches. What might be happening here?@lh two possibilities are:
git add
is run, git-annex does some additional checks to detect things like a file that was checked into git before but has been renamed, and will avoid annexing such a file even when annex.largefiles says to.git add
is finding some other git-annex version that behaves differently. Seems unlikely, but possible.git annex add
with a differentannex.largefiles
that caused some files to be added (though not committed) via git rather than git-annex? That's roughly the scenario I was in.I thought I had the same problem as lh (
git add
not respecting the largefiles config), but when I tried to make a minimal example I noticed thatgit add
does add files to the annex, it just doesn't print the progress message thatgit annex add
usually prints.Is there any way to get it to do that? It would help newbs like me know that largefiles is indeed working and for files that are actually large it can be helpful to see the progress.
@imlew
git add
runs git-annex as a filter, so it cannot output progress displays or other messages on stdout at that point. It would have to output to stderr, but outputting non-errors to stderr is not great either.