git-annex is designed for scalability. The key points are:
Arbitrarily large files can be managed. The only constraint on file size are how large a file your filesystem can hold.
While git-annex does checksum files by default, there is a WORM backend available that avoids the checksumming overhead, so you can add new, enormous files, very fast. This also allows it to be used on systems with very slow disk IO.
Memory usage should be constant. This is a "should", because there can sometimes be leaks (and this is one of haskell's weak spots), but git-annex is designed so that it does not need to hold all the details about your repository in memory.
Many files can be managed. The limiting factor is git's own limitations in scaling to repositories with a lot of files, and as git improves this will improve. Scaling to hundreds of thousands of files is not a problem, scaling beyond that and git will start to get slow.
To some degree, git-annex works around inefficiencies in git; for example it batches input sent to certain git commands that are slow when run in an enormous repository.
It can use as much, or as little bandwidth as is available. In particular, any interrupted file transfer can be resumed by git-annex.
scalability tips
If the files are so big that checksumming becomes a bottleneck, consider using the WORM backend. You can always
git annex migrate
files to a checksumming backend later on.If you're adding a huge number of files at once (hundreds of thousands), you'll soon notice that git-annex periodically stops and say "Recording state in git" while it runs a
git add
command that becomes increasingly expensive. Consider adjusting theannex.queuesize
to a higher value, at the expense of it using more memory.See also: Repositories with large number of files
What is the default value of :
Without knowing the default value it is impossible to increase it thank you
annex.queuesize default is documented on the man page. Currently 10240 commands.