As title says, how big can a git-annex repo be?
My use case is this: I have a few external hard disks (like 2TB, 3TB etc) and each of them have a bunch of files. Ideally I would like to keep all of the data organized in a single git annex repo. However, there are probably a handful of really big files and a LARGE number of really small files, and I doubt its a good idea to put all of them in together.
What should be my biggest concern? I assume file size is not a problem, and I should either tar/zip smaller files into much bigger ones and annex that, or split it up into multiple small annex repositories.
Any suggestions? What do the rest of you with similar amounts of data do?
It can actually get really big and still be okay to use, if you follow the tips page for annexs with lots of files.
Mine is still reasonably responsive at 10 million files! Iterating over it all (git annex info, deduping) is really painful though, but it'd be like that without git-annex..
Depending on the total size of the small files, you might consider a mixed repo, with the small files checked into git normally, and the larger files annexed.
The advantage is that you then don't need to use git-annex commands to manage the many small files. This will probably be faster, for except you won't need to
git annex get
a ton of small files, which will avoid a lot of overhead.Of course, if you have gigabytes of small files, that will result in a git repo gigabytes in size, and you will start to run into some of the scalability problems that git-annex addresses.