I just created a new Annex by doing the following:
- git init
- git annex init
- git annex add .
- git commit -m "Added files"
- git annex status
I see the following:
local annex keys: 224
local annex size: 41 gigabytes
known annex keys: 235
known annex size: 49 gigabytes
bloom filter size: 16 mebibytes (0% full)
backend usage:
SHA256: 459
Why is there an 8 gigabyte difference here? What/where are those files? What is a bloom filter?
Those are duplicate files.
see http://git-annex.branchable.com/tips/finding_duplicate_files/ for how to easily display them.
The local keys are files whose content is locally present.
The known keys are annexed files in the current branch, whose content may or may not be present.
Justin is correct -- if you have the same file in the tree twice, it will be counted twice as known keys. Since git-annex deuplicates, only one local key is needed to store it.
The bloom filter is a technical implementation detail that allows the potentially expensive status scan to run in constant space. You can read about it on Wikipedia if interested.