While at the DerbyCon security conference, I got to thinking about verifying objects that git-annex downloads from remotes. This can be expensive for big files, so git-annex has never done it at download time, instead deferring it to fsck time. But, that is a divergence from git, which always verifies checksums of objects it receives. So, it violates least surprise for git-annex to not verify checksums too. And this could weaken security in some use cases.
So, today I changed that. Now whenever git-annex accepts an object into
.git/annex/objects, it first verifies its checksum and size. I did add a
setting to disable that and get back the old behavior: git config
annex.verify false
, and there's also a per-remote setting if you want to
verify content from some remotes but not others.
annex.verify-threshold
to verify all files below a certain size. This could default to 500MB.git annex get $file && git annex fsck $file
in all my scripts already because I had run into an issue where a bad file got replicated everywhere, and the only good copy I had wasdrop
ped before I realized. This would have caught it. Thanks!fetch.fsckObjects
,receive.fsckObjects
andtransfer.fsckObjects
totrue
) to catch potential errors as a result of bad disk, memory corruption or transfer errors over the network. I'd rather wait a bit longer while copying or especially moving files than wait for that single corrupted bit in the only copy of a 4 gig file. Thanks!