Git uses SHA1, which is becoming increasingly broken. Using git-annex and signed commits, we can work around the weaknesses of SHA1, and let anyone who clones a repository verify that the data they receive is the same data that was originally commited to it.
This is recommended if you are storing any kind of binary files in a git repository.
Configuring git-annex
You need git-annex 6.20170228. Upgrade if you don't have it.
git-annex can use many types of backends and not all of them are secure. So, you need to configure git-annex to only use cryptographically secure hashes.
git annex config --set annex.securehashesonly true
Each new clone of the repository will then inherit that configuration. But, any existing clones will not, so this should be run in them:
git config annex.securehashesonly true
Signed commits
It's important that all commits to the git repository are signed.
Use git commit --gpg-sign
, or enable the commit.gpgSign configuration.
Use git log --show-signature
to check the signatures of commits.
If the signature is valid, it guarantees that all annexed files
have the same content that was orignally committed.
Why is this more secure than git alone?
SHA1 collisions exist now, and can be produced using a common-prefix attack. See https://shattered.io/. Let's assume that a chosen-prefix attack against SHA1 will also become feasible too. However, a full preimage attack still seems unlikely, so we won't consider such attacks in the analysis below.
The reason that git-annex can work around git's problematic use of SHA1 is that git-annex uses other, stronger hashes of the contents of annexed files. For example, an annexed file may be a symlink to ".git/annex/objects/Ab/Cd/SHA256--eb45a55eb8756646e244e6c5f47349294568d58a9321244f4ee09a163da23a27".
Such a symlink is stored as a git blob object. The SHA1 of the git blobs are listed in a git tree object, and the git commit object contains the SHA1 of the tree. Finally, the commit object is gpg signed.
So, by checking the signature of a commit (git log --show-signature
),
you can verify that this is the same commit that was originally made
to the repository. As far as the git developers know, there is no way
to produce multiple colliding git tree objects (at least not without
creating files with spectacularly ugly and long names), so you
know that the tree object pointed to by the signed commit is the original one.
Now, what about the blob objects that the tree lists? If these blobs were regular git files, a SHA1 collision could mean your git repository does not contain the same file that was orignally committed, and the signed commit would not help.
But, if the blob object is a git-annex symlink target, it has to contain the strong hash of the file content. If a SHA1 collision swaps in some other blob object, it will need to contain the strong hash of a different file's content. The current common-prefix attack cannot do that.
A chosen-prefix attack could make two strong hashes SHA1 the same, but it would need to include additional data after the hash to do it. Since git-annex version 6.20170224, there is no place for an attacker to put such data in a git-symlink target. (See ?sha1 collision embedding in git-annex keys for details of how this was prevented.)
So, we have a SHA1 chain from the gpg signature to the git-annex symlink target,
and at no point in the chain is a SHA1 collision attack feasible.
Finally, git-annex verifies the strong hash when transferring
the content of a file into the repository (and git annex fsck
verifies it
too), and so the content that the symlink is pointing to must be the same
content that was originally committed.
It seems git-annex needs some way to verify that all blobs match its expected format for this security to be strong. My histories have tons of huge binary blobs in them from when I tried upgrading to v6 before it was stable and got a lot of data committed raw. I guess I need to rewrite my history to contain only verified blobs?
It's really too bad git hasn't upgraded to a newer hashing function by now. They should make a plan to do that at least once a decade.
To clarify, this only prevents SHA1 collision attacks from causing problems with annexed files. Files checked into the git repository itself are still vulnerable to collision attacks.
git annex webapp
the automated commits seemed to bypass my normal global git config and not add a signature.Are we still concerned about this? Well, git has a workaround for SHA1's insecurity and will eventually change hashes. There are plenty of other reasons to want to sign git commits, certianly.
The webapp bypasses gpg signing because it commits automatically and potentially frequently, and depending on how gpg handles password prompting, that could flood the user with repeated password prompts. But you can change this default with the
annex.allowsign
configuration.(Commits to the git-annex branch are also not signed by default, for similar reasons. Also, the risks of SHA1 collisions involving the git-annex branch seem small to nonexistant, since that branch only records bookeeping information git-annex cares about, and a small amount of configuration. git-annex does not use data from that branch in any way that would let an untrusted person who modified the branch do anything malicious.)