Some git-annex backends allow embedding enough data in the names of keys that it could be used for a SHA1 collision attack. So, a signed git commit could point to a tree with such a key in it, and the blob for the key could have two versions with the same SHA1.

All issues below are done --Joey

Users who want to use git-annex with signed commits to mitigate git's own SHA1 insecurities would like at least a way to disable the insecure git-annex backends:

  • WORM can contain fairly arbitrary data in a key name
  • URL too (also, of course, URLs download arbitrary data from the web, so a signed git commit pointing at URL keys doesn't have any security even w/o SHA1 collisions)
  • SHA1 and MD5 backends are insecure because there can be colliding versions of the data they point to.

There could be a config setting, which would prevent git-annex from using keys with such insecure backends. A user who checks git commit signatures could enable the config setting when they initially clone their repository. This should prevent any file contents using insecure backends from being downloaded into the repository. (Even git-annex-shell recvkey would refuse data using such a key, since it would fail parsing the key.) The user would thus know that any file contents in their repository match the files in signed git commits.

Enabling the config setting in a repository that already contains file contents would be a mistake, because it might contain insecure keys. And since git-annex would skip over such files, git annex fsck cannot warn about such a mistake.

Perhaps, then, the config setting should be turned on by git annex init? Or, we can document this gotcha.

I've done some groundwork for this, but making git-annex not accept insecure keys into the repo at all requires changing file2key, which is a pure function that's used in eg, instances for serialization.

So, how to make it vary depending on git config? Can't. Alternative would be to add lots of checks everywhere a key is read from disk or network, which feels like it would be a hard security boundary to manage.

It doesn't really matter if content under an insecure key is in the repo, as long as there's not a signed commit referencing such a key. So, we could say, this is up to the user constucting a signed commit, to not put such keys in the commit.

Or, we could use the pre-commit hook, and when the config setting disallows insecure keys, make it reject commits that contain them. But, if a past commit added a file using an insecure key, and the current commit does not touch it, should it be rejected? Rejecting it would then require a somewhat expensive look at the tree being committed.

The user might be merging a branch from someone else; there seems no git hook that can sanity check a fast-forward merge.

Perhaps leave it up to the person making signed commits to get it right, and make git annex fsck warn about such keys? That seems reasonable. --Joey

Rather than preventing SHA1/URL/WORM Keys, could put checks in Annex.Content.moveAnnex to prevent SHA1/URL/WORM objects reaching the repository. That would make moveAnnex a security boundary, which is is not currently. Would need to audid to check if anything else populates .git/annex/objects.

Annex.Transfer.runTransfer could also check for disallowed objects, not as a security boundary, but to prevent accidental expensive transfers that would fail at the moveAnnex stage.

As to how to enable this, it may make sense to use git-annex-config but only read the value from the git-annex branch when initializing the repository, and cache it in git-config.

This way, a repository can be created and configured not to allow SHA1/URL/WORM, and all clones will inherit this configuration.

Users can also set it in git-config on a per repository basis.

If the git-annex-config setting is changed, existing clone's won't change their behavior, although new ones will. That's a mixed blessing; it makes it harder to switch an existing repo to disallowing SHA1/URL/WORM, but an accidental/malicious re-enabling won't affect clones made while it was disabled.

This is done now.

Could a repository be configured to either always disallow SHA1/URL/WORM, or always allow them, and then not let that be changed? Maybe -- Look through all the history of the git-annex branch from the earliest commit forward. The first value stored in git-annex/disableinsecurehashes (eg 0 or 1) is the value to use; any later changes are ignored. That would be a little slow, but only needs to be done at init time. It might be possible to fool this though. Create a new empty branch, with an old date, make a commit enabling insecure hashes, and merge it into git-annex branch HEAD. It now looks as if insecure hashes were disabled earliest.

Well, annex.securehashesonly is implemented now. It currently needs to be set in each clone that cares about it. --Joey


A few other potential problems:

  • A symlink target like .git/annex/objects/XX/YY/SHA256--foo might be able to be manipulated to add collision data in the path. For example, .git/annex/objects/collisiondata/../XX/YY/SHA256--foo

    I think this is not a valid attack, because at least on linux, such a symlink won't be followed, unless the .git/annex/objects/collisiondata directory exists.

  • *E backends could embed sha1 collision data in a long filename extension in a key.

    The recent SHA1 common-prefix attack could be used to exploit this; the result would be two keys that have the same SHA1.

    This can be fixed by limiting the length of an extension allowed in such a key to the longest such extension git-annex has ever supported (probably < 20 bytes or so), which would be less than the size of the data needed for current SHA1 collision attacks. Now done; git-annex refuses to use keys with super long extensions.

  • It might be possible to embed colliding data in a specially constructed key name with an extra field in it, eg "SHA256-cXXXXXXXXXXXXXXX-...". Need to review the code and see if such extra fields are allowed.

    Update: All fields are numeric, but could contain arbitrary data after the number. Could have been used in a common-prefix attack. This has been fixed; git-annex refuses to parse such fields, so it won't work with files that try to exploit this.

  • A symlink target like .git/annex/objects/XX/YY/SHA256--foo might be able to be manipulated to add collision data in the path. For example, .git/annex/objects/collisiondata/../XX/YY/SHA256--foo

    I think this is not a valid attack, because at least on linux, such a symlink won't be followed, unless the .git/annex/objects/collisiondata directory exists.