I recently discovered (thanks to Paul Wise) the Meow hash. The TL;DR: is that it's a fast non-crypto hash which might be useful for git-annex. Here's their intro, quoted from the website:

The Meow hash is a high-speed hash function named after the character Meow in Meow the Infinite. We developed the hash function at Molly Rocket for use in the asset pipeline of 1935.

Because we have to process hundreds of gigabytes of art assets to build game packages, we wanted a fast, non-cryptographic hash for use in change detection and deduplication. We had been using a cryptographic hash (SHA-1), but it was unnecessarily slowing things down.

To our surprise, we found a lack of published, well-optimized, large-data hash functions. Most hash work seems to focus on small input sizes (for things like dictionary lookup) or on cryptographic quality. We wanted the fastest possible hash that would be collision-free in practice (like SHA-1 was), and we didn't need any cryptograhic security.

We ended up creating Meow to fill this niche.

I don't an immediate use case for this right now, but I think it could be useful to speed up checks on larger files. The license is a little weird but seems close enough to a BSD to be acceptable.

I know it might sound like a conflict of interest, but I swear I am not bringing this up only as a oblique feline reference. ;) -- anarcat