I said I was going to stop with the ByteString conversion, but then I looked at profiling, and I knew I couldn't stop there -- conversion between String and ByteString had became a major cost center.
So today, converted all the code that reads and parses symlinks and pointer files to ByteString, now ByteString is used all the way from disk to Key. Also put in some caching, so git-annex does not need to re-serialize a Key that it's just deserialized from a ByteString.
There's still some ByteString to String conversion when generating FilePaths; to avoid that will need an equivilant of System.FilePath that operates on RawFilePath, and I don't think there is one yet? But the profiling does show improvement, it's more and more dominated by IO operations that can't be sped up, and less by slow code.
This really does feel like a stopping place now.
Updated benchmarks (compared to last git-annex release):
find on 10000 files, none present... 8% speedup
whereis on 1000 files............... 12% speedup
info on dir with 1000 files......... 7% speedup
local get ; drop of 1000 files...... 4% speedup
setting metadata in 1000 files...... 8% speedup
getting metadata from 1000 files.... 7% speedup
finding a single file out of 1000 that has a given metadata value... 8% speedup