Two entire days spent making a branch where git-annex uses ByteString instead of String, especially for filepaths. I commented out all the commands except for find, but it still took thousands of lines of patches to get it to compile.
The result: git-annex find is between 28% and 66% faster when using ByteString. The files just fly by!
It's going to be a long, long road to finish this, but it's good to have a start, and know it will be worth it. ?optimize by converting String to ByteString is the tracking page for this going forward.
66% performance improvements is an amazing number! i take it this will be especially good for repositories with a large number of files? if so this could make my life MUCH better!
i wonder if this connects with the problems gorzen identified in python 3 about POSIX paths... does Haskell have similar problems with non-unicode filenames?
in any case, I thank you for this awesome work...
This is great.
One other potential for speedup is fixing issues with parallel operations. My current fix is to use
-J1
, giving up a potential 96X speedup. There may also be additional ?parallel possibilities.