This page exsists to collect profiling data about git-annex, so it can be referred to later. If you have a specific instance where git-annex seems unncessarily slow, please file a bug report about it.
This page exsists to collect profiling data about git-annex, so it can be referred to later. If you have a specific instance where git-annex seems unncessarily slow, please file a bug report about it.
Built git-annex with profiling, using
stack build --profile
(For reproduciblity, running git-annex in a clone of the git-annex repo https://github.com/RichiH/conference_proceedings with rev 2797a49023fc24aff6fcaec55421572e1eddcfa2 checked out. It has 9496 annexed objects.)
Profiling
git-annex find +RTS -p
:This is interesting!
Fully 40% of CPU time and allocations are in list (really String) processing, and the details of the profiling report show that
spanList
andstartsWith
andjoin
are all coming from calls toreplace
inkeyFile
andfileKey
. Both functions nest several calls to replace, so perhaps that could be unwound into a single pass and/or a ByteString used to do it more efficiently.12% of run time is spent calculating the md5 hashes for the hash directories for .git/annex/objects. Data.Hash.MD5 is from missingh, and it is probably a quite unoptimised version. Switching to the version if cryptonite would probably speed it up a lot.
Instead of profiling
git annex copy --to remote
, I profiledgit annex find --not --in web
, which needs to do the same kind of location log lookup.The adjustGitEnv overhead is a surprise! It seems it is getting called once per file, and allocating a new copy of the environment each time. Call stack: withIndex calls withIndexFile calls addGitEnv calls adjustGitEnv. Looks like simply making gitEnv be cached at startup would avoid most of the adjustGitEnv slowdown.
(The catchIO overhead is a false reading; the detailed profile shows that all its time and allocations are inherited. getAnnexLinkTarget is running catchIO in the expensive case, so readSymbolicLink is the actual expensive bit.)
The parsePOSIXTime comes from reading location logs. It's implemented using a generic Data.Time.Format.parseTime, which uses a format string "%s%Qs". A custom parser that splits into seconds and picoseconds and simply reads both numbers might be more efficient.
catObjectDetails.receive is implemented using mostly String and could probably be sped up by being converted to use ByteString.
After all that, profiling
git-annex find
:And
git-annex find --not --in web
:So, quite a large speedup overall!
This leaves md5 still unoptimised at 10-28% of CPU use. I looked at switching it to cryptohash's implementation, but it would require quite a lot of bit-banging math to pull the used values out of the ByteString containing the md5sum.
Switched from MissingH to cryptonite for md5. It did move md5 out of the top CPU spot but the overall runtime didn't change much. Memory allocations did go down by a good amount.
Updated profiles:
After switching many internal types to ByteString.
(Note that stack build --profile built this with -O, not -O2, so it's not as fast as it ought to be, but the cost centers are probably fairly accurate still.)
Notice that the percent of time inAnnex' went up from 14.1% to 31.6%. That and getAnnexLinkTarget are the meat of the IO, so it's good for them to get a higher percent of the CPU, to the extent they're IO bound. It seems like getAnnexLinkTarget also lost a lot of non-IO overhead.
There are still some overheads from conversion to and from ByteString, but the above does seem like a good improvement.
Notice that allocations dropped by 1/3rd!
Otherwise, not a large change here..
After caching serialized Keys.
Runtime improved by 5% or so, and getAnnexLinkTarget moved up, otherwise not a lot of change. keyFile is looking like an optimization target, although its percent of the runtime actually reduced. However that's specific to this repo which has a lot of URL keys that contain '/' and so need to be escaped.
Ditto.
Updated profiling. git-annex find is now ByteString end-to-end! Note the massive reduction in alloc, and improved runtime.
Update after some recent optimisations involving seekFilteredKeys.