Spent rather too long today tracking down a memory leak in git annex unused
.
Actually, it was three memory leaks; one of them was a reversion introduced
while otherwise improving a function to not be partial. Another only
happened in very rare circumstances. The third, which took several more
hours staring at the code, turned out to simply be an unnecessary use of an
accumulating list. Feel like I should have seen that one sooner, but then I
am under the weather and was running profiles in a daze for several hours..
In the end, git-annex unused
went from needing 1 gb of memory to 150 mb
in my big repo.
One advantage to all the profiling though, was I noticed that the split
function was allocating a lot of memory, and seemed generally ineficient. This
has to do with it splitting on a string; splitting on a single character
can run twice as fast and churn the GC quite a bit less, so I wrote up a
specialized version of that, and it's used extensively in git-annex now, so
it may run up to 50% faster in some cases. Seems like haskell libraries
with a split
function should perhaps use the more optimal version
when splitting on a single character, and I'm going to file bugs to that
effect.
Today's work was sponsored by Jake Vosloo on Patreon.