Spent rather too long today tracking down a memory leak in
git annex unused.
Actually, it was three memory leaks; one of them was a reversion introduced
while otherwise improving a function to not be partial. Another only
happened in very rare circumstances. The third, which took several more
hours staring at the code, turned out to simply be an unnecessary use of an
accumulating list. Feel like I should have seen that one sooner, but then I
am under the weather and was running profiles in a daze for several hours..
In the end,
git-annex unused went from needing 1 gb of memory to 150 mb
in my big repo.
One advantage to all the profiling though, was I noticed that the
function was allocating a lot of memory, and seemed generally ineficient. This
has to do with it splitting on a string; splitting on a single character
can run twice as fast and churn the GC quite a bit less, so I wrote up a
specialized version of that, and it's used extensively in git-annex now, so
it may run up to 50% faster in some cases. Seems like haskell libraries
split function should perhaps use the more optimal version
when splitting on a single character, and I'm going to file bugs to that
Today's work was sponsored by Jake Vosloo on Patreon.