I’ve noticed something odd when inspecting the history of the git-annex branch today. Apparently, the branch had some merge conflicts during sync that involved two alternative location tracking entries that both were for one and the same remote. Both entries only differed in their timestamps, and the union merge kept both, so that I now have .log files in the annex branch that contain duplicate parts like this.
1404838274.151066s 1 a2401cfd-1f58-4441-a2b3-d9bef06220ad 1406978406.24838s 1 a2401cfd-1f58-4441-a2b3-d9bef06220ad
The UUID here is my local repository.
The duplication also occurred in the uuid.log:
4316c3dc-5b6d-46eb-b780-948c717b7be5 server timestamp=1404839228.113473s 4316c3dc-5b6d-46eb-b780-948c717b7be5 server timestamp=1404847241.863051s
Is this something to be concerned about? The situation somehow arose in relation to unannexing a bunch of files and rebasing the master branch.
This is perfectly normal. The next time that file in the git-annex branch is updated for any reason, git-annex will automatically compress the two entries down to a single one. In the meantime, it has no difficulty working out which entry is more recent. This is basically why it's called a log file.
It would be possible to make the union merge code compress as it merges, but this would slow down union merging some, and make it a more conceptually complicated operation. Also, whether the old entry is present in the file or not, git will be storing a copy of that old entry, so it doesn't actually tend to make the git repository any larger. For more on this, see https://joeyh.name/blog/entry/databranches/
Thanks for the info, Joey! As long as the git tracks the history anyway, this should not increase space consumption that much.
Perhaps it would be useful to have something like «git annex gc» that can clean up these things manually in some situations, e. g. to compact everything before doing a «git annex forget».