git merge watch_
My cursor has been mentally poised here all day, but I've been reluctant to merge watch into master. It seems solid, but is it correct? I was able to think up a lot of races it'd be subject to, and deal with them, but did I find them all?
Perhaps I need to do some automated fuzz testing to reassure myself. I looked into using genbackupdata to that end. It's not quite what I need, but could be moved in that direction. Or I could write my own fuzz tester, but it seems better to use someone else's, because a) laziness and b) they're less likely to have the same blind spots I do.
My reluctance to merge isn't helped by the known bugs with files that are
either already open before git annex watch
starts, or are opened by two
processes at once, and confuse it into annexing the still-open file when one
process closes it.
I've been thinking about just running lsof
on every file as it's being
annexed to check for that, but in the end, lsof
is too slow. Since its
check involves trawling through all of /proc, it takes it a good half a
second to check a file, and adding 25 seconds to the time it takes to
process 100 files is just not acceptable.
But an option that could work is to run lsof
after a bunch of new files
have been annexed. It can check a lot of files nearly as fast as a single
one. In the rare case that an annexed file is indeed still open, it could
be moved back out of the annex. Then when its remaining writer finally
closes it, another inotify event would re-annex it.
wasn't there some filesystem functionality that could tell you the amount of open file handles on a certain file? I thought this was tracked per-file too. Or maybe i'm just confusing it with the number of hard links (which stat can tell you), anyway something to look into.
Corner case, but if the other program finishes writing while you are annexing and your check shows no open files, you are left with bad checksum on a correct file. This "broken" file with propagate and the next round of fsck will show that all copies are "bad".
Without verifying if this is viable, could you set the file RO and thus block future writes before starting to annex?
@wichert All this inotify stuff is entirely linux specific AFAIK anyway, so it's find for workarounds to limitations in inotify functionality to also be linux specific.
@dieter I think you're thinking of hard links, filesystems don't track number of open file handles afaik.
@Jimmy, I'm planning to get watch going on freebsd (and hopefully that will also cover OSX), after merging it
@Richard, the file is set RO while it's being annexed, so any lsof would come after that point.