Git annex is really amazing software, and this cute little scenario is actually Linux's fault. But it's a nasty situation nonetheless, and worth passing on.
...
Create a client repository on your laptop and two backup repositories on external USB drives. Keep all repositories mounted and connected. Now drop 60GB of data spread over 30,000 files into your annex, and watch git annex assistant start adding and syncing. So far, so good.
Now wait a few hours, and watch some kernel crypto code—probably ecryptfs—fall over with a segfault.
Since your laptop and USB drives are all running ext4, the sudden kernel panic will leave you with hundreds of 0-length files. Because git annex assistant was busily adding and syncing files, those 0-length files are spread randomly throughout all your git repositories (typically in .git/objects
) and throughout all the associated annexes. Unfortunately, because git annex assistant
generates tons of commits, this is pretty much unrecoverable using standard git tools unless you're willing to get deep into the repositories' internals.
So what should you do, if you want to add 10s of 1000s of files and there's some risk of kernel panic or accidentally bumping a USB cable? Here's my recommendation to limit the damage:
- Use the command line if possible.
- Add all your files with your remotes offline.
- Run
git gc
on your central repository, just on general principals. - Mount one repository at a time.
- Sync the pure git data first, and then make sure that all disk I/O is flushed (
sync; sleep 10
is a good approximation). - Use
git annex copy --to
to move the annex data. - Unmount the USB repository cleanly and move onto the next one.
If you do bump a USB cable in the middle of step (6), then:
- Run 'git annex fsck' to clean up any garbage files.
- Try another 'git annex copy --to' where you left off.
Wiser minds than I are encouraged to suggest optimizations for the recovery steps.
The theory behind these steps is to only do one thing at a time, and to expose as few remotes as possible to a power failure or crash. A secondary goal is to make sure that pure git operations complete very quickly, limiting the risk that they will be interrupted, because they're the hardest operations to recover from after a crash.
Well, it's not like my machine kernel panics on a regular basis or anything.
This is the first time I ever saw the kernel encryption code do this. I'm running a boring stock install of Ubuntu 12.04 LTS that was preloaded by ZaReason, and I'm using the ecryptfs home directory encryption supplied by Ubuntu. So in this case, "stop using kernel features that crash" means "stop using Ubuntu on supported hardware."
The underlying problem is that ext4 allocates file contents lazily and out-of-order, and it may wait a surprisingly long time before actually flushing data to disk:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781 http://linux.bihlman.com/2010/learn-linux-help/how-to-solve-zero-length-file-problem-in-linuxs-ext4-file-system/
The problem is that
git annex assistant
andgit
don't sync the disks all that often, and that the assistant can generate huge amounts of complicated disk I/O across multiple volumes for hours on end. All you need to do is go through the walkthrough, create the specified repositories including one on a USB drive, and throw 50GB of data into the annex directory.git annex assistant
will happily grind away overnight, and it anything prevents ext4 from flushing data, there's a good chance you'll wind up with multiple corrupted repositories with hundreds ofgit fsck
errors.There are some potential workarounds:
git
andgit annex assistant
to callsync
orfsync
more aggressively on local volumes.git
's data is in a known-good state before trying to copy the annex files. The annex files are much easier to recover than git's state.Anyway, I doubt this is really fixable. And it's not really
git annex
's fault, in any case. But I'm really glad I had recent backups of all my data last night, which allowed me to checksum everything and start from scratch....
Leaving aside this incident,
git annex
is one of the nicest pieces of open source software I've seen in a long time, and it's clearly going to change how I use my computer. And thank you for posting the crowd-funding campaign so we can say "Thanks!"git config core.fsyncobjectfiles true
This will make git fsync all the data it writes. Whether it's a good default, I don't know.
Oh i've had that type of crash from ecryptfs many times over many different versions of Ubuntu, also before i even encountered git-annex.
Git-annex might provoke this issue, but it is by no means the only thing that can crash your system when running ecryptfs.
I highly suggest you try LUKS for a stable experience.
What good idea! I've turned on
core.fsyncobjectfiles
. This should definitely help avoid the worst damage when ext4 filesystems get unmounted while dirty. Thank you very much for addressing this issue so quickly, even if though has more to do with ext4 than git annex.develop: Thank you for the suggestion to use LUKS; that's good to know.
If the string
" ecryptfs "
(with surrounding spaces) appears in/etc/mtab
, then there's at least one ecryptfs volume on the system. More specific parsing will tell you where it's mounted.Ubuntu released a new kernels fixing the worst ecryptfs bugs in 12.04 LTS. The current version is OK, but perhaps not exactly great. Some older versions are public menaces with awful data corruption bugs.