Hi,
I use git-annex 3.20120123 on a debian-testing amd-64 machine with software RAID6 and LVM2 on it. I needed to move the whole /home
directory to another LV (the new LV is on encrypted PV, the old LV is encrypted and not properly aligned; I'm changing from encrypted /home
only to encrypted everything except /boot
), so I have used the rsync -aAXH
from a ro
mounted /home
to a new LV mounted on /mnt/home_2
. After the move was complete I run the git annex fsck
on my (4TB of) data. The fsck finds some files bad, and moves them to the ..../bad
directory. So far so good, this is how it should be, right? But then- I have a file with sha1sum of all my files. So - I checked the 'bad' file against that. It was OK. Then I computed the SHA256 of the file - this is used by git annex fsck
. It was OK, too. So how did it happen, that the file was marked as bad? Do I miss something here? Could it be related to the hardware (HDDs) and silent data corruption? Or is it the undesirable effect of rsync? Or maybe the fsck is at fault here?
Any ideas?
Well, it should only move files to
.git/annex/bad/
if their filesize is wrong, or their checksum is wrong.You can try moving a file out of
.git/annex/bad/
and re-run fsck and see if it fails it again. (And if it does, paste in a log!)To do that -- Suppose you have a file
.git/annex/bad/SHA256-s33--5dc45521382f1c7974d9dbfcff1246370404b952
and you know that filefoobar
was supposed to have that content (you can check thatfoobar
is a symlink to that SHA value). Then reinject it:git annex reinject .git/annex/bad/SHA256-s33--5dc45521382f1c7974d9dbfcff1246370404b952 foobar
Thanks, joey, but I still do not know, why the file that has been (and is) OK according to separate sha1 and sha256 checks, has been marked 'bad' by
fsck
and moved to.git/annex/bad
. What could be a reason for that? Could haversync
caused it? I know too little about internal workings ofgit-annex
to answer this question.But one thing I know for certain - the false positives should not happen, unless something is wrong with the file. Otherwise, if it is unreliable, if I have to check twice, it is useless. I might as well just keep checksums of all the files and do all checks by hand...
All that git annex fsck does is checksum the file and move it away if the checksum fails.
If bad data was somehow read from the disk that one time, what you describe could occur. I cannot think of any other way it could happen.
OK, thanks. I was just wondering - since there are links in git(-annex), and a hard links too, that maybe the issue has been caused by
rsync
.I will keep my eye on that and run checks with my own checksum and
fsck
from time to time, and see what happens. I will post my results here, but the whole run (fsck
or checksum) takes almost 2 days, so I will not do it too often...The symlinks are in the git repository. So if the rsync damanged one, git would see the change. And nothing that happens to the symlinks can affect fsck.
git-annex does not use hard links at all.
fsck corrects mangled file permissions. It is possible to screw up the permissions so badly that it cannot see the files at all (ie, chmod 000 on a file under .git/annex/objects), but then fsck will complain and give up, not move the files to bad. So I don't see how a botched rsync could result in fsck moving a file with correct content to bad.