Situation/trouble

I have a set of big repos. Each full replica manages about 172.000 annexed files plus a number of small regular files in git history, for a total of about 1.8TB. At filesystem level, find reports about 924.000 entries (directories, files, symlinks).

That worked rather well for a while except that a number of operations are slow (even outside git-annex, e.g. a plain find takes more than our hour the first time). Also, the git part got rather heavy. Hundreds of megabytes that should have been annexed were committed as regular files and vice versa.

The whole setup even survived some catastrophe rather well, for example 6 months ago when the first 1.5GB of one hard drive was accidentally overwritten. fsck with an alternate superblock fixed the lower level, while git annex repair fixed the rest nicely.

Last night, though, the git parts got corrupted and I'm struggling to get things back to a sane state. git log shows only recent history then fails. Various attempts with git annex repair failed so far, I'll try again adding a new local "bare" git clone of the server as remote for git annex repair to use.

Storage media and filesystems seem sane, still. Software has been unchanged for a long time:

client run Ubuntu 16.04 with locally compiled git-annex version: 6.20161001-gade6ab4
server runs with locally compiled (in a Debian unstable chroot) git-annex version: 6.20161011-g3135d35 .

Considered solution

I'm considering:

recreating a new set of replicas
each replica on same filesystem as old one
recreated only from the checked-out tree and the .git/annex.objects tree.
without copying data (re-reading on import is okay, but no room on a 2TB disk for duplicating 1.8TB)
non-constraint: this will lose detailed history which is an inconvenience we can live with.

Solution, practically

(1) Assuming `git annex fsck` can take into account objects manually placed into `.git/annex/objects`

mkdir $newrepo
cd $newrepo
git init
git annex init
cp -al $oldrepo/* $newrepo/ # ignores .git and other .* (dotfiles)
cp -al $oldrepo/.git/annex/objects/* $newrepo/.git/annex/objects/*
git annex fsck # will this find and use result of cp ?
# git annex fsck will also tell if some checked-out files lack their annexed data
git remote add ...(other replicas)...
git annex sync ...(other replicas)...
git annex unused # will tell if some files don't appear in checked-out tree?

(2) Else...

mkdir $newrepo
cd $newrepo
git init
git annex init
cp -al $oldrepo/* $newrepo/ # ignores .git and other .* (dotfiles)
cp -al $oldrepo/.git/annex/objects/* ${newrepo}.objectdup
git annex reinject --known ${newrepo}.objectdup  # will that perform a copy? I must not in this case.
# or something like find "${newrepo}.objectdup" -type f -exec git annex reinject --known {} \;
git annex fsck # will tell if some checked-out files lack their annexed data
git remote add ...(other replicas)...
git annex sync ...(other replicas)...
# if some files in $oldrepo/.git/annex/objects/* don't appear in checked-out tree, the won't be picked up by reinject and remain in ${newrepo}.objectdup

Questions

No one wins when a lot of time is spent on dead ends. :-) Before I spend time testing if solutions 1 and 2 can work, is there any caveat to mention?

For example, perhaps one must clone from a common empty ancestor instead of creating independent annexed then sync?

What else? Is the whole approach sane? Doomed? Anything simpler/better?

Thanks a lot.

RSS Atom

comment 1

My solution is very roundabout but preserves a lot of information, but did involve buying another drive (and exclusively using v5 indirect mode!).

I create a new repository (on the new drive) which I import all the contents of the "keyfiles" (contents of .git/annex/objects). Then I create another repository with the filelinks (symlinks pointing to .git/annex/objects). After adding the keyfiles remote to this, this lets me see which content is present and valid, which got corrupted, is missing etc.

Then I can move the valid content from this recovery annex into a proper annex and try and repair/find the corrupted/missing.

Comment by CandyAngel — Fri Jul 21 09:25:25 2017

Remove comment

Indeed git annex fsck can take into account objects manually placed into .git/annex/objects

Thanks @CandyAngel. This is similar to wat I'm doing and somehow validates. I'm trying to repair on the same filesystem without long recopy. I don't understand why your solution is specific to v5 indirect mode.

Indeed git annex `fsck` can take into account objects manually placed into `.git/annex/objects`.

Let's create a repo:

mkdir repo1
cd repo1/
ls -al
git init
git annex init repo1
ls -a
ls -al
echo 1 > 1
git annex add 1
git annex sync

git annex fsck
git annex repair --verbose

Everything is fine.

Let's say this repo has its git structures broken and we rebuild it from checked-out tree and .git/annex/objects. We'll lose tree history and location tracking history but recover content.

cd ..
mkdir repo2
cd repo2
git init
git annex init repo2

Ok we have an empty repo. Let's import tree.

cp -al ../repo1/* .
ls -al
git annex add .
git annex repair --verbose

Running git fsck ...
No problems found.
ok

Notice that git annex repair does not care about annexed objects, only history data.

git annex fsck

But fsck notices about missing objects.

fsck 1
  ** No known copies exist of 1
failed
(recording state in git...)
git-annex: fsck: 1 failed

So, in a sense, git annex fsck and git annex repair operate on nearly independent things.

Let's get annexed objects back.

ls -al # red shows symlink is broken
cp -al ../repo1/.git/annex/objects/ .git/annex/
find .git/annex/objects/
ls -al # grey shows symlink is okay
git annex fsck

Hooray, fsck notices that objects are back.

fsck 1 (fixing location log) (checksum...) ok
(recording state in git...)

Conclusion

I can use approach (1). Extra benefit: it will notice if some files got corrupted on the filesystem.

Approach (2) would mean, if any file was corrupted on the filesystem, it would have been considered the correct content, and I'd prefer to avoid that.

Comment by stephane-gourichon-lpad — Fri Jul 28 05:03:30 2017

Remove comment

Add a comment