a special remote (encrypted rsync) that got copied to long ago (not sure when, there are old files that already have sizes in their unencrypted file names) seems to use the aa/bb/GPGHMACSHA1-- format instead of aaa/bbb/GPGHMACSHA1-. git annex fsck
over such files produces very irritating output:
fsck L1100423.JPG (gpg) (checking …remote…...)
rsync: change_dir "…somewhere…/0a0/8cd/GPGHMACSHA1--91234b770b34eeff811d09c97ce94bb2398b3d72" failed: No such file or directory (2)
sent 8 bytes received 12 bytes 40.00 bytes/sec total size is 0 speedup is 0.00 rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1536) [Receiver=3.0.9]
rsync failed -- run git annex again to resume file transfer
GPGHMACSHA1--91234b770b34eeff811d09c97ce94bb2398b3d72 3922730 100% 623.81kB/s 0:00:06 (xfer#1, to-check=0/1)
sent 30 bytes received 3923328 bytes 523114.40 bytes/sec total size is 3922730 speedup is 1.00 (checksum...) ok
(observed with debian's git-annex 3.20121017).
while this does output an "ok" at th end and a zero exit status, having such messages in an fsck is highly irritating.
i see two ways to enhance the situation:
- silence the "not found" error when the file is found in another location
- a way to rename the files in the remote (i guess the aaa/bbb part can be derived from the file name; in that case, that could even be done w/o network interaction).
the same problem also shows up with
git annex get
:again, it says "ok", but the "no such file or directory" / "rsync failed" is visually more prominent.
The new hash directory tree is generated in a simple to explain way. Take the md5sum of the key and the first 3 characters are the first directory, and the next 3 characters are the second directory.
The old hash directory tree is rather harder to explain. It takes the md5sum of the key, but rather than a string, represents it as 4 32bit words. Only the first word is used. It is converted into a string by the same mechanism that would be used to encode a normal md5sum value into a string, but where that would normally encode the bits using the 16 characters 0-9a-f, this instead uses the 32 characters "0123456789zqjxkmvwgpfZQJXKMVWGPF". The first 2 letters of the resulting string are the first directory, and the second 2 are the second directory.
There's probably a 1:1 mapping between this special md5 encoding an a regular md5 encoding. But it's certainly easier just to use the existing Haskell implementation of the hash. The following program, which needs to be built inside a git-annex source tree, reads keys on stdin, and outputs their old hash directory tree values, and their new values on stdout.
i've successfully applied this monster to migrate my repository (as always with such expressions, use it only if you know what it does, and have a backup):
when executed in an encrypted git annex object directory, it takes all two-letter directories, executes a python expression on them (in case of failure, printing the file name it failed on), and doesn't continue searching there (-prune avoids error messages about moved-away directories).
the python expression itself generates the hash described above, generates the required directories (put awkwardly in an
a if b else c
expression to avoid ifs (which wouldn't fit in a single line) and because python still doesn't have a proper mkdir-p function), and moves the found object there. (nb: using the system'smkdir -p
would trigger another bug).