using old remote format generates irritating output

a special remote (encrypted rsync) that got copied to long ago (not sure when, there are old files that already have sizes in their unencrypted file names) seems to use the aa/bb/GPGHMACSHA1-- format instead of aaa/bbb/GPGHMACSHA1-. git annex fsck over such files produces very irritating output:

fsck L1100423.JPG (gpg) (checking …remote…...) rsync: change_dir "…somewhere…/0a0/8cd/GPGHMACSHA1--91234b770b34eeff811d09c97ce94bb2398b3d72" failed: No such file or directory (2)



sent 8 bytes  received 12 bytes  40.00 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1536) [Receiver=3.0.9]

  rsync failed -- run git annex again to resume file transfer

GPGHMACSHA1--91234b770b34eeff811d09c97ce94bb2398b3d72
     3922730 100%  623.81kB/s    0:00:06 (xfer#1, to-check=0/1)

sent 30 bytes received 3923328 bytes 523114.40 bytes/sec total size is 3922730 speedup is 1.00 (checksum...) ok

(observed with debian's git-annex 3.20121017).

while this does output an "ok" at th end and a zero exit status, having such messages in an fsck is highly irritating.

i see two ways to enhance the situation:

silence the "not found" error when the file is found in another location
a way to rename the files in the remote (i guess the aaa/bbb part can be derived from the file name; in that case, that could even be done w/o network interaction).

RSS Atom

also affects `git annex get`

the same problem also shows up with git annex get:

get …filename… (from prometheus...) 
rsync: change_dir "/home/shared/photos/encrypted_storage/63e/50b/GPGHMACSHA1--b83e8aaf05918ae2fc81652368f9d4068f938625" failed: No such file or directory (2)

sent 8 bytes  received 12 bytes  8.00 bytes/sec
total size is 0  speedup is 0.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1536) [Receiver=3.0.9]

  rsync failed -- run git annex again to resume file transfer

GPGHMACSHA1--b83e8aaf05918ae2fc81652368f9d4068f938625
      513214 100%   95.68kB/s    0:00:05 (xfer#1, to-check=0/1)

sent 30 bytes  received 513396 bytes  44645.74 bytes/sec
total size is 513214  speedup is 1.00
ok

again, it says "ok", but the "no such file or directory" / "rsync failed" is visually more prominent.

Comment by chrysn — Mon Nov 12 23:11:36 2012

Remove comment

comment 2

So, I don't know how to suppress this message without causing worse problems, like suppressing real error messages, and even password prompts.

Comment by joeyh.name — Tue Nov 13 17:27:31 2012

Remove comment

comment 3

how about renaming the stored files, them? if you give me a pointer on how the directory names are generated, i can write a script that does the migration (some hash of the file name?). i suppose that's just a relic from another naming scheme, isn't it?

Comment by chrysn — Wed Nov 14 07:31:11 2012

Remove comment

comment 4

The new hash directory tree is generated in a simple to explain way. Take the md5sum of the key and the first 3 characters are the first directory, and the next 3 characters are the second directory.

The old hash directory tree is rather harder to explain. It takes the md5sum of the key, but rather than a string, represents it as 4 32bit words. Only the first word is used. It is converted into a string by the same mechanism that would be used to encode a normal md5sum value into a string, but where that would normally encode the bits using the 16 characters 0-9a-f, this instead uses the 32 characters "0123456789zqjxkmvwgpfZQJXKMVWGPF". The first 2 letters of the resulting string are the first directory, and the second 2 are the second directory.

There's probably a 1:1 mapping between this special md5 encoding an a regular md5 encoding. But it's certainly easier just to use the existing Haskell implementation of the hash. The following program, which needs to be built inside a git-annex source tree, reads keys on stdin, and outputs their old hash directory tree values, and their new values on stdout.

import Locations
import Types.Key
import Utility.Misc

main = interact $ \s -> case file2key $ firstLine s of
        Nothing -> "bad key"
        Just k -> hashDirMixed k ++ " " ++ hashDirLower k ++ "\n"

joey@gnu:~/src/git-annex>ghc --make convert.hs
joey@gnu:~/src/git-annex>echo WORM--foo | ./ convert
jq/8w/ 2b1/ba3/

Comment by joeyh.name — Wed Nov 14 17:31:38 2012

Remove comment

single-line migration

i've successfully applied this monster to migrate my repository (as always with such expressions, use it only if you know what it does, and have a backup):

find . -path './??/??/*' -type d \( -exec python -c 'import sys, hashlib, os; hash = hashlib.md5(sys.argv[1][8:]).hexdigest(); h1 = hash[:3]; h2 = hash[3:6]; os.mkdir(h1) if not os.path.exists(h1) else None; os.mkdir(h1+"/"+h2) if not os.path.exists(h1+"/"+h2) else None; os.rename(sys.argv[1], h1+"/"+h2+"/"+sys.argv[1][8:])' '{}' ';' -o -print \) -prune

when executed in an encrypted git annex object directory, it takes all two-letter directories, executes a python expression on them (in case of failure, printing the file name it failed on), and doesn't continue searching there (-prune avoids error messages about moved-away directories).

the python expression itself generates the hash described above, generates the required directories (put awkwardly in an a if b else c expression to avoid ifs (which wouldn't fit in a single line) and because python still doesn't have a proper mkdir-p function), and moves the found object there. (nb: using the system's mkdir -p would trigger another bug).

Comment by chrysn — Thu Nov 22 03:14:45 2012

Remove comment

Add a comment