Please describe the problem.
I had to migrate a special remote using hybrid encryption from an Amazon Cloud Drive-hosted rclone remote to a Google Drive-hosted rclone remote. Now I am testing it by trying to fsck a particular file.
I've left all of the special remote settings on the git annex side the same, and just changed my .rclone.conf
to have "amazondrive", the rclone remote name, actually point to the new Google Drive location.
When I try to fsck my file, git annex generates an HMAC key, successfully downloads it from the remote, but then fails to decrypt it. After that, it looks like it retries: it generates a different HMAC key, which it can't find the file for.
I've compared the files in Amazon and in Google Drive and they are the same.
Using the independent decryption script, modified to use the SHA512 MAC that my repo uses, and to not use every matching line from remote.log
, I can generate the second HMAC key that git annex appears to fall back to, but not the first one. I think that this means I can't generate the decryption key either; I've had no luck manually decrypting the file with the characters after the first 256 of the decrypted shared key as the GPG passphrase.
How is Git Annex generating that first HMAC, if not by the process used in the decryption script? Why is it able to get an HMAC that exists but not a valid decryption key? What is the thinking behind trying multiple HMAC keys for the same file?
What steps will reproduce the problem?
Have an encrypted, hybrid Git Annex repo in an Amazon Drive rclone remote. Copy all the files over to Google Drive, and make a Google Drive rclone remote in rclone.conf with the same name that the Amazon Drive one had. Try to retrieve files from the remote with Git Annex.
What version of git-annex are you using? On what operating system?
6.20170525+gitge1cf095ae-1~ndall+1
on Ubuntu 16.04
Please provide any additional information below.
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
$ git annex fsck info.txt --from=amazon --debug
[2017-06-03 11:24:31.368563623] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2017-06-03 11:24:31.373358653] process done ExitSuccess
[2017-06-03 11:24:31.373520685] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2017-06-03 11:24:31.379319755] process done ExitSuccess
[2017-06-03 11:24:31.382954929] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2017-06-03 11:24:31.386196477] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2017-06-03 11:24:31.392852081] read: git ["config","--null","--list"]
[2017-06-03 11:24:31.397686571] process done ExitSuccess
[2017-06-03 11:24:31.400187312] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--cached","-z","--","info.txt"]
[2017-06-03 11:24:31.424951599] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","check-attr","-z","--stdin","annex.backend","annex.numcopies","annex.largefiles","--"]
[2017-06-03 11:24:31.425544622] read: git ["--version"]
[2017-06-03 11:24:31.428009953] process done ExitSuccess
fsck info.txt [2017-06-03 11:24:31.479360829] chat: gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
[2017-06-03 11:24:31.503040759] process done ExitSuccess
[2017-06-03 11:24:31.506638987] chat: /home/anovak/bin/git-annex-remote-rclone []
[2017-06-03 11:24:31.510094668] git-annex-remote-rclone[1] --> VERSION 1
[2017-06-03 11:24:31.510281634] git-annex-remote-rclone[1] <-- PREPARE
[2017-06-03 11:24:31.511006098] git-annex-remote-rclone[1] --> GETCONFIG prefix
[2017-06-03 11:24:31.511148645] git-annex-remote-rclone[1] <-- VALUE data/annex-rclone
[2017-06-03 11:24:31.515487261] git-annex-remote-rclone[1] --> GETCONFIG target
[2017-06-03 11:24:31.515770189] git-annex-remote-rclone[1] <-- VALUE amazondrive
[2017-06-03 11:24:31.519503001] git-annex-remote-rclone[1] --> GETCONFIG rclone_layout
[2017-06-03 11:24:31.519666709] git-annex-remote-rclone[1] <-- VALUE lower
[2017-06-03 11:24:31.523726589] git-annex-remote-rclone[1] --> PREPARE-SUCCESS
[2017-06-03 11:24:31.523902589] git-annex-remote-rclone[1] <-- CHECKPRESENT GPGHMACSHA512--4354571263cd241ac8300e0bd8fc3643c184b922500c3d9318b7ce590684e1de820d41c0b245f1595e0bd38b2844a5151192981b4f9ab41bda135d5f7a184d16
[2017-06-03 11:24:31.530946791] git-annex-remote-rclone[1] --> DIRHASH-LOWER GPGHMACSHA512--4354571263cd241ac8300e0bd8fc3643c184b922500c3d9318b7ce590684e1de820d41c0b245f1595e0bd38b2844a5151192981b4f9ab41bda135d5f7a184d16
[2017-06-03 11:24:31.531236912] git-annex-remote-rclone[1] <-- VALUE b95/785/
Total objects: 1 Total size: 119 Bytes (119 Bytes) 2017/06/03 11:24:42 Transferred: 0 Bytes (0 Bytes/s) Errors: 0 Checks: 0 Transferred: 0 Elapsed time: 10.4s
[2017-06-03 11:24:42.04576211] git-annex-remote-rclone[1] --> CHECKPRESENT-SUCCESS GPGHMACSHA512--4354571263cd241ac8300e0bd8fc3643c184b922500c3d9318b7ce590684e1de820d41c0b245f1595e0bd38b2844a5151192981b4f9ab41bda135d5f7a184d16
[2017-06-03 11:24:42.049286934] git-annex-remote-rclone[1] <-- TRANSFER RETRIEVE GPGHMACSHA512--4354571263cd241ac8300e0bd8fc3643c184b922500c3d9318b7ce590684e1de820d41c0b245f1595e0bd38b2844a5151192981b4f9ab41bda135d5f7a184d16 .git/annex/tmp/GPGHMACSHA512--4354571263cd241ac8300e0bd8fc3643c184b922500c3d9318b7ce590684e1de820d41c0b245f1595e0bd38b2844a5151192981b4f9ab41bda135d5f7a184d16
[2017-06-03 11:24:42.050298028] git-annex-remote-rclone[1] --> DIRHASH-LOWER GPGHMACSHA512--4354571263cd241ac8300e0bd8fc3643c184b922500c3d9318b7ce590684e1de820d41c0b245f1595e0bd38b2844a5151192981b4f9ab41bda135d5f7a184d16
[2017-06-03 11:24:42.05045573] git-annex-remote-rclone[1] <-- VALUE b95/785/
2017/06/03 11:24:52 Local file system at /tmp/tmp.ITXaCoES4i: Waiting for checks to finish
2017/06/03 11:24:52 Local file system at /tmp/tmp.ITXaCoES4i: Waiting for transfers to finish
2017/06/03 11:24:54
Transferred: 119 Bytes (9 Bytes/s)
Errors: 0
Checks: 0
Transferred: 1
Elapsed time: 12s
[2017-06-03 11:24:54.083003717] git-annex-remote-rclone[1] --> TRANSFER-SUCCESS RETRIEVE GPGHMACSHA512--4354571263cd241ac8300e0bd8fc3643c184b922500c3d9318b7ce590684e1de820d41c0b245f1595e0bd38b2844a5151192981b4f9ab41bda135d5f7a184d16
[2017-06-03 11:24:54.083554696] chat: gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--batch","--passphrase-fd","22","--decrypt"]
gpg: decryption failed: bad key
[2017-06-03 11:24:54.086768586] process done ExitFailure 2
[2017-06-03 11:24:54.088931923] git-annex-remote-rclone[1] <-- TRANSFER RETRIEVE GPGHMACSHA512--7439fc103f30635cd914af0ce14370aef2f9a81f415ba7a652bc20454c90893bcd5701ebbed855f6a0285fce413e5c039c15a7049738cf519e2719c9d485ce28 .git/annex/tmp/GPGHMACSHA512--7439fc103f30635cd914af0ce14370aef2f9a81f415ba7a652bc20454c90893bcd5701ebbed855f6a0285fce413e5c039c15a7049738cf519e2719c9d485ce28
[2017-06-03 11:24:54.089806756] git-annex-remote-rclone[1] --> DIRHASH-LOWER GPGHMACSHA512--7439fc103f30635cd914af0ce14370aef2f9a81f415ba7a652bc20454c90893bcd5701ebbed855f6a0285fce413e5c039c15a7049738cf519e2719c9d485ce28
[2017-06-03 11:24:54.090050153] git-annex-remote-rclone[1] <-- VALUE b47/e54/
2017/06/03 11:25:03 Local file system at /tmp/tmp.5lapDbIoZy: Waiting for checks to finish
2017/06/03 11:25:03 Local file system at /tmp/tmp.5lapDbIoZy: Waiting for transfers to finish
2017/06/03 11:25:03 Attempt 1/3 failed with 0 errors and: directory not found
2017/06/03 11:25:03 Local file system at /tmp/tmp.5lapDbIoZy: Waiting for checks to finish
2017/06/03 11:25:03 Local file system at /tmp/tmp.5lapDbIoZy: Waiting for transfers to finish
2017/06/03 11:25:03 Attempt 2/3 failed with 0 errors and: directory not found
2017/06/03 11:25:04 Local file system at /tmp/tmp.5lapDbIoZy: Waiting for checks to finish
2017/06/03 11:25:04 Local file system at /tmp/tmp.5lapDbIoZy: Waiting for transfers to finish
2017/06/03 11:25:04 Attempt 3/3 failed with 0 errors and: directory not found
2017/06/03 11:25:04 Failed to copy: directory not found
[2017-06-03 11:25:04.479940212] git-annex-remote-rclone[1] --> TRANSFER-FAILURE RETRIEVE GPGHMACSHA512--7439fc103f30635cd914af0ce14370aef2f9a81f415ba7a652bc20454c90893bcd5701ebbed855f6a0285fce413e5c039c15a7049738cf519e2719c9d485ce28
failed to download file from remote
failed
[2017-06-03 11:25:04.492702185] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","hash-object","-w","--stdin-paths","--no-filters"]
[2017-06-03 11:25:04.494054942] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-index","-z","--index-info"]
[2017-06-03 11:25:04.672344301] process done ExitSuccess
[2017-06-03 11:25:04.67255166] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2017-06-03 11:25:04.68070068] process done ExitSuccess
(recording state in git...)
[2017-06-03 11:25:04.680983817] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","write-tree"]
[2017-06-03 11:25:04.900456464] process done ExitSuccess
[2017-06-03 11:25:04.900643429] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","commit-tree","a8fc26b91dab1ffac0499d3e21c0f691858d3a6a","--no-gpg-sign","-p","refs/heads/git-annex"]
[2017-06-03 11:25:04.965122443] process done ExitSuccess
[2017-06-03 11:25:04.965288386] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-ref","refs/heads/git-annex","c0b297184826849d8a24c899faa9f4f47c1f8e53"]
[2017-06-03 11:25:04.970235879] process done ExitSuccess
[2017-06-03 11:25:04.971493023] process done ExitSuccess
git-annex: fsck: 1 failed
# End of transcript or log.
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yeah, it works great most of the time, and it was working great on the amazon remote before I tried to pull these migration shenanigans.
Closing this since there has been no response for information for 5 years. done --Joey
Hello,
I have no idea why git-annex fails to decrypt your file, but as for the two different HMAC keys, I guess you have chunking enabled on that remote (at least when you uploaded the file) and that first HMAC key is the right key for the 1st chunk of your file. That decryption script does not take chunks into account, so you were only able to generate the second HMAC key, which should be the right key if the file was uploaded without chunking enabled. I've just posted a modified version of the script that supports chunks.
Your remote has chunking enabled, so git-annex first tries generating a HMAC for a chunked key. When decrypting the content fails for some reason, it falls back to trying the HMAC for an unchunked key. This is done because chunking can be enabled after some content has been uploaded to a remote, so it always tries the unchunked location just in case.
It looks like gpg is successfully decrypting the hybrid encryption key that's embedded in your git repository. That is the first, successful call tp gpg in your log.
The "bad key" error then comes when gpg is asked to use the hybrid encryption key to decrypt the content. This seems to indicate it's not using the same hybrid encryption key that was used to encrypt it.
The fact that it was able to generate the right HMAC key to download the content though, indicates that it did get the right hybrid encryption key (since half of that key is used to generate the HMAC).
So hmm, I don't understand what is going on.
Are you able to retrieve the same file successfully when the rclone.conf is configured to use the Amazon Cloud Drive? (Assuming that the content of the file is still present over there.)
Unfortunately even though the content is still over at Amazon, ACD can no longer be accessed through rclone, so I can't get Git Annex to go over there and download it.
With the new chunk-supporting decryption script (which I further modified to actually use the timestamps on the log entries), I am able to generate the right key for my test file. I was also able to generate the key for and decrypt another test file, and then testing with git annex again shows that I can successfully fsck other file in the remote.
I think what is happening is that my small file I was testing with somehow became corrupted or was modified while on Amazon's servers. I downloaded from Amazon before the transfer and from Google after and did a diff, and both files are identical, so I think I moved over corrupt data.
It looks like git annex is just successfully doing its job and identifying some data corruption here. The bug can probably be closed.
"I think what is happening is that my small file I was testing with somehow became corrupted or was modified while on Amazon's servers."
That was also kind of my guess. It's hard to imagine how the way that a file is downloaded from The Cloud changes how git-annex decrypts it. As long as the content is the same, the decrpytion step should behave identially no matter where the file is downloaded from.
But, multiple small files getting corrupted seems like it must have a cause other than a bit flip. Perhaps something about how they were transferred between the two clouds corrupted them..
I suppose there could also be a bug in git-annex or rclone that somehow corrupts uploads of small files. Perhaps something to do with chunking.. What does
git annex info theremote --fast
say about its configuration?What is the range of sizes of small files that you've found to be corrupted? Is there a cut-off point after which all larger files are not corrupted? Are any small files not corrupted?