Recent comments posted to this site:

If I understand this right, this feature should allow us to say to git: “Hey, from now on whenever I git add a *.png file, add it to git-annex instead!”

How about saying to git-annex: “Hey, whenever I git-annex add a file which is not *.png, add it to git instead! Or at least leave it unadded so that I can decide later.” Is it possible now? If not, would it be reasonable to add such a feature?

Comment by tomekwi Sat Apr 25 09:13:31 2015

I also obtain the expected result if a file is thought to be present, but isn't.

> git annex setpresentkey `git annex lookupkey notpresent` be992080-b1db-11e1-8f79-1b10bb4092ef 1
setpresentkey SHA256E-s37--2f9b7d77d43f49b59fb00148bc1b3d31a887ba717c988be55b9377d403a91f53 ok

> git annex fsck --debug -f cloud --fast notpresent
[2015-04-25 09:24:25 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2015-04-25 09:24:25 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2015-04-25 09:24:25 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..0547dfd2d61ff9a24a08ff97cf4984bebbd4f0f1","-n1","--pretty=%H"]
[2015-04-25 09:24:25 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..b43028d651236ce59a3e47240bead91cdbfc37ea","-n1","--pretty=%H"]
[2015-04-25 09:24:25 BST] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2015-04-25 09:24:25 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--cached","-z","--","notpresent"]
fsck notpresent [2015-04-25 09:24:25 BST] chat: gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
(checking cloud...) [2015-04-25 09:24:25 BST] String to sign: "HEAD\n\n\nSat, 25 Apr 2015 08:24:25 GMT\n/BUCKET/GPGHMACSHA1--e46ce4a11bc47622fb40affac818d6128bcd94bd"
[2015-04-25 09:24:25 BST] Host: "BUCKET.s3-ap-southeast-2.amazonaws.com"
[2015-04-25 09:24:25 BST] Path: "/GPGHMACSHA1--e46ce4a11bc47622fb40affac818d6128bcd94bd"
[2015-04-25 09:24:25 BST] Query string: ""
[2015-04-25 09:24:25 BST] Response status: Status {statusCode = 404, statusMessage = "Not Found"}
[2015-04-25 09:24:25 BST] Response header 'x-amz-request-id': 'D562150974717AB1'
[2015-04-25 09:24:25 BST] Response header 'x-amz-id-2': 'Geq6BKC3Sg1rUuhgOHE7fOa5fq+L5ecShidW0ktI/ri3zNXKudhK5O5qT2qmUraJP6BCzDFuj1Q='
[2015-04-25 09:24:25 BST] Response header 'Content-Type': 'application/xml'
[2015-04-25 09:24:25 BST] Response header 'Transfer-Encoding': 'chunked'
[2015-04-25 09:24:25 BST] Response header 'Date': 'Sat, 25 Apr 2015 08:24:24 GMT'
[2015-04-25 09:24:25 BST] Response header 'Server': 'AmazonS3'
[2015-04-25 09:24:25 BST] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
(fixing location log) 
  ** Based on the location log, notpresent
  ** was expected to be present, but its content is missing.
failed

That leaves only one case: when the file isn't thought to be in cloud, but is. For completeness

> git annex copy --to cloud notpresent
copy notpresent (checking cloud...) (to cloud...) 
ok                      
(recording state in git...)
> git annex setpresentkey `git annex lookupkey notpresent` be992080-b1db-11e1-8f79-1b10bb4092ef 0
setpresentkey SHA256E-s37--2f9b7d77d43f49b59fb00148bc1b3d31a887ba717c988be55b9377d403a91f53 ok

> git annex fsck --debug -f cloud --fast notpresent
[2015-04-25 09:26:33 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2015-04-25 09:26:33 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2015-04-25 09:26:33 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..dca379d2631cd39bd205ccb7d6c192faea7c05c5","-n1","--pretty=%H"]
[2015-04-25 09:26:33 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..b43028d651236ce59a3e47240bead91cdbfc37ea","-n1","--pretty=%H"]
[2015-04-25 09:26:33 BST] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2015-04-25 09:26:33 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--cached","-z","--","notpresent"]
fsck notpresent [2015-04-25 09:26:33 BST] chat: gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
(checking cloud...) [2015-04-25 09:26:33 BST] String to sign: "HEAD\n\n\nSat, 25 Apr 2015 08:26:33 GMT\n/BUCKET/GPGHMACSHA1--e46ce4a11bc47622fb40affac818d6128bcd94bd"
[2015-04-25 09:26:33 BST] Host: "BUCKET.s3-ap-southeast-2.amazonaws.com"
[2015-04-25 09:26:33 BST] Path: "/GPGHMACSHA1--e46ce4a11bc47622fb40affac818d6128bcd94bd"
[2015-04-25 09:26:33 BST] Query string: ""
[2015-04-25 09:26:34 BST] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2015-04-25 09:26:34 BST] Response header 'x-amz-id-2': '4Ti/62fBMzjW0woyrX5C++tQUw4uV97bbowjSiCkUNI6X2bAt+JCKbRYvZf/Is1QSY6SI2Aqgv4='
[2015-04-25 09:26:34 BST] Response header 'x-amz-request-id': '9311809D4C8485FD'
[2015-04-25 09:26:34 BST] Response header 'Date': 'Sat, 25 Apr 2015 08:26:35 GMT'
[2015-04-25 09:26:34 BST] Response header 'Last-Modified': 'Sat, 25 Apr 2015 08:26:22 GMT'
[2015-04-25 09:26:34 BST] Response header 'ETag': '"c5c3c0f720110210e73c7bf962d76390"'
[2015-04-25 09:26:34 BST] Response header 'Accept-Ranges': 'bytes'
[2015-04-25 09:26:34 BST] Response header 'Content-Type': 'binary/octet-stream'
[2015-04-25 09:26:34 BST] Response header 'Content-Length': '99'
[2015-04-25 09:26:34 BST] Response header 'Server': 'AmazonS3'
[2015-04-25 09:26:34 BST] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>

  failed to download file from remote
(fixing location log) failed
[2015-04-25 09:26:34 BST] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","hash-object","-w","--stdin-paths","--no-filters"]
[2015-04-25 09:26:34 BST] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-index","-z","--index-info"]
[2015-04-25 09:26:34 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
(recording state in git...)
[2015-04-25 09:26:34 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","write-tree"]
[2015-04-25 09:26:34 BST] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","commit-tree","83cec04d148757f98565eacda236d6e9dbd48678","--no-gpg-sign","-p","refs/heads/git-annex"]
[2015-04-25 09:26:34 BST] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","update-ref","refs/heads/git-annex","31d4a714f6977197029faf23b099ea32a298be59"]
git-annex: fsck: 1 failed

This correctly determines that the file is present, and updates the location log. But I don't understand why the message failed to download file from remote is used (which is also used when a file is present, and thought to be present). For a fast fsck it shouldn't be trying to download the file. Also, I don't think this is specific to S3, I expect any remote will have the same behaviour.

Comment by Walter Sat Apr 25 08:36:25 2015

Sorry, I should have provided this output also, which is when I do a non-fast fsck. Below that is the output for a fsck in a file not in the remote. Basically, they both work. The case of a file not present with --fast also works (it gets a 404 response). But fscking a file with --fast that is there gets a 200 response for the HEAD, and then decides it didn't get downloaded properly (it shouldn't download it), and reports a fail. It should see the 200 response and report OK.

I guess this should have been a bug report instead of todo.

> git annex fsck -f  cloud file --debug 
[2015-04-25 08:52:51 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2015-04-25 08:52:51 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2015-04-25 08:52:51 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..7f6b0b58ef362edd43fc89d8ef641e18cfebcb4a","-n1","--pretty=%H"]
[2015-04-25 08:52:51 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..19ca78351e854273ccb2b6a83fbaf7e2ed9b32da","-n1","--pretty=%H"]
[2015-04-25 08:52:51 BST] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2015-04-25 08:52:51 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--cached","-z","--","file"]
[2015-04-25 08:52:51 BST] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","check-attr","-z","--stdin","annex.backend","annex.numcopies","--"]
[2015-04-25 08:52:51 BST] read: git ["--version"]
fsck file [2015-04-25 08:52:51 BST] chat: gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
(checking cloud...) [2015-04-25 08:52:51 BST] String to sign: "HEAD\n\n\nSat, 25 Apr 2015 07:52:51 GMT\n/BUCKET/GPGHMACSHA1--6e7e880f80de44ddd845c6241198622b9102eaa1"
[2015-04-25 08:52:51 BST] Host: "BUCKET.s3-ap-southeast-2.amazonaws.com"
[2015-04-25 08:52:51 BST] Path: "/GPGHMACSHA1--6e7e880f80de44ddd845c6241198622b9102eaa1"
[2015-04-25 08:52:51 BST] Query string: ""
[2015-04-25 08:52:52 BST] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2015-04-25 08:52:52 BST] Response header 'x-amz-id-2': 'mLGNeVBzsS7BusAEsDpIyECSpmErjO0HLA/G04svlIgIwsD+K8FpquTvtuA/UoIJK5FrJV0geCE='
[2015-04-25 08:52:52 BST] Response header 'x-amz-request-id': '2E977E4D5EC072F6'
[2015-04-25 08:52:52 BST] Response header 'Date': 'Sat, 25 Apr 2015 07:52:53 GMT'
[2015-04-25 08:52:52 BST] Response header 'Last-Modified': 'Sun, 02 Nov 2014 05:42:48 GMT'
[2015-04-25 08:52:52 BST] Response header 'ETag': '"3bd1b766a68a305ba0495af36b353a07"'
[2015-04-25 08:52:52 BST] Response header 'Accept-Ranges': 'bytes'
[2015-04-25 08:52:52 BST] Response header 'Content-Type': ''
[2015-04-25 08:52:52 BST] Response header 'Content-Length': '775647'
[2015-04-25 08:52:52 BST] Response header 'Server': 'AmazonS3'
[2015-04-25 08:52:52 BST] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>

[2015-04-25 08:52:52 BST] String to sign: "GET\n\n\nSat, 25 Apr 2015 07:52:52 GMT\n/BUCKET/GPGHMACSHA1--6e7e880f80de44ddd845c6241198622b9102eaa1"
[2015-04-25 08:52:52 BST] Host: "BUCKET.s3-ap-southeast-2.amazonaws.com"
[2015-04-25 08:52:52 BST] Path: "/GPGHMACSHA1--6e7e880f80de44ddd845c6241198622b9102eaa1"
[2015-04-25 08:52:52 BST] Query string: ""
[2015-04-25 08:52:53 BST] Response status: Status {statusCode = 200, statusMessage = "OK"}
[2015-04-25 08:52:53 BST] Response header 'x-amz-id-2': 'QufZ3GyBdogXO8nVnqmJGU5mKZ7+I4DnU95aBUhy04f4158CGAIlp8vHrnGAMDVgLnLuM2TA70A='
[2015-04-25 08:52:53 BST] Response header 'x-amz-request-id': 'A4EBAB4DD9E11352'
[2015-04-25 08:52:53 BST] Response header 'Date': 'Sat, 25 Apr 2015 07:52:54 GMT'
[2015-04-25 08:52:53 BST] Response header 'Last-Modified': 'Sun, 02 Nov 2014 05:42:48 GMT'
[2015-04-25 08:52:53 BST] Response header 'ETag': '"3bd1b766a68a305ba0495af36b353a07"'
[2015-04-25 08:52:53 BST] Response header 'Accept-Ranges': 'bytes'
[2015-04-25 08:52:53 BST] Response header 'Content-Type': ''
[2015-04-25 08:52:53 BST] Response header 'Content-Length': '775647'
[2015-04-25 08:52:53 BST] Response header 'Server': 'AmazonS3'
[2015-04-25 08:52:53 BST] Response metadata: S3: request ID=A4EBAB4DD9E11352, x-amz-id-2=QufZ3GyBdogXO8nVnqmJGU5mKZ7+I4DnU95aBUhy04f4158CGAIlp8vHrnGAMDVgLnLuM2TA70A=
74%         189.4KB/s 1s[2015-04-25 08:52:56 BST] chat: gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--batch","--passphrase-fd","15","--decrypt"]
(checksum...)           
ok

In contrast, here is the output for a file that isn't in the remote

> git annex fsck -f  cloud notpresent --debug
git annex fsck -f  cloud notpresent --debug --numcopies 1
[2015-04-25 09:00:34 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2015-04-25 09:00:34 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2015-04-25 09:00:34 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..071d29cd21384f0ca129c76442c95c705b4ddc7b","-n1","--pretty=%H"]
[2015-04-25 09:00:34 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..7f6b0b58ef362edd43fc89d8ef641e18cfebcb4a","-n1","--pretty=%H"]
[2015-04-25 09:00:34 BST] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2015-04-25 09:00:34 BST] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--cached","-z","--","notpresent"]
fsck notpresent [2015-04-25 09:00:34 BST] chat: gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--decrypt"]
(checking cloud...) [2015-04-25 09:00:35 BST] String to sign: "HEAD\n\n\nSat, 25 Apr 2015 08:00:35 GMT\n/BUCKET/GPGHMACSHA1--e46ce4a11bc47622fb40affac818d6128bcd94bd"
[2015-04-25 09:00:35 BST] Host: "BUCKET.s3-ap-southeast-2.amazonaws.com"
[2015-04-25 09:00:35 BST] Path: "/GPGHMACSHA1--e46ce4a11bc47622fb40affac818d6128bcd94bd"
[2015-04-25 09:00:35 BST] Query string: ""
[2015-04-25 09:00:35 BST] Response status: Status {statusCode = 404, statusMessage = "Not Found"}
[2015-04-25 09:00:35 BST] Response header 'x-amz-request-id': 'AFA9934844CD547C'
[2015-04-25 09:00:35 BST] Response header 'x-amz-id-2': 'sDLFvcFj1pBh4Dhar/nxGGneN2ZP9XXPlI7GHyzuO1XiyW94b52pypel/1uSeFouWl8dXo4xOjc='
[2015-04-25 09:00:35 BST] Response header 'Content-Type': 'application/xml'
[2015-04-25 09:00:35 BST] Response header 'Transfer-Encoding': 'chunked'
[2015-04-25 09:00:35 BST] Response header 'Date': 'Sat, 25 Apr 2015 08:00:34 GMT'
[2015-04-25 09:00:35 BST] Response header 'Server': 'AmazonS3'
[2015-04-25 09:00:35 BST] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
ok
Comment by Walter Sat Apr 25 08:07:35 2015

It's HEADing the file, you can see it in the transcript.

Appears the error message could be better though.

Comment by joey Sat Apr 25 01:28:34 2015

Investigating further, when I create a bucket with the AWS library in ap-southeast-2, s3cmd info shows it is located there.

When I create a bucket with hS3 in ap-southeast-2, I get this interesting output:

joey@darkstar:~>s3cmd info s3://s3-43302240-076c-4420-8099-f2ef0b517e5f
s3://s3-43302240-076c-4420-8099-f2ef0b517e5f/ (bucket):
   Location:  ap-southeast-2
WARNING: Redirected to: s3-43302240-076c-4420-8099-f2ef0b517e5f.s3-ap-southeast-2.amazonaws.com
   Expiration Rule: none
   policy: none
   ACL:       joeyhess: FULL_CONTROL

So, it's apparently in the datacenter I asked for when making it, but here's a redirect again.

Comment by joey Sat Apr 25 01:18:51 2015

Playing around with it, I also can't reproduce it (using new or old versions of git-annex; it may be, as you allude, a problem in an old version of the s3 library).

Anyway, I'm happy that it's working now.

Comment by Walter Fri Apr 24 20:13:31 2015

The old behavior was accidental, and was never documented. When you use such undocumented behaviors, you're taking the risk of bugfixes breaking things. It's not fair to call that a "regression". If I had to worry about every bugfix breaking users who relied on the old buggy behavior in some way, I could just stop working now.

It might be reasonable to add the "expand wildcards rather than letting the shell do it" feature, as an option. Of course, it would need to be tested for every git-annex command and problems like the one that caused this bug to be noticed in the first place dealt with, for every git-annex command. Using --include and --exclude, which already work seems pretty reasonable instead.

Comment by joey Fri Apr 24 17:26:01 2015

I don't want to complicate git-annex more with configurable names for programs, and glacier is not at all special in this regard, any program could be installed under any namee. We pick non-conflicting names to avoid integration nightmares. Pick a name and I'll use it.

Comment by joey Fri Apr 24 17:23:10 2015

I was able to fully reproduce this bug! I installed the old version of git-annex that used the S3 library, and made a remote:

joey@darkstar:~/tmp/rrold>git annex initremote S3 type=S3 encryption=none datacenter=ap-southeast-1
initremote S3 (checking bucket...) (creating bucket in ap-southeast-1...) ok
joey@darkstar:~/tmp/rrold>git annex move me --to S3
move me (checking S3...) (to S3...) 
ok                      

Retrieval then failed using current git-annex.

Also, a remote made with the old git-annex with datacenter=ap-southeast-2 fails with the new git-annex.

Hypothesis: Either the new or the old S3 library must be confusing between ap-southeast-1/2. My guess is the old library was just creating and using buckets in the wrong place, at least when told to use ap-southeast-*.


I cannot reproduce anything about "the upload failed, but git annex thought it succeeded", nor do I see any indications in comments 11 or 12 that git-annex's location log is failing in any way. The sequence of commands in comment 11 ends with the get failing, as it should, since the remote has been switched to a different datacenter. I don't understand what you're seeing in comment #12 at all; it seems to just show it getting a file successfully.

Comment by joey Fri Apr 24 16:33:02 2015
glacier-cli would be a rather silly name to put in /usr/bin. How about glcr, as suggested here?
Comment by adamspiers Fri Apr 24 15:55:29 2015