Upload to S3 fails

Please describe the problem.

Uploading a 21GB file to an S3 special remote fails. It will generally fail somewhere at about 3-15%. I am using the new chunking feature, with chunks set to 25MiB.

What steps will reproduce the problem?

$ git annex copy my-big-file.tar.bz --to s3
copy my-big-file.tar.bz (gpg) (checking s3...) (to s3...)
13%       863.8KB/s 6h0m
  ErrorClosed
failed
git-annex: copy: 1 failed

What version of git-annex are you using? On what operating system?

Running on Arch Linux.

git-annex version: 5.20140818-g10bf03a
build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV Inotify DBus DesktopNotify XMPP DNS Feeds Quvi TDFA CryptoHash
key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 WORM URL
remote types: git gcrypt S3 bup directory rsync web webdav tahoe glacier ddar hook external
local repository version: 5
supported repository version: 5
upgrade supported from repository versions: 0 1 2 4

Please provide any additional information below.

If I fire up the web app and open the log, the end looks like this:

...

3%       857.3KB/s 6h46m
3%       857.3KB/s 6h46m
3%       857.3KB/s 6h46m
3%       857.4KB/s 6h46m
3%       857.4KB/s 6h46m
3%       857.5KB/s 6h46m
3%       857.5KB/s 6h46m
3%       857.6KB/s 6h46m
3%       857.6KB/s 6h46m
3%       857.6KB/s 6h46m
3%       857.7KB/s 6h46m
3%       857.7KB/s 6h46m
3%       857.8KB/s 6h46m
3%       857.8KB/s 6h46m
3%       857.8KB/s 6h46m
3%       857.9KB/s 6h46m
3%       857.9KB/s 6h46m
3%       858.0KB/s 6h46m
3%       858.0KB/s 6h46m
3%       858.1KB/s 6h46m
3%       858.1KB/s 6h45m
3%       858.1KB/s 6h45mmux_client_request_session: read from master failed: Broken pipe

RSS Atom

comment 1

This is using the old hS3 library. So, each chunk is sent using a new http connection. It seems that the connection must be being closed by S3 part way through the upload of a chunk.

It may be that the new aws library somehow avoids this problem. So, a git-annex built with the s3-aws branch merged in may help with this bug. OTOH, that new branch makes a single http connection be reused for all the chunks in a file, so it might also make things worse.

Comment by joeyh.name — Thu Sep 18 18:49:43 2014

Remove comment

comment 2

If you're using the new chunking system, git-annex should support resuming the upload to S3. Next time you try to send the file, it should find the chunks that were successfully sent, and resume at the chunk where it failed.

Supporting this even for encrypted uploads was a major benefit of the new chunking system, so I hope it works...?

Comment by joeyh.name — Thu Sep 18 18:52:17 2014

Remove comment

comment 3

Was the version I listed above not using the new chunking from the s3-aws branch? How do I determine if my version of git-annex was built with the new or old chunking?

Comment by annexuser — Fri Sep 19 04:43:42 2014

Remove comment

comment 4

Your version supports both new and old style chunking. Which is used depends on how the S3 remote was configured when it was set up. It can't really be changed w/o re-setting up the remote. You can check which is used by git show git-annex:remote.log, find the line for the UUID of the remote, and see if it has chunk= (new chunking) or chunksize= (old chunking).

Comment by joeyh.name — Fri Sep 19 18:33:17 2014

Remove comment

comment 5

I am experiencing similar behavior on Ubuntu Trusty (x86_64) using a prebuilt Linux release:

Linux hostname 3.13.0-36-generic #63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Distributor ID: Ubuntu
Description:    Ubuntu 14.04.1 LTS
Release:    14.04
Codename:   trusty

git-annex version: 5.20141016-g26b38fd
build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV Inotify DBus DesktopNotify XMPP DNS Feeds Quvi TDFA CryptoHash
key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 WORM URL
remote types: git gcrypt S3 bup directory rsync web webdav tahoe glacier ddar hook external

Copying files to S3 consistently fails both from the command line and via the assistant:

[2014-10-20 22:34:32 CDT] read: git ["--git-dir=/home/user/git-annex/.git","--work-tree=/home/user/git-annex","-c","core.bare=false","show-ref","git-annex"]
[2014-10-20 22:34:32 CDT] read: git ["--git-dir=/home/user/git-annex/.git","--work-tree=/home/user/git-annex","-c","core.bare=false","show-ref","--hash","refs/heads/git-annex"]
[2014-10-20 22:34:32 CDT] read: git ["--git-dir=/home/user/git-annex/.git","--work-tree=/home/user/git-annex","-c","core.bare=false","log","refs/heads/git-annex..78e9b6b85f3b453d8ed4f66f63ff09e03ce13d06","-n1","--pretty=%H"]
[2014-10-20 22:34:32 CDT] read: git ["--git-dir=/home/user/git-annex/.git","--work-tree=/home/user/git-annex","-c","core.bare=false","log","refs/heads/git-annex..658720ba59a2fefee89c908b972971ca901f84dc","-n1","--pretty=%H"]
[2014-10-20 22:34:32 CDT] chat: git ["--git-dir=/home/user/git-annex/.git","--work-tree=/home/user/git-annex","-c","core.bare=false","cat-file","--batch"]
[2014-10-20 22:34:32 CDT] read: git ["--git-dir=/home/user/git-annex/.git","--work-tree=/home/user/git-annex","-c","core.bare=false","ls-files","--cached","-z","--","storage/data.bin"]
[2014-10-20 22:34:32 CDT] chat: git ["--git-dir=/home/user/git-annex/.git","--work-tree=/home/user/git-annex","-c","core.bare=false","cat-file","--batch"]
copy storage/data.bin (gpg) (checking S3git-annex...) (to S3git-annex...) 
0%            0.0 B/s 0s[2014-10-20 22:34:33 CDT] chat: gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--batch","--passphrase-fd","14","--symmetric","--force-mdc","--no-textmode"]
8%         512.0KB/s 21s[2014-10-20 22:34:35 CDT] chat: gpg ["--batch","--no-tty","--use-agent","--quiet","--trust-model","always","--batch","--passphrase-fd","14","--symmetric","--force-mdc","--no-textmode"]
8%         528.0KB/s 21s
  ErrorClosed
failed                  
git-annex: copy: 1 failed

Two files (out of several hundred) have succeeded.

Any ideas?

Comment by Thedward — Tue Oct 21 04:10:16 2014

Remove comment

comment 6

I need to know if the S3 remote is configured to use the new style chunking feature, and what size chunks it is configured to use. I have already explained how to check that in this thread.

I also need to know if retrying the upload after it fails lets it resume where it left off.

Comment by joeyh.name — Tue Oct 21 16:31:02 2014

Remove comment

comment 7

It is running the new style chunking (chunk=1MiB).

It does not appear to resume when it tries again. If I try copying a file to the remote from the command line, it always starts at 0% and dies at some point before 100% even if it has tried to copy that file before.

Comment by Thedward — Tue Oct 21 17:35:16 2014

Remove comment

comment 8

Additional info: I created both of the related git-annex repositories yesterday via the webapp. Then imported a number of files to each one. Then connected them via xmpp. Then created the S3 remote via the webapp so they could actually share files. I am using an IAM identity for S3 instead of my root access key; it has full S3 access (and data IS showing up in the bucket, so it's not a simple permissions problem).

Comment by Thedward — Tue Oct 21 17:47:35 2014

Remove comment

comment 9

How big is the file that it fails to copy?

Comment by joeyh.name — Tue Oct 21 20:22:52 2014

Remove comment

comment 10

The only files that succeeded were small text files. The other files — 3-200MiB — all failed.

Comment by Thedward — Tue Oct 21 21:25:57 2014

Remove comment

comment 11

When it resumes, it will start at 0% but jump forward to the resume point pretty quickly, after verifying which chunks have already been sent. If any full chunk gets transferred, I'd expect it to resume. This may not be very obvious it's happening for smaller files.

I have been running git annex testremote against S3 special remotes today, and have not managed to reproduce this problem (using either the old S3 or the new AWS libraries). It could be anything, including a problem with your network or the network between you and the S3 endpoint. Have you tried using a different S3 region?

Comment by joeyh.name — Thu Oct 23 21:05:15 2014

Remove comment

Add a comment