I set up a new git annex repo with an S3 remote. Uploading small files works file, but the process fails on larger files (>1 GB) with the following error.
copy prosper/loaninfo.p (checking s3...) (to s3...)
99% 10.7MB/s 0s
S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "RequestTimeout", s3ErrorMessage = "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.", s3ErrorResource = Nothing, s3ErrorHostId = Just "< a base64 encoded string>", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
I tried these different options while setting up remote, but nothing worked. partsize=1GiB partsize=400MiB chunk=100MiB
What am I doing wrong? Should I try an even smaller chunk sie
Googling for that message suggests it's pretty common for large file uploads amoung different AWS implementations for different programming languages. The error message is coming from AWS not git-annex.
If this is happening with a single file transfer, I'm pretty sure git-annex is not keeping the S3 connection idle. (If one file transfer succeeded, and then a later one in the same git-annex run failed, that might indicate that the S3 connection was being reused and timed out in between.)
Based on things like https://github.com/aws/aws-cli/issues/401, this seems to be down to a network connection problem, especially on residential internet connections, such as the link getting saturated by something else and so the transfer stalling out.
I think that finding a chunk size that works is your best bet. That will let uploads be resumed more or less where they left off.
It might make sense for git-annex to retry an upload that fails this way, but imagine if it were a non-chunked 1 gb file and it failed part way through every time. That would waste a lot of bandwidth.