Please describe the problem.
While download content from dropbox (via regular urls associated with the files, so "web" remote), some fail some time with error code returned 500 (internal server failure without explicit reason), which causes git-annex get also to fail if that was the only source for the file to try.
What steps will reproduce the problem?
This is all happening with http://datasets.datalad.org/workshops/nih-2017/ds000114/derivatives/freesurfer/ repository (on eg https://dl.dropboxusercontent.com/s/sn4et1e3d2run9g/rh.aparc.dktatlas.annot?dl=0 url ), but for testing wget/curl I just created http://www.onerussian.com/tmp/errors/500 which would always return 500.
What version of git-annex are you using? On what operating system?
6.20180316+gitg308f3ecf6
Please provide any additional information below.
here are options for wget and curl which could help us out here:
# to make wget retry
> wget --retry-on-http-error=500 http://www.onerussian.com/tmp/errors/500
# to make curl retry, just needs --retry 10 which then considers 5xx errors intermittent
> curl --retry 10 http://www.onerussian.com/tmp/errors/500
Could git-annex add those options to curl/wget invocations for more robust access to the web remote?
Note that with --retry 10, curl will back off 10 times, with wait doubling, so it could get stuck for 20+ minutes. That seems too long, but the right number of retries seems to depend on how overloaded the http server is; you may need a number that would otherwise be excessive in order to get a high enough probability of success.
Also worth noting that http has status codes such as 503 that are intended to be used when the client should wait and retry; 500 is not such a code.
If this is done at the wget/curl level, it will also need to be done when using the http-client library (which does not currently retry on any code, AFAICs).
And, it could just as easily be a S3 or webdav server that is throwing the http retry codes, and the libraries for those will have their own retrying behavior. (And it could even be a ssh server ot other non-http protocol that connections to fail intermittently.)
Putting all this together, I'm wondering if the http level is the right place to put this retrying. It's not a matter of complying with the http spec; it seems to need user configuration in order to handle their particular use case.
git-annex already does generic retrying as long as some data was received, to recover from broken connections. That could be extended to support a config option that enables a number of retries.
I've implemented
remote.<name>.annex-retry
,annex.retry
,remote.<name>.annex-retry-delay
,annex.retry-delay
configs, so you can have full control over retry behavior for remotes.Since these are fully generic, not at the HTTP level, they'll make any and all transfer failures be retried, no matter why it failed. Which could be a good thing or a bad thing..