projects/datalad/bugs-done/fails to verify presence via http while wget fetches it just fineyohhttp://git-annex.branchable.com/projects/datalad/bugs-done/fails_to_verify_presence_via_http_while_wget_fetches_it_just_fine/git-annexikiwiki2023-01-05T17:30:31Zcomment 1http://git-annex.branchable.com/projects/datalad/bugs-done/fails_to_verify_presence_via_http_while_wget_fetches_it_just_fine/comment_1_fa6649208f1882a6bb412ba40cf57fec/joey2023-01-05T17:30:31Z2017-08-15T17:28:20Z
<p>The normal reason for this to happen is if the size of the file
on the website has changed. git-annex checks the reported size and if it
differs from the versioned file, it knows that the website no longer
contains the same file.</p>
<p>In this case, it seems to be a cgi program generating a zip file, and the
program actually generated two different zip files when I hit it twice with
wget. (So if git-annex actually did drop the only copy of the version you
downloaded, you'd not be able to download it again. Not that git-annex can know
that; this kind of thing is why trusting the web is not a good idea..) They did
have the same size, but it looks like the web server is not sending a size
header anyway.</p>
<p>The actual problem is the web server takes a long time to answer a HEAD request
for this URL. It takes 35 seconds before curl is able to HEAD it. I suspect
it's generating the 300 mb zip file before it gets around to finishing
the HEAD request. Not the greatest server behavior, all around.</p>
<p>That breaks http-client due to its default 30 second timeout. So, will remove that timeout then.</p>