projects/datalad/bugs-done/regression - yt: prefix for "regular" urls
yoh
http://git-annex.branchable.com/projects/datalad/bugs-done/regression_-_yt__58___prefix_for___34__regular__34___urls/
git-annex
ikiwiki
2023-01-05T17:30:31Z
comment 1
http://git-annex.branchable.com/projects/datalad/bugs-done/regression_-_yt__58___prefix_for___34__regular__34___urls/comment_1_56cd6e167c86f4419ac0971922f2e5d1/
joey
2023-01-05T17:30:31Z
2017-12-06T15:51:03Z
<p>Reporoduced by putting "Lots of abouts" in a http://localhost/about.txt file,
and running git annex addurl --file with an existing file and that url.</p>
comment 2
http://git-annex.branchable.com/projects/datalad/bugs-done/regression_-_yt__58___prefix_for___34__regular__34___urls/comment_2_1cebe1385d2b8a244afc259a78f72ada/
joey
2023-01-05T17:30:31Z
2017-12-06T16:02:55Z
comment 2
http://git-annex.branchable.com/projects/datalad/bugs-done/regression_-_yt__58___prefix_for___34__regular__34___urls/comment_2_ca0bf1079acc89a4a79e55f8f58403c0/
yarikoptic
2023-01-05T17:30:31Z
2017-12-08T02:32:15Z
<p>Thanks for the fix... seems to be a bit incomplete though -- in the --fast mode, url still has yt: prefix, reflected in the key as well:</p>
<div class="highlight-sh"><pre class="hl">$<span class="hl opt">></span> <span class="hl kwc">ls</span> <span class="hl kwb">-l</span>
total <span class="hl num">4</span>
lrwxrwxrwx <span class="hl num">1</span> yoh yoh <span class="hl num">110</span> Dec <span class="hl num">7 21</span><span class="hl opt">:</span><span class="hl num">27 1</span><span class="hl kwb">-copy</span>.dat <span class="hl opt">-></span> .git<span class="hl opt">/</span>annex<span class="hl opt">/</span>objects<span class="hl opt">/</span>gw<span class="hl opt">/</span>pw<span class="hl opt">/</span>URL--yt<span class="hl opt">&</span>chttp<span class="hl opt">&</span>c<span class="hl opt">%%</span><span class="hl num">127.0.0.1</span><span class="hl opt">&</span>c34337<span class="hl opt">%</span>d1<span class="hl opt">%</span><span class="hl num">1</span>.dat<span class="hl opt">/</span>URL--yt<span class="hl opt">&</span>chttp<span class="hl opt">&</span>c<span class="hl opt">%%</span><span class="hl num">127.0.0.1</span><span class="hl opt">&</span>c34337<span class="hl opt">%</span>d1<span class="hl opt">%</span><span class="hl num">1</span>.dat
$<span class="hl opt">></span> git annex <span class="hl kwc">whereis</span> <span class="hl num">1</span><span class="hl kwb">-copy</span>.dat
<span class="hl kwc">whereis</span> <span class="hl num">1</span><span class="hl kwb">-copy</span>.dat <span class="hl opt">(</span><span class="hl num">1</span> copy<span class="hl opt">)</span>
<span class="hl num">00000000</span><span class="hl kwb">-0000-0000-0000-000000000001 --</span> web
web<span class="hl opt">:</span> yt<span class="hl opt">:</span>http<span class="hl opt">://</span><span class="hl num">127.0.0.1</span><span class="hl opt">:</span><span class="hl num">34337</span><span class="hl opt">/</span>d<span class="hl num">1</span><span class="hl opt">/</span><span class="hl num">1</span>.dat
ok
$<span class="hl opt">></span> git annex version
git-annex version<span class="hl opt">:</span> <span class="hl num">6.20171206</span><span class="hl opt">+</span>gitgc6e4bc0a2-1~ndall<span class="hl opt">+</span><span class="hl num">1</span>
</pre></div>
comment 4
http://git-annex.branchable.com/projects/datalad/bugs-done/regression_-_yt__58___prefix_for___34__regular__34___urls/comment_4_f05e1047c5d22dddba21a9faa3397f0d/
joey
2023-01-05T17:30:31Z
2017-12-08T18:48:46Z
<p>That one happened with <code>git annex addurl --fast $url</code> so a different code
path. Had to add a html page check to youtubeDlFileName to fix it.</p>
comment 5
http://git-annex.branchable.com/projects/datalad/bugs-done/regression_-_yt__58___prefix_for___34__regular__34___urls/comment_5_811ba6ec07f6f30c701201178cdb00ac/
yarikoptic
2023-01-05T17:30:31Z
2017-12-11T05:27:15Z
<p>ok, tested with 6.20171208+gitg01f78e877-1~ndall+1 -- --fast mode issue is indeed resolved. Thanks!
Failing tests now only relate to our special remote (datalad-archives) which interfaces urls, so I thought it might relate... I will try to distill more info tomorrow unless you beat me to it figuring out where that regression could be (upon quick look didn't spot any yt: anywhere, so probably another issue)</p>
comment 6
http://git-annex.branchable.com/projects/datalad/bugs-done/regression_-_yt__58___prefix_for___34__regular__34___urls/comment_6_2cd156baab77dc3b3c248a2368b59040/
yarikoptic
2023-01-05T17:30:31Z
2017-12-11T14:40:26Z
<p>ok, first initial bit of information: there is a difference in interactions with the special remote now. Here is the diff between old and new runs (sorry - content was also changing so there is difference in keys as well):</p>
<div class="highlight-sh"><pre class="hl">$<span class="hl opt">></span> <span class="hl kwc">diff</span> <span class="hl kwb">-Nar -u6</span> <span class="hl opt">/</span>tmp<span class="hl opt">/</span>datalad_temp_tree_check_basic_scenarioVEOIhb<span class="hl opt">/</span>.git<span class="hl opt">/</span>bin<span class="hl opt">/</span>git-annex-remote-datalad-archive <span class="hl opt">/</span>tmp<span class="hl opt">/</span>datalad_temp_tree_check_basic_scenarioIfsLo<span class="hl num">7</span><span class="hl opt">/</span>.git<span class="hl opt">/</span>bin<span class="hl opt">/</span>git-annex-remote-datalad-archive
<span class="hl kwb">---</span> <span class="hl opt">/</span>tmp<span class="hl opt">/</span>datalad_temp_tree_check_basic_scenarioVEOIhb<span class="hl opt">/</span>.git<span class="hl opt">/</span>bin<span class="hl opt">/</span>git-annex-remote-datalad-archive <span class="hl num">2017</span><span class="hl kwb">-12-11</span> <span class="hl num">08</span><span class="hl opt">:</span><span class="hl num">56</span><span class="hl opt">:</span><span class="hl num">38.381298365</span> <span class="hl kwb">-0500</span>
<span class="hl opt">+++ /</span>tmp<span class="hl opt">/</span>datalad_temp_tree_check_basic_scenarioIfsLo<span class="hl num">7</span><span class="hl opt">/</span>.git<span class="hl opt">/</span>bin<span class="hl opt">/</span>git-annex-remote-datalad-archive <span class="hl num">2017</span><span class="hl kwb">-12-11</span> <span class="hl num">08</span><span class="hl opt">:</span><span class="hl num">56</span><span class="hl opt">:</span><span class="hl num">14.885677071</span> <span class="hl kwb">-0500</span>
@@ <span class="hl kwb">-48</span><span class="hl opt">,</span><span class="hl num">66</span> <span class="hl opt">+</span><span class="hl num">48</span><span class="hl opt">,</span><span class="hl num">72</span> @@
<span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'add', '--debug', '--json', 'simple.txt']</span>
<span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'commit', '-m', 'Added the load file']</span>
<span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'lookupkey', '--debug', 'a.tar.gz']</span>
<span class="hl opt">-</span><span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'addurl', '--debug', '--relaxed', '--file=simple.txt', 'dl+archive:SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+']</span>
<span class="hl opt">+</span><span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'addurl', '--debug', '--relaxed', '--file=simple.txt', 'dl+archive:SHA256E-s173--db0a9680f8d15578de8e4a5b5c1e87f36d9372d6118fb24c1c60f390e71ad3c1.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+']</span>
<span class="hl slc">#send VERSION 1</span>
<span class="hl slc">#recv PREPARE</span>
<span class="hl slc">#send PREPARE-SUCCESS</span>
<span class="hl slc">#recv GETCOST</span>
<span class="hl slc">#send COST 500</span>
<span class="hl slc">#recv GETAVAILABILITY</span>
<span class="hl slc">#send AVAILABILITY LOCAL</span>
<span class="hl opt">-</span><span class="hl slc">#recv CLAIMURL dl+archive:SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl opt">-</span><span class="hl slc">#send DEBUG Claiming url 'dl+archive:SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+'</span>
<span class="hl opt">+</span><span class="hl slc">#recv CLAIMURL dl+archive:SHA256E-s173--db0a9680f8d15578de8e4a5b5c1e87f36d9372d6118fb24c1c60f390e71ad3c1.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl opt">+</span><span class="hl slc">#send DEBUG Claiming url 'dl+archive:SHA256E-s173--db0a9680f8d15578de8e4a5b5c1e87f36d9372d6118fb24c1c60f390e71ad3c1.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+'</span>
<span class="hl slc">#send CLAIMURL-SUCCESS</span>
<span class="hl opt">-</span><span class="hl slc">#recv CHECKURL dl+archive:SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl opt">+</span><span class="hl slc">#recv CHECKURL dl+archive:SHA256E-s173--db0a9680f8d15578de8e4a5b5c1e87f36d9372d6118fb24c1c60f390e71ad3c1.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl slc">#send CHECKURL-CONTENTS UNKNOWN</span>
<span class="hl slc">#recv </span>
<span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'drop', '--debug', '--json', 'simple.txt']</span>
<span class="hl slc">#send VERSION 1</span>
<span class="hl slc">#recv PREPARE</span>
<span class="hl slc">#send PREPARE-SUCCESS</span>
<span class="hl slc">#recv CHECKPRESENT SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt</span>
<span class="hl slc">#send GETURLS SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt dl+archive:</span>
<span class="hl opt">-</span><span class="hl slc">#recv VALUE dl+archive:SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl opt">+</span><span class="hl slc">#recv VALUE dl+archive:SHA256E-s173--db0a9680f8d15578de8e4a5b5c1e87f36d9372d6118fb24c1c60f390e71ad3c1.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl slc">#recv VALUE </span>
<span class="hl slc">#send CHECKPRESENT-SUCCESS SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt</span>
<span class="hl slc">#recv </span>
<span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'whereis', '--debug', '--json', 'simple.txt']</span>
<span class="hl slc">#send VERSION 1</span>
<span class="hl slc">#recv PREPARE</span>
<span class="hl slc">#send PREPARE-SUCCESS</span>
<span class="hl slc">#recv WHEREIS SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt</span>
<span class="hl slc">#send WHEREIS-FAILURE</span>
<span class="hl opt">-</span><span class="hl slc">#recv CLAIMURL dl+archive:SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl opt">-</span><span class="hl slc">#send DEBUG Claiming url 'dl+archive:SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+'</span>
<span class="hl opt">-</span><span class="hl slc">#send CLAIMURL-SUCCESS</span>
<span class="hl slc">#recv </span>
<span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'find', '--debug', '--json', '--not', '--in', 'here', 'simple.txt']</span>
<span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'get', '--debug', '--json', '--json-progress', 'simple.txt']</span>
<span class="hl slc">#send VERSION 1</span>
<span class="hl slc">#recv PREPARE</span>
<span class="hl slc">#send PREPARE-SUCCESS</span>
<span class="hl slc">#recv TRANSFER RETRIEVE SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt .git/annex/tmp/SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt</span>
<span class="hl slc">#send GETURLS SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt dl+archive:</span>
<span class="hl opt">-</span><span class="hl slc">#recv VALUE dl+archive:SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl opt">+</span><span class="hl slc">#recv VALUE dl+archive:SHA256E-s173--db0a9680f8d15578de8e4a5b5c1e87f36d9372d6118fb24c1c60f390e71ad3c1.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl slc">#recv VALUE </span>
<span class="hl slc">#send TRANSFER-SUCCESS RETRIEVE SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt</span>
<span class="hl slc">#recv </span>
<span class="hl opt">-</span><span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'rmurl', '--debug', 'simple.txt', 'dl+archive:SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+']</span>
<span class="hl opt">+</span><span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'rmurl', '--debug', 'simple.txt', 'dl+archive:SHA256E-s173--db0a9680f8d15578de8e4a5b5c1e87f36d9372d6118fb24c1c60f390e71ad3c1.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+']</span>
<span class="hl opt">+</span><span class="hl slc">#send VERSION 1</span>
<span class="hl opt">+</span><span class="hl slc">#recv PREPARE</span>
<span class="hl opt">+</span><span class="hl slc">#send PREPARE-SUCCESS</span>
<span class="hl opt">+</span><span class="hl slc">#recv CLAIMURL dl+archive:SHA256E-s173--db0a9680f8d15578de8e4a5b5c1e87f36d9372d6118fb24c1c60f390e71ad3c1.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl opt">+</span><span class="hl slc">#send DEBUG Claiming url 'dl+archive:SHA256E-s173--db0a9680f8d15578de8e4a5b5c1e87f36d9372d6118fb24c1c60f390e71ad3c1.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+'</span>
<span class="hl opt">+</span><span class="hl slc">#send CLAIMURL-SUCCESS</span>
<span class="hl opt">+</span><span class="hl slc">#recv </span>
<span class="hl opt">+</span>
<span class="hl opt">+</span><span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'drop', '--debug', '--json', 'simple.txt']</span>
send VERSION <span class="hl num">1</span>
recv PREPARE
send PREPARE-SUCCESS
<span class="hl kwb">-recv</span> CLAIMURL dl<span class="hl opt">+</span>archive<span class="hl opt">:</span>SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.<span class="hl kwc">tar</span>.gz<span class="hl slc">#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl kwb">-send</span> DEBUG Claiming url <span class="hl str">'dl+archive:SHA256E-s172--70cf6dd95738e5d3672a7139a2785b0a979f0f7955d0f6da0d94cc03c84a63b7.tar.gz#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+'</span>
<span class="hl kwb">-send</span> CLAIMURL-SUCCESS
<span class="hl opt">+</span>recv CHECKPRESENT SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt
<span class="hl opt">+</span>send GETURLS SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt dl<span class="hl opt">+</span>archive<span class="hl opt">:</span>
<span class="hl opt">+</span>recv VALUE dl<span class="hl opt">+</span>archive<span class="hl opt">:</span>SHA256E-s173--db0a9680f8d15578de8e4a5b5c1e87f36d9372d6118fb24c1c60f390e71ad3c1.<span class="hl kwc">tar</span>.gz<span class="hl slc">#path=a/d/+%22%27%3Ba%26b%26cd+%60%7C+</span>
<span class="hl opt">+</span>recv VALUE
<span class="hl opt">+</span>send CHECKPRESENT-SUCCESS SHA256E-s3--a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3.txt
recv
<span class="hl opt">-</span>
<span class="hl opt">-</span><span class="hl slc">### ['git', '-c', 'receive.autogc=0', '-c', 'gc.auto=0', 'annex', 'drop', '--debug', '--json', 'simple.txt']</span>
</pre></div>
<p><a href="https://github.com/datalad/datalad/blob/master/datalad/customremotes/tests/test_archives.py#L86">our test</a> verifies that annex refuses to drop the content if we remove the dl+archive link for the key, and now it fails</p>
comment 7
http://git-annex.branchable.com/projects/datalad/bugs-done/regression_-_yt__58___prefix_for___34__regular__34___urls/comment_7_9f3f7c04524063960f1451c90948a41f/
joey
2023-01-05T17:30:31Z
2017-12-11T17:12:41Z
<p>The diff shows <code>git annex whereis</code> used to send CLAIMURL to the external,
and no longer does.</p>
<p>Reproduction recipe using git-annex-remote-ipfs:</p>
<pre><code>git annex initremote ipfs type=external externaltype=ipfs encryption=none
date > somefile
git annex add somefile
git annex addurl --debug --relaxed ipfs:dummy --file somefile
</code></pre>
<p>Which results in <code>git annex whereis somefile</code> saying it's present in ipfs,
but not listing the ipfs url for it. And again, whereis does not sent
CLAIMURL.</p>
<p>And, in log.web, I see why:</p>
<pre><code>+1513013502.312530881s 1 ipfs:dummy
</code></pre>
<p>That is not an OtherDownloader url, it's lacking the ":" prefix.</p>
<p>This seems to be particular to the addurl --relaxed --file code path;
letting addurl add a new file does result in an OtherDownloader url
being recorded.</p>
<p>I guess the reason the test suite then fails is, the url it removes is not
the one git-annex recorded, and so git-annex still thinks it's at the wrongly
recorded url, and so dropping succeeds.</p>
comment 8
http://git-annex.branchable.com/projects/datalad/bugs-done/regression_-_yt__58___prefix_for___34__regular__34___urls/comment_8_55b111079ecb9732a42d3486160e77b5/
joey
2023-01-05T17:30:31Z
2017-12-11T17:43:30Z
<p>Above problem fixed in [[!commit bd7f8be121a5cd310ffbc32c6020326ef437a151]</p>
<p>Good thing you have such a good test suite.</p>