bugs/git-annex drop fails to access file:/// target URL on Windowsgit-annexhttp://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/git-annexikiwiki2023-03-27T17:58:28Zseems just ignore errors while adding urls to "unsupported" urlshttp://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_1_f0d30a953f072f8d9a929a4a6ba69914/EbvxpTI_xP9Aod7Mg4cwGhgjrCrdM5s- [me.yahoo.com/a]2016-03-12T16:58:09Z2015-08-19T21:59:09Z
<p>actually situation is somewhat similar on linux as well in a sense that annex manages to addurl a file using e.g. file:///./data url (not sure even if legit) without puking but contains wrong content (empty):</p>
<div class="highlight-sh"><pre class="hl"><span class="hl opt">%</span> mkdir XXX
<span class="hl opt">%</span> <span class="hl kwb">cd</span> XXX
<span class="hl opt">%</span> git init<span class="hl opt">;</span> git annex init
Initialized empty Git repository <span class="hl kwa">in</span> <span class="hl opt">/</span>tmp<span class="hl opt">/</span>XXX<span class="hl opt">/</span>.git<span class="hl opt">/</span>
init ok
<span class="hl opt">(</span>recording state <span class="hl kwa">in</span> git...<span class="hl opt">)</span>
<span class="hl opt">%</span> <span class="hl kwb">echo</span> <span class="hl num">123</span> <span class="hl opt">></span> data
<span class="hl opt">%</span> git annex addurl <span class="hl kwb">--file</span><span class="hl opt">=</span>annexed <span class="hl kwc">file</span><span class="hl opt">:///</span>.<span class="hl opt">/</span>data
addurl annexed <span class="hl opt">(</span>downloading <span class="hl kwc">file</span><span class="hl opt">:///</span>.<span class="hl opt">/</span>data ...<span class="hl opt">)</span>
ok
<span class="hl opt">(</span>recording state <span class="hl kwa">in</span> git...<span class="hl opt">)</span>
<span class="hl opt">%</span> <span class="hl kwc">cat</span> annexed
<span class="hl opt">%</span> git annex drop annexed
drop annexed <span class="hl opt">(</span>checking <span class="hl kwc">file</span><span class="hl opt">:///</span>.<span class="hl opt">/</span>data...<span class="hl opt">) (</span>unsafe<span class="hl opt">)</span>
Could only verify the existence of <span class="hl num">0</span> out of <span class="hl num">1</span> necessary copies
Rather than dropping this <span class="hl kwc">file</span><span class="hl opt">,</span> try using<span class="hl opt">:</span> git annex move
<span class="hl opt">(</span>Use <span class="hl kwb">--force</span> to override this check<span class="hl opt">,</span> or adjust numcopies.<span class="hl opt">)</span>
failed
git-annex<span class="hl opt">:</span> drop<span class="hl opt">:</span> <span class="hl num">1</span> failed
<span class="hl opt">%</span> git annex addurl <span class="hl kwb">--file</span><span class="hl opt">=</span>annexedfull <span class="hl kwc">file</span><span class="hl opt">://</span><span class="hl kwd">$PWD</span><span class="hl opt">/</span>data
addurl annexedfull <span class="hl opt">(</span>downloading <span class="hl kwc">file</span><span class="hl opt">:///</span>tmp<span class="hl opt">/</span>XXX<span class="hl opt">/</span>data ...<span class="hl opt">)</span>
<span class="hl slc">######################################################################## 100.0%</span>
ok
<span class="hl opt">(</span>recording state <span class="hl kwa">in</span> git...<span class="hl opt">)</span>
<span class="hl opt">%</span> <span class="hl kwc">cat</span> annexedfull
<span class="hl num">123</span>
<span class="hl opt">%</span> git annex drop annexedfull
drop annexedfull <span class="hl opt">(</span>checking <span class="hl kwc">file</span><span class="hl opt">:///</span>tmp<span class="hl opt">/</span>XXX<span class="hl opt">/</span>data...<span class="hl opt">)</span> ok
<span class="hl opt">(</span>recording state <span class="hl kwa">in</span> git...<span class="hl opt">)</span>
<span class="hl opt">%</span> annex version
zsh<span class="hl opt">:</span> <span class="hl kwb">command</span> not found<span class="hl opt">:</span> annex
<span class="hl opt">%</span> git annex version
git-annex version<span class="hl opt">:</span> <span class="hl num">5.20150812</span><span class="hl kwb">-2</span>
</pre></div>
comment 2http://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_2_504ea07f798838710cdbf6133135c815/joey2016-03-12T16:58:09Z2015-09-09T16:36:13Z
<p>I can't reproduce that behavior on linux.</p>
<pre><code>joey@darkstar:~/tmp/xx>git annex addurl --file=annexed file:///./data
addurl annexed (downloading file:///./data ...)
curl: (37) Couldn't open file /data
</code></pre>
<p>Here, curl seems to be doing the right thing; the url is not relative; it's
for <code>/./data</code>, which doesn't exist.</p>
<p>Relative <code>file:</code> urls shouldn't be valid at all, I think?</p>
<hr />
<p>For checking if a file: url exits, git-annex parses the url and stats
the file itself. The first screenshot
shows this check for file: url existance failing on Windows for
the url <code>file:///C:/tmp/test/test.dat</code></p>
<p>I guess this might come down to problems with parsing file: urls on
Windows; seems especially complicated by drive letters. git-annex and curl
seem to parse this url in different ways.</p>
<p>Checking how that url parses, the uriScheme is "file:" and the uriPath is
"/C:/tmp/test/test.dat". So, it seems clear why it fails to stat that file.</p>
<p>Is there actually a valid way to produce a file: url that refers to a drive
letter? curl seems to think so, since it found the file when <code>git annex
addurl</code> ran it. I don't know if the above parse is valid, but it's not
git-annex's code doing the parse, but the URI parsing library.</p>
<p>(Possibly related bug report:
<a href="http://git-annex.branchable.com/bugs/git_annex_test_fails_when_run_through_powershell/">http://git-annex.branchable.com/bugs/git_annex_test_fails_when_run_through_powershell/</a>
)</p>
file:/// must be forced to file:// on windowshttp://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_3_a9bd50ebe805afefab103da31b88cf89/mih2023-01-27T09:19:10Z2023-01-27T09:19:10Z
<p>I just want to leave a note that the <code>file:</code> URL scheme handling on windows is still problematic. A file URL (e.g. generated by Python's <code>pathlib.as_uri()</code>) like the following does not work:</p>
<pre><code>file:///C:/DLTMP/myarchive.zip
</code></pre>
<p>it is parsed to the path <code>/C:/DLTMP/myarchive.zip</code> (as reported by OP), which is invalid.</p>
<p>A workaround is to replace <code>file:///</code> with <code>file://</code> (remove one slash). This makes git-annex accept the URL and parse it to the correct path <code>C:/DLTMP/myarchive.zip</code>.</p>
<p>However, such a "double-slash" URL is invalid. <a href="https://en.wikipedia.org/wiki/File_URI_scheme">Wikipedia</a> has a dedicated bit on "How many slashes?" which states</p>
<blockquote><p>file://path (i.e. two slashes, without a hostname) is never correct, but is often used</p></blockquote>
comment 4http://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_4_f2bd555ca36b42dac6a213e7c947f1f9/joey2023-03-13T19:41:14Z2023-03-13T18:54:33Z
<p>Wikipedia seems to be wrong about that.
I took a quick look at <a href="https://datatracker.ietf.org/doc/html/rfc8089#appendix-E.2">https://datatracker.ietf.org/doc/html/rfc8089#appendix-E.2</a>
and it says that "file:c:/path/to/file" is a valid URI on Windows. And it will be
parsed ok by git-annex. So you could just use those.</p>
<p>The RFC does say that "file:///c:/path/to/file" should also be supported.
(Though I don't understand its reference to the "path-absolute" rule.)</p>
<p>I don't know if network-uri could be made to support that, it seems that
it would have to handle windows and non-windows differently. Because on linux,
"file:///c:/path/to/file" should parse to a path "/c:/path/to/file",
which is after all a valid path if you choose to have a <code>/c:</code> directory!</p>
<p>But network-uri is a pure uri parser and it does not seem right for it to parse
the same uri two different ways depending on the OS it's running on. And it doesn't
special-case handling of file urls at all, it only implements RFC3986.
We could try opening an issue at <a href="https://github.com/haskell/network-uri/issues">https://github.com/haskell/network-uri/issues</a>
and find out what its maintainer thinks.</p>
<p>I suppose that git-annex, when running on windows, every place after it parses an
url could:</p>
<ol>
<li>Check if it's a file: url</li>
<li>If the path starts with "/DRIVE:", remove the leading "/"</li>
</ol>
<p>Yugh.</p>
comment 5http://git-annex.branchable.com/bugs/git-annex_drop_fails_to_access_file__58____47____47____47___target_URL_on_Windows/comment_5_e00d70ba6b88d9cf60fcb183f7ea4980/joey2023-03-27T17:58:28Z2023-03-27T17:57:52Z
Ok, put in that ugly fix.