Please describe the problem.
If I addurl file pointing to file:///C:/... it seems to work just fine, but then refuses to drop it stating that can't verify its presence.
Please see two screenshots (sorry for not cut/paste here since it was already was painful -- debugging under Virtualbox in remote vnc through debconf internet which for some reason drops for me quite often):
http://www.onerussian.com/tmp/gkrellShoot_08-19-15_220150.png http://www.onerussian.com/tmp/gkrellShoot_08-19-15_184052.png
as screenshot shows, apparently wget is clueless about file:// targets on windows (while curl does fine) -- may be related ;-)
What steps will reproduce the problem?
addurl some file:///C:/ under windows pointing to existing file, and then try to drop that annexed file
What version of git-annex are you using? On what operating system?
windows 20150805 or so
Please provide any additional information below.
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
actually situation is somewhat similar on linux as well in a sense that annex manages to addurl a file using e.g. file:///./data url (not sure even if legit) without puking but contains wrong content (empty):
I can't reproduce that behavior on linux.
Here, curl seems to be doing the right thing; the url is not relative; it's for
/./data
, which doesn't exist.Relative
file:
urls shouldn't be valid at all, I think?For checking if a file: url exits, git-annex parses the url and stats the file itself. The first screenshot shows this check for file: url existance failing on Windows for the url
file:///C:/tmp/test/test.dat
I guess this might come down to problems with parsing file: urls on Windows; seems especially complicated by drive letters. git-annex and curl seem to parse this url in different ways.
Checking how that url parses, the uriScheme is "file:" and the uriPath is "/C:/tmp/test/test.dat". So, it seems clear why it fails to stat that file.
Is there actually a valid way to produce a file: url that refers to a drive letter? curl seems to think so, since it found the file when
git annex addurl
ran it. I don't know if the above parse is valid, but it's not git-annex's code doing the parse, but the URI parsing library.(Possibly related bug report: http://git-annex.branchable.com/bugs/git_annex_test_fails_when_run_through_powershell/ )
I just want to leave a note that the
file:
URL scheme handling on windows is still problematic. A file URL (e.g. generated by Python'spathlib.as_uri()
) like the following does not work:it is parsed to the path
/C:/DLTMP/myarchive.zip
(as reported by OP), which is invalid.A workaround is to replace
file:///
withfile://
(remove one slash). This makes git-annex accept the URL and parse it to the correct pathC:/DLTMP/myarchive.zip
.However, such a "double-slash" URL is invalid. Wikipedia has a dedicated bit on "How many slashes?" which states
Wikipedia seems to be wrong about that. I took a quick look at https://datatracker.ietf.org/doc/html/rfc8089#appendix-E.2 and it says that "file:c:/path/to/file" is a valid URI on Windows. And it will be parsed ok by git-annex. So you could just use those.
The RFC does say that "file:///c:/path/to/file" should also be supported. (Though I don't understand its reference to the "path-absolute" rule.)
I don't know if network-uri could be made to support that, it seems that it would have to handle windows and non-windows differently. Because on linux, "file:///c:/path/to/file" should parse to a path "/c:/path/to/file", which is after all a valid path if you choose to have a
/c:
directory!But network-uri is a pure uri parser and it does not seem right for it to parse the same uri two different ways depending on the OS it's running on. And it doesn't special-case handling of file urls at all, it only implements RFC3986. We could try opening an issue at https://github.com/haskell/network-uri/issues and find out what its maintainer thinks.
I suppose that git-annex, when running on windows, every place after it parses an url could:
Yugh.