(Sorry for the uninformative title, but I had to work within the character limit.)
Please describe the problem.
git-annex metadata
does nothing on Windows if invoked while git-annex addurl
is in progress on other files.
What steps will reproduce the problem?
On Windows (but not on Linux or macOS, where everything works fine):
- Start
git-annex addurl
in batch mode - Feed it two or more URLs
- After reading the completion message for a URL from addurl's stdout, but before reading the remaining output, run
git-annex metadata
in batch mode and try to set the metadata for the file that was just downloaded. git-annex metadata
will output an empty line (i.e., just CR LF), and if nothing further is fed to it, it will exit successfully without printing anything else on stdout or stderr.- Querying the file's metadata normally after
git-annex addurl
exits will show that no metadata was set for the file.
The Python script at https://github.com/jwodder/git-annex-bug-20221024/blob/master/mvce.py (Python 3.8+ required) will run the above steps and show the output from git-annex metadata
. A sample run under GitHub Actions can be seen at https://github.com/jwodder/git-annex-bug-20221024/actions/runs/3322463020/jobs/5491516209; note the following section of the output under "Run script":
16:04:04 [DEBUG ] __main__: Opening pipe to: git-annex metadata --batch --json --json-error-messages
16:04:04 [DEBUG ] __main__: Input to metadata: b'{"file": "programming/gameboy.pdf", "fields": {"title": ["GameBoy Programming Manual"]}}\n'
16:04:04 [DEBUG ] __main__: r.returncode=0
16:04:04 [DEBUG ] __main__: r.stdout=b'\r\n'
16:04:04 [DEBUG ] __main__: r.stderr=b''
This problem does not always occur, but it seems to occur most of the time. Using git-annex registerurl
in place of git-annex metadata
works fine.
What version of git-annex are you using? On what operating system?
git-annex 10.20221003, provided by datalad/git-annex, on Microsoft Windows Server 2022
Please provide any additional information below.
This affects a hobby project of mine – "gamdam", implemented in Python and Rust — that interacts with git-annex.
Windows is not needed, this will happen in a repository where
git annex adjust --unlock
has been run.A simpler example:
I'm not sure if this is a bug, because it's documented to output a blank line when batch mode is provided a file that is not an annexed file, and the file is not an annexed file yet due to the pointer not yet having been staged in git. Which is needed, when in an adjusted unlocked branch, for git-annex to know that this is an annexed file.
When the file is locked, it just stats the symlink, so the fact that the symlink is not yet staged in git doesn't matter.
It does not seem to make sense to have addurl update the index after each file it adds, because that would make addurl of a lot of files unncessarily slow.
So, I think if anything is changed, it would need to be a change to make the behavior with unlocked files consistent with the behavior with locked files. Eg, when the symlink is not yet staged in git, treat it as a non-annexed file. Which is also consistent with other handling of such files by git-annex when not in batch mode.
The solution for your program, though, seems like it will be to end the git-annex addurl process before trying to set metadata on just-added files. Or, alternatively, to use addurl with --json, extract the key, and set the metadata of the key.
I've made --batch handling of unstaged locked files consistent with the handling of unstaged unlocked files.