Running
git annex get . --from=remote
on a repository fails for particular annexed files despite the files being in the remote.
I tried running git annex fsck
in the repository but that did not seem to help.
Any idea what the issue could be?
Thanks
--verbose --debug
might give more info on the cause.This seems like the the relevant information from
--debug
:Well, it does seem that the
cp
command is failing. Usually when cp fails, it displays an error message, so it would probably help to display the complete output when you run the git-annex command.Just to migrate a bit of maybe-relevant information to this issue:
With git-annex version
8.20210310
in a v8 repository, file retrieval from origin (a local path, but one a different ZFS volume) fails for a file that was created in a distributed workflow on a compute node, even though the file content is present in origin (as verified viagit-annex fsck
).It seems to be a combination of system, git-annex version, and the way that the file in question came to life.
I only have a few "breadcrumbs" to the cause. For one, here is the debug output of retrieving this file with git annex
8.20210310
and higher versions (the latest snapshot I tried was from June this year).Note that
cp --reflink="always"
fails on this system with "Operation not supported" if executed on its own.Next, I have rolled back git-annex to version
8.20200330-1~bpo10+1
on this system (a version where the workflow that now fails for us has succeeded), and file retrieval (usingdatalad get
) succeeds. Still, there are a few details of this issue that I do not have a good enough understanding of.One, a
git annex get
debug output still looks fishy, because it seems that even though file retrieval viarsync
succeeds, git-annex reports a failure (datalad does not bubble this failure up, sodatalad get
reports success)Second, this issue doesn't affect all files, and I haven't been able to create a reproducer yet. The issue has surfaced for us in distributed workflows where compute nodes of a cluster perform jobs, and push their results back to origin (that's why a second location of the file in my output exists at cpu10 - its the node it has originally been created on before it was pushed to origin). It does not affect files that haven't been created in a distributed fashion but lie right next to the files where file retrieval fails. Here is the debug output from retrieving different data from a different dataset in the same fashion:
And finally, I am able to clone the first repository to my local computer (git-annex 8.20210223), and file retrieval via SSH works fine.
I'll continue to try to create a reproducer, but maybe somebody has an idea based on what I have posted already. Sorry for the convoluted and complex issue description!
There is a bug report that looks very similar to what is being reported here, in particular the doubled "failed to retrieve" with no other information: https://git-annex.branchable.com/bugs/__34__failed_to_send_content_to_remote__34__/
The most recent git-annex release adds some additional error messages in some cases that could be the reason for the failure. The fact that rsync succeeds and then the get fails suggests rather strongly that git-annex thinks the file got modified while rsync was running -- which the new version will display an informative message about.
(Please do not use this forum for bug reports; it is not an effective BTS since nothing ever gets closed.)