Please describe the problem.
originally reported while composing https://git-annex.branchable.com/bugs/copy--fast--from_--to_checks_destination_files/ but it is a separate issue: some files are simply not annex copy
'ed at all: here it tries 6 out of 8 files and still reports that 2 are not on the target remote:
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex copy --from web --to dandi-dandisets-dropbox --fast
copy sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210818T112556_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 696.194 MBytes (730012683 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
copy sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210830T100716_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 224.618 MBytes (235528804 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
copy sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210920T120959_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 295.387 MBytes (309735634 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
copy sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210920T181347_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 860.168 MBytes (901951882 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
copy sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave/sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave_ses-20211124T174401_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 856.342 MBytes (897939760 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
copy sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20220525T092829_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes)
Total objects: 1 Total size: 948.656 MBytes (994737479 Bytes)
(from web...) (to dandi-dandisets-dropbox...) ok
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex find --in web --not --in dandi-dandisets-dropbox | nl
1 sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb
2 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
and it seems to boil down (at least in one case, don't know yet if generalizes to other cases I have) to having those keys present locally:
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex find --in web --not --in dandi-dandisets-dropbox | xargs ls -lL
-r--r--r-- 1 dandi dandi 3878847966 Mar 16 2023 sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb
-r--r--r-- 1 dandi dandi 3665589468 Mar 16 2023 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
but somehow it doesn't know that it has them according to list
:
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex list
here
|github
||dandiapi
|||web
||||bittorrent
|||||dandi-dandisets-dropbox (untrusted)
||||||
__XX_x sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210818T112556_behavior+ophys.nwb
__XX__ sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb
__XX_x sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210830T100716_behavior+ophys.nwb
__XX_x sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210920T120959_behavior+ophys.nwb
__XX_x sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210920T181347_behavior+ophys.nwb
__XX__ sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb
__XX_x sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave/sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave_ses-20211124T174401_behavior+ophys.nwb
__XX_x sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20220525T092829_behavior+ophys.nwb
running without --from web
starts the transfer:
git annex copy --fast --to dandi-dandisets-dropbox
IMHO it should perform copy from the local store into the remote since in effect it would be fulfilling the goal - adding a copy to the destination.
I didn't check move
command but if it does support similar --from --to
and has similar defect -- should just compliment with dropping after from the original remote.
What version of git-annex are you using? On what operating system?
10.20230626-g801c4b7 from conda-forge .
This could be an unlocked file that has gotten modified but the staged version is not actually present locally. Or if
git-annex fsck
on it says its fixing the location logs, that would tell us something happened that got the location tracking out of sync with reality.So possibly there's an issue that could be tracked down regarding the state of that file. But in either case, git-annex doesn't know it has a local copy of the file, so
copy --from --to
could not use it.But:
copy --from --to
does in fact have an interesting bug:So the file content being present locally prevents it sending it to the remote! This needs to get fixed.
Hmm: In the corresponding case of
git-annex move --from --to
, it does not behave that way.As far as what the behavior ought to be when a file is present locally but not on the --from remote, the documentation does say:
So it is behaving as documented. I can think of two reasons why that documented behavior makes some sense:
git-annex copy --from foo --to bar --in foo
to explicitly only act on files that are present in it.Fixed that.
That bug I fixed would also explain the behavior that you saw if the content was present locally, and the location log was out of date about that.
In that situation, git-annex sees that the object file is present, and so treats the content as present, despite the location log not knowing it's present. Which triggers the situation of the bug I fixed, causing it to skip copying the file.
Also, there's a pretty easy way to get into this situation. When the file is not present, run
git-annex --from --to
. Then interrupt it after it's downloaded the file --from but before it's finished sending it --to. This results in the file being present locally, but only transiently so it didn't update the location log.So my guess is you interrupted a copy like that (or it failed incomplete for whatever reason).
Now that I've fixed that bug, the behavior in that situation is that it does copy the file to the remote. And then it drops the local copy since the location log doesn't contain it. So it resumes correctly now.
So that leaves only the question of what it should do when content is present locally but not on the --from remote.
Another reason for the current behavior is to be symmetric with
git-annex move --from foo --to bar
. It would be surprising, I think, if that populated bar with files that are not present in foo, but are in the local repository!So I'm inclined to not change the documented behavior. If you want to populate a remote with files that are either in the local repo or in a --from remote, you can just run
git-annex copy
twice after all.(Or there could be a new option like
git-annex copy --to bar --from foo --or-from-here
)or may be
git-annex copy --to bar --from remote1 --or-from remote2 ...
or alike so there could be a sequence (in order of preference) of remotes? or better a generalgit-annex copy --to bar --from-anywhere
so thatannex
firstget
's it following current set costs etc if not present here, and then copies over.I like the idea of
copy --from-anywhere --to=remote
and just use the lowest cost remote (when not in local repo). Likegit-annex get
andgit-annex copy --to=here
.Hmm, if there's a remote that is too expensive to want to use in such a copy, it would be possible to use
-c remote.foo.annex-ignore=true
to make it avoid using that remote. As can also be done in the case ofgit-annex get
, although that was not documented well.I've implemented --from-anywhere..