When a git annex move is interrupted at a point where the content has been transferred, but not yet dropped from the remote, resuming the move will often refuse to drop the content, because it would violate numcopies.

Eg, if numcopies is 2, and there is only 1 extant copy, on a remote, git-annex move --from remote will normally ignore numcopies (since it's not getting any worse) and remove the content from the remote after transferring it. But, on resume, git-annex sees there are 2 copies and numcopies is 2, so it can't drop the copy from the remote.

This happens to me often enough to be annoying. Note that being interrupted during checksum verification makes it happen, so the window is relatively wide.

I think it can also happen with move --to, although I can't remember seeing that.

Perhaps some local state could avoid this problem?

--Joey

One simple way would be to drop the content from the remote before moving it to annex/objects/. Then if the move were interrupted before the drop, it could resume the interrupted transfer, and numcopies would work the same as it did when the move started.

After an interrupted move, whereis would say the content is present, but eg an annex link to it would be broken. That seems surprising, and if the user doesn't think to resume the move, fsck would have to be made to deal with it. I don't much like this approach, it seems to change an invariant that usually existance of copy on disk is ground truth, and location tracking tries to reflect it. With this, location tracking would be correct, but only because the content is in an unusual place on disk that it can be recovered from.

Or: Move to annex/objects/ w/o updating local location log. Then do the drop, updating the remote's location log as now. Then update local location log.

If interrupted, and then the move is resumed, it will see there's a local copy, and drop again from the remote. Either that finishes the interrupted drop, or the drop already happened and it's a noop. Either way, the local location log then gets updated. That should clean things up.

But, if a sync is done with the remote first, and then the move is resumed, it will no longer think the remote has a copy. This is where the only copy can appear missing (in whereis). So a fsck will be needed to recover. Or, move could be made to recover from this too, noticing the local copy and updating the location log to reflect it.

Still, if the move is interrupted and never resumed, after a sync with the remote, the only copy appears missing, which does seem potentially confusing.

Local state could be a file listing keys that have had a move started but not finished. When doing the same move, it should be allowed to succeed even if numcopies would prevent it. More accurately, it should disregard the local copy when checking numcopies for a move --from. And for a move --to, it should disregard the remote copy. May need 2 separate lists for the two kinds of moves.

This is complex to implement, but it avoids the gotchas in the earlier ideas, so I think is best. --Joey

Implementation will involve willDropMakeItWorse, which is passed a deststartedwithcopy that currently comes from inAnnex/checkPresent. Check the log, and if the interrupted move started with the move destination not having a copy, pass False.

Are there any situations where this would be surprising? Eg, if git-annex move were interrupted, and then a year later, run again, and proceeded to apparently violate numcopies?

Maybe, OTOH I've run into this problem probably weeks after the first move got interrupted. Eg, if files are always moved from repo A to repo B, leaving repo A empty, this problem can cause stuff to build up on repo A unexpectedly. And in such a case, the timing of the resumed move does not matter, the user expected files to always get eventually moved from A.

fixed --Joey