git-annex-shell p2pstdio currently always verifies content it receives. git-annex-shell recvkey has a speed optimisation, when it's told the file being sent is locked, it can avoid an expensive verification, when annex.verify=false. (Similar for transfers in the other direction.)

The P2P protocol does not have a way to communicate when that happens. File content can be modified while it's sent, and if annex.verify=false is allowed to take effect, bad data will get into the repository.

It would be nice to support annex.verify=false when it's safe but not when the file got modified, but if it added an extra round trip to the P2P protocol, that could lose some of the speed gains.

The way that git-annex-shell recvkey handles this is the client communicates to it if it's sending an unlocked file, which forces verification. Otherwise, verification can be skipped.

Seems the best we could do with the P2P protocol, barring adding rsync-style rolling hashing to it, is to detect when a file got modified as it was being sent, and inform the peer that the data it got is invalid. It can then force verification.

done --Joey

A related problem is resumes. What if a file starts to be transferred, gets changed while it's transferred so some bad bytes are sent, then the transfer is interrupted, and later is resumed from a different remote that has the correct content. How can it tell that the bad data was sent in this case?

In the case where an upload is started from one repository and later resumed by another, rsync wipes out any differences, so if the first repository was unlocked, and the second is locked, it's safe for recvkey to treat it locked and skip verification.

This is not really unique to the P2P protocol -- special remotes can be written to support resuming. The web special remote does; there may be external special remotes that do too. While the content of a key on a special remote is not allowed to change, a download could start from an unlocked git repo, and then be resumed from such a special remote. When verification is disabled, this can result in bad content getting into the repository.

So, let's solve this broadly. Whenever a download is resumed, force AlwaysVerify, unless the remote returns Verified. This can be done in Annex.Content.getViaTmp, so it will affect all downloads involving the tmp key for a file.

done --Joey

This would change handling of resumes of downloads using rsync too. But those are always safe to skip verification of, although they don't quite do a full verification of the key's hash. To still allow disabling of verification of those, could add a third state in between UnVerified and Verified, that means it's sure it's gotten exactly the same bytes as are on the remote.

Decided this added too much complexity for such an edge case, so skipped dealing with it. --Joey