Using URL keys can lead to data loss. Two remotes can have two different objects for the same URL key, as the content of an url may change over time. If git-annex gets part of an object from the first remote, but is then interrupted or fails, and then resumes from the second remote's object, it will stitch together a chimera that has never existed at the url. Then dropping the URL objects from both remotes will result in no valid copies of the object remaining.
This could also happen with WORM, but it would be much less likely. Two files with the same mtime, size, and path would have to be added to two repos.
And it could happen if an old git-annex is being used in a repo that uses some new key that it doesn't support, like a new checksum type.
Special remotes are affected if they use chunking, or if they resume when the destination file already exists and don't have their own checksumming. So the rsync special remote is affected when it's used with chunking, but not otherwise.
With the default annex.security.allow-unverified-downloads config, encrypted special remotes already don't allow downloading the problem keys.
The bug affects remotes using the git-annex P2P protocol, but not ssh remotes using rsync. So the introduction of the P2P protocol made this bug more prevelant.
Best fix for this seems to be to prevent resuming download of keys when their content is not verifiable.
The P2P protocol could also be extended to use a checksum to verify a resume is safe to do. That would only be worth doing for the affected keys.
git-annex-migrate
, but it requires manual invocation, clouds the commit history of the main git branch with commits that don't really change the content, and leads to either duplicate content in remotes or (if duplicates are dropped) inability to git-annex-get the contents of some past commits.Backend.URL has isStableKey = False, and that does prevent chunking URL keys on special remotes. So looking at that is the thing to do, and will not affect WORM but only URL. (And any external backends that are not stable.)
While some remotes can handle it, eg rsync, this does not seem like something every remote should need to worry about getting right.
While retrieveKeyFile can be wrapped and made to delete the destination file before the transfer if the key is not stable, what to do about storeKey? If it chooses to resume, it's based on data on the remote. removeKey does not necessarily remove a partially recieved key; it doesn't for P2P where the temp file holds the content until it's fully received.
Could refuse to storeKey URL keys, which would be nearly the same as deprecating/removing support for URL keys entirely. (Which is not unappealing, but I know people are using them and dropping support would be painful.)
Or ugh, special case isStableKey checks in P2P and any other remotes that support resuming storeKey w/o using chunking and resume based on file offsent and not content. But there could be external remotes that I don't know about that would still be affected.