Currently, setting a repo to trusted can result in data loss if content
gets removed from that repo. For this reason, --trust has been disabled,
and git-annex trust
needs --force.
My idea for a better way to handle this is to make trusted repos that cannot be checked not count toward satisfying mincopies. So while trusting a repository would let numcopies be violated still, it would avoid data loss. That would be enough to bring back --trust and not need --force.
Aside from preventing data loss in some situations, the impact of this behavior change would be that dropping content would fail in some situations where it used to work. Specifically, when a trusted repo contains content, but is not accessible, and no other accessible repo contains the content. That has been a valid use of trusted repos, but it's also obviously an unsafe one -- if that trusted repo has dropped the content in the meantime, it risks data loss.
So, is it worth breaking an existing, but unsafe use case, in order to make it easier to use trust safely?
Without an answer to that question, one approach would be to add an ultimatelytrusted level, which behaves like trusted does now. The serialization of ultimatelytrusted would be what is used for trusted now, so existing trusted repositories would convert to it, and no behavior would change. The new trusted would use a different serialization, which old versions of git-annex would be unable to parse, but parsing defaults to semitrusted, so old versions of git-annex would behave safely.
Then after some amount of time, users can be polled to see if they need ultimatelytrusted, and if not it can be deprecated and eventually be changed to be the same as new trusted. Or, a gitconfig can be added that makes ultimatelytrusted behave the same as trusted.
That seems like a reasonable plan.
(I thought about just adding a gitconfig now to control which behavior
to use for trusted, but that would not be safe, if a user expects the new
trusted behavior, they could git-annex trust
a repo, but then later
the gitconfig gets unset, or is not set in a clone.)
A complication with implementation: Currently when checking if a drop is safe, trusted repositories result in a TrustedCopy, and are not checked. And it checks other repositories until it has enough other evidence that the drop is safe. But now it would need to check some trusted repositories, while providing TrustedCopy for ones that it has not checked.
One way to deal with that is, check trusted repositories in amoung the rest (in cost order). If content is present, add a LockedCopy or a RecentlyVerifiedCopy to the evidence. If the content is not present, add nothing to the evidence (or could add TrustedCopy, but there seems no benefit to doing that). If the repository cannot be accessed, add a TrustedCopy to the evidence.
--Joey
Perhaps there could be a way to manually "lock" a remote into append-only permanently such that all repos perceive it as locked and therefore trust it to keep its content.
Obviously this isn't as good of a trust as having it connected directly and actively locking it at runtime but it's the best you could do for cold storage. (Btw, does git-annex check whether the content to be dropped is actually present in the repos it locked?)
The append-only repo would only be allowed to unlock itself and drop content again when all other (unlocked?) repos ack'd the unlock (or via
--force
).Perhaps it would be better to leave trusted working as it does now, and introduce a new intermediate trust level. It could be called eg, "offline".
An offline repo would count toward numcopies when not available, but would only count toward mincopies if it happened to be available at the time.
It might then be that there's no real need for the less safe trusted, and users could be polled and it deprecated if so.
(This might also be a good opportunity to improve the "semitrust" name, which would not make much sense if the list was untrust, semitrust, offline, trust. "normal" perhaps. Would need to still accept the old name for back-compat of course.)