It'd be really useful if I could specify my level of trust in a remote holding a file as a function of the time since the file has last been fsck'd in that remote.
This way, if I haven't fsck'd say my off-site cold storage in x amount of time, git-annex would automatically try to create additional copies of its files in other remotes for example.
Expiry can be used in a similar way but declaring the remote as dead is overkill and has unwanted side-effects.
You can query for repositories that have not been fscked for some amount of time:
From there, it's a simple script to set the unfscked ones to untrusted, or whatever.
I suppose
git-annex expire
could have an option added, like--untrust
to specify how to expire, rather than the default of marking the repo dead.I suppose you'd want a way to also go the other way, to stop untrusting a repo once it's been fscked.. There is not currently a way to do that.
Note that a fsck that is interrupted does not count as a fsck activity, and it's not keeping track of what files were fscked. That would bloat the git-annex branch. On the other hand, if you
git annex fsck onefile
that counts as a fsck activity, even though other files in the repo didn't get fscked. So you would have to limit the ways you use fsck to ones that generate the activity you want, perhaps togit annex fsck --all
.Perhaps fsck should also have a way to control whether it records an activity or not..
What if
git annex fsck --all
recorded an additional activity, eg FsckAll. Then there could be a command, or a config that untrusts repos that do not have a FsckAll activity that happened recently enough.A git config would be simplest, eg:
Tried to implement this, but ran into a problem adding FsckAll: If it only logs FsckAll and not also Fsck, then old git-annex expire will see the FsckAll and not understand it, and treats it as no activity, so expires. (I did fix git-annex now so an unknown activity is not treated as no activity.)
And, the way recordActivity is implemented, it removes previous activities, and adds the current activity. So a FsckAll followed by a Fsck would remove the FsckAll activity.
That could be fixed, and both be logged, but old git-annex would probably not be able to parse the result. And if old git-annex is then used to do a fsck, it would log Fsck and remove the previously added FsckAll.
So, it seems this will need to use some log other than activity.log to keep track of fsck --all.
Maybe it's better to not tie this directly in to fsck. Another way would be:
The first time this is run, it would record that the trust level will change to untrust after 100 days. The next time it's run, it would advance the timeout.
So, you could do whatever fsck or other checks make you still trust the repo, and then run this again.
Implementation would I guess need a separate future-trust.log in addition to trust.log, and when loading trust levels, if there was a value in future-trust.log that has a newer timestamp than the value in trust.log, and enough time has passed, use it instead of the value from trust.log. That way it avoids breaking older git-annex with changes to trust.log.
No need to change what's in trust.log, although it could, which would also let older git-annex versions learn about the change to trust.