Add git hooks that are used to lockdown annexed objects. --Joey
Use cases include:
Setting immutable bit on systems where git-annex is given the ability to do so, to fully guard against accidental deletion in all circumstances.
For systems that ignore the write bit, but have some other way to prevent write to a file (eg, ACLs or something).
Note that in such a case,
git-annex init
's probe of the write bit handling fails; as long as the hook is configured globally, it should run the hook instead, and if it works, can avoid direct mode.
Design:
Configs: annex.freezecontent-command, annex.thawcontent-command In these, "%path" is replaced with the file/directory to act on.
Locking down a directory only needs to do the equivilant of removing its write bit, does not need to lockdown the files within it.
It would be up to the command to decide how to handle the core.sharedRepository configuration.
These could be set in the global gitconfig file. The IncludeIf directive can be used to make them be used only for repositories located within a given mount point.
git-annex test disables use of global gitconfig settings. There would need to be a way to let it use these.
Perfomance:
Hook would be called twice per store/drop of an annexed object, once for the file and once for the parent directory.
On windows, called four times per lock of an annexed object, to first thaw it and then freeze it. This could be reduced to 2, I think. On posix, the file is locked without being thawed, as only read access is needed.
Probably running a shell script is not too much overhead in many cases, if it was too slow, there could be a variant that is run once and fed the names of files to operate on via stdin.
These hooks may be too specific to this purpose, while a more generalized hook could also support things like storing xattrs --Joey
done..
git-annex init
does run annex.freezecontent-command and if it prevents writing to a file, it will avoid setting annex.crippledfilesystem.I didn't make
git-annex test
use the global git config of the hooks though, not sure if that really makes sense or is needed.
Thank you Joey for looking into this issue! I am though a bit worried that necessity to configure using some kind of a hook would be ... suboptimal for a number of reasons. Before laying out my argumentation, let me first ask: why alternative "lockdown" mechanisms could not be sensed/configured per each repository during
init
and implemented within git-annex?As you have noted
git-annex init's probe of the write bit handling fails...
so git-annex already checks for a possible way to establish the "lockdown" for a given repository location. It just tries one possible mechanism ATM. But it could as well try multiple ways to achieve it, starting from current "POSIX", and then trying "ACL" if appropriate (i.e. tools found). Then if non-POSIX handling is needed, would simple add yet another configuration to .git/config of that repository, and consult to it to switch to corresponding lockdown implementation within git-annex. This would be much more user friendly, and it would allow 3rd party tools using git-annex (such as datalad) to not worry about necessity to configure some additional hooks for a particular location, etc.Seems likely that there are a couple of different ways to use ACLs to remove write access. In the simple case, any existing ACL can be overwritten. In other cases, some other existing ACLs will need to be preserved and only a single part changed. In some cases, the ACL for a user should be changed, in others the ACL for a group.
And there are several different varieties of ACLs (POSIX, NFS, Windows). And there's the immutable bit, which might be wanted in some specific circumstances but certianly not by most people.
So it makes sense to me to not embed specific knowledge of this into git-annex.
This feels to me like something that the system administrator is going to want to set up. It would mostly be limited to repositories inside a given mount point that needs the unusual lockdown method due to using NFS or whatever. The global gitconfig can be set up to switch on the config only for those repositories, and the system administrator can set up hooks for the particular use case.
I don't see why something like datalad would need to worry about this detail, any more than they worry about the PATH to system programs or other such things that the administrator sets up.
There's a choice between the hook needing to replicate git-annex's use of permissions as well as doing whatever else it does, or git-annex setting the permissions first, and only then running the hook.
Seems to me that git-annex setting the permissions is better, because then the hook does not need to worry about details like core.sharedrepository if it's doing something simple like setting immutable. (But if it adjusts ACLs, it might make sense for it to consider core.sharedrepository.) Also, the precise details of what file permissions git-annex uses don't need to be documented well enough for the hook to replicate them if git-annex just makes the permissions changes itself.
It seems to make sense that when restoring permissions, it should run the that hook before changing the permissions. The freeze hook might do something that prevents changing permissions and the thaw hook undo that.