Object files stored in .git/annex/objects
are each put in their own directory.
This allows the write bit to be removed from both the file, and its directory,
which prevents accidentally deleting or changing the file contents.
The reasoning for doing this follows:
Normally with git, once you have committed a file, editing the file in the
working tree cannot cause you to lose the committed version. This is an
important property of git. Of course you can rm -rf .git
and delete
commits if you like (before you've pushed them). But you can't lose a
committed version of the file because of something you do with the working
tree version.
It's easy for git to do this, because committing a file makes a copy of it. But git-annex does not make a local copy of a file added to it, because the file could be very large.
So, it's important for git-annex to find another way to preserve the expected property that once committed, you cannot accidentally lose a file. The most important protection it makes is just to remove the write bit of the file. Thus preventing programs from modifying it.
But, that does not prevent any program that might follow the symlink and
delete the symlinked file. This might seem an unlikely thing for a program to
do at first, but consider a command like:
tar cf foo.tar foo --remove-files --dereference
When I tested this, I didn't know if it would remove the file foo symlinked to or not! It turned out that my tar doesn't remove it. But it could have easily went the other way.
Rather than needing to worry about every possible program that might
decide to do something like this, git-annex removes the write bit from the
directory containing the annexed object, as well as removing the write
bit from the file. (The only bad consequence of this is that rm -rf .git
doesn't work unless you first run chmod -R +w .git
)
It's known that this lockdown mechanism is incomplete. The worst hole in
it is that if you explicitly run chmod +w
on an annexed file in the working
tree, this follows the symlink and allows writing to the file. It would be
better to make the files fully immutable. But most systems either don't
support immutable attributes, or only let root make files immutable.
The git configs annex.freezecontent-command
and annex.thawcontent-command
can be used to run additional commands to further lock down and later thaw
the annex object and directory.
I'm using a git-annex to store build artefacts on a remote bare repo. Some of these artefacts are used in subsequent builds, which clone the artefacts repo, and use 'git annex get' to retrieve the artefacts of interest.
Unfortunately, I've had to add a little kludge along the following lines to my build script fragment:
This is necessary because I need to ensure that the cloned git repo is able to be deleted at all times (I'm using yocto/openembedded and it may want to delete the clone for a variety of reasons).
setcap cap_linux_immutable+ep /usr/bin/git-annex
After doing that, git-annex is able to make files immutable, so the additional directory is not needed any more. Even on file systems / in environments where that is not possible, in some situations file lookup speed is way more important than not being able to delete the target of a symlink.
I have no idea how to code in Haskell, so if somebody else could add an appropriate always/never/only-when-necessary config option I'd be very happy, and my media server would not have any more hiccups when switching songs …
I've set up a project server for my team with annexes in most repos. I'm using gitolite with its git-annex-shell plugin. It's been going well for a year, and my team finds git-annex very useful for managing our large projects, so we have a large debt to you for that
Problem
But when my users delete repos, the repos aren't fully deleted because any
annex/objects/*/*/SHA256-*/SHA256-*
file is locked down.Gitolite
For example:
test-git-annex-write
Create+Upload:
test-gitea-annex-write git@data:datasets/test-jank.git
Delete the repo
Notice the "Permission denied" error -- but gitolite thinks its work is done:
but if I try to recreate the same repo, it fails:
test-gitea-annex-write git@data:datasets/test-jank.git
Because of course if I log in on the server I can see:
Gitea
I've been porting
git-annex-shell
into Gitea as well to get a more familiar UI for my team, and I have discovered the exact same problem there: iftest-git-annex-write
to my test instance, then delete that repo, Gitea dutifully reportsbut if I then try to recreate it balks with
Solutions
There doesn't seem to be much benefit to lockdown in a bare repo: there's no checkout that might corrupt the content. Plus in my case there's gitolite/gitea in the way which is an extra layer of protection against direct modification. So could lockdown be turned off?
I'd like it best if you detected when you're run in a bare repo and skipped the freezing and thawing steps. But I'd also just be able to work with a config setting (
git config --global annex.lockdown false
?).Workaround #1
In the meantime, so far I have found one workaround: I can misuse
annex.freezecontent-command
:Example
(notice it doesn't give any error this time)
But this only works with a relatively new git-annex; I haven't looked up when this went in, but I know 8.20210223, from barely a year ago, doesn't have this feature, while 10.20220322 does. And also it's very much a workaround: it immediately undoes the work git-annex does, which will cause unnecessary disk I/O.
Here's an upgrade to this workaround, that limits the effect to bare repos (though the only repos ever created by the remote
git
user should be bare):Workaround #2
The advice above says to
so another solution would be to patch gitolite/gitea's
rm
subroutines to be git-annex aware, i.e. to runchmod -R +w
before doing anything else.That looks more feasible for me to do in gitea, where git-annex support is turning out to need a whole bunch of patches scattered across the codebase, but it's a lot less appealing to do in gitolite where git-annex support is currently contained in one very elegant file.
I think that the annex.freezecontent-command approach is fine. The hook runs after git-annex changes permissions, and it can add them back if you want. It is supported since 8.20210630
I agree it would be better if gitolite could be modified to only set the write bits before deleting the repository. It seems to me that gitolite demonstratably has a bug, because you show it fail to delete everything but apparently behave as if it succeeded. Perhaps setting the write bits could be justified to the gitolite developers as a way to make it more robust when removing a repository, in case some permission problem prevents deleting the content of a directory.
Thanks for your time and your advice, joey. We'll stick with the hook for now then.
I realized I didn't include versions: on my gitolite server I have git-annex 10.20220504. However, that's only because I hacked the environment up so I could use the conda-forge build as the system git-annex. Because the version in Ubuntu 22.04 is 8.20210223.
That's not really a stable way to run a system, so I am leaning towards reverting and just living with the bug for a while.