Object files stored in .git/annex/objects are each put in their own directory. This allows the write bit to be removed from both the file, and its directory, which prevents accidentally deleting or changing the file contents.

The reasoning for doing this follows:

Normally with git, once you have committed a file, editing the file in the working tree cannot cause you to lose the committed version. This is an important property of git. Of course you can rm -rf .git and delete commits if you like (before you've pushed them). But you can't lose a committed version of the file because of something you do with the working tree version.

It's easy for git to do this, because committing a file makes a copy of it. But git-annex does not make a local copy of a file added to it, because the file could be very large.

So, it's important for git-annex to find another way to preserve the expected property that once committed, you cannot accidentally lose a file. The most important protection it makes is just to remove the write bit of the file. Thus preventing programs from modifying it.

But, that does not prevent any program that might follow the symlink and delete the symlinked file. This might seem an unlikely thing for a program to do at first, but consider a command like: tar cf foo.tar foo --remove-files --dereference

When I tested this, I didn't know if it would remove the file foo symlinked to or not! It turned out that my tar doesn't remove it. But it could have easily went the other way.

Rather than needing to worry about every possible program that might decide to do something like this, git-annex removes the write bit from the directory containing the annexed object, as well as removing the write bit from the file. (The only bad consequence of this is that rm -rf .git doesn't work unless you first run chmod -R +w .git)

It's known that this lockdown mechanism is incomplete. The worst hole in it is that if you explicitly run chmod +w on an annexed file in the working tree, this follows the symlink and allows writing to the file. It would be better to make the files fully immutable. But most systems either don't support immutable attributes, or only let root make files immutable.

The git configs annex.freezecontent-command and annex.thawcontent-command can be used to run additional commands to further lock down and later thaw the annex object and directory.

RSS Atom

Having the write bit not set can cause problems with automated usage (e.g. build systems)

I'm using a git-annex to store build artefacts on a remote bare repo. Some of these artefacts are used in subsequent builds, which clone the artefacts repo, and use 'git annex get' to retrieve the artefacts of interest.

Unfortunately, I've had to add a little kludge along the following lines to my build script fragment:

git annex get ${file}
find .git/annex/objects -type d -exec chmod +w {} \;

This is necessary because I need to ensure that the cloned git repo is able to be deleted at all times (I'm using yocto/openembedded and it may want to delete the clone for a variety of reasons).

Comment by Duncan — Tue Dec 2 23:16:57 2014

Remove comment

File immutability

setcap cap_linux_immutable+ep /usr/bin/git-annex

After doing that, git-annex is able to make files immutable, so the additional directory is not needed any more. Even on file systems / in environments where that is not possible, in some situations file lookup speed is way more important than not being able to delete the target of a symlink.

I have no idea how to code in Haskell, so if somebody else could add an appropriate always/never/only-when-necessary config option I'd be very happy, and my media server would not have any more hiccups when switching songs …

Comment by Matthias — Wed Jan 7 12:06:15 2015

Remove comment

How to disable lockdown in bare repos?

I've set up a project server for my team with annexes in most repos. I'm using gitolite with its git-annex-shell plugin. It's been going well for a year, and my team finds git-annex very useful for managing our large projects, so we have a large debt to you for that

Problem

But when my users delete repos, the repos aren't fully deleted because any annex/objects/*/*/SHA256-*/SHA256-* file is locked down.

Gitolite

For example:

test-git-annex-write


test-gitea-annex-write() {
REPO=$1; shift


(set -e; cd $(mktemp -d)
  git init
  echo '# testing' > README.md && git add README.md && git commit -m "Initial commit"
  git annex init
  dd if=/dev/urandom of=large.bin bs=1M count=16 && git annex add large.bin && git commit -m "Annex a file"

  git remote add origin "$REPO"
  git config annex.jobs 1
  git annex sync --content origin
  git annex sync --content origin  # it only uploads the branch, but doesn't upload content, if I only do this once
)
}

Create+Upload: test-gitea-annex-write git@data:datasets/test-jank.git

$ test-gitea-annex-write git@data:datasets/test-jank.git
Initialized empty Git repository in /tmp/tmp.ak87a0yp1e/.git/
[master (root-commit) 0a7be36] Initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 README.md
init  ok
(recording state in git...)
16+0 records in
16+0 records out
16777216 bytes (17 MB, 16 MiB) copied, 0.0608327 s, 276 MB/s
add large.bin 
ok                                
(recording state in git...)
[master 4a55ea5] Annex a file
 1 file changed, 1 insertion(+)
 create mode 120000 large.bin

  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true

  Unable to parse git config from origin
FATAL: autocreate denied

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.


  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.
  annex.sshcaching is not set to true

  Unable to parse git config from origin
FATAL: autocreate denied

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
On branch master
nothing to commit, working tree clean
commit ok
pull origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
FATAL: autocreate denied

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
ok
push origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint: 
hint:   git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint:   git branch -m <name>
Initialized empty Git repository in /srv/git/repositories/datasets/test-jank.git/
ok

  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
On branch master
nothing to commit, working tree clean
commit ok
pull origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
ok
copy large.bin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
(to origin...) ok
pull origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
ok
(recording state in git...)
push origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
ok

Delete the repo

$ ssh git@data D unlock datasets/test-jank
'datasets/test-jank' is now unlocked
$ ssh git@data D rm datasets/test-jank
rm: cannot remove 'datasets/test-jank.git/annex/objects/968/4c0/SHA256E-s16777216--9d8ccb3ebe399a8f6801cde009e03a867151ea4e4bc609848abbd29dd335688f.bin/SHA256E-s16777216--9d8ccb3ebe399a8f6801cde009e03a867151ea4e4bc609848abbd29dd335688f.bin': Permission denied
'datasets/test-jank' is now gone!

Notice the "Permission denied" error -- but gitolite thinks its work is done:

$ ssh git@data info | grep test-jank
$

but if I try to recreate the same repo, it fails:

test-gitea-annex-write git@data:datasets/test-jank.git

$ test-gitea-annex-write git@data:datasets/test-jank.git
Initialized empty Git repository in /tmp/tmp.IBSNFKRgRg/.git/
[master (root-commit) e11344b] Initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 README.md
init  ok
(recording state in git...)
16+0 records in
16+0 records out
16777216 bytes (17 MB, 16 MiB) copied, 0.0631523 s, 266 MB/s
add large.bin 
ok                                
(recording state in git...)
[master 0cedc5c] Annex a file
 1 file changed, 1 insertion(+)
 create mode 120000 large.bin

  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true

  Unable to parse git config from origin
FATAL: R any datasets/test-jank nguenther DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true

  Unable to parse git config from origin
FATAL: R any datasets/test-jank nguenther DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
On branch master
nothing to commit, working tree clean
commit ok
pull origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
FATAL: R any datasets/test-jank nguenther DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
ok
push origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
FATAL: W any datasets/test-jank nguenther DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
FATAL: W any datasets/test-jank nguenther DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

  Pushing to origin failed.
failed
sync: 1 failed

Because of course if I log in on the server I can see:

git@data:~/repositories/datasets$ tree test-jank.git/
test-jank.git/
└── annex
    └── objects
        └── 968
            └── 4c0
                └── SHA256E-s16777216--9d8ccb3ebe399a8f6801cde009e03a867151ea4e4bc609848abbd29dd335688f.bin
                    └── SHA256E-s16777216--9d8ccb3ebe399a8f6801cde009e03a867151ea4e4bc609848abbd29dd335688f.bin

5 directories, 1 file

Gitea

I've been porting git-annex-shell into Gitea as well to get a more familiar UI for my team, and I have discovered the exact same problem there: if test-git-annex-write to my test instance, then delete that repo, Gitea dutifully reports

The repository has been deleted.

but if I then try to recreate it balks with

Files already exist for this repository. Either adopt them or delete them.

Solutions

There doesn't seem to be much benefit to lockdown in a bare repo: there's no checkout that might corrupt the content. Plus in my case there's gitolite/gitea in the way which is an extra layer of protection against direct modification. So could lockdown be turned off?

I'd like it best if you detected when you're run in a bare repo and skipped the freezing and thawing steps. But I'd also just be able to work with a config setting (git config --global annex.lockdown false?).

Workaround #1

In the meantime, so far I have found one workaround: I can misuse annex.freezecontent-command:

ssh root@data
su -l git
git config --global annex.freezecontent-command "chmod -R +w %path"

Example

$ test-gitea-annex-write git@data:datasets/test-jank3
Initialized empty Git repository in /tmp/tmp.I9wno7oZXg/.git/
[master (root-commit) 4f47840] Initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 README.md
init  ok
(recording state in git...)
16+0 records in
16+0 records out
16388608 bytes (16.2 MB, 16.0 MiB) copied, 0.0313028 s, 268 MB/s
add large.bin 
ok                                
(recording state in git...)
[master 27e4c8c] Annex a file
 1 file changed, 1 insertion(+)
 create mode 120000 large.bin

  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true

  Unable to parse git config from origin
FATAL: autocreate denied

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.


  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.
  annex.sshcaching is not set to true

  Unable to parse git config from origin
FATAL: autocreate denied

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
On branch master
nothing to commit, working tree clean
commit ok
pull origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
FATAL: autocreate denied

fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
ok
push origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint: 
hint:   git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint:   git branch -m <name>
Initialized empty Git repository in /srv/git/repositories/datasets/test-jank3.git/
ok

  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
On branch master
nothing to commit, working tree clean
commit ok
pull origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
ok
copy large.bin 

  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.
  annex.sshcaching is not set to true
(to origin...) ok
pull origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
ok
(recording state in git...)
push origin 
  You have enabled concurrency, but git-annex is not able to use ssh connection caching. This may result in multiple ssh processes prompting for passwords at the same time.

  annex.sshcaching is not set to true
ok
$ ssh git@data D unlock datasets/test-jank3
'datasets/test-jank3' is now unlocked
$ ssh git@data D rm datasets/test-jank3
'datasets/test-jank3' is now gone!

(notice it doesn't give any error this time)

But this only works with a relatively new git-annex; I haven't looked up when this went in, but I know 8.20210223, from barely a year ago, doesn't have this feature, while 10.20220322 does. And also it's very much a workaround: it immediately undoes the work git-annex does, which will cause unnecessary disk I/O.

Here's an upgrade to this workaround, that limits the effect to bare repos (though the only repos ever created by the remote git user should be bare):

git config --global annex.freezecontent-command 'sh -c '"'"'[ "$(git config core.bare)" = "true" ] && chmod -R +w %path'"'"

Workaround #2

The advice above says to

(The only bad consequence of this is that rm -rf .git doesn't work unless you first run chmod -R +w .git)

so another solution would be to patch gitolite/gitea's rm subroutines to be git-annex aware, i.e. to run chmod -R +w before doing anything else.

That looks more feasible for me to do in gitea, where git-annex support is turning out to need a whole bunch of patches scattered across the codebase, but it's a lot less appealing to do in gitolite where git-annex support is currently contained in one very elegant file.

Comment by nick.guenther — Sun May 15 21:30:50 2022

Remove comment

Re: How to disable lockdown in bare repos?

I think that the annex.freezecontent-command approach is fine. The hook runs after git-annex changes permissions, and it can add them back if you want. It is supported since 8.20210630

I agree it would be better if gitolite could be modified to only set the write bits before deleting the repository. It seems to me that gitolite demonstratably has a bug, because you show it fail to delete everything but apparently behave as if it succeeded. Perhaps setting the write bits could be justified to the gitolite developers as a way to make it more robust when removing a repository, in case some permission problem prevents deleting the content of a directory.

Comment by joey — Fri May 20 17:08:29 2022

Remove comment

comment 5

Thanks for your time and your advice, joey. We'll stick with the hook for now then.

I realized I didn't include versions: on my gitolite server I have git-annex 10.20220504. However, that's only because I hacked the environment up so I could use the conda-forge build as the system git-annex. Because the version in Ubuntu 22.04 is 8.20210223.

That's not really a stable way to run a system, so I am leaning towards reverting and just living with the bug for a while.

Comment by nick.guenther — Fri May 20 18:09:45 2022

Remove comment

Add a comment