shared setting of git causes annex'ed files to be writeable!

Please describe the problem.

Not sure how I have missed this issue for a while, since now I have tried with almost year old 6.20170101-1+b1 -- the same story as with 6.20171003-g14ffdd779

I expect those files to be readable according to the 'shared' setting but definitely not writeable!

Please provide any additional information below.

$> sudo rm -rf tttt; mkdir tttt; ( cd tttt; git init --shared .; git annex init ; echo 123 >123; git annex add 123; git commit -m 'added 123'; ls -lL 123; ) 
Initialized empty shared Git repository in /tmp/tttt/.git/
init  ok
(recording state in git...)
add 123 ok
(recording state in git...)
[master (root-commit) b137ea6] added 123
 1 file changed, 1 insertion(+)
 create mode 120000 123
-rw-rw---- 1 yoh yoh 4 Oct 10 12:59 123

$> echo 11 >> tttt/123
# no error code

FWIW my umask is 077 but originally discovered on a system with umask 0002

I guess I need to annex fsck tons of repositories now :-/

fixed in git-annex v10. --Joey

RSS Atom

annex fsck reverts them back to incorrect!

even if I fix them, and then rerun 'annex fsck' to verify that I am 'ok' -- it would revert to the bad state:

$ find -ipath '*.git/annex/objects' | while read d; do chmod a-w -R $d/*/*/*; done
ls -ld ./.git/annex/objects/Kj/ZJ/MD5E-s3500--157855c754113321d3f6621abe0fe71d.tsv
dr-xr-sr-x 2 bids bids 3 Oct  3 08:36 ./.git/annex/objects/Kj/ZJ/MD5E-s3500--157855c754113321d3f6621abe0fe71d.tsv

$ git annex fsck >/dev/null

$ ls -ld ./.git/annex/objects/Kj/ZJ/MD5E-s3500--157855c754113321d3f6621abe0fe71d.tsv
drwxrwsr-x 2 bids bids 3 Oct  3 08:36 ./.git/annex/objects/Kj/ZJ/MD5E-s3500--157855c754113321d3f6621abe0fe71d.tsv

$ ls -ld ./.git/annex/objects/Kj/ZJ/MD5E-s3500--157855c754113321d3f6621abe0fe71d.tsv/*
-rw-rw-rw- 1 bids bids 3500 Oct  3 07:57 ./.git/annex/objects/Kj/ZJ/MD5E-s3500--157855c754113321d3f6621abe0fe71d.tsv/MD5E-s3500--157855c754113321d3f6621abe0fe71d.tsv

Comment by yarikoptic — Tue Oct 10 17:17:12 2017

Remove comment

comment 2

The write bit is necessary so that the files can be opened in write mode to lock them. Normally the write bit is temporarily enabled and then disabled for locking, but in a shared repository, some other user may own the file, which prevents the user from changing its permissions.

Similarly, the parent directory is not made unwritable in a shared repository, because other users won't be able to temporarily flip the write bit on when making changes.

0d432dd1a4f718225c4192d0834a4e0a34b3e4bd used the latter as a rationalle to allow the former.

I suppose it could use separate lock files from the content file, as is already done in direct mode. However interaction between different versions of git-annex with different ideas about locking could result in git annex drop losing data.

Comment by joey — Tue Oct 10 17:23:14 2017

Remove comment

comment 3

Found a following comment in the code

{- Normally, blocks writing to an annexed file, and modifies file
 - permissions to allow reading it.
 -
 - When core.sharedRepository is set, the write bits are not removed from
 - the file, but instead the appropriate group write bits are set. This is
 - necessary to let other users in the group lock the file. But, in a
 - shared repository, the current user may not be able to change a file
 - owned by another user, so failure to set this mode is ignored.
 -}

So may be it is a "Feature" although killing the whole premise of data safety while using git-annex.

In my case, shared permissions are primarily to make files/repositories readable by others, so may be I should have not used 'shared' mode anyways, since reading does not need the shared setting

Comment by yarikoptic — Tue Oct 10 17:40:08 2017

Remove comment

comment 4

our comments crossed in the hyperspace...

yeah -- I would have expected a separate lock file to be used for such cases. Now data loss is really a possibility (almost made it happen since opened the file and wrote it without unlocking, good that I caught it and was able to modify back exactly the way it was). Interactions among multiple versions of annex on the same repo is imho a lesser possibility (would require two different versions of annex being available)

Comment by yarikoptic — Tue Oct 10 17:56:47 2017

Remove comment

comment 5

It's easy to have multiple versions of git-annex installed. Especially when a repository is shared amoung users, who may install different versions for their own reasons.

I am not happy either about this weakening of safety guarantees that indirect mode repositories otherwise have. It seems at least worth a big fat warning.

I don't feel comfortable changing locking without a major repository version change, which would be a Big Deal.

Anyhow the file mode and directory mode are both needed to protect annexed files. Making the file read-only will not prevent a lot of things renaming over top of it. (Including vim!)

Comment by joey — Wed Oct 11 15:59:16 2017

Remove comment

comment 6

Making the file read-only will not prevent a lot of things renaming over top of it. (Including vim!)

not exactly... removing write permission from the key file at least makes vim report that the file read-only and disallowing (without force) to overwrite it. That is my use case where I keep burning myself constantly (e.g. this morning again). Need to not forget to switch all "shared" repos back to the single user mode. One of the promises of git-annex is that data is safe, and this mode just violates it wildly.

Comment by yarikoptic — Tue Dec 5 18:20:41 2017

Remove comment

comment 7

I've opened ?v9 changes and this can be taken care of if/when a v9 happens. It's unfortunate I didn't think of this bug when doing v8, so hopefully that new todo will remind me about it for v9.

Comment by joey — Tue Oct 12 17:40:51 2021

Remove comment

comment 8

This is being implemented in the v9-locking branch, which will only get merged when v9 happens, whenever that is. See ?v9 changes.

Comment by joey — Tue Jan 11 18:48:15 2022

Remove comment

comment 9

While this is mostly implemented in the branch, the upgrade to it has a serious danger point.

If a long-running git-annex process, eg a large drop, is running during the upgrade, then it will keep on using the current locking method. Meanwhile, other processes run after the upgrade will use the new locking method. So this could cause data loss: Old git-annex locks a file to drop it, and at the same time new git-annex-shell is used to lock the same file, to prevent it from being dropped.

Avoiding that seems to require a way to make sure there are no running git-annex processes when performing a repository upgrade. (Which would be nice in general but has somehow not been necessary until now.)

One way to do it would be to have a shared lock file that all git-annex processes hold while they are running. And upgrade takes an exclusive lock. That locking would need to be implemented first and somehow be known that any git-annex process that is running is using it, before performing the repository upgrade. Ideally without taking years in between to wait for all git-annex binaries to be upgraded.

I suppose that git-annex v9 could ship with that added lock file, and not upgrade the repository to v9 immediately. Instead, stat the lock file, and only when its ctime is sufficiently old that it seems safe to assume any running git-annex process would be using it, do the repository upgrade. Eg after 3 months or so or perhaps when the ctime is older than the last reboot. But this would not avoid problems if an older git-annex version was also used in the same repository as the new version.

Or git-annex upgrade could hunt for other running git-annex processes that are using the repository and refuse to perform the v9 upgrade. But that is hard because a processes's cwd is not necessarily inside the repository it's using (eg a remote). It would have to look for git child processes of git-annex processes that are using the repository, such as git cat-file. Also network filesystems would be a problem.

Comment by joey — Wed Jan 12 16:29:40 2022

Remove comment

comment 10

This would almost work:

Continue taking a shared lock of the content file when locking to prevent dropping. That does not need write access, only an exclusive lock does, so the content file can have its write bit removed. Also lock the new lock file, with a shared lock to prevent dropping, or an exclusive when dropping.

The old git-annex version, when dropping, will fail to exclusively lock the content file, either because it's not writable, or because of a shared lock intended to prevent dropping. So a git-annex drop that was in progress may start to fail, but it will not lose any data.

Problem: The old git-annex version, when locking to prevent dropping (eg git-annex move --from remote), will take the shared lock of the content file. If the new git-annex version is locking to drop, it will also take the shared lock of the content file, followed by the exlusive lock of the new lock file. So the old git-annex will not be able to prevent the new git-annex from dropping.

Comment by joey — Wed Jan 12 19:14:10 2022

Remove comment

comment 11

This is going to need two repository version bumps:

v9: Add the upgrade lock file, and all git-annex processes take a shared lock to avoid the repository being upgraded out from under them.

v10: Skipped until the upgrade lock file is of a certain age. Take upgrade lock before upgrading.

In v10, stop locking content files and lock separate lock files.

The age could be eg 1 month, which assumes that no pre-v9 git-annex process like git-annex move --to remote is still running after that long. Of course, that is still an assumption, but it can be pushed out as long as it takes to feel comfortable with it. Maybe 1 year? The only disadvantage really is that any v11 upgrade would also get deferred.

Since the assistant can possibly run for longer than a year without restarting, the v10 upgrade would need to be skipped when the assistant is running.

git-annex upgrade --version=10 could be available to speed up that upgrade. The user would be responsible for making sure there are no such old git-annex processes running, so that might need --force.

Comment by joey — Wed Jan 12 19:40:35 2022

Remove comment

comment 12

What kind of locking is needed for the v10 upgrade? Everything else is in place now, but the actual locking is TBD.

My plan above calls for a way to detect if another git-annex process (v9 or above) is running in the repo. That will be hard to implement though..

It might cause problems with annex.pidlock, if every git-annex process starts taking a shared lock, because pidlock does not support shared locks, so only 1 git-annex process will be able to run, perhaps in situations where multiples can run now even with pidlocking because no locking is needed.

Also, the existing locking machinery runs in the Annex monad, but such a lock needs to be implemented in Annex.hs itself, so that would be a recursive dep. And, it would add overhead to every git-annex process. (A small amount.)

Alternatively, there could be a top-level lock file that is held shared whenever locking content files. And the v10 upgrade takes an exclusive lock. But this seems to fail when a v9 process is running -- if it blocks on the shared lock for the v10 upgrade, it would still go on the lock in v9 mode in the now v10 repository.

Update: That problem can be avoided by re-reading the git config to check if v10 was enabled, once it has taken the shared lock. That will mean that v9 repos do a little bit more work when locking content and dropping. For efficiency, use an InodeCache and only re-read when it's changed. This will need the v10 upgrade to set annex.version while it is still holding the exclusive lock.