"annex unlock" in thin mode of v6 hard-links key into the file location and makes it RW. This is obviously for the case where modifications to the file need to be done and danger is understood. In my case, I need unlock to avoid having symlinks in the files since some software doesn't digest them well (might copy without dereferencing or dereference and look for neighboring files in the directory), and I want to use unlock to pretty much provide "symlink-free" view of the tree BUT at least with some protection, which could be given if files are unlocked read-only, so no inplace modifications could happen without explicit change of the permissions.
closing because this never got to a concrete proposal that didn't have fatal problems. --Joey
The protection offered by a read-only mode is pretty minimal; any program that writes a file atomically using rename will bypass it. So, as programs are implemented better, they'll bypass this "protection" more -- not much of a protection!
Also, it doesn't make much sense to call this operation "unlock" if it's intended to not let programs modify the files.
Actually, yoh is right: read-only would be sufficient protection here. Because, with annex.thin, the worktree file is a hard link to the annex object, and the annex object lives in a mode 400 directory. So, even if the file is deleted and a new version renamed into place, the annex object will still have captured the old version.
Still don't like the self-contradition of "unlock read-only".
Of course, you can do this yourself:
So I wonder if there's any need for a git-annex command to do this.
as far as I see it
It sounds like you would want to unlock all files in the repo this way, is that right?
If so, it seems like a case for
git-annex adjust
, eggit annex adjust --hardlink
. And it would perhaps make sense to do that on a crippled filesystem by default instead of the current default of --unlock.Keeping it in adjust only avoids needing to make the unlock command do something that is not an unlocking, and it avoids needing to add a new command.
It also neatly avoids the problem that, while
git annex unlock
makes a change that can be committed to git (in v7 mode), this new operation is not something that can be committed to git (at least w/o some change to indicate it in the pointer file).git status
to list these files as modified.Was not sure if I should file a new issue for related discussion, but I thought it might align with the last comment from Ilya, but let me know if it is off-topic too much.
One of the most common "consumer" use cases across platforms is just to get the dataset and files to be processed, and then possibly even wipe it all out. Not all file systems support hard linking or CoW. I wondered if "thin" mode could be something explicit like
hardlink
, or even a new mode --mv
. Inmv
I would see annex just moving the file in its needed location uponunlock
(and probably marking ingit-annex
branch it to be not present "here") (and probably retaining their "read-only" for some level of protection? or havemv
andmv-rw
modes?).And then may be if
annex get
would get an option--unlocked
, then inthin=mv
mode, annex could take a shortcut and just place the file in a target location right away without even bothering to change any availability information in "git-annex" branch? That would also avoid stressing file systems with consuming all the inodes for.git/annex/objects
tree in such scenarios.To "just to get the dataset and files to be processed, and then possibly even wipe it all out", maybe you could just use the directory special remote?
Which common file systems do not support hardlinking? It seems that Windows does.
To fix the problem that unlocking a file causes
git status
to report it as changed ("typechange"), maybe git-annex could tell git to locally ignore the change?The ideas in that comment won't work, and here's why:
If git-annex does not maintain a hardlink in .git/annex/objects, then when you run git checkout and it replaces the working tree file with some other version, or deletes it, it's deleted the only copy of the annex object that is stored on your disk. So you lose data.
Much of this discussion seems irrelevant given that v7 is the default now and half of the discussion above is about v5 unlock.
In general, this todo suffers from far too many unrelated or only tangentially related suggestions.
Any concrete proposals, or shall I close this?
joeyh, could you please elaborate what v7/v8 does different to v5 when unlocking? I don't get it.
I need this feature (checked out real/hardlinked files while being immutable) as well. Even if it is only a thin layer of protection it may help. Where supported, git annex may use the file immutable attributes (as discussed in https://git-annex.branchable.com/internals/lockdown/) for better protection.
Imo it's lock/unlock which isn't clear about naming/semantics. We have to things here: 1. symlink vs. direct files 2. protection against mutation. These 2 things mingle together in the current implementation they are different concepts. We can not choose any free combination of these (writable symlink into the object store makes no sense). But a little finer control would be appreciated. No idea how to do this in a concrete way.
Maybe some 'git annex protect' command to set different protection modes on content (which could be abstract, no need to comply to unix semantics. for example: appendable, writable, immutable, deletable etc. git-annex could enforce the mode lazily if not supported directly)
or 'git annex rolock' (needs better name) .. which is like unlock but makes the file immutable/write protected somehow.
-o ro
and--resolve-symlinks
to create a read-only view of a repo where symlinks look like regular files.