Recent comments posted to this site:
Anyone with write access to a repo can already freely change any group, groupwanted or wanted for any involved clone - if it's present in the git-annex branch
A good point certianly.
So your concerns only apply to private repos that don't record their activity in the git-annex branch by using
annex.private=true.
Well also repos that lack permission to push or are simply not pushed to origin.
It's probably somewhat common to want to get files from origin, but not let origin make config changes that drop all the files they have previously shared.
you can set annex.defaultwanted to "standard", and annex.defaultgroups to some group, and then changing git-annex groupwanted will affect all repositories that copied that defaultwanted into their config
If annex.defaultwanted were able to be changed for all repositories with git-annex config, then here's a really ugly security problem [...]
Yes, but the same is already possible for anyone with write access to a repo. I can git annex wanted JOEYS-UUID nothing, wait for your assistant or manual sync to auto-drop all files (would also need to set {num,min}copies to 1 for that, and even then it might not auto-drop it depending on the remotes). Anyone with write access to a repo can already freely change any group, groupwanted or wanted for any involved clone - if it's present in the git-annex branch (i.e. not made with git config annex.private=true). So your concerns only apply to private repos that don't record their activity in the git-annex branch by using annex.private=true. Making a git-annex repo private is a conscious, active choice. One does not need to do it if one only consumes files and does not have push access anyway. So that'll be people who actively change repo content, probably consume it, but don't want their repo to show up in git annex info. Maybe for a publicly-pushable git-annex repo where everyone can add new files (who would host that anyway...). In this case, yes, users of that repo can't trust each other and there setting something like git annex config --set annex.defaultwanted nothing at some point can lead to people's git annex sync|assist|assistant to suddenly drop their files - and probably also on the central remote. But I'd argue that this kind of publicly writable setup has so many other obvious problems that annex.defaultwanted is one of the minor ones.
Other situations I can imagine consider groups of people (or just single users) who trust each other when using a git-annex repo. git-annex is not designed to solve such permission problems - neither is git itself.
In your publicly readable (not writable) git-annex-builds repo on the other hand, if you were to set git annex config --set annex.defaultwanted nothing, then people who just run git annex sync|assist|assistant in their clones would have their downloaded builds dropped, okay.
git-annex usage scenarios
- publicly writable git-annex repo
- (bad idea anyway for several reasons without any form of permission control on the remote side)
- malicious people could set
git annex config --set annex.defaultwanted nothingat some point and other's clones would have files dropped on sync.
- publicly readable git-annex repo to provide assets (e.g. your git-annex-builds repo)
- only the owner could do such shenanigans. Users can avoid it by using
git annex pullandgit annex getinstead ofsync|assist|assistant(which arguably makes more sense in this case anyway) or explicitly stating theirgit annex wanted here ....
- only the owner could do such shenanigans. Users can avoid it by using
- groups or individuals working on a repo in several clones - everyone has write access, in a team for example
- anyone can already happily destroy repo contents and control other's wanted expressions
git annex config annex.defaultwantedcan be set as an established "repo policy" for everyone's convenience, that anyone can overwrite locally withgit annex wanted here ....- if you run
git annex assist|sync|assistant|satisfy, you accept the repo's policy, as with yoursecurehashesonlyexample. If you're paranoid, don't use these sync commands, but do only exactly what you want such asgit annex pull -g,git annex get <thatfile>,git annex wanted ..., etc.
If annex.defaultwanted were able to be changed for all repositories with
git-annex config, then here's a really ugly security problem:
- First, I make sure to get a copy of every annexed file.
- Then I run
git-annex config annex.defaultwanted nothing - Then I wait for git-annex to drop every file from your repository.
- Finally, I demand $ to get your files back.
Now, the same can be done by convincing people to add their repository to some group and set preferred content to "standard", and later changing the groupwanted. But that only works on people you were able to social engineer to doing that, not everyone who cloned a repository with the default settings.
And beyond the ransom problem, there's the problem that once this is set, any change to it is going to affect most every other user of the repository. With groupwanted there's a communicated intent in the name of the group, and there can be different groups with different versions of the preferred content expression. This lacks that, it encourages flag day events.
Note that you can set annex.defaultwanted to "standard", and
annex.defaultgroups to some group, and then changing
git-annex groupwanted will affect all repositories that copied that
defaultwanted into their config.
So that's a way to be able to make changes that will affect other people's clones. But only ones that they have opted into.
I'm on the fence about whether the kind of security impact I discussed earlier is really something that should prevent a global setting, or not.
git-annex config of annex.securehashesonly is another example of
something where my hypothetical "auditing repos" would be vulnerable to a
behavior change that might be security significant. Since that gets copied
from the git-annex config to git config at init time, behavior in a
new clone might be different than behavior in an existing clone.
Does that mean it's ok for there to be more cases where there can be such a potential security impact? I don't know.
The annex-ignore config can be manually set by the user to prevent using an otherwise usable remote. The man page gives the example of a network connection that is too slow to use normally.
It may be that no users are actually using annex-ignore like this. Using annex-sync seems more likely. But, it's hard to rule out.
That presents a problem, since this would need to unset annex-ignore once the repository was created.
Checking before push if the repository exists, and only unsetting annex-ignore if it did not exist before sync, but does afterwards, would be one way around this problem. It does mean that, if 2 people are making a repository at the same location at the same time, the loser may be left with annex-ignore set due to the other person having created the repository.
Or, a new config could be added, that is like annex-ignore, but is only set by git-annex, and not by the user. Keeping annex-ignore's behavior, but making git-annex set and unset the new config as needed.
Hi joey, thank you for picking this up. IIUC, what you implemented (git config annex.default{wanted,required,group}) allows you to set these configs locally and then spare yourself the initial git annex wanted . present (etc.) setup calls. This is cool, thanks!
The problem I was trying to express here is however that git annex assist (the very convenient do-it-all command you can tell non-techy people to use to 'do the syncing stuff') will by default pull in all files, resulting in a terrible user experience: it's slow (of course nobody sets annex.jobs=cpus or uses -j4), it takes up a ridiculous amount of space, people will say 'I don't need that 3GB file, why does it download it?' (of course nobody remembers or understands to set git annex wanted . present or anything complex), etc. Sure, this is a question of user education, but good defaults can make for a much easier onboarding experience. (I know you are not so fond of such a do-it-all command, but this git annex assist single-stepping command really has been a good git annex selling point in the discussions and talks I had.)
So if there was a global setting like git annex config --set annex.defaultwanted 'present or include=*.pdf' that would set the default wanted expression for any clone, one could define what the most important files are and tell everyone to git annex get the others if necessary. git annex assist will be fast, only pull in the most important files (or none!), people can modify or add new stuff, and run git annex assist quickly again.
I would say git annex config --set annex.defaultwanted <whatever> should not execute git annex wanted . <whatever> and as such hard-code it in the git-annex branch for every repo (because then again, when would that even be executed? Would it be re-set after another git annex config --set annex.defaultwanted <whatever2>? When?). Instead, git annex --set annex.defaultwanted <whatever> should cause the default (i.e. fallback) value of git annex wanted . to be <whatever>, which is currently just "", which I guess means something like include=* IIRC.
Re: your security concerns
I understand your hesitation to add more git annex config ... global repo configs. But here I would argue:
- git annex does not have a permissions model anyway. Anyone with push access to a repo can change any policy, any wanted expression for any repo, etc. If that is a problem, then git annex might not be the right tool. I guess one can implement some level of permission control with post-receive hooks on the remote side, but that is outside git annex's scope. git annex assumes everyone writing to the repo is nice.
- I don't really understand your 'auditing' repo situation. Does it mean you regularly clone some repos, run
git annex pull|assistin them to check if it still works? In that case the only negative thinggit annex config --set annex.defaultwantedcould do is indeed leaving you with less downloaded files. If one needs all files,git annex get --allhas always been the way to go, hasn't it? 🤔 Or what kind of external repos from bad actors maliciously setting a default wanted expression do you 'audit'? And how is not having all files aftergit annex assistbad in this case?
Should you consider implementing git annex config --set annex.defaultwanted, it would conflict with the freshly introduced git config annex.defaultwanted local settings. We could rename those to git config annex.initdefaultwanted (or just annex.initwanted), to emphasize that those only happen ongit annex init. Thengit annex config --set annex.defaultwanted` would also sound very sensible to me in contrast, as it really configures the default, and does not modify individual repos.
Cheers, Yann
The automatic init that git-annex does in a clone does enter adjusted branch. I think I was not considering that because you were talking about having an existing repository and git-annex entering the adjusted branch later.
We can reopen this if you want, unsure.
Oh good question!
This gets a tiny bit into internals, but .git/annex/journal-private/ is
where the private information is stored. If you move the files from there
into .git/annex/journal/, they will be committed on the next run of
git-annex.
You would need to take care to avoid overwriting any existing files in the journal, usually there won't be any though.
Also unset annex.private of course.