I've been trying to figure out whether git-annex can be used to make a user unknowingly download data from a malicious source. The general question here is, assuming a git/git-annex server that I can fully trust to be safe and secure (let's call it trustedserver
):
Is it possible, when performing (for example) git clone git@trustedserver:user/repo && cd repo && git annex init
for annex to set up and enable a remote that is not on trustedserver
?
I'm trying to imagine a scenario where someone with access to the repository (a person who I share files with) can set up a remote to a different server (e.g., badremote
), set it to autoenable=true
, and sync changes. Would this enable the other user to put files on badremote
that are not on trustedserver
but are tracked by annex? More importantly, if this happens and I perform a git clone
→ git annex init
→ git annex sync --content
, would I be downloading files from badremote
without specifically enabling it?
Thanks, Achilleas
git-annex init
will try to auto-enable special remotes that have been configured with autoenable=true.So if someone can push to the repository on trustedserver, they can set up such a special remote and cause your later clones of it to enable the special remote. Sync will then push content to their special remote. They could also check in additional annexed file to the git repository, and put their contents on their special remote, and sync would then download the contents from there.
Of course, someone who can do this has to have write access to the git repository on trustedserver, and if they can write to the git repository, they can also send annexed file to there, unless you've prevented that somehow.
I had not really considered the autoenable=true as a potential security problem, so it's good to think about it that way. I don't know if we have a real security problem here though. It seems to rely on the attacker having write access to the trustedserver so far.
I suppose the attacker could instead convince you to pull from a clone that they control, and after you've pulled once, clones made from your repository (or trustedserver after you push to there) will then autoenable their special remote unexpectedly. Perhaps the goal then is to get git annex sync to unexpectedly send file contents there, so they can collect all your annexed files. Pulling from their repository once thus turns into sending them all your annexed files going forward.
So I am starting to see this as a security problem..
Note that pulling from someone untrusted can also change other settings in the git-annex branch (since it's automatically merged), which can probably screw up the repository fairly well in other ways, like setting numcopies to 0 and messing with preferred content expressions such that git-annex wants to drop all files, or copy files to repositories where you don't want them to go, etc.
Hey, thanks for the feedback and your thoughts. Should have gotten back to you sooner on this.
I wanted to share with you my thoughts about getting around this issue, from the point of view of the
trustedserver
administrator, and get your input on this.I want to run a server that uses git and git annex for data storage. I want users of this server to feel safe that when they clone a repository and sync content, they're not pulling things from an untrusted server. I was thinking of modifying annex configurations serverside, perhaps as a post-receive hook. The idea would be to go through the remotes on the serveride, bare git repository, and mark all unknown (ssh, rsync, etc) remotes as a
dead
.Would this cause any issues for the receiver or the sender? Other than potentially making files for the receiver unavailable (which is what I want), would it possibly put the repository in a state where the original sender can't push more changes, because of a disagreement between configurations?
I've played around with the idea a bit and I think the idea is pretty safe, but I might be missing something.
Thanks!
Achilleas
The server can certainly do filtering or blocking of changes to the git-annex branch to prevent this kind of abuse.
Marking a repository as dead will indeed prevent it from being auto-enabled. It will not cause later synchronisation problems. It seems like a perhaps too big hammer though. Cloning from such a server, and then pushing back to it would make your clone be marked as dead on the next pull!
And marking dead doesn't prevent malicious changes to preferred content settings etc.
Filtering in the
pre-receive
hook should be very doable. See internals for the git-annex branch documentation.