This is a security hole that allows exposure of private data in files located outside the git-annex repository.

The attacker needs to have control over one of the remotes of the git-annex repository. For example, they may provide a public git-annex repository that the victim clones. Or the victim may have paired repositories with them. Or, equivilantly, the attacker could have read access to the victim's git-annex repository (eg on a server somewhere), and some channel to get commits into it (eg a pull requests).

The attacker does git-annex addurl --relaxed file:///etc/passwd and commits this to the repository in some out of the way place. Then they wait for the victim to pull the change.

The easiest exploit is when the victim is running the assistant, or is periodically doing git annex sync --content. The victim may also perform the equivilant actions manually, not noticing they're operating on the file.

In either case, git-annex gets the content of the annexed file, following the file:// url. Then git-annex transfers the content back to the attacker's repository.

It may also be possible to exploit scp:// sftp:// smb:// etc urls to get at files on other computers that the user has access to as well as localhost. I was not able to get curl to download from scp:// or sftp:// on debian (unstable) but there may be configurations that allow it.

If the url is attached to a key using a cryptographic hash, then the attacker would need to already know at least the hash of the content to exploit this.

Sending that content back to them could be considered not a security hole. Except, I can guess what content some file on your system might have, and use this attack as an oracle to determine if I guessed right, and work toward penetrating your system using that information.

So, best to not treat addurl with a hash any differently than --relaxed and --fast when addressing this hole.

The fix must involve locking down the set of allowed in url schemes. Better to have a whitelist than a blacklist. http and https seems like the right default.

Some users do rely on file:// urls, and this is fine in some use cases, eg when you're not merging changes from anyone else.

So this probably needs to be a git config of allowed url schemes, with an appropriatly scary name, like

Redirects from one url scheme to another could be usd to bypass such a whitelist. curl's --proto also affects redirects. http-conduit is limited to only http and will probably remain that way.

done in 28720c795ff57a55b48e56d15f9b6bcb977f48d9 --Joey

The same kind of attack can be used to see the content of localhost urls and other non-globally-available urls.

Redirects and DNS rebinding attacks mean that checking the IP address of the hostname in the url is not good enough. It needs to check the IP address that is actually connected to, for each http connection made, including redirects.

There will need to be a config to relax checks, like with an appropriatly scary name, like Users will want to enable support for specific subnets, or perhaps allow all addresses.

When git-annex is configured to use curl, there seems to be no way to prevent curl from following urls that redirect to localhost, other than disabling redirections. And unless git-annex also pre-resolves the IP address and rewrites it into the url passed to curl, DNS rebinding attacks would still be possible. Also, one of the remaining reasons people enable curl is to use a netrc file with passwords, and the content of urls on those sites could also be exposed by this attack. So, it seems curl should not be enabled by default and should have a big security warning if it's supported at all. Probably ought to only enable it when

http-client does not have hooks that can be used to find out what IP address it's connecting to in a secure enough way. See

Seems that building my own http Manager is the only way to go. By building my own, I can do the IP address checks inside it when it's setting up connections. And, the same manager can be passed to the S3 and WebDav libraries. (The url scheme checks could also be moved into that Manager, to prevent S3 redirect to file: url scenarios..)

restricted http manager done and used in b54b2cdc0ef1373fc200c0d28fded3c04fd57212; curl also disabled by default

http proxies are another problem. They could be on the local network, or even on localhost, and http-client does not provide a way to force a http proxy to connect to an particular IP address (is that even possible?) May have to disable use of http proxies unless Or better, find what http proxy is enabled and prevent using it if it's on an IP not allowed there.

done in cc08135e659d3ca9ea157246433d8aa90de3baf7

The external special remote interface is another way to exploit this. Since it bypasses git-annex's usual url download code, whatever fixes are put in place there won't help external special remotes.

External special remotes that use GETURLS, typically in conjunction with CLAIMURL and CHECKURL, and then themselves download the content of urls in response to a TRANSFER RETRIEVE will have the same problems as git-annex's url downloading.

An external special remote might also make a simple http request to a key/value API to download a key, and follow a redirect to file:/// or http://localhost/.

If the key uses a cryptographic hash, git-annex verifies the content. But, the attacker could have committed a key that doesn't use a hash. Also, the attacker could use the hash check as an oracle, to guess at the content of files.

If the external special remote is encrypted, the http content is passed though gpg. So, whatever location an attacker redirects it to would also have to be encrypted. gpg is not told what encryption key the content is expected to be encrypted with. (Could it be told somehow for hybrid and shared encryption which key to use? pubkey encryption of course uses the regular gpg public key).

So if the attacker knows a file that the user has encrypted with any of their gpg keys, they can provide that file, and hope it will be decrypted. Note that this does not need a redirect to a local file or web server; the attacker can make their web server serve up a gpg encrypted file. This is a separate vulnerability and was assigned CVE-2018-10859.

So, content downloaded from encrypted special remotes (both internal and external) must be rejected unless it can be verified with a hash. Then content using WORM and URL keys would not be able to be downloaded from them. Might as well also require a hash check for non-encrypted external special remotes, to block the redirection attack. There could be a config setting to say that the git-annex repository is not being shared with untrusted third parties, and relax that check.

done in b657242f5d946efae4cc77e8aef95dd2a306cd6b

Could also tighten down the gpg decryption to only allow decrypting with the provided symmetric key, as a further protection against CVE-2018-10859. If this can be done, then only remotes with encryption=pubkey will really need to reject WORM and URL keys, since encryption=shared and encryption=hybrid use a symetric key that's only used to encrypt data for that remote. Although opening those back up to WORM and URL would allow the remote sending other content stored on it, to get the wrong content decrypted. This seems unlikely to be a useful exploit in most cases, but perhaps not all cases, so probably best to not relax the rejection aven when doing this. It's still worth doing as a belt and braces fix.

AFAICS, gpg does not have a way to specify to decrypt with only a symmetric encryption key. It could be done by running gpg in an environment with an empty keyring, but gpg agent makes that difficult and it would be added complexity. Decided not to do it.

Built-in special remotes that use protocols on top of http, eg S3 and WebDAV, don't use Utility.Url, could also be exploited, and will need to be fixed separately.

not affected for url schemes, because http-client only supports http, not file:/

done for localhost/lan in b54b2cdc0ef1373fc200c0d28fded3c04fd57212


already disabled file:/. Added a scheme check, but it won't block redirects to other schemes. But youtube-dl goes off and does its own thing with other protocols anyway, so that's fine.

The youtube-dl generic extractor will download media files (including videos and photos) if passed an direct url to them. It does not seem to extract video etc from tags on html pages.

git-annex first checks if a web page is html before pointing youtube-dl at it, to avoid using it to download direct urls to media files. But that would not prevent a DNS rebinding attack which made git-annex see the html page, and youtube-dl then see a media file on localhost.

Also, the current code in htmlOnly runs youtube-dl if the html page download test fails to download anything.

Also, in the course of a download, youtube-dl could chain to other urls, depending on how its extractor works. Those urls could redirect to a localhost/lan web server.

So, youtube-dl needs to be disabled unless http address security checks are turned off.

done in e62c4543c31a61186ebf2e4e0412df59fc8630c8

Both security holes are now fixed. done --Joey