todo/speculate-can-get : extension of speculate-presentgit-annexhttp://git-annex.branchable.com/todo/speculate-can-get___58___extension_of_speculate-present/git-annexikiwiki2019-05-03T16:31:18Zcomment 1http://git-annex.branchable.com/todo/speculate-can-get___58___extension_of_speculate-present/comment_1_264538d1cbf07254a2857ca4cf40e9ac/joey2019-05-03T15:53:49Z2019-05-03T15:39:07Z
<p>This would involve an extension to the P2P protocol to ask a remote to
git-annex get a key from its remotes.</p>
<p>But, I'd worry this could be abused. Imagine for example that you have
published a sanitized dataset by cloning the complete dataset and getting
only the files you wish to publish, and then exposed that over the P2P
protocol, with a locked-down ssh key. Such a new feature would make this
previously secure setup be exploitable to expose the unsantizied data.</p>
<p>In such a scenario, <code>GIT_ANNEX_SHELL_READONLY</code> might be set, and could be
used to avoid the unwanted behavior. But consider, the repo might be
publishing the sanitized dataset and also accepting uploads of derived data
from the people who have been given ssh keys to use it and so not have
readonly set.</p>
<p>A DOS attack seems even more likely, where you've only gotten a subset of
files into a particular clone to avoid using up too much disk space,
and then this is used to get many more files than you want there. This
could happen without a trust boundary as well. Of course, git-annex repos with
the assistant running and a bad preferred content configuration can
similarly download too much data, but that takes an explicit configuration.
This would change a scenario where "git annex get --from remote" had just
failed into one where it suddenly ran the remote out of disk.</p>
<hr />
<p>There's also the problem that it could take the remote arbitrarily long to
perform the get, and so would it need to send back progress information?
And how would that indirect download progress info be presented to the
user? Consider there could be a chain of several transfers. If it was
possible to stream the file back to the requestor as the remote received
it, the progress display would work as-is, but many file retrievals are not
streamable.</p>
comment 2http://git-annex.branchable.com/todo/speculate-can-get___58___extension_of_speculate-present/comment_2_93450de83c4843f6828e971a948d12e3/Ilya_Shlyakhter2019-05-03T16:11:06Z2019-05-03T16:11:06Z
<p>How about limiting this to just the local non-special remotes, i.e. git clones of the repo? Not ones accessible over ssh. And requiring the origin repo to have an explicitly set config setting, like annex.allow-speculate-can-get-from-this-repo, before it can be used that way.</p>
<p>I was thinking of something much simpler / less powerful than what you're describing, but it would address the real use cases I have.</p>
<p>git-annex already has several security settings that can expose data or enable attacks if used badly, but require enough explicit configuration that people who use them likely know what they're doing.</p>
comment 3http://git-annex.branchable.com/todo/speculate-can-get___58___extension_of_speculate-present/comment_3_272d66c3549892687879ef20e0cd9cf1/Ilya_Shlyakhter2019-05-03T16:26:40Z2019-05-03T16:26:40Z
Re: disk space problem, can just drop the file from the <code>speculate-can-get</code> repo, after getting the file from it? In my usage scenarios, <code>annex.hardlink</code> would also be on, so only one copy of the file would exist.
comment 4http://git-annex.branchable.com/todo/speculate-can-get___58___extension_of_speculate-present/comment_4_8627a8073fe0b4c144f26d32626e0c1b/Ilya_Shlyakhter2019-05-03T16:31:18Z2019-05-03T16:31:18Z
I guess this could also be implemented with a read-only external special remote, which has the path to the <code>speculate-can-get</code> clone as a config param, and <code>speculate-present</code> is then set to true for this external special remote.