git-annex-setpresentkeygit-annexhttp://git-annex.branchable.com/git-annex-setpresentkey/git-annexikiwiki2019-01-21T15:42:51Zcomment 1http://git-annex.branchable.com/git-annex-setpresentkey/comment_1_58c98aafb5f6c20c6b71126cd17d3a40/Ilya_Shlyakhter2019-01-21T15:42:51Z2018-10-04T21:16:29Z
Is there a way to say the key exists in an export remote at a given path?
comment 2http://git-annex.branchable.com/git-annex-setpresentkey/comment_2_57b0ae8a49df6c8809aa3050610191b2/joey2019-01-21T15:42:51Z2018-10-04T21:38:41Z
<p>That's not the same information that this command deals with. There is the
per-remote metadata log, which some export remotes (currently S3) can
use to keep track of whatever information is needed to access a given
file that was exported to them.</p>
comment 3http://git-annex.branchable.com/git-annex-setpresentkey/comment_3_30a5d3fbd8e02726989ab80ed98b4a54/joey2019-01-21T15:42:51Z2018-10-08T16:24:59Z
<p>@Ilya_Shlyakhter the way export tree remotes work is git-annex keeps track
of the tree object that corresponds to the state of the remote, as well
as the usual presense tracking information. It uses the presense tracking
to know which files in the tree have reached the remote, and the tree to
work out the path to a file on the remote.</p>
<p>So the only way to manipulate its tracking for those is to update the tree
that it has recorded as exported there, as well as the presence information
this command is about. <a href="http://git-annex.branchable.com/internals/">internals</a> has the details for the export.log.</p>
comment 4http://git-annex.branchable.com/git-annex-setpresentkey/comment_4_df3018b9b3491963311063e8ff202df4/Ilya_Shlyakhter2019-01-21T15:42:51Z2018-10-08T21:43:17Z
<p>@joey thanks. But, besides export.log, the S3 remote also keeps some (undocumented?) internal state, and there's not way to update that state to record the fact that git-annex can GET a given key by downloading s3://mybucket/myobject ? Also, I feel uneasy directly manipulating git-annex internal files. Can you think of any plumbing commands, that could be added to support this use case?
The use case is, I submit a batch job that takes as input some s3:// objects, writes outputs to other s3:// objects, and returns pointers to these new s3:// objects. I want to register these new objects in git-annex, initially without downloading them, but be able to git-annex-get these objects, drop them from the S3 remote, but later be able to put them back under their original s3:// URIs. The latter ability is needed because (1) many workflows expect filenames to be in a particular form, e.g. mysamplename.pN.bam to represent mysample processed with parameter p=N; and (2) some workflow engines can reuse past results if a step is re-run with the same inputs, but they need the results to be at the same s3:// URI as when the step was first run.</p>