Creating a special S3 remote to hold files shareable by URL
(In this example, I'll assume you'll be creating a bucket in S3 named public-annex and a special remote in git-annex, which will store its files in the previous bucket, named public-s3, but change these names if you are going to do the thing for real)
Set up your special S3 remote with (at least) these options:
git annex initremote public-s3 type=s3 encryption=none bucket=public-annex chunk=0 public=yes
This way git-annex will upload the files to this repo, (when you call git
annex copy [FILES...] --to public-s3
) without encrypting them and without
chunking them. And, thanks to the public=yes, they will be
accessible by anyone with the link.
(Note that public=yes was added in git-annex version 5.20150605. If you have an older version, it will be silently ignored, and you will instead need to use the AWS dashboard to configure a public get policy for the bucket.)
Following the example, the files will be accessible at http://public-annex.s3.amazonaws.com/KEY
where KEY
is the file key created by git-annex and which you can discover running
git annex lookupkey FILEPATH
This way you can share a link to each file you have at your S3 remote.
Sharing all links in a folder
To share all the links in a given folder, for example, you can go to that folder and run (this is an example with the fish shell, but I'm sure you can do the same in bash, I just don't know exactly):
for filename in (ls)
echo $filename": https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)
end
Sharing all links matching certain metadata
The same applies to all the filters you can do with git-annex.
For example, let's share links to all the files whose author's name starts with "Mario" and are, in fact, stored at your public-s3 remote. However, instead of just a list of links we will output a markdown-formatted list of the filenames linked to their S3 urls:
for filename in (git annex find --metadata "author=Mario*" --and --in public-s3)
echo "* ["$filename"](https://public-annex.s3.amazonaws.com/"(git annex lookupkey $filename)")"
end
Very useful.
Sharing links with time-limited URLs
By using pre-signed URLs it is possible to create limits on how long a URL is valid for retrieving an object. To enable use a private S3 bucket for the remotes and then pre-sign actual URL with the script in AWS-Tools. Example:
key=`git annex lookupkey "$fname"`; sign_s3_url.bash --region 'eu-west-1' --bucket 'mybuck' --file-path $key --aws-access-key-id XX --aws-secret-access-key XX --method 'GET' --minute-expire 10
Adding the S3 URL as a source
Assuming all files in the current directory are available on S3, this will register the public S3 url for the file in git-annex, making it available for everyone through git-annex:
git annex find --in public-s3 | while read file ; do key=$(git annex lookupkey $file) echo $key https://public-annex.s3.amazonaws.com/$key done | git annex registerurl
registerurl
was introduced in 5.20150317
.
Manually configuring a public get policy
Here is how to manually configure a public get policy for a bucket, in the AWS dashboard.
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "AllowPublicRead",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::public-annex/*"
}
]
}
This should not be necessary if using a new enough version of git-annex, which can instead be configured with public=yes.
Thanks Giovanni for that nice tip!
You can additionally publish the whole git repository by eg pushing it to github. (Not if it contains private files or if you have embedded encryption keys or credentials though.)
You can tell git-annex the public url for the files too, and then others can just clone the git repository and use git-annex to download the files from S3.
You could set that up by running something like this:
You can look up the hash directories for a key using:
git annex examinekey $key --format '${hashdirlower}\n'
Many thanks. The command line I ended up using is:
to publish selected documents in my git-annex repository onto the web via a rsync special remote on a conventional http server.