client side upload to a special remote

Suppose you are gathering files from users on the web and want to ingest that data into a git-annex repository, with a special remote that is eg, a S3 bucket.

You could have the web browser upload to your server, and run git-annex there, to add it to the git repository, and move it on to the S3 bucket. That is innefficient though, the file goes into the server and back out, and needs to be spooled to the server's disk as well.

This page shows a more efficient way to do it, where the web browser uploads directly to S3, and a git-annex repository is updated accordingly. There is not (currently) a way to run git-annex in a web browser. So you will need to write some custom code to do this. But with the method described here, you won't need to re-implement all of git-annex in the web browser.

Uploading from the browser to S3 is left an an exercise to the reader. All that matters really is, what filename to use in the S3 bucket? It's simplest to make the S3 special remote an exporttree=yes special remote, and then you can upload whatever filenames you want to it, rather than needing to use the same filenames git-annex uses for storing keys in a S3 bucket.

Along with the S3 bucket, you will need a server set up, which is where git-annex will run in a git repository. Set up the S3 special remote there. And make git-annex on the server a proxy for the S3 special remote:

git-annex initremote s3 type=S3 exporttree=yes encryption=none bucket=mybucket
git config remote.s3.annex-proxy true
git-annex updateproxy

If the special remote is configured with exporttree=yes, be sure to also configure the annex-tracking-branch for it on the server:

git config remote.s3.annex-tracking-branch master

Once the browser uploads the file to S3, you need to add a git-annex symlink or pointer file to the git repository. This can be done in the browser, using js-git. Generating a git-annex key is not hard, just hash the file content before/while uploading it, and see key format. Write that to a pointer file, or make a symlink to the appropriate directory under .git/annex/objects (a bit harder). Commit it to git and push the branch ("master" in this example) to your server using js-git.

All that's left is to let git-annex know that the file has been uploaded to the S3 special remote. To accomplish this, the web browser will need to talk with git-annex on the server. The easy way to accomplish that is to run git-annex p2phttp. The web browser will be speaking the P2P protocol over HTTP.

Make sure you have git-annex 10.20241031 or newer installed. That version extended the p2p protocol with a DATA-PRESENT feature, which is just what you need.

All the web browser needs to do, after uploading the S3 and pushing the git branch to the server, is POST /git-annex/$uuid/v4/put with data-present=true included in the URL parameters, along with the key of the file that was added to the git repository. Replace $uuid with the UUID of the S3 special remote. You can look that up with eg git config remote.s3.annex-uuid.

When the git-annex HTTP server receives that request, since it is configured to be able to proxy for the S3 special remote, it will act the same as if the content of the file had been sent in the request. But thanks to data-present=true, it knows the data is already in the S3 special remote. So it updates the git-annex branch to reflect that the file is stored there.

Now if someone else clones the git repository, they can git-annex get the file, and it will be downloaded from the S3 bucket, if that bucket is configured to let them read it. Your server never needs to deal with the content of the file.