Recent comments posted to this site:
TRANSFEREXPORT, in the "simple export interface" also uses TRANSFER-SUCCESS/TRANSFER-FAILURE, and should also support this extension.
One problem with this design is that there may be HTTP headers that are needed for authorization, rather than putting authentication in the url.
I think we may have talked about this at the hackfest, and came down on the side of simplicity, supporting only an url. Can't quite remember.
It might also be possible to redirect to an url when storing an object. There it is more likely that a custom http verb would be needed, rather than PUT.
I think that protocol design should leave these possibilities open to be implemented later. So, I'm going with this:
EXTENSIONS TRANSFER-RETRIEVE-URL
TRANSFER-RETRIEVE-URL Key Url
Which leaves open the possibility for later things like:
TRANSFER-RETRIEVE-URL Key Url [Header1: foo, Header2: bar]
TRANSFER-STORE-URL Key Url PUT
Probably this. In any case, it's better to upgrade before filing a bug on something like this.
git-annex (8.20211123) upstream; urgency=medium
* Bugfix: When -J was enabled, getting files could leak an
ever-growing number of git cat-file processes.
10.20251114-..... I will update/close issue according to the result.
With the separate autoenabled remote for PRs, the UX could look like this:
> git-annex add myfile
add myfile ok
> git commit -m foo
> git push origin HEAD:refs/for/main -o topic="add myfile"
> git-annex push origin-PRs
copy myfile (to origin-PRs) ... ok
Or with a small git-annex improvement, even:
> git-annex assist -o topic="add myfile"
add myfile ok
copy myfile (to origin-PRs) ... ok
For this, origin-PRs would want all files not in origin, and origin would want all files not in origin-PRs. And origin-PRs would need to have a lower cost than origin so that it doesn't first try, and fail, to copy the file to origin.
A per-user special remote that is assumed to contain the annexed files for all of the users AGit-PRs. If git recognizes remote configs in the users' global git config then it could be possible to get away with configuring things once, but I am not sure of the behavior of git in that case.
I think git will do that (have not checked), but a special remote needs information to be written to the git-annex branch, not just git config, so there's no way to globally configure a special remote to be accessible in every git-annex repository.
Along similar lines, forgejo could set up an autoenabled remote
that contains annexed files for all AGit-PRs, and that wants any files
not in the main git repository. (This could be a special remote, or a
git-annex repository that just doesn't allow any ref pushes to it. The
latter might be easier to deal with since git-annex p2phttp could serve
it as just another git-annex repository.)
That would solve the second problem I discussed in the comment above, because when the user copies objects to that separate remote, it will not cause git-annex in the forgejo repository to update the main git-annex branch to list those objects.
When merging a PR, forgejo would move the objects over from that remote to the main git repository.
You would be left with a bit of an problem in deleting objects from that remote when a PR is rejected. Since the user may never have pushed their git-annex branch after sending an object to it, and so you would not know what PR that object belongs to. I suppose this could be handled by finding all objects that are in active PRs and deleting ones that are not after some amount of time.
Obviously annexed objects copied to the Forgejo-aneksajo instance via this path should only be available in the context of that PR in some way.
The fundamental issue seems to be that annexed objects always belong to the entire repository, and are not scoped to any branch.
Hmm.. git objects also don't really belong to any particular branch. git only fetches objects referenced by the branches you clone.
Similarly, git-annex can only ever get annex objects that are listed
in the git-annex branch. Even with --all, it will not know about objects
not listed there.
So, seems to me you may only need to keep the PR's git-annex branch separate from the main git-annex branch, so that the main git-annex branch does not list objects from the PR. I see two problems that would need to be solved to do that:
If git-annex is able to see the PR's git-annex branch as eg (refs/foo/git-annex), it will auto-merge it into the main git-annex branch, and then --all will operate on objects from the PR as well. So the PR's git-annex branch would need to be named to avoid that.
This could be just
git push origin git-annex:refs/for/git-annex/topic-branch
Maybegit-annex synccould be made to support that for its pushes?When git-annex receives an object into the repository, the receiving side updates the git-annex branch to indicate it now has a copy of that object. So, you would need a way to make objects sent to a PR update the PR's git-annex branch, rather than the main git-annex branch.
This could be something similar to
git push -o topicin git-annex. Which would need to be a P2P protocol extension. Or maybe some trick with the repository UUID?
When the PR is merged, you would then also merge its git-annex branch.
If the PR is instead rejected, and you want to delete the objects
associated with it, you would first delete the PR's other branches, and
then run git-annex unused, arranging (how?) for it to see only the PR's
git-annex branch and not any other git-annex branches. That would find any
objects that were sent as part of the PR, that don't also happen to be used
in other branches (including other PRs).
I do wonder, if this were implemeted, would the git-annex
workflow for the user be any better than if there were a per-PR
remote for them to use? If every git-annex command that pushes the
git-annex branch or sends objects to forjejo needs -o topic
to be given, then it might be a worse user experience.
Glacier is in the process of being deprecated, instead there is the Deep Archive S3 storage class. https://aws.amazon.com/blogs/aws/new-amazon-s3-storage-class-glacier-deep-archive/
While it is possible to configure a S3 special remote
with storageclass=DEEP_ARCHIVE, or configure a bucket with lifecycle rules
to move objects to deep archive, git-annex won't be able to retrieve objects
stored in deep archive.
To support that, the S3 special remote would need to send a request to S3 to
restore an object from deep archive. Then later (on a subsequent git-annex run)
it can download the object from S3.
This is the API: https://docs.aws.amazon.com/AmazonS3/latest/API/API_RestoreObject.html
It includes a Tier tag which controls whether the restore is expedited. There would probably need to be a git config for that, since the user may want to get a file fast or pay less for a slower retrieval.
And there is a Days tag, which controls how long the object should be left accessible in S3. This would also make sense to have a git config.
I have opened this issue, which is a prerequisite to implementing this https://github.com/aristidb/aws/issues/297