Forgejo supports "AGit-Flow" to make pull requests without requiring a user to fork a repository first. This is achieved by having a sort of branch namespace refs/for/<target-branch>/<topic> which can be pushed to by users that only have read access to the repository. This will open a PR from this branch to the named target branch.
There are efforts in upstream Forgejo to make this a more prominent alternative to forking for contributions: https://codeberg.org/forgejo/discussions/issues/131.
I am wondering how git-annex could best fit into this flow. I would like to be able to create PRs containing annexed files on Forgejo-aneksajo in this way (tracking issue on the Forgejo-aneksajo side: https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo/issues/32). Obviously annexed objects copied to the Forgejo-aneksajo instance via this path should only be available in the context of that PR in some way.
The fundamental issue seems to be that annexed objects always belong to the entire repository, and are not scoped to any branch.
I've thought of these options so far:
- Provide a "per PR special remote" that the creator of the PR could push annexed files to. This would require the user to configure an additional remote, which the AGit-Flow tries to avoid for plain-git contributions.
- A per-user special remote that is assumed to contain the annexed files for all of the users AGit-PRs. If git recognizes remote configs in the users' global git config then it could be possible to get away with configuring things once, but I am not sure of the behavior of git in that case.
- Allow read-only users to have append-only access to the annex. This must at least be limited to secure hashes though, and there are implications of DoS by malicious users filling disk space / quotas.
Worth it to note that AGit-Flow already works for contributors with write access, since they can write to the annex freely anyway.
Do you have any other ideas on how git-annex could be used in this workflow?
Hmm.. git objects also don't really belong to any particular branch. git only fetches objects referenced by the branches you clone.
Similarly, git-annex can only ever get annex objects that are listed in the git-annex branch. Even with
--all, it will not know about objects not listed there.So, seems to me you may only need to keep the PR's git-annex branch separate from the main git-annex branch, so that the main git-annex branch does not list objects from the PR. I see two problems that would need to be solved to do that:
If git-annex is able to see the PR's git-annex branch as eg (refs/foo/git-annex), it will auto-merge it into the main git-annex branch, and then --all will operate on objects from the PR as well. So the PR's git-annex branch would need to be named to avoid that.
This could be just
git push origin git-annex:refs/for/git-annex/topic-branchMaybe
git-annex synccould be made to support that for its pushes?When git-annex receives an object into the repository, the receiving side updates the git-annex branch to indicate it now has a copy of that object. So, you would need a way to make objects sent to a PR update the PR's git-annex branch, rather than the main git-annex branch.
This could be something similar to
git push -o topicin git-annex. Which would need to be a P2P protocol extension. Or maybe some trick with the repository UUID?When the PR is merged, you would then also merge its git-annex branch.
If the PR is instead rejected, and you want to delete the objects associated with it, you would first delete the PR's other branches, and then run
git-annex unused, arranging (how?) for it to see only the PR's git-annex branch and not any other git-annex branches. That would find any objects that were sent as part of the PR, that don't also happen to be used in other branches (including other PRs).I do wonder, if this were implemeted, would the git-annex workflow for the user be any better than if there were a per-PR remote for them to use? If every git-annex command that pushes the git-annex branch or sends objects to forjejo needs
-o topicto be given, then it might be a worse user experience.I think git will do that (have not checked), but a special remote needs information to be written to the git-annex branch, not just git config, so there's no way to globally configure a special remote to be accessible in every git-annex repository.
Along similar lines, forgejo could set up an autoenabled remote that contains annexed files for all AGit-PRs, and that wants any files not in the main git repository. (This could be a special remote, or a git-annex repository that just doesn't allow any ref pushes to it. The latter might be easier to deal with since
git-annex p2phttpcould serve it as just another git-annex repository.)That would solve the second problem I discussed in the comment above, because when the user copies objects to that separate remote, it will not cause git-annex in the forgejo repository to update the main git-annex branch to list those objects.
When merging a PR, forgejo would move the objects over from that remote to the main git repository.
You would be left with a bit of an problem in deleting objects from that remote when a PR is rejected. Since the user may never have pushed their git-annex branch after sending an object to it, and so you would not know what PR that object belongs to. I suppose this could be handled by finding all objects that are in active PRs and deleting ones that are not after some amount of time.
With the separate autoenabled remote for PRs, the UX could look like this:
Or with a small git-annex improvement, even:
For this, origin-PRs would want all files not in origin, and origin would want all files not in origin-PRs. And origin-PRs would need to have a lower cost than origin so that it doesn't first try, and fail, to copy the file to origin.