Recent comments posted to this site:
I have some early work toward implementing this in the p2phttp-multi
branch.
- re mac: try
joey@datalads-imac2
fromsmaug
- a few times we used https://github.com/mxschmitt/action-tmate to interactively debug on github CI... want us to bolt it on?
credential.useHttpPath is the relevant git config for this git-credential behavior.
I think it would be reasonable for git-annex to check if that is false, and
if so, remove the path from the git credential
request for an annex+http
url.
But I agree, it would be better, in the vast majority of cases, to have a single url endpoint that serves multiple repositories.
And for that matter, if someone is running git-annex p2phttp to serve 2 different repositories right now, they are probably making the two listen on different ports and so removing the path wouldn't help. They would have to be interposing another web server that mapped those ports to paths, like you have done with forgejo-aneksajo, for the path mangling to help.
So implementing p2phttp serve multiple repositories seems better than adding such path mangling.
Unfortunately, remote.foo.annexUrl is not limited to use for p2phttp. It existed before that and could be legitimately set to a http url when p2phttp is not being used.
I agree it would be good to try to reuse the credentials of the git url for p2phttp. That could be done by just querying git credential for the git url credentials, and trying to use them for the p2phttp url. If they don't work, use git credential to prompt for the p2phttp url credentials as it does now.
If the user had credential.helper configured, they would probably already have the git credentials cached, and if not, this would cache them for later use, so no harm done asking for them. But if credential.helper was not configured, there would be an extra and wholly unncessary password prompt.
So, I think it makes sense to only do this when credential.helper is configured. And when the hostname is the same in both the git url and the p2phttp url.
Hmm, I can imagine a situation where this behavior could be considered a security hole. Suppose A and B both have accounts on the same host. A is in charge of serving the git repositories. B is in charge of serving git-annex p2phttp. This would make git-annex prompt for a password to one of user A's git repositories, and send the password to user B. So B would be able to crack into the git repos.
That is pretty farfetched. But it begs the question: If the git repository and p2phttp are on the same host, why would they ever need 2 distinct passwords? If git-annex simply doesn't support that A/B split, then that security hole can't happen.
So, git-annex could simply, when the git url and p2phttp url have the same hostname, request the git credentials for the git url, rather than for the p2phttp url.
Aha, this test on ubuntu is failing the same way as the OSX test:
https://github.com/datalad/git-annex/actions/runs/11905453897/job/33176247387
It seems that "custom-config1" only involves a annex.stalldetection setting, if I am reading the workflow file right. I was not able to reproduce the failure with that config set though.
Re the OSX failure, it seems that somehow the manifest key is not being found when the test is run on OSX. I don't know why. There is nothing in this code that should be OSX-specific.
Unfortunately I do have access to any OSX system to try to investigate this. The "datalads-mac" I used to use does not seem to exist anymore.
Of course, this test could be skipped on OSX.
Does occur to me this could somehow be exposing a deeper problem on OSX with exporttree special remotes. I have split the failing test in two, so we'll see if both fail, or only the exporttree one.
This is a new test.
Looks like it's found a legitimate bug in git-remote-annex. When the filesystem is crippled, the git-annex init checks out an adjusted branch, which here happens in the middle of git's own checkout and so legitimately confuses git.
I can reproduce this on a FAT filesystem, cloning from eg a directory special remote. Fixed this.
(The OSX failure is something else.)
FWIW, I've made some improvements that should make it need around 80% less memory in this case. Which might be enough to let it import.
Still don't have filtering on preferred contents on the fly though.
Did same memory optimisation for the versioned case, and the results are striking! Running the command until it had made 45 API requests, it was using 592788 kb of memory. Now it uses only 110968 kb.
Of that, about 78900 kb are used at startup, so it grew 29836 kb. At that point, it has gathered 23537 changes. So about 1 kb is used per change. That seems a bit more memory than really should be needed, each change takes about 75 bytes of data, eg:
"y3RixvrmLvr1oWJ7meEa4vWK6B.C.aad",3340,"dandisets/000003/draft/dandiset.jsonld",2021-09-28 02:12:39 UTC
I did try some further memory optimisation, making it avoid storing the same filename repeatedly in memory when gathering versioned changes. Which oddly didn't save any memory.
Memory profiling might let this be improved further, but needing 1 gb of memory to import a million changes to files doesn't seem too bad.
Update: Did some memory profiling, nothing stuck out as badly wrong. Lists and tuples are using as much memory as anything.
My arm64-ancient build failed today in the same way as the OSX build is failing, so I should be able to debug it there.
Huh ok, so git-remote-annex is failing to push, which is why clone later fails. And for whatever reason git doesn't propigate the error, which is why this is not visible in the transcript.
That build uses git 2.30.2. That git bundle --stdin was broken and didn't read refs from stdin at all. Also it had other bugs. I think it's best not to try to support git-remote-annex with that version of git at all, given those bugs.
That probably won't help with the OSX failure, which is with a very new git version. So I also made the test suite capture the git push output even when it exits successfully, so it can display it when the git pull fails. That should show what the problem is there.