Recent comments posted to this site:
I have now tried with most recent release 10.20251114-geeb21b831e7c45078bd9447ec2b0532a691fe471 while operating on a copy from the backup.
and looking at the fact that it starts with the latter, likely the "access restricted ones"
(venv-annex) dandi@drogon:/mnt/backup/dandi/dandiset-manifests$ ( source .git/secrets.env; git-annex import master --from s3-dandiarchive && git merge s3-dandiarchive/master )
list s3-dandiarchive ok
import s3-dandiarchive 000675/draft/assets.jsonld
ok
import s3-dandiarchive 000675/draft/assets.yaml
ok
...
while still making commits to earlier folders
(venv-annex) dandi@drogon:/mnt/backup/dandi/dandiset-manifests$ git log --stat s3-dandiarchive/master -- 000029/draft || echo $?
fatal: unable to read f7c097994e60c2b58dae464633583b65a6691415
commit ce60e6d1
Author: DANDI Team <team@dandiarchive.org>
Date: 2025 Dec 02 14:16:10 -0500
import from s3-dandiarchive
128
I suspect it just somehow "manufactures" them for public ones without fetching their keys?
Hello, Thank you for the pointers from the previous comments it seems to be indeed connected to starship in my case also. However for me, increasing the timeout up to 2000ms wasn't enough. The workaround that worked for me was to ignore_submodules...
[git_status] ignore_submodules = true
After installing git-annex from Archlinux repositories, it works again.
For some reason, I had installed git-annex-standalone (10.20220121-1) With git-annex (10.20251114-2) everything works as intended.
git-annex version: 10.20251114-geeb21b831e7c45078bd9447ec2b0532a691fe471
Sorry for the noise, should have done this before.
Best, Scinu
Even if it only re-checks when git-annex is going to use the remote (and not on every run of git-annex) that seems perhaps too often to check.
But if it checks less often than that, once per day or whatever, there will of course be a window where it has not yet noticed the change and uses the cached remote.name.annexUrl and potentially fails.
A balance might be that if it fails to connect to the remote.name.annexUrl, it could re-check it then.
Yes, the trashbin remote could be private. I think we're in agreement that's the best way to go.
--accessedwithin relies on atime, and looks at objects in the local repository only, so it would not work to find objects in the trashbin remote.
I don't think there is anything in preferred content expressions that would meet your need here exactly. It would probably be possible to add an expression that matches objects that have been present in a given repository for a given amount of time. The presence logs do have a timestamp.
Of course, if you used a directory special remote you could use
plain old find.
There are also some common setup stage tasks that pose problems but could all be fixed in one place:
- Encryption setup generates encryption keys. Which is both slow and also generating an then throwing away an encryption key is the wrong thing to do. I think this could be dealt with by copying the encryption setup of the remote that is generating the emphemeral remote into it.
- remote.name.annex-uuid is set in git config by gitConfigSpecialRemote. Either that could be disabled for ephemerals, or the uuid and name could also be inherited, which would make that a no-op.
The major difficulty in implementing this seems to be the setup stage, which is the per-special-remote code that runs during initremote/enableremote. That code can write to disk, or perform expensive operations.
A few examples:
- S3's setup makes 1 http request to verify that the bucket exists (or about 4 http requests when it needs to create the bucket). It does additional work when bucket versioning is enabled.
- directory's setup modifies the git config file to set remote.name.directory. And if that were skipped, generating the directory special remote would fail, because it reads that git config.
My gut feeling is that it won't be practical to make it possible to ephemeralize every type of special remote. But it would not be too hard to make some subset of special remotes able to be used ephemerally.
It might be possible to maintain a cache of recently used ephemeral special remotes across runs of git-annex, and so avoid needing to re-run the setup stage.
This seems like a good design to me. It will need a protocol extension to indicate when a git-annex version supports it.
It occured to me that when git-annex p2phttp is used and is proxying to a
special remote that uses this feature, it would be possible to forward the
redirect to the http client, so the server would not need to download the
object itself.
A neat optimisation potential, although implementing it would cut across several things in a way I'm unsure how to do cleanly.
That did make me wonder though, if the redirect url would always be safe to share with the client, without granting the client any abilities beyond a one-time download. And I think that's too big an assumption to make for this optionisation. Someone could choose to redirect to an url containing eg, http basic auth, which would be fine when using it all locally, but not in this proxy situation. So there would need to be an additional configuration to enable the proxy optimisation.
This is fixed in aws-0.25.1. I have made the git-annex stack build use that version. I also added a build warning when built with an older version, to hopefully encourage other builds to get updated.
Root caused to this bug: https://github.com/aristidb/aws/issues/296
Seems likely that git-annex import from an importtree=yes S3
remote on GCP is also broken since it also uses getBucket.
git-annex uses getBucket to probe if the bucket already exists, which lets it avoid dealing with the various ways that PUT of a bucket can fail. GCP also has some incompatabilities in how it responds to that, eg in the above log, it uses a custom "BucketNameUnavailable", rather than the S3 standard " BucketAlreadyExists".