Recent comments posted to this site:

comment 4
Thank you Joey for looking into it. Since there was a bit of exploration above, in the nutshell, what should the tandem of git-annex command(s) for users to do after git reset --hard COMMITISH to "time travel" most efficiently (assuming heavy repos)?
Comment by yarikoptic
comment 1

git-annex is actually using git credential here. That's where the "Username for" prompt comes from.

I think that this is a chicken and egg problem. git-annex is doing UUID discovery, which is the first thing it does when run with a new remote that does not have a UUID. But the repository does not exist, so has no UUID, and it won't be created until git push happens.

Deferring git-annex UUID discovery would avoid the problem, but I think that would be very complicated if possible at all.

I wonder if there is some way that git-annex could tell, at the http level, that this URL does not exist yet? If so, it could avoid doing UUID discovery. Then git-annex push would at least be able to push the git repo. And then on the next run git-annex would discover the UUID and would be able to fully use the repository. Not an ideal solution perhaps, since you would need to git-annex push twice in a row to fully populate the repisitory.

Looks like the url you gave just 404's, but I'm not sure if I'm seeing now the same as what you would have seen.

@matrs Any chance you could give me access to reproduce this using your server so I could look into that?

Comment by joey
comment 4

Nothing changed on the git-annex side I'm pretty sure that would have fixed this.

I am inclined to chalk this up to something having crashed in some way on that machine, and the problem later clearing up. Ugh.

Comment by joey
comment 7

One way I can see that this might happen is if git-annex forget has been used, after a previous export/import.

In that case, the content identifier database would be populated with a GIT key, which would be used instead of downloading the file to be imported. Resulting in a git sha being used, which could not be present in the git repository. Because while the git-annex branch usually gets imported/exported trees linked into it, git-annex forget erases that.

So a possible scenario:

git-annex export or import
git-annex forget
pushing git-annex branch to somewhere
in a separate git clone, pulling that git-annex branch
git-annex import

That is worth trying to replicate. But it seems pretty unlikely to me that is what you actually did ...?

Leaving aside the possibility that git hash-object might be buggy and not record the object in the git repository, that's the only way I can find for this to possibly happen, after staring at the code for far too long.

Comment by joey
comment 6

I was able to set up this same special remote myself (manually populating remote.log) and use with my own S3 creds (which of course have no special access rights to this bucket so it was all public access only), importing into a fresh repository.

Part of that import included:

import s3-dandiarchive 000345/draft/dandiset.yaml
  HttpExceptionRequest Request {
[...]
   (StatusCodeException (Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("x-amz-request-id","T0PNM10TN8STRTK4"),("x-amz-id-2","pqZXYNtU9T0mQxmHvtBjr2weztjwWwP3GleV7Jy5P3DcZbCi7Mt4Kzqo1wpPj9Zy85cZ3CUPHro="),("Content-Type","application/xml"),("Transfer-Encoding","chunked"),("Date","Fri, 02 Jan 2026 15:01:16 GMT"),("Server","AmazonS3")], responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose, responseOriginalRequest = Request {
[...]
  , responseEarlyHints = []}) "<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>T0PNM10TN8STRTK4</RequestId><HostId>pqZXYNtU9T0mQxmHvtBjr2weztjwWwP3GleV7Jy5P3DcZbCi7Mt4Kzqo1wpPj9Zy85cZ3CUPHro=</HostId></Error>")
ok

But, the import ended with:

  Failed to import some files from s3-dandiarchive. Re-run command to resume import.

And did not create a branch, so I have not been able to reproduce the problem.

Digging into why it says "ok" there, that was unfortunately only a display problem. Corrected that.

Comment by joey
comment 5

All being small files does make me think this bug is somehow specfic to adding the files to git. So it would be very useful to re-run the reproducer again, with annex.largefiles this time configured so everything is annexed.

And when you replicated the problem from the backup, were you using it in the configuration where it cannot access those?

if I got the question right and since I do not recall now -- judging from me using ( source .git/secrets.env; git-annex import master... I think I was with credentials allowing to access them (hence no errors while importing)

Well that's why I asked. It's not clear to me if it ever did show a failure, when used in the configuration where it couldn't access the files.

It seems equally likely that it somehow incorrectly thought it succeeded.

Comment by joey
Re: Directory remotes in offline drives for archiving?

The only time git-annex will complain about being unable to lock down a file on a remote is when you are dropping a file from a special remote, and the only copy is in another special remote.

drop foo (from dirremote...) (unsafe)
  Unable to lock down 1 copy of file necessary to safely drop it.

  These remotes do not support locking: otherdirremote

  (Use --force to override this check, or adjust numcopies.)

In that situation, you can either use --force or git-annex get the file, then drop from the remote, and then drop the file from the local repository. The latter avoids any possible concurrency problems, but --force is of course faster, and would be fine in your situation.

Dropping a file from a local repository that is present in a special remote does not have this problem.

Comment by joey
comment 3

Looks to me like arch is no longer stuck on the old 9.4.8 ghc but has a slightly newer 9.6.6. Which is the same as Debian stable.

So, I am probably going to make git-annex only support back to that version, to simplify things.

Please let me know if I have misunderstood the situation in arch land..

Comment by joey
comment 6

A useful thing to display might be the path to the corrupted database file and advice to remove it?

Good idea to display the path. I've made that change.

I don't think I want to make git-annex suggest deleting sqlite databases anytime sqlite crashes for any reason. While they are safe to delete, that encourages users to shrug and move on and tends to normalize any problem with sqlite. In reality, problems with sqlite are very rare, and I'd like to hear about them and understand them.

Comment by joey