Recent comments posted to this site:

comment 5

When a secondary worktree is used on a filesystem not supporting symlinks, it would be possible for git-annex move to move an object from another repository. And store it to the wrong location, under .git/worktrees/foo/annex/objects/. The object would still be accessible, and a later git-annex copy --to remote, if run in the same worktree, would be able to send the object on to a remote.

But if this bug gets fixed, then the misplaced object file will be left, and won't be used any longer. Which could appear to the user as data loss in some situations. Eg, the copy to the remote would fail. (There might be situations where the populated worktree file would be used as a copy of the object, but that assumes the worktree file is still populated.)

Also, git-annex drop would not delete such misplaced object files, so the user would be left with bloated repository.

So, git-annex fsck will need to be made to search out such misplaced object files and move them to the correct objects directory.

Comment by joey
comment 4

Apparently in the FAT case gitAnnexLocation is returning something like ../demo/.git/worktrees/demo-wt3/annex/objects/d13/2dd/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999 which is not the right path to the object file. Should be ../demo/.git/annex/objects/d13/2dd/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999

(In the ext4 case that does not happen, instead the reconcileStaged git diff does not include the new file. So that is a different problem.)

Turns out that .git/worktrees/foo/annex is a symlink when the filesystem supports symlinks. But, when symlinks are not supported, that symlink is not made. And so it looks for objects there, but they're not there. This could also cause other behavior differences, since other state files that go in the annex directory get written there, so git-annex inside and outside the worktree, or in different worktrees, can have different states.

That symlink is needed to make annex symlinks point to the object files. But git-annex shouldn't rely on the symlink in things like gitAnnexLocation.

Luckily, annexDir exists, and I've checked and it is the only thing that produces "annex" as a path to the annex directory. So annexDir could be made into a function that is passed the git repository and handles this special case, by returning a path like "../../annex", which when combined with the git directory in a linked worktree, ends up pointing to the main repository's ".git/annex".

Except, annexDir is not only used to find the paths to object files. It's also used to generate the symlink target. When git-annex add is run in a linked worktree, and symlinks are supported, the symlink target needs to be of the form ".git/annex/". With this annexDir change, it would not be right.

So, it seems that annexDir, and some functions that call it need to behave differently when they're generating a path into the annex directory, vs when they're generating a symlink target or other similar thing. Which is a subtle distinction to introduce.

Comment by joey
comment 3

I've verified that populatePointerFile is not getting called in this case, and does get called in the same situation on ext4. And that call is made by reconcileStaged, which is getting called. So I would look in there for the bug.

Except, interestingly, some percent of the time, on ext4, manually populating the pointer file followed by git-annex add also does not call populatePointerFile. The pointer file remains unpopulated until another process calls reconcileStaged, and it gets populated then. This seems like also a bug, possibly another case of the same bug?

Comment by joey
comment 2

In a FAT filesystem after reproducing this bug with initial file foo, the following thing also happens:

joey@darkstar:~/mnt/demo-wt3#demo-wt3>cat foo
/annex/objects/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999
joey@darkstar:~/mnt/demo-wt3#demo-wt3>cp foo bar
joey@darkstar:~/mnt/demo-wt3#demo-wt3>git add bar
joey@darkstar:~/mnt/demo-wt3#demo-wt3>cat bar
/annex/objects/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999

This seems to be another case of the bug, because the content of the object is present in the repository, so usually git add of a pointer file should result in the smudge filter populating it.

git-annex add behaves the same as well.

Comment by joey
comment 1

I reproduced the same behavior on linux, when using a FAT filesystem.

So, this has something to do with automatic entry of an unlocked adjusted branch on a crippled filesystem.

Interestingly, doing the same on ext4, and manually using git-annex unlock on the file and committing before checking out the worktree does not replicate the problem. The unlocked file is automatically populated on worktree checkout there. And manual git-annex adjust --unlock before worktree creation also doesn't have the problem, even though the worktree does end up in an adjusted unlocked branch.

(The output of git-annex get is also weird. I think what's happening is that, since the unlocked file is not populated, it is enumerated as a file that get can operate on. But then when it runs, since there is no other location, it displays that message. The command does not have anything to handle this unusual case of the file being a pointer file but its content being present in the repisitory. And, usually there is no way that can happen, eg even writing a pointer file manually followed by git add of it populates it. So I think this unusual behavior of git-annex get doesn't need to change, once this bug is fixed it should not be possible to see that behavior.)

Comment by joey
comment 4

For the record, I've experienced a similar problem when uploading a 11 GB file to Backblaze B2:

6:1 (158)-6:8 (165): Expected end element for: Name {nameLocalName = "hr", nameNamespace = Nothing, namePrefix = Nothing}, but received: EventEndElement (Name {nameLocalName = "body", nameNamespace = Nothing, namePrefix = Nothing})

Using git annex copy --explain shows that the remote provided a more useful error message:

1% 116.47 MiB 5 MiB/s 38m9s

[22:57:03.01203742] (Remote.S3) Response status: Status {statusCode = 413, statusMessage = "Request Entity Too Large"}

Maybe the status code could be inspected before parsing the body of the response to provide a clearer error message?

Comment by gioele
We'll call this solved...
OK, I verified that what you accomplished also works on my end. This must be yet another gotcha related to my previous issue regarding imports: once something is imported it is not possible to attempt a "clean" re-import. At some point in my initial testing I must have accidentally imported the files ignored at foo/*.c (root-ignore/c in my case) and from there I could not get those files not to import once again.
Comment by Spencer
comment 4

Implemented a --socket option. I have not tried connecting to it as a client, but it seems to be listening to it, so I assume all is good.

Note that it still checks for authentication when using the socket, so you will probably want to combine it with --wideopen. The socket mode allows only the current user to access it.

Comment by joey
comment 3

I've made it support nested directories, which was easy.

Should be possible to make it use runSettingsSocket indeed though.

Comment by joey