Recent comments posted to this site:
Running git-annex fsck
in the affected worktree will clean up from this
bug.
I have fixed this bug.
I still need to make git-annex fsck
clean up repositories that
encountered this bug.
When a secondary worktree is used on a filesystem not supporting symlinks,
it would be possible for git-annex move
to move an object from another
repository. And store it to the wrong location, under
.git/worktrees/foo/annex/objects/
. The object would still be accessible,
and a later git-annex copy --to remote
, if run in the same worktree,
would be able to send the object on to a remote.
But if this bug gets fixed, then the misplaced object file will be left, and won't be used any longer. Which could appear to the user as data loss in some situations. Eg, the copy to the remote would fail. (There might be situations where the populated worktree file would be used as a copy of the object, but that assumes the worktree file is still populated.)
Also, git-annex drop
would not delete such misplaced object files, so the
user would be left with bloated repository.
So, git-annex fsck
will need to be made to search out such misplaced
object files and move them to the correct objects directory.
Apparently in the FAT case gitAnnexLocation
is returning something like
../demo/.git/worktrees/demo-wt3/annex/objects/d13/2dd/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999
which is not the right path to the object file. Should be
../demo/.git/annex/objects/d13/2dd/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999
(In the ext4 case that does not happen, instead the reconcileStaged git diff
does not include the new file. So that is a different problem.)
Turns out that .git/worktrees/foo/annex
is a symlink when the filesystem
supports symlinks. But, when symlinks are not supported, that symlink is
not made. And so it looks for objects there, but they're not there.
This could also cause other behavior differences, since other state files
that go in the annex directory get written there, so git-annex inside
and outside the worktree, or in different worktrees, can have different states.
That symlink is needed to make annex symlinks point to the object files.
But git-annex shouldn't rely on the symlink in things like
gitAnnexLocation
.
Luckily, annexDir
exists, and I've checked and it is the only thing
that produces "annex" as a path to the annex directory. So annexDir
could
be made into a function that is passed the git repository and
handles this special case, by returning a path like "../../annex", which
when combined with the git directory in a linked worktree, ends up pointing
to the main repository's ".git/annex".
Except, annexDir
is not only used to find the paths to object files. It's
also used to generate the symlink target. When git-annex add
is run in
a linked worktree, and symlinks are supported, the symlink target needs to
be of the form ".git/annex/". With this annexDir
change, it would not be
right.
So, it seems that annexDir
, and some functions that call it need to behave
differently when they're generating a path into the annex directory, vs
when they're generating a symlink target or other similar thing.
Which is a subtle distinction to introduce.
I've verified that populatePointerFile
is not getting called in this case,
and does get called in the same situation on ext4. And that call is made by
reconcileStaged
, which is getting called.
So I would look in there for the bug.
Except, interestingly, some percent of the time, on ext4, manually
populating the pointer file followed by git-annex add also does not call
populatePointerFile
. The pointer file remains unpopulated until another
process calls reconcileStaged
, and it gets populated then. This seems
like also a bug, possibly another case of the same bug?
In a FAT filesystem after reproducing this bug with initial file foo
,
the following thing also happens:
joey@darkstar:~/mnt/demo-wt3#demo-wt3>cat foo
/annex/objects/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999
joey@darkstar:~/mnt/demo-wt3#demo-wt3>cp foo bar
joey@darkstar:~/mnt/demo-wt3#demo-wt3>git add bar
joey@darkstar:~/mnt/demo-wt3#demo-wt3>cat bar
/annex/objects/SHA256E-s30--dcf81122854db210a12a47851a3430b6ab000e3f981b5266f0873b94d130c999
This seems to be another case of the bug, because the content of the object
is present in the repository, so usually git add
of a pointer file
should result in the smudge filter populating it.
git-annex add
behaves the same as well.
I reproduced the same behavior on linux, when using a FAT filesystem.
So, this has something to do with automatic entry of an unlocked adjusted branch on a crippled filesystem.
Interestingly, doing the same on ext4, and manually using git-annex
unlock
on the file and committing before checking out the worktree does
not replicate the problem. The unlocked file is automatically populated on
worktree checkout there. And manual git-annex adjust --unlock
before
worktree creation also doesn't have the problem, even though the worktree
does end up in an adjusted unlocked branch.
(The output of git-annex get
is also weird. I think what's happening is
that, since the unlocked file is not populated, it is enumerated as a file
that get
can operate on. But then when it runs, since there is no other
location, it displays that message. The command does not have anything to
handle this unusual case of the file being a pointer file but its content
being present in the repisitory. And, usually there is no way that can
happen, eg even writing a pointer file manually followed by git add
of it
populates it. So I think this unusual behavior of git-annex get
doesn't
need to change, once this bug is fixed it should not be possible to see
that behavior.)
For the record, I've experienced a similar problem when uploading a 11 GB file to Backblaze B2:
6:1 (158)-6:8 (165): Expected end element for: Name {nameLocalName = "hr", nameNamespace = Nothing, namePrefix = Nothing}, but received: EventEndElement (Name {nameLocalName = "body", nameNamespace = Nothing, namePrefix = Nothing})
Using git annex copy --explain
shows that the remote provided a more useful error message:
1% 116.47 MiB 5 MiB/s 38m9s
[22:57:03.01203742] (Remote.S3) Response status: Status {statusCode = 413, statusMessage = "Request Entity Too Large"}
Maybe the status code could be inspected before parsing the body of the response to provide a clearer error message?
foo/*.c
(root-ignore/c
in my case) and from there I could not get those files not to import once again.
Implemented a --socket option. I have not tried connecting to it as a client, but it seems to be listening to it, so I assume all is good.
Note that it still checks for authentication when using the socket, so you will probably want to combine it with --wideopen. The socket mode allows only the current user to access it.