Please describe the problem.
Came up in the course of - BF: allow for empty output directory to be specified to run
What steps will reproduce the problem?
Here is a bash script
#!/bin/bash
# https://github.com/datalad/datalad/pull/7654#issuecomment-2334087030
export PS4='> '
set -x
set -eu
cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"
function annexsync() {
# call we have in datalad
# git -c diff.ignoreSubmodules=none -c core.quotepath=false annex sync --no-push --no-pull --no-resolvemerge --no-content -c annex.dotfiles=true --no-commit
git annex sync
:
}
git init
git annex init
mkdir empty full
touch emptyfile full/emptyfile
echo c1 > full/withcontent
echo c2 > withcontent
git annex add *
git commit -m 'Initial commit'
echo content >| empty/withcontent
touch empty/emptyfile
git annex add empty/*
git commit -m 'Added empty/ files'
annexsync
pwd
ls -l
git status
git diff
which if ran on TMPDIR to be on a crippled FS, e.g. vfat, it would report at the end git diff
for all empty files, but not for files with content, e.g. using our eval_under_testloopfs helper
❯ DATALAD_TESTS_TEMP_FSSIZE=300 tools/eval_under_testloopfs ../trash/adjusted-git-diff.sh
...
> ls -l
total 24
drwxr-xr-x 2 yoh root 8192 Sep 6 09:59 empty
-rwxr-xr-x 1 yoh root 0 Sep 6 09:59 emptyfile
drwxr-xr-x 2 yoh root 8192 Sep 6 09:59 full
-rwxr-xr-x 1 yoh root 3 Sep 6 09:59 withcontent
> git status
On branch adjusted/master(unlocked)
nothing to commit, working tree clean
> git diff
diff --git a/empty/emptyfile b/empty/emptyfile
--- a/empty/emptyfile
+++ b/empty/emptyfile
@@ -1 +0,0 @@
-/annex/objects/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
diff --git a/emptyfile b/emptyfile
--- a/emptyfile
+++ b/emptyfile
@@ -1 +0,0 @@
-/annex/objects/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
diff --git a/full/emptyfile b/full/emptyfile
--- a/full/emptyfile
+++ b/full/emptyfile
@@ -1 +0,0 @@
-/annex/objects/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
I: done, unmounting
What version of git-annex are you using? On what operating system?
first locally with 10.20240430 and then current 10.20240831+git21-gd717e9aca0-1~ndall+1
A simplified test case, which works on any filesystem, not only crippled filesystems:
The adjusted branch is not even needed.
git-annex add emptyfile
followed bygit-annex unlock emptyfile
has the same result.In this case,
git diff
is running thegit-annex smudge --clean
filter every time. Which IIRC is a bug of some kind with git when smudging empty files.I've verified that
git-annex smudge --clean
behaves corretly. It outputs the same annex link that was already staged. So git diff is choosing for whatever reason to ignore what it output, and using "" as the content of the file instead.So, I think this is a git bug, which git-annex cannot work around.
See also Empty files make git status slow which is about the repeated and unncessary running of the smudge filter on empty files. There I hypothesize that git treats 0 size in the index as an indication that it doesn't know about the file, so generally mishandles empty files.
And see also resolvemerge fails when unlocked empty files exist where I identified a related git bug, where an empty unlocked file causes git to crash with an internal error, and reported it to the git developers. Unfortunately, nobody ever responded to my bug report.
Perhaps the thing to do is for git-annex to refuse to store an empty file as an unlocked file. It could still use annex symlinks for locked empty files, but unlocking would necessarily switch to an empty file stored in git the usual way. Unfortunately, that would make reverse adjusting an unlocked branch not know if the file was intended to be annexed or not. Also, it doesn't help for any repositories that already contain unlocked empty files.
Filed a bug report on git, with a testcase that does not need git-annex:
https://lore.kernel.org/git/aC90kn2mE93DCJEH@kitenet.net/T/#u