Recent comments posted to this site:
I've reproduced this.
Note that running git-annex add
on the affected files will clear up the
problem. It manages to restage the file, and as there is no modification,
is otherwise a no-op.
As a first analysis, I looked as annex.debug output between the two versions of git-annex. There were no differences other than pid numbers and git-annex branch shas.
Also, after running git-annex get
with the bad git-annex, running git
status
with the good git-annex in path did not correct the problem.
My current guess is that restagePointerFiles has changed its behavior in the filenames it sends to git update-index. In particular the comment about "update-index is documented as picky" about filepaths suggests a small and otherwise ok change to a file path could cause this breakage.
@Joey: you've got email. Thanks in advance for taking a look at my problem. I forgot
to mention in the email that you really need to comment or remove the line with
git config core.autocrlf false
from my init script (init-git-annex.cmd|.sh) or
you're going to get spurious modified files in git status. I believe the default
is core.autocrlf
set to true systemwide in Git for Windows. Somehow that default
is what Git for Windows is most "comfortable" with, although personally I hate
that it always "cooks" my newlines, so to speak.
git grep for uuid is a good simple solution.
Maybe git-annex log --all
could be made to show all location log changes
for all keys. Then you could just grep that for the uuid to see what
changes have been happening to what files (if it mapped keys back to
current filenames when possible). Implementation would be git log
filtered to location log files, with --raw
to get the diff, then
parsing the diff.
There is already code that does something very similar in
Annex.RepoSize.diffBranchRepoSizes. And since that is already run by
git-annex info
, it would be cheap to pull out a last activity date
for each repo at the same time as the repo's size, and have git-annex
info
display it or use it in the other ways you suggest.
The only wrinkle is that is an incremental diff since the last time it was called, so would not include dates for repos that have not changed since. So the dates would need to be cached somewhere.
It's fine to have multiple unrelated branches in a repository, I don't think you're doing anything wrong.
Currently --historical stops once it finds one use of the key. I think you are wanting to find all uses by any of your branches?
Maybe the right option for this use case would be a --branches that searches each branch for uses of the key and doesn't stop at the first hit. Perhaps combined with a --refspec= option that takes a wildcard that can be used to select which branches to search.
You can email me the test repo.
I expect bisection won't be of much use in tracking this down unfortunately.
Apparently, git grep
in the git-annex branch is pretty performant, so this can be used to find activity times:
# (roughly) First date of "activity" in the remote
❯ git grep $REMOTE_UUID git-annex -- '*/*.log' | sort -t: -k3 | cut -d: -f3 | cut -ds -f1 | head -n1 | xargs -ITS date -d@TS
Mi 1. Jan 21:02:30 CET 2025
# latest activity in a given remote
❯ git grep $REMOTE_UUID git-annex -- '*/*.log' | sort -t: -k3 | cut -d: -f3 | cut -ds -f1 | tail -n1 | xargs -ITS date -d@TS
Di 6. Mai 11:25:49 CEST 2025
Thank you for the offer. I'm a little sheepish about it though.
Because I'm mindful that my, perhaps over ambition, has knock on effects. eg. is what I am doing (with unrelated branches) breaking how unusued detects unused objects?
If that is the case, I'd understand being consistent and not facilitating this hubris.
On my end the test case is very consistent in exhibiting this problem with this particular repo
-- as in I git clone it into half a dozen copies and then enter them one by one to run the commands
above and, e.g., six out of six times I get the same result with the affected versions of git-annex.
Ok, sometimes it doesn't suggest to you to run git-annex restage
in the middle but basically
when the availability of the annexed files change (mostly after get
but sometimes after drop
, too),
git seems to think the (unlocked) files have been modified.
There is a small red herring, though, as it turns out Git for Windows' handling of core.autocrlf
set to
false seems to be a bit haphazard, in which case even otherwise unaffected (i.e., older) versions
of git-annex (or rather core git) start to unnecessarily show files as having been modified (including
non-large files stored in plain git). But that is a separate problem which doesn't involve git-annex
as such.
Finally, the affected versions (in my tests) are the two latest released versions (with OsPath build flag set to true by default):
- 10.20250320-g4c8577d3a2b963d4c790124633584537a372d389
- 10.20250416-gb22a72cd9444071e86a46cc1eb8799e7d085b49d
In contrast, version 10.20250416 built with OsPath set to false, and the pre-OsPath released version 10.20250115-g7a8bc19228b2e16ec86836277c4077b63667b391 seem to not be affected by the problem. At least that's the conclusion that I have reached by repeatedly running the test case a few dozen times in total (in sets of six like I mentioned above).
In restagePointerFiles, isunmodified's call to genInodeCache is returning Nothing, and so it does not try to restage the file.
The path to the annexed file includes "läp", and that non-ascii character is causing the problem. If I rename that to "lap", the problem goes away.
Printing out the OsPath, I see
"l\195\164p"
. That seems wrong for Windows, where it should be using UTF-16. CallingfromOsPath
on it to make a RawFilePath yields"l\195\131\194\164p"
which is certainly wrong, and explains why genInodeCache, which does that conversion, is failing.So the question is how the OsPath is being constructed with the wrong encoding. In this case, it's coming from streamRestageLog. Which uses streamLogFileUnsafe. Which does not set the filesystem encoding when reading the log file. So that's the bug. I'm not sure if this bug is actually Windows specific, although the use of UTF16 on windows may be helping trigger a problem with it.
Anyway, fixed it!