Recent comments posted to this site:

In restagePointerFiles, isunmodified's call to genInodeCache is returning Nothing, and so it does not try to restage the file.

The path to the annexed file includes "läp", and that non-ascii character is causing the problem. If I rename that to "lap", the problem goes away.

Printing out the OsPath, I see "l\195\164p". That seems wrong for Windows, where it should be using UTF-16. Calling fromOsPath on it to make a RawFilePath yields "l\195\131\194\164p" which is certainly wrong, and explains why genInodeCache, which does that conversion, is failing.

So the question is how the OsPath is being constructed with the wrong encoding. In this case, it's coming from streamRestageLog. Which uses streamLogFileUnsafe. Which does not set the filesystem encoding when reading the log file. So that's the bug. I'm not sure if this bug is actually Windows specific, although the use of UTF16 on windows may be helping trigger a problem with it.

Anyway, fixed it!

Comment by joey Thu May 8 18:03:11 2025
issue persists, ATM using 10.20250115-1~ndall+1 . Joey please advise.
Comment by yarikoptic Thu May 8 17:49:56 2025

I've reproduced this.

Note that running git-annex add on the affected files will clear up the problem. It manages to restage the file, and as there is no modification, is otherwise a no-op.

As a first analysis, I looked as annex.debug output between the two versions of git-annex. There were no differences other than pid numbers and git-annex branch shas.

Also, after running git-annex get with the bad git-annex, running git status with the good git-annex in path did not correct the problem.

My current guess is that restagePointerFiles has changed its behavior in the filenames it sends to git update-index. In particular the comment about "update-index is documented as picky" about filepaths suggests a small and otherwise ok change to a file path could cause this breakage.

Comment by joey Thu May 8 16:33:54 2025

@Joey: you've got email. Thanks in advance for taking a look at my problem. I forgot to mention in the email that you really need to comment or remove the line with git config core.autocrlf false from my init script (init-git-annex.cmd|.sh) or you're going to get spurious modified files in git status. I believe the default is core.autocrlf set to true systemwide in Git for Windows. Somehow that default is what Git for Windows is most "comfortable" with, although personally I hate that it always "cooks" my newlines, so to speak. :)

Comment by jkniiv Tue May 6 22:14:48 2025

git grep for uuid is a good simple solution.

Maybe git-annex log --all could be made to show all location log changes for all keys. Then you could just grep that for the uuid to see what changes have been happening to what files (if it mapped keys back to current filenames when possible). Implementation would be git log filtered to location log files, with --raw to get the diff, then parsing the diff.

There is already code that does something very similar in Annex.RepoSize.diffBranchRepoSizes. And since that is already run by git-annex info, it would be cheap to pull out a last activity date for each repo at the same time as the repo's size, and have git-annex info display it or use it in the other ways you suggest.

The only wrinkle is that is an incremental diff since the last time it was called, so would not include dates for repos that have not changed since. So the dates would need to be cached somewhere.

Comment by joey Tue May 6 15:31:51 2025

It's fine to have multiple unrelated branches in a repository, I don't think you're doing anything wrong.

Currently --historical stops once it finds one use of the key. I think you are wanting to find all uses by any of your branches?

Maybe the right option for this use case would be a --branches that searches each branch for uses of the key and doesn't stop at the first hit. Perhaps combined with a --refspec= option that takes a wildcard that can be used to select which branches to search.

Comment by joey Tue May 6 15:18:51 2025

You can email me the test repo.

I expect bisection won't be of much use in tracking this down unfortunately.

Comment by joey Tue May 6 15:15:00 2025

Apparently, git grep in the git-annex branch is pretty performant, so this can be used to find activity times:

# (roughly) First date of "activity" in the remote
❯ git grep $REMOTE_UUID git-annex -- '*/*.log' | sort -t: -k3 | cut -d: -f3 | cut -ds -f1 | head -n1 | xargs -ITS date -d@TS
Mi 1. Jan 21:02:30 CET 2025

# latest activity in a given remote
❯ git grep $REMOTE_UUID git-annex -- '*/*.log' | sort -t: -k3 | cut -d: -f3 | cut -ds -f1 | tail -n1 | xargs -ITS date -d@TS
Di 6. Mai 11:25:49 CEST 2025
Comment by nobodyinperson Tue May 6 13:06:28 2025

Thank you for the offer. I'm a little sheepish about it though.

Because I'm mindful that my, perhaps over ambition, has knock on effects. eg. is what I am doing (with unrelated branches) breaking how unusued detects unused objects?

If that is the case, I'd understand being consistent and not facilitating this hubris.

Comment by beryllium Tue May 6 07:24:51 2025

On my end the test case is very consistent in exhibiting this problem with this particular repo -- as in I git clone it into half a dozen copies and then enter them one by one to run the commands above and, e.g., six out of six times I get the same result with the affected versions of git-annex. Ok, sometimes it doesn't suggest to you to run git-annex restage in the middle but basically when the availability of the annexed files change (mostly after get but sometimes after drop, too), git seems to think the (unlocked) files have been modified.

There is a small red herring, though, as it turns out Git for Windows' handling of core.autocrlf set to false seems to be a bit haphazard, in which case even otherwise unaffected (i.e., older) versions of git-annex (or rather core git) start to unnecessarily show files as having been modified (including non-large files stored in plain git). But that is a separate problem which doesn't involve git-annex as such.

Finally, the affected versions (in my tests) are the two latest released versions (with OsPath build flag set to true by default):

  • 10.20250320-g4c8577d3a2b963d4c790124633584537a372d389
  • 10.20250416-gb22a72cd9444071e86a46cc1eb8799e7d085b49d

In contrast, version 10.20250416 built with OsPath set to false, and the pre-OsPath released version 10.20250115-g7a8bc19228b2e16ec86836277c4077b63667b391 seem to not be affected by the problem. At least that's the conclusion that I have reached by repeatedly running the test case a few dozen times in total (in sets of six like I mentioned above).

Comment by jkniiv Tue May 6 00:14:04 2025