git-annex info uuid was observed to be slow on a slow NFS, because it is opening lots of .git/annex/journal files that DNE. So does git annex find --in remote.

Normally the journal is empty, but each query of a file from the git-annex branch still tries to open the corresponding journal file.

It seems that this could be improved by making such query commands either commit the journal to the branch once at startup, or check if the journal is empty, and once there's known to be nothing in the journal, avoid opening files from there.

Minimum patch to test if this is a significant performance impact:

diff --git a/Annex/Branch.hs b/Annex/Branch.hs
index 0d8eb7944..686791a3a 100644
--- a/Annex/Branch.hs
+++ b/Annex/Branch.hs
@@ -222,7 +222,7 @@ get file = do
  - (Changing the value this returns, and then merging is always the
  - same as using get, and then changing its value.) -}
 getLocal :: FilePath -> Annex L.ByteString
-getLocal file = go =<< getJournalFileStale file
+getLocal file = go =<< pure Nothing -- getJournalFileStale file
    go (Just journalcontent) = return journalcontent
    go Nothing = getRef fullname file

I don't see any measuable speed gain with this on my laptop SSD, but maybe on a slower filesystem it will?

But: Concurrency. Another process may be writing changes to the git-annex branch, via the journal, and so this would be a behavior change. Mostly that seems acceptible, because there's no defined ordering of events in such a situation, and this change only makes it so that the writes effectively always come after the reads.

But: Batch jobs. When a git-annex command is run in a batch mode, its caller can currently safely expect that running some other command, that modifies the git-annex branch, followed by asking the batch mode command to query it will yield a result that takes the earlier write into account.

So, this optimisation seems it would be limited to commands that are not in batch mode and do strictly read-only queries. Which seems a bit hard to plumb through to the git-annex branch reading code.

Unless, perhaps, we can rely on the process that modifies the git-annex branch having cleanly exited, and so committed its changes to the branch, before the next batch query. Can we rely on that, or might a batch mode command expect to see changes made up to the point it started by a concurrent command?

An easier alternative would be an option that bypasses reading the journal. But maybe there's some other, better way to avoid this slow case? --Joey

yoh benchmarked the above patch on the slow NFS system, and it did not result in a notable difference in speed, so it seems the basis of this todo is false. done --Joey