In a v6 repository, git annex get of a particular file re-downloaded it each time it was run. git annex whereis said the content was locally present. But, git annex fsck of the file said the content was missing, and removed it from the location log.

The file was locked, and the repository was on ext4.

Reported by gleachkr on IRC. Don't have enough information to reproduce the problem yet. --Joey

My initial analysis is that this must be a problem with Annex.Content.inAnnex. Note that checks the cached inode for the content and if it finds a mismatch, it behaves as if the content is not present. That would be consistent with the problem as reported.

When I init a v6 repository and add some locked files, no inode cache is recorded, which makes sense as they're locked.

Hypothesis: A cached inode for the key got into the keys database, despite the file being locked, and that is messing up inAnnex.

Should inAnnex even be checking the inode cache for locked content? This seems unncessary, and note that it's done for v4 mode as well as v6.

How could a cached inode for a locked file leak in? Perhaps the file was earlier unlocked. Or perhaps another, unlocked file, had the same content. I tried both these scenarios, and was able to get a cached inode to be listed for a file, but in my tests at least, it also cached the inode of the locked file, and I did not replicate the problem.

--Joey

When annex.thin is not set, inAnnex does not need to check the inode cache, and not checking it will avoid this problem.

It is necessary for inAnnex to check the inode cache when annex.thin is set, because then the object file can be a hard link to the working tree and so modifiable.

Checking for link count of 2 and only then checking the inode cache won't suffice though, because the object file could be modified and then the worktree file deleted, and then the object file would be modified with a link count of 1. So with annex.thin, have to always check the inode cache.

So, it seems what has to be done is, when annex.thin is set, check the inode cache first, if it's unchanged great, but if not, inAnnex would need to checksum the object file to determine if it's been modified. So inAnnex gets potentially very much slower for annex.thin, but I can't see a way around that. --Joey

Also, I was able to reproduce the repeated get, after unlock; lock; touch; fsck and the above change did fix that.

So, all done! --Joey

more information needed

If gleachkr comes back to IRC, it would be good to find out:

  • Was this file previously unlocked? git log on the file would probably tell, unless it was briefly unlocked in between commits.
  • Run git annex info on the file to see what its key is.
    Then, run sqlite3 .git/annex/keys/db and .dump and see what is recorded for that key, in both the "content" and "associated" tables.