git-annex could use linux's fsverify feature as an alternative to hashing and verifying hashes of files itself.

Benefits would include:

  • Any read of an annexed file that uses fsverify would check the blocks that are read, and the read would fail if the file had gotten corrupted.
  • Avoiding any theoretical cases where git-annex add is hashing a file and something modifies it, causing the file to be added with the wrong hash (which git-annex fsck will later detect). The FS_IOC_ENABLE_VERITY ioctl prevents anything else from possibly modifying the file while it's hashing it.
  • Slightly faster git-annex fsck, because it would not need to hash verified files. It would suffice to read the file, and if it all read successfully, it's valid!

Since fsverify uses a merkle tree, its hashes are not the same as simply using SHA on the whole file. So for git-annex to use the fsverify hash as the key for the file, it would need to be a separate type of key. That's a bit problimatic because then git-annex would need a way to verify that merkle hash itself on systems that do not support fsverify. Also, for large files, the merkle tree can get relatively large (1/127th the size of the file the docs say). So with a terabyte of annexed files, that's gigabytes of merkle hashes, which seems too large to want to stote them in git.

Alternatively, git-annex could hash as usual for the key. This would mean that git-annex add would hash a file twice, once for the git-annex key and the second time calling the FS_IOC_ENABLE_VERITY ioctl. Slower, but perhaps these could parallelize and only use 2x the CPU or so.

Since fsverified files are readonly, this would only be useful for locked files. Unlocking a file would need to either remove the fsverify from it (if possible?) or copy it.

Using fsverify in this way would not work if the sysctl fs.verity.require_signatures is set, because the annexed files would not have signatures.


Putting all this together, fsverify is not too compelling for use by git-annex. A user who wants the verification on all reads of a file can just call FS_IOC_ENABLE_VERITY on it themselves after git-annex add. The annex.freezecontent-command hook could be used to to that.

Then the only benefit of supporting it in git-annex is that perhaps git-annex add could parallize enabling verification with checksumming, or avoid its own checksumming, and so run faster than if a hook were used to enable fsverify. And fsck would use less CPU. Is that worth complicating git-annex for? --Joey

After investigating that, I currently don't think it's compelling, so I'm gonna close this. done --Joey