fsverify

git-annex could use linux's fsverify feature as an alternative to hashing and verifying hashes of files itself.

Benefits would include:

Any read of an annexed file that uses fsverify would check the blocks that are read, and the read would fail if the file had gotten corrupted.
Avoiding any theoretical cases where git-annex add is hashing a file and something modifies it, causing the file to be added with the wrong hash (which git-annex fsck will later detect). The FS_IOC_ENABLE_VERITY ioctl prevents anything else from possibly modifying the file while it's hashing it.
Slightly faster git-annex fsck, because it would not need to hash verified files. It would suffice to read the file, and if it all read successfully, it's valid!

Since fsverify uses a merkle tree, its hashes are not the same as simply using SHA on the whole file. So for git-annex to use the fsverify hash as the key for the file, it would need to be a separate type of key. That's a bit problimatic because then git-annex would need a way to verify that merkle hash itself on systems that do not support fsverify. Also, for large files, the merkle tree can get relatively large (1/127th the size of the file the docs say). So with a terabyte of annexed files, that's gigabytes of merkle hashes, which seems too large to want to stote them in git.

Alternatively, git-annex could hash as usual for the key. This would mean that git-annex add would hash a file twice, once for the git-annex key and the second time calling the FS_IOC_ENABLE_VERITY ioctl. Slower, but perhaps these could parallelize and only use 2x the CPU or so.

Since fsverified files are readonly, this would only be useful for locked files. Unlocking a file would need to either remove the fsverify from it (if possible?) or copy it.

Using fsverify in this way would not work if the sysctl fs.verity.require_signatures is set, because the annexed files would not have signatures.

Putting all this together, fsverify is not too compelling for use by git-annex. A user who wants the verification on all reads of a file can just call FS_IOC_ENABLE_VERITY on it themselves after git-annex add. The annex.freezecontent-command hook could be used to to that.

Then the only benefit of supporting it in git-annex is that perhaps git-annex add could parallize enabling verification with checksumming, or avoid its own checksumming, and so run faster than if a hook were used to enable fsverify. And fsck would use less CPU. Is that worth complicating git-annex for? --Joey

After investigating that, I currently don't think it's compelling, so I'm gonna close this. done --Joey