Please describe the problem.
I added a YT vid via addurl and then intentionally corrupted it by rewriting the underlying contents. After doing so, the video was clearly broken, but git annex fsck
did not detect this.
What steps will reproduce the problem?
$ git annex addurl 'https://www.youtube.com/watch?v=zGDzdps75ns'
addurl https://www.youtube.com/watch?v=zGDzdps75ns
(using yt-dlp) (to Small short test video [zGDzdps75ns].webm)
ok
(recording state in git...)
$ git commit -m 'test'
$ echo test | sudo tee './Small short test video [zGDzdps75ns].webm'
test
$ git annex fsck './Small short test video [zGDzdps75ns].webm'
fsck Small short test video [zGDzdps75ns].webm ok
(recording state in git...)
$ file -L ./Small\ short\ test\ video\ \[zGDzdps75ns\].webm
./Small short test video [zGDzdps75ns].webm: ASCII text
The fsck reports the file is "ok". Even an fsck --from=web
reports the video is OK:
$ git annex fsck './Small short test video [zGDzdps75ns].webm' --from=web
fsck Small short test video [zGDzdps75ns].webm
ok
(recording state in git...)
In contrast, doing the same thing to a non-youtube video via addurl does get detected by fsck:
$ git annex addurl 'https://git-annex.branchable.com/logo_small.png'
addurl https://git-annex.branchable.com/logo_small.png
(to git-annex.branchable.com_logo_small.png) ok
(recording state in git...)
$ git commit -m 'test'
$ echo test | sudo tee ./git-annex.branchable.com_logo_small.png
test
$ git annex fsck ./git-annex.branchable.com_logo_small.png
fsck git-annex.branchable.com_logo_small.png
git-annex.branchable.com_logo_small.png: Bad file size (4.74 kB smaller); moved to .git/annex/bad/SHA256E-s4749--c604d942bd8edebe5e8e01d18d1ad3604b0874c38d436c9486c52d601e4251dd.png
failed
(recording state in git...)
fsck: 1 failed
What version of git-annex are you using? On what operating system?
10.20230828, on Ubuntu Jammy.
Please provide any additional information below.
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Excited to move my photo collection over to it!
I think I found a likely explanation: The yt-dlp downloaded videos use the URL backend, which doesn't include any sort of hashing. That makes some sense, but the indicator that
git annex fsck
gives here is still misleading. You might have silent corruption without realizing it, which is never something I'd anticipate with files stored in a git repo.Should
git annex fsck
perhaps print a warning of some type when checking files it cannot verify consistency on?The intent of this is to allow downloading the same youtube video to multiple devices, since youtube might present different encodings of the video at different times.
You can use
git-annex migrate --backend=SHA256
to convert to a backend that does get hash checked.If you set the annex.securehashesonly config to true,
git-annex fsck
will complain about any files that are not hashed (as well as files using insecure hashes).Setting annex.securehashesonly will also prevent
git-annex addurl
from adding youtube videos, eg:Perhaps it would be better for addurl, with that configuration, to not generate an URL key, but hash it with the default backend.