Hello. I have been having some trouble with downloading podcasts. I have read the guide, but I was getting the following error message:
verification of content failed
Unable to access these remotes: web
Try making some of these repositories available:
00000000-0000-0000-0000-000000000001 -- web
failed
git-annex: get: 1 failed
I have tried several times. Some times it seemed to work, others it did not — it was not very structured, so I do not recall the details. Recently I recreated the entire stuff from scratch and was downloading files, and after I upgraded to 6.20160318 it stopped working and just gives me the error message above, after downloading the file.
I tried looking up information about the web remote, but it mentions nothing about "making the web remote available" or something that I found to address this subject. So I am confused. What is wrong with my web repository? Why did it stop working? How can I fix it? How can I prevent this from happening in the future? Thank you.
Here is the version information:
git-annex version: 6.20160318
build flags: Assistant Webapp Pairing Testsuite S3(multipartupload)(storageclasses) WebDAV Inotify DBus DesktopNotify XMPP ConcurrentOutput TorrentParser MagicMime Feeds Quvi
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
local repository version: 6
supported repository versions: 5 6
upgrade supported from repository versions: 0 1 2 4 5
The web remote is automatically available (so long as you have an internet connection) and does not need any configuration to use.
What's going on here is, after git-annex downloads a file from the web remote, as from any remote, it verifies its checksum. Seems that the checksum of the content at an url on the web has changed. So, git-annex rejects the bad data.
(Note that git-annex only started verifying checksums of downloads in version 5.20151019.)
Now, sometimes you don't care if eg, a podcast file might get modified after
git-annex importfeed
initially adds it, and you want to just download whatever file is on the web now, and treat any contents coming from the web at different times as equivilant.The way to do that is to use --relaxed when running git-annex addurl or importfeed.
Thank you for your help. I would still like to ask a few related questions.
I was unaware that RSS feeds provide a form of checksum. Are they stored as the filename? I still find it unlikely that the old file was changed in the few minutes between git-annex importing the feed and me asking for the file. However, maybe the file was changed, the feed was not updated and podcast downloading software are not as strict as git-annex.
How should I proceed in order to update my podcasts. This is something I do not understand and I should play a bit with git-annex to figure out: there are multiple ways to remove files. As I understand it:
rm
the file. This removes the link. git still remembers the file and keeps it.git rm
the file and commit. This could keep the file or remove it. git remembers the file's history and keeps its last state. However, in git-annex I am not sure what is kept in .git/. Can I re-add the file?git-annex drop
the file. git-annex ensures the minimum number of copies, so this makes me believe that git totally forgets about the file (no last state copy). However, in the podcast tips you say that dropped files are not redownloaded. So this leaves me confused.I tried to
git rm -r
the podcast directory, and now I cannot importfeed it again. :-S Hmm... It is nothing important, but could you explain me what is going on, please?The checksum is not coming from the RSS feed. If you ran
git-annex importfeed
without--fast
, it downloaded the content and checksummed it locally, and this checksum becomes the one it expects to have when downloading the file again. If you rangit annex importfeed
with--fast
, it doesn't get a checksum, but it does record the current size of the file as reported by the web server.Either the size or the checksum changing could be what caused the verification problem. It's possible that the web server reports a bogus size somehow (I've seen this happen before), or the web server might be some kind of CDN that is serving up different file contents at different times for whatever reason.
If you use
git rm
or commit arm
, then any content of the file is retained in the git-annex repository, at least until you usegit annex unused
to remove it. You can check out old versions of the branch and the file content will still be there.If you use
git annex drop
, it drops the content from the local repository. git-annex still knows about the file, andgit annex get
can get the content again, perhaps by downloading it from the web again.Since git-annex still knows about the file even if it's removed or dropped,
git annex importfeed
avoids re-importing such files from a rss feed.My attempt to redo the podcast repository adding the
--fast
flag seems to be successful. Thank you.However, I would still like to ask you one thing about the delete operations. When would you use
git rm
instead ofgit-annex drop
(in a repository that uses only git-annex to store the files)?Oh, and I'll also sneak in a small question that is unrelated to the above, since I don't think it deserves its own topic: Is git-annex safe to operate in parallel tasks? Can I be adding file A and move file B and rename file C from the same repository at the same time, without the integrity of the repository being compromised?