Recent comments posted to this site:
Looks to me like arch is no longer stuck on the old 9.4.8 ghc but has a slightly newer 9.6.6. Which is the same as Debian stable.
So, I am probably going to make git-annex only support back to that version, to simplify things.
Please let me know if I have misunderstood the situation in arch land..
A useful thing to display might be the path to the corrupted database file and advice to remove it?
Good idea to display the path. I've made that change.
I don't think I want to make git-annex suggest deleting sqlite databases anytime sqlite crashes for any reason. While they are safe to delete, that encourages users to shrug and move on and tends to normalize any problem with sqlite. In reality, problems with sqlite are very rare, and I'd like to hear about them and understand them.
Your previous problem with the sqlite database cannot have caused fsck to detect a checksum problem with your annexed file.
It looks like you have somehow modified annex object files, eg files in
.git/annex/objects. git-annex sets permissions that usually prevent such
a thing from happening.
There is no way to make git-annex accept a version of a file with a different
checksum than the one recorded in git. Instead you need to git-annex add the
new version of the files to the repository in place of the old version.
Here is a bash script that will pull the files out of .git/annex/bad/
and update the annexed files:
IFS=$'\n'; for x in $(git-annex find --format='${key}\n${file}\n'); do if [ "$l" ]; then f="$x"; l=; if [ -e ".git/annex/bad/$k" ]; then mv ".git/annex/bad/$k" "$f"; git-annex add "$f" ; fi; else k="$x"; l=1; fi; done
I think it makes sense for git-annex fix to deal with this situation.
In both cases the user has run a git command that affects files in the
workint tree, and it has left the annexed content not accessible.
That makes a lot of sense. So if I understood things right, the correct place to work on this is rclone. I think I'll try to ask what they think of this kind of use case.
Thanks for the explanation
While the database file was corrupt, I did some work (not realising it was corrupt) to fix up MP3 tags in my music collection. Now when I run git annex fsck I'm getting errors like:
fsck music/Arlo_Guthrie/The_Best_Of_Arlo_Guthrie/01-Alices_Restaurant_Massacree.mp3
music/Arlo_Guthrie/The_Best_Of_Arlo_Guthrie/01-Alices_Restaurant_Massacree.mp3: Bad file size (128 B larger); moved to .git/annex/bad/SHA256E-s17800671--1a992cda34a5ab52d42cd7a420114fc122458ff57672e468f8403faa77f209b0.mp3
** No known copies exist of music/Arlo_Guthrie/The_Best_Of_Arlo_Guthrie/01-Alices_Restaurant_Massacree.mp3
failed
and
fsck music/Arrow/misc/Hot_Hot_Hot.mp3 (checksum...)
music/Arrow/misc/Hot_Hot_Hot.mp3: Bad file content; moved to .git/annex/bad/SHA256E-s3444736--3178689ce4a69a0e94fe11afaf077b6471077fd2d5128a5a65a71dcf84272ed5.mp3
** No known copies exist of music/Arrow/misc/Hot_Hot_Hot.mp3
failed
I've tried using git annex reinject, but that is refused as the checksum doesn't match.
Can I tell git-annex to just accept the files that I have in my repository as being correct?
Hey,
I just came back to this after trying to do something in my repository. Good to hear I can just the SQlite file, done that now, and it is busy running fsck now.
A useful thing to display might be the path to the corrupted database file and advice to remove it?
git-annex p2phttp does update the git-annex branch itself when recieving
files. And generally, any time git-annex stores an object in a repository,
it updates the git-annex branch accordingly.
So, you can fetch from the remote and learn about those objects,
and then git-annex unused --from=$remote will show you unused objects in
the remote.
When running git-annex unused on the local repository, it does list all
objects in the local repository. So if an object somehow does get into the
repository without a branch update, it will still show as unused.
There is no way to list all objects present in a remote. Special remotes are not required to support emumeration at all. So, if an object got sent to a special remote, and the git-annex branch record of that was lost, there would be no way to find that unused object.
Passing arbitrary parameters to rclone is not supported. It would possibly
be a security hole if it were supported, because if there were a parameter
say --deleteeverything, you could initremote a special remote with that
parameter, and then wait for someone else to enableremote and use that
special remote and have a bad day.
The "*" in initremote --whatelse output is a placeholder. It is not
intended to mean that every possible thing is passed through, but that,
if rclone supports some additional parameters, and explicitly asks for
them (via GETCONFIG), they will be passed through to it.
I think that currently, rclone gitannex does not request any parameters.
It would certainly be possible to make it support something like
"bwlimit=3000".
It might well be possible to implement this for restic too.
The crucial thing needed is for git-annex to
be able to list the backups and find the annexed files. For borg,
it does that by using borg list.