Recent comments posted to this site:

comment 3

Looks to me like arch is no longer stuck on the old 9.4.8 ghc but has a slightly newer 9.6.6. Which is the same as Debian stable.

So, I am probably going to make git-annex only support back to that version, to simplify things.

Please let me know if I have misunderstood the situation in arch land..

Comment by joey
comment 6

A useful thing to display might be the path to the corrupted database file and advice to remove it?

Good idea to display the path. I've made that change.

I don't think I want to make git-annex suggest deleting sqlite databases anytime sqlite crashes for any reason. While they are safe to delete, that encourages users to shrug and move on and tends to normalize any problem with sqlite. In reality, problems with sqlite are very rare, and I'd like to hear about them and understand them.

Comment by joey
comment 5

Your previous problem with the sqlite database cannot have caused fsck to detect a checksum problem with your annexed file.

It looks like you have somehow modified annex object files, eg files in .git/annex/objects. git-annex sets permissions that usually prevent such a thing from happening.

There is no way to make git-annex accept a version of a file with a different checksum than the one recorded in git. Instead you need to git-annex add the new version of the files to the repository in place of the old version.

Here is a bash script that will pull the files out of .git/annex/bad/ and update the annexed files:

IFS=$'\n'; for x in $(git-annex find --format='${key}\n${file}\n'); do if [ "$l" ]; then f="$x"; l=; if [ -e ".git/annex/bad/$k" ]; then mv ".git/annex/bad/$k" "$f"; git-annex add "$f" ; fi; else k="$x"; l=1; fi; done
Comment by joey
comment 3

I think it makes sense for git-annex fix to deal with this situation. In both cases the user has run a git command that affects files in the workint tree, and it has left the annexed content not accessible.

Comment by joey
comment 10

That makes a lot of sense. So if I understood things right, the correct place to work on this is rclone. I think I'll try to ask what they think of this kind of use case.

Thanks for the explanation

Comment by nadir
Fixing a bit of a mess

While the database file was corrupt, I did some work (not realising it was corrupt) to fix up MP3 tags in my music collection. Now when I run git annex fsck I'm getting errors like:

fsck music/Arlo_Guthrie/The_Best_Of_Arlo_Guthrie/01-Alices_Restaurant_Massacree.mp3 
  music/Arlo_Guthrie/The_Best_Of_Arlo_Guthrie/01-Alices_Restaurant_Massacree.mp3: Bad file size (128 B larger); moved to .git/annex/bad/SHA256E-s17800671--1a992cda34a5ab52d42cd7a420114fc122458ff57672e468f8403faa77f209b0.mp3

  ** No known copies exist of music/Arlo_Guthrie/The_Best_Of_Arlo_Guthrie/01-Alices_Restaurant_Massacree.mp3
failed

and

fsck music/Arrow/misc/Hot_Hot_Hot.mp3 (checksum...) 
  music/Arrow/misc/Hot_Hot_Hot.mp3: Bad file content; moved to .git/annex/bad/SHA256E-s3444736--3178689ce4a69a0e94fe11afaf077b6471077fd2d5128a5a65a71dcf84272ed5.mp3

  ** No known copies exist of music/Arrow/misc/Hot_Hot_Hot.mp3
failed

I've tried using git annex reinject, but that is refused as the checksum doesn't match.

Can I tell git-annex to just accept the files that I have in my repository as being correct?

Comment by puck
More details in error message?

Hey,

I just came back to this after trying to do something in my repository. Good to hear I can just the SQlite file, done that now, and it is busy running fsck now.

A useful thing to display might be the path to the corrupted database file and advice to remove it?

Comment by puck
comment 1

git-annex p2phttp does update the git-annex branch itself when recieving files. And generally, any time git-annex stores an object in a repository, it updates the git-annex branch accordingly.

So, you can fetch from the remote and learn about those objects, and then git-annex unused --from=$remote will show you unused objects in the remote.

When running git-annex unused on the local repository, it does list all objects in the local repository. So if an object somehow does get into the repository without a branch update, it will still show as unused.

There is no way to list all objects present in a remote. Special remotes are not required to support emumeration at all. So, if an object got sent to a special remote, and the git-annex branch record of that was lost, there would be no way to find that unused object.

Comment by joey
Re: passing additional flags to rclone

Passing arbitrary parameters to rclone is not supported. It would possibly be a security hole if it were supported, because if there were a parameter say --deleteeverything, you could initremote a special remote with that parameter, and then wait for someone else to enableremote and use that special remote and have a bad day.

The "*" in initremote --whatelse output is a placeholder. It is not intended to mean that every possible thing is passed through, but that, if rclone supports some additional parameters, and explicitly asks for them (via GETCONFIG), they will be passed through to it.

I think that currently, rclone gitannex does not request any parameters. It would certainly be possible to make it support something like "bwlimit=3000".

Comment by joey
comment 2

It might well be possible to implement this for restic too. The crucial thing needed is for git-annex to be able to list the backups and find the annexed files. For borg, it does that by using borg list.

Comment by joey