Recent comments posted to this site:

May I suggest a what references this key command? With that, I'd have been able to figure out for myself what I'd done wrong.
Comment by erics Mon May 10 22:06:02 2021

"Is it in use in git's index?"

That was it; after git reset, the file is now again reported as unused.

Thanks, and sorry for the noise.

Comment by erics Mon May 10 21:26:25 2021

I rm'ed the symlink, expecting git annex unused to again report K as unused. But it doesn't; it still reports no unused files.

That's odd. Did you also remove it from the index (git rm)?

git log --stat --no-textconv -S'K'

Perhaps try with --all to do a wider search, though given what you said above (that is was reported as unused before using addunused) my bet would be on the index.

Comment by kyle Mon May 10 19:05:57 2021

Is it in use in git's index? git-annex addunused adds a link to the working tree, and also stages it in the index. You said "I rm'ed the symlink", but that would leave it in the index, so still used.

(It's also theoretically possible for git-annex unused to incorrectly decide something is unused that is not -- it uses a bloom filter which can have false positives -- but this is highly unlikely unless your repo has a huge number of files in it. The annex.bloomaccuracy and annex.bloomaccuracy configs can be used to tweak the bloom filter.)

Comment by joey Mon May 10 19:04:00 2021

Yes, I seem to have been able to fix it like that. Also added a test case to make sure the largefiles conversion recipes keep working.

Of course it's always possible there are other cases I've not thought of..

Comment by joey Mon May 10 17:21:14 2021

Maybe the solution would be for git annex add, whenever it decides to add a file to git (due to --force-small or largefiles config), to drop the inode out of the keys database? I think that would make all of the cases described so far work.

Comment by joey Mon May 10 16:43:00 2021

Well Lukey was right, fixing this causes other breakage. Here's the bug report about what my change broke: case where using pathspec with git-commit leaves s

As well as the case in that bug, largefiles has a recipe to convert an annexed file to be stored in git, which the change broke. The recipe has git annex add --force-small be run on a file, which in turn runs git add on the file, which runs the smudge filter. So if the smudge filter then sees an annexed inode and keeps it annexed, it is going against what the user is trying to do there.

So the change has been reverted.

I guess that both problems could be avoided by having git-annex add not run git add, but stage the file in the index itself. (IIRC there were some reasons to use git add there, to do with .gitignore.)

But I'm doubtful now that all problems could be avoided. For one, consider what happens when the user follows the recipe to convert an annexed file to be stored in git, running git annex add --force-small file, which does store it in git. But then, if the smudge clean filter runs on the file later for any reason, it would still see a known annexed inode, and convert it back to being stored in the annex.

Comment by joey Mon May 10 16:20:37 2021

On the second git annex add foo, I see:

add foo (non-large file; adding content to git repository) ok

Which seems right as far as the largefiles config goes.

That in turn runs git add, which runs the smudge filter. Due to my change in 424bef6b6, the smudge filter then sees an inode it already knows (because the file was unlocked), and so it avoids adding the file to git, and leaves it annexed.

I don't quite understand how the different ways of running git commit play into this, or how the file ends up, after the commit, to having a non-annexed file recorded in the index. I guess the smudge filter must end up being run again, and perhaps the 424bef6b6 change doesn't work then, but anyway the behavior before that point is already a problem.

Another instance of I think the same problem is following the largefiles recipe to convert an annexed file to git:

git init repo; cd repo; git annex init
echo foo > file
git annex add file
git commit -m add
git annex unlock file
git rm --cached file
git annex add --force-small file
git commit -m convert file

This results in a commit that unlocked the file but leaves it annexed, and after the commit, git diff --cached shows that the index has staged a conversion from unlocked to stored in git. So very similar, and clearly 424bef6b6 cannot be allowed to cause that breakage, which does not even involve largefiles!

So I've reverted the commit for now, and we can discuss next steps in the forum thread.

Comment by joey Mon May 10 16:04:07 2021

Normally the common options are not included in every command's man page because there are over 100 lines of them. However, I do think it's worth including --quiet on fsck's man page in this specific case and am doing that.

Maybe individual command man pages should mention that there are also a bunch of common options. Perhaps those should be split out of the git-annex man page, like the git-annex-matching-options man page is handled.

Comment by joey Mon May 10 15:07:06 2021

Currently the assistant simply tries to drop when it thinks it's possible that the content may be droppable. It doesn't check if the drop is allowed before trying to drop. Because that would be redundant with the check that's done when actually dropping.

I don't see anything dangerous about this, same as running git annex drop on lots of files and having them be preserved by numcopies is not dangerous.

If it's a bug at all, it's only that it should be silent if it is unable to perform the drop due to numcopies. However, making it silent about that also seems like it would make it harder to figure out what is preventing things from being dropping in situations where you do expect drops to happen.

As in, ironically, your specific case! You have a transfer remote, which is having files pile up on it, despite them apparently having been transferred from it to both of the repos you want it to transfer them to. Since your local repo cannot access the other repo, it cannot verify it has the content and so leaves a copy on the transfer remote.

If you had a problem with that transfer remote filling up, and nothing was ever logged about why it was not dropping from it, it would be hard to understand what was going on and how to fix it (eg by trusting the other repo, or adding it as a remote, or lowering numcopies to 1).

So the logging seems like a good thing to me.

Comment by joey Mon May 10 14:53:34 2021