Recent comments posted to this site:

I can't find any way to delete a special remote. I created one that was explicitly temporary (to transfer between some other systems), and now I can't get rid of it. I've already run "git remote rm" on it, but it seems git-annex didn't pick this up.
Comment by tomdhunt Sun Mar 7 01:53:21 2021

Are you working on different computers? Is the system time in sync?

Comment by qiang.fang Sat Mar 6 00:03:51 2021

This bug has been fixed.

And next time you have a problem so blatent with a test case and everything, it's certianly a bug, so don't be afraid to file a bug report rather than using the forum. It's harder to close forum posts than bug reports!

Comment by joey Fri Mar 5 18:35:54 2021

In fact, a very simple patch that just makes a GitKey generate a "GIT" key seems to have solved this problem! Files that were non-annexed on export remain so on import, until they're changed, and then annex.largefiles controls what happens.

Once non-annexed files have been exported using the new version, they'll stay non-annexed on import. Even when an old version of git-annex is doing the importing!

When an old annex had exported, and a new one imports, what happens is the file gets imported as an annexed file. Exporting first with the new version avoids that unwanted conversion.

Interestingly though, the annexed file when that conversion happens does not use the SHA1 key from git, so its content can be retrieved. I'm not quite sure how that problem was avoided in this case but something avoided the worst behavior.

It would be possible to special case the handling of SHA1 keys without a size to make importing from an old export not do the conversion. But that risks breakage for some user who is generating their own SHA1 keys and not including a size in them. Or for some external special remote that supports IMPORTKEY and generates SHA1 keys without a size. It seems better to avoid that potential breakage of unrelated things, and keep the upgrade process somewhat complicated when non-annexed files were exported before, than it does to streamline the upgrade.

Comment by joey Fri Mar 5 17:44:54 2021

Wait... The import code has a separate "GIT" key type that it uses internally once it's decided a file should be non-annexed. Currently that never hits disk. Using that rather than a SHA1 key for the export database could be a solution.

(Using that rather than "SHA1" for the keys would also avoid the problem that the current GitKey hardcods an assumption that git uses sha1..)

Comment by joey Fri Mar 5 17:31:32 2021

The importer could check for each file, if there's a corresponding file in the branch it's generating the import for, if that file is annexed.

Should it check the branch it's generating the import for though? If the non-annexed file is "foo" and master is exported, then in master that file is renamed to "bar", the import should not look at the new master to see if the "foo" from the remote should be annexed. The correct tree to consult would be the tree that was exported to the remote last.

It seems reasonable to look at the file in that exported tree to see it was non-annexed before, and if the ContentIdentifier is the same as what was exported before, keep it non-annexed on import. If the ContentIdentifier has changed, apply annex.largefiles to decide whether or not to annex it.

The export database stores information about that tree already, but it does not keep track of whether a file was exported annexed or not. So changing the database to include an indication of that, and using it when importing, seems like a way to solve this problem, and without slowing things down much.

Alternatively the GitKey that git-annex uses for these files when exporting is represented as a SHA1 key with no size field. That's unusual; nothing else creates such a key usually. (Although some advanced users may for some reason.) Just treating such keys as non-annexed files when importing would be at least a bandaid if not a real fix.

Comment by joey Fri Mar 5 16:42:03 2021

This leads to worse behavior than just converting to annexed from non-annexed. The converted file's contents don't verify due to some confusion between git and git-annex's use of SHA1. See https://git-annex.branchable.com/forum/__96__git_annex_import__96___from_directory_loses_contents__63__/

Comment by joey Fri Mar 5 16:38:25 2021

The conversion of the git file to an annexed file is a known problem, https://git-annex.branchable.com/todo/import_tree_annexes_files_that_were_exported_non-annexed/

The failure to get the content of the file when that happens is a bug though. (I think it may be a reversion as I seem to remember that working, but I could be mistaken.)

It seems to be caused by an underlying inability to get the file:

get file.txt (from test...) (checksum...) 
  verification of content failed

Which in turn is due to a confusion between two different SHA1s. When exporting a file stored in git, git-annex use the SHA1 git uses for it, but that is not actually the SHA1 of the file, but of the file size and file or something like that. Then when the file gets converted to an annexed file, it uses a git-annex get with that same SHA1. But git-annex expects the content of a SHA1 keyed file to match that SHA1, which is not the case here.

So verification fails, and that's also why importing doesn't get the content.

This is certainly a bug. I guess the best way to fix it would be to fix the above todo.

Comment by joey Fri Mar 5 16:29:35 2021

The git-annex.linux directory from the bundle does include a git, so if you put that directory in PATH first, it will avoid the problem.

This should be a fairly transient problem, since the git-annex bundle is built with the version of git from debian unstable. Once the new git version gets released, it should fairly quickly get in there and the incompatability will stop being a problem.

Comment by joey Fri Mar 5 16:16:16 2021
Great, thank you!
Comment by kyle Tue Mar 2 20:51:52 2021