Please describe the problem.

I have a special remote "source" set up as type=directory importtree=yes.

I pulled it into one repo in which none of the files were wanted. So far so good. I cloned that repo to a second, archive, repo. git annex sync worked (but took 2 hours). Then I did git annex get --auto, most of the files came through OK. But some said things like this:

get Pictures/.dtrash/info/IMG_2979_v1-e9dced7b.dtrashinfo (from source...) (checksum...)
  verification of content failed

  Unable to access these remotes: source

  No other repository is known to contain the file.
failed

(Both repos are unlocked via adjust --unlocked)

Upon looking at the file in the repo where it wasn't wanted, I saw this:

$ cat IMG_2979_v1-e9dced7b.dtrashinfo 
/annex/objects/SHA256E-s144--cec5c7b6a9d97344e374e8395e02b74350678147ff65d6df091f5115cf18bf72

Interesting. So, in the source directory:

$ sha256sum IMG_2979_v1-e9dced7b.dtrashinfo 
aca3ed7243def7a0bd5fcad542c66841b8e7d2a670b4cafe749eb27e032d8975  IMG_2979_v1-e9dced7b.dtrashinfo

That's not a match at all. Well, OK then:

$ sha256sum * | grep cec5c7
cec5c7b6a9d97344e374e8395e02b74350678147ff65d6df091f5115cf18bf72  IMG_2981_v1-5fc99c7a.dtrashinfo

Yikes. So for IMG_2979_v1-e9dced7b.dtrashinfo, git-annex recorded a checksum that belonged to IMG_2981_v1-5fc99c7a.dtrashinfo. Well then, what is this other file recorded as, back in the git-annex repo?

$ cat IMG_2981_v1-5fc99c7a.dtrashinfo 
/annex/objects/SHA256E-s144--cec5c7b6a9d97344e374e8395e02b74350678147ff65d6df091f5115cf18bf72

OK, so two files that were not identical in the source directory got recorded with an identical checksum in git-annex somehow. And, when they were attempted to be imported via git annex get --auto, this at least was detected there.

In this .dtrash/info directory, 436 files out of 719 were not loaded by git annex get, presumably due to this issue.

In this directory, the source files were ranging in size from 140 to 227 bytes.

In a companion directory, .dtrash/files, 24 out of 719 files exhibited this issue. These files tended to be larger, but one that was 495MB triggered it also.

I have not yet seen it outside .dtrash, but it will be many hours until this get process completes fully, as it needs to copy about 1TB of data.

In case you are wondering if there is a race condition with .dtrash: no. The only application that writes to it isn't running, and the last time a file was modified in there was over a year ago. Also the content of the .info files is just JSON and a corresponding filename embedded in them, so it is very clear that the files on the filesystem are correct and the calculated checksums at issue here were never correct.

What steps will reproduce the problem?

I have laid that out as best I can above.

What version of git-annex are you using? On what operating system?

8;.20210223 on Debian

Please provide any additional information below.

Assistant is not being used.

Setup:

REPO=Pictures
cd /acrypt/git-annex/repos
mkdir $REPO
cd $REPO
git init
git config annex.thin true
git annex init 'local hub'
git annex wanted . "include=* and exclude=$REPO/*"

# Now initialize things.
touch mtree
git annex add mtree
git annex sync
git annex adjust --unlock

git annex initremote source type=directory directory=/acrypt/git-annex/bind-ro/$REPO importtree=yes encryption=none

git annex enableremote source directory=/acrypt/git-annex/bind-ro/$REPO
git config remote.source.annex-readonly true
git config remote.source.annex-tracking-branch main:$REPO
git config annex.securehashesonly true
git config annex.genmetadata true
git config annex.diskreserve 100M

Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

done since this has pretty strongly been confirmed to be a bug that was already fixed in a newer git-annex version. --Joey