Please describe the problem.
original issue: https://github.com/datalad/datalad-fuse/issues/118
I obtained unexpected result from
(git)smaug:/mnt/btrfs/datasets/datalad/crawl/openneuro/ds000001[master]git
$> git annex whereis sub-02/anat/sub-02_inplaneT2.nii.gz
whereis sub-02/anat/sub-02_inplaneT2.nii.gz (2 copies)
8d2b6e96-ad81-44a5-99b4-0ec37d6b3800 -- s3-PUBLIC
b5dd2e3d-825f-4bc2-b719-cba1059f6bfc -- root@93184394ac19:/datalad/ds000001
ok
giving no URLs, whenever I expected it to be autoenabled
$> git annex info --autoenable
8d2b6e96-ad81-44a5-99b4-0ec37d6b3800 -- [s3-PUBLIC]
but overall git annex info
not stating that it is enabled
$> git annex info | grep s3-PU
8d2b6e96-ad81-44a5-99b4-0ec37d6b3800 -- s3-PUBLIC
2.42 GB: 8d2b6e96-ad81-44a5-99b4-0ec37d6b3800 -- s3-PUBLIC
and looking to .git/config
showed oddity -- it has different UUID!
$> grep -A2 s3-PUBLIC .git/config
[remote "s3-PUBLIC"]
annex-s3 = true
annex-uuid = deaa691f-c824-4416-9bf8-a94a47dd31b5
and then looking at remote.log we see the mess:
$> git show git-annex:remote.log | grep s3-PUBLIC
8d2b6e96-ad81-44a5-99b4-0ec37d6b3800 autoenable=true bucket=openneuro.org datacenter=US encryption=none exporttree=yes fileprefix=ds000001/ host=s3.amazonaws.com name=s3-PUBLIC partsize=1GiB port=80 public=yes publicurl=https://s3.amazonaws.com/openneuro.org storageclass=STANDARD type=S3 versioning=yes timestamp=1598041450.944011857s
deaa691f-c824-4416-9bf8-a94a47dd31b5 autoenable=true bucket=openneuro.org datacenter=US encryption=none exporttree=yes fileprefix=ds000001/ host=s3.amazonaws.com name=s3-PUBLIC partsize=1GiB port=80 public=yes publicurl=http://openneuro.org.s3.amazonaws.com/ storageclass=STANDARD type=S3 versioning=yes timestamp=1541446534.498728751s
deaa691f-c824-4416-9bf8-a94a47dd31b5 autoenable=true bucket=openneuro.org datacenter=US encryption=none exporttree=yes fileprefix=ds000001/ host=s3.amazonaws.com name=s3-PUBLIC partsize=1GiB port=80 public=yes publicurl=https://s3.amazonaws.com/openneuro.org storageclass=STANDARD type=S3 versioning=yes timestamp=1597693935.116974698s
deaa691f-c824-4416-9bf8-a94a47dd31b5 autoenable=true bucket=openneuro.org datacenter=US encryption=none exporttree=yes fileprefix=ds000001/ host=s3.amazonaws.com name=s3-PUBLIC-unversioned partsize=1GiB port=80 public=yes publicurl=https://s3.amazonaws.com/openneuro.org storageclass=STANDARD type=S3 versioning=yes timestamp=1598041440.701349559s
git annex fsck
happily completes and nothing complains about nothing.
What steps will reproduce the problem?
dunno... the history
repository is at http://github.com/OpenNeuroDatasets/ds000001 and it was in 30cf8f0cf99c9c98ab83ebca8d5c9708b563b2d4 Fri Aug 21 20:24:10 2020 +0000 when initial remote 8d2b6e96-ad81-44a5-99b4-0ec37d6b3800 was added . Actually by then there was already deaa691f-c824-4416-9bf8-a94a47dd31b5 but it was named differently -- s3-PUBLIC-unversioned
. I just now spotted that it is there listed as well -- so multiple names for the same UUID. And multiple UUIDs for the same name... May be OpenNeuro folks could shine more light on this situation.
I would expect some warnings or even some fix by fsck in such a case.
To mitigate I git remove s3-PUBLIC
and then git annex enableremote s3-PUBLIC
and whereis
started to work.
What version of git-annex are you using? On what operating system?
ATM 10.20240430+git26-g5f61667f27-1~ndall+1 but I guess it is unrelated.
I think I have explained what happened here, and the behavior change is enough to prevent the confusing behavior. done --Joey
dang -- doing that dance of remove and re-enableremote actually addressed it (I did feel that we had similar case before) -- it removed duplicate entries in
remote.log
.and looking at https://github.com/OpenNeuroDatasets/ds000001/blob/git-annex/remote.log -- there is no duplicates.
So it feels like it is just a matter of git-annex to be able to somehow trigger fixup for a user without needing to do the whole investigation and dance of remove/re-enableremote? or at least to warn somehow whenever such a situation detected
Multiple names for the same uuid is easy to explain, if they ran
git-annex renameremote
. Anyway, git-annex will use whichever of those configs for that uuid has the latest timestamp. So not really a problem. And when the remote.log gets compacted (as happened when you did "that dance"), the old log entries get removed.Multiple uuids for the same name is also pretty easy to explain: initremote can be run twice with the same name in different clones, and so you then have two remotes upon merging.
git-annex enableremote
does deal with this situation, failing with "Multiple remotes have that name. Either use git-annex renameremote to rename them, or specify the uuid of the remote."Here you didn't use enableremote though, but it autoenabled. Since both remotes have autoenable set, I think what happened was whichever got autoenabled second overwrote the git config of the one that got autoenabled first. Here's how that looks:
Maybe autoenable could somehow handle that case better, but all I can think of is a warning.
Beyond a warning, it would be possible to autoenable both, but use a new name for the second one. Although that could lead to its own problems.
It occurs to me that it's also possible for autoenable of a special remote to overwrite/change the git config of a regular git remote that has the same name. This would be unlikely except in the case of one named "origin", but it could happen, just needs the git remote to have been added before git-annex inits. That seems like a problem that ought to be avoided too.
I've added a warning in these cases, and it will avoid autoenabling a special remote when there is already a remote with the same name.
In the case of two special remotes with the same name that are both set to autoenable, it's essentially random which gets enabled first and so "wins".
Decided against autoenabling it with a different name, because: a) There's the potential that the name it comes up with is actually the name of another special remote that is also due to be autoenabled. b) It seems like potentially confusing behavior for there to be a remote with a different name than that usually used for a particular special remote.