Please describe the problem.
spent quite some time trying to figure out WTF I got some files not published due to tag being set... managed to reproduce in minimal setup -- if I set metadata to a file, rename a file (mv, git add; or even git mv), and add another file with different content but the same name -- it would obtain metadata of original key/file (unless I commit right after rename)
What steps will reproduce the problem?
see below
What version of git-annex are you using? On what operating system?
6.20170924+gitgd35053009-1~ndall+1
Please provide any additional information below.
$> sudo rm -rf /tmp/repo; mkdir /tmp/repo; cd /tmp/repo; git init; git annex init; echo 1 >| 1; git annex add 1; git commit -m 'added 1'; git annex metadata -s tag=value 1; git mv 1 2; git annex add 2; echo 2>1; git annex add 1; for f in 1 2; do echo "file $f"; ls -l $f; git annex metadata -g tag $f; done
Initialized empty Git repository in /tmp/repo/.git/
init ok
(recording state in git...)
add 1 ok
(recording state in git...)
[master (root-commit) 750d619] added 1
1 file changed, 1 insertion(+)
create mode 120000 1
metadata 1
lastchanged=2017-09-26@17-59-22
tag=value
tag-lastchanged=2017-09-26@17-59-22
ok
(recording state in git...)
add 1 ok
(recording state in git...)
file 1
lrwxrwxrwx 1 yoh yoh 178 Sep 26 13:59 1 -> .git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
value
file 2
lrwxrwxrwx 1 yoh yoh 178 Sep 26 13:59 2 -> .git/annex/objects/2W/V5/SHA256E-s2--4355a46b19d348dc2f57c046f8ef63d4538ebb936000f3c9ee954a27460dd865/SHA256E-s2--4355a46b19d348dc2f57c046f8ef63d4538ebb936000f3c9ee954a27460dd865
value
you can see in above taht 1 and 2 have different content/keys, but they both acquire the same tag=value. If I commit 2 after it being renamed from 1, it is ok.
closing because I added a warning about the metadata copying, and that seems sufficient for users to understand why this happens. --Joey
This is caused by an intentional feature. When
git annex add
is run on a modified file, the old metadata for the file (as committed to HEAD) is copied over. This prevents metadata being lost when modifying a file.The same would happen if the file were deleted and then new content added with the same filename. It seems like different things should be done in these cases, but the same thing can end up staged in several different cases, so the cases can't be distinguished.
Committing the rename or deletion before adding a file back with the same name allows distinguishing from a modification of the file, and so that avoids the problem.
I agree this led to confusing behavior here, but I'll bet it's also prevented confusing loss of metadata for people when editing a file..
Other than the always-popular "make it configurable", I wonder if it would suffice to simply output a note when copying metadata from the (presumed) old version of the file? Then there would be no confusion about why the metadata got set.
Hmm, if the file was not already in the index, that could be taken to indicate it was deleted/moved and replaced, rather than being modified, and so don't copy the metadata.
But that would make these two sequences have different behavior:
As well as these two sequences:
Thanks for the analysis/explaining. My POV, I somewhat got used to the notion "git annex cares about content/keys, and files are mere pointers". That is easy to explain, in particular why two files pointing to the same key would have the same meta-data. So I assumed that no meta-data would be copied over to the new content/key. But I now do see how retaining the meta-data could be useful, e.g. if I unlock/modify/add a file where meta-data should be carried over (e.g. for music could be all the author/album/etc)... But if file was not unlocked, then may be such "copying" shouldn't happen? not sure what to suggest otherwise :-/.
In my specific case in one case (newly assigned meta-data) I could simply workaround by first renaming and then assigning meta-data to the renamed file, but in the other (if meta-data is already assigned for the key), I would need to explicitly reset it then but it seems can't do it without committing since that "copy metadata for the filename" happens upon committing... so not sure on how to workaround without explicitly removing it from those files after committing... heh
Files are not unlocked before modifying in direct mode, and may be unlocked all the time in v6 mode. Also, in indirect mode it's of course fine to overwrite the symlink with a new version of a file. So detecting if it's been unlocked doesn't seem to help with this.
It may be that there are different sorts of metadata, some of which should be inherited by new versions of a file, and others not. If there was a way to tell git-annex which metadata was which, it could do the right thing. But it feels like stacking complications. Particularly since there might be some tags that should be inherited and others not, and tags are values..
In the meantime, I've added the warning when it copies metadata. I also added
git annex metadata --remove-all
, which the warning suggests running if you don't want the copied metadata.