Please describe the problem.
Leads to a failure of 'git commit' upon attempt to commit a file which went from "largefile" to small, according to .gitattributes settings, if we git annex add file
before committing the change.
What version of git-annex are you using? On what operating system?
6.20180720+gitg03978571f-1~ndall+1
Please provide any additional information below.
Here is a full script to reproduce it:
#!/bin/bash
set -ex
builtin cd /tmp;
if [ -e /tmp/repo ]; then
chmod -R +w /tmp/repo;
rm -rf /tmp/repo;
fi
mkdir /tmp/repo;
cd /tmp/repo;
git init;
git annex init;
echo '* annex.largefiles=(largerthan=5b)' >.gitattributes;
git add .gitattributes;
git commit -m 'added .gitattri';
echo 123456 > file;
git annex add file;
git commit -m add1;
ls -l;
git annex unlock file;
echo 123 >| file
git annex add file
# this would work but commit to git-annex, not git despite .gitattributes settings
# git commit -m edit -a
# This one would fail to commit at all, complaining about "partial commit"
git commit -m edit file
ls -l file;
git status
which leads to
...
+ git annex add file
add file (non-large file; adding content to git repository) ok
add file (non-large file; adding content to git repository) ok
(recording state in git...)
+ git commit -m edit file
git-annex: Cannot make a partial commit with unlocked annexed files. You should `git annex add` the files you want to commit, and then run git commit.
additional observations:
- works fine if remains large file (e.g. we just append to it)
- does not fail if we do
git commit -a
notgit commit file
, but it commits it to annex not to git, despite previousgit annex add
message rightfully says that "non-large file; adding content to git repository"
Expected behavior:
- have consistent behavior between commit -a
and commit file
- commit without a failure, committing to git (since .gitattributes instructs so and even git annex add
reports that)
I take this as not being a bug about the partial commit blocking (as explained in adc5ca70a8095a389273e7c286cb32de6873a5a3), which is working around a git behavior and so can't be fixed other than by going to v6.
Instead, I think this is a bug about git annex add of an unlocked file not converting it to a in-git file when annex.largefiles says it ought to. If it did that it would not run into the partial commit blocking at all. And, the observersion about git commit -a committing to the annex not to git points at the same problem.
The double output from
git-annex add file
is also some kind of minor bug.The double output seems to have the same root cause too: The file is left typechanged form by the first pass of the add, and so the second pass sees it again. When annex.largefiles lets the file be annexed, the doubled output does not occur.
So the root problem is that when we have a typechanged file and want to convert that to be not typechanged, we have to git commit it. As long as the previous commit is a symlink and the file in the index is not, the file will be typechanged by definition.
When git-annex add runs
git add file
, it's doing the only thing it can do, but it leaves the file typechanged, and so git-annex later has no way to tell that this file is not supposed to be treated as an unlocked file. I don't think we wantgit annex add
to commit the file. That would be very surprising behavior!What git-annex could do is have the pre-commit hook notice that the file doesn't match annex.largefiles and not re-annex it, allowing the typechange to get committed to git. Then the user would only need to unlock the file, modify it to make it non-large, and commit it to get it checked into git.
In a way, this is too easy, because if the user sees that working, they may expect to be able to turn a small file back into an annexed file by making the content large and running git commit on it w/o git-annex add. Which would be bad because that would commit a large file to git. I suppose the pre-commit could handle that too, but imagine that replacing eg a
configure
script that's expected to be shipped in the git repository with an annex symlink, which would be surprising.So it may be better to keep the conversion from annexed to in-git file and back explicit. This could be done by
git annex add
detecting this situation and erroring out with a message that suggests runninggit commit -n
if the user wants to change the annexed file to a in-git file. That bypasses the pre-commit hook, so the typechange gets committed to git as they desire.Which is better, the implicit conversion of the explicit? I am not sure, but lean toward the explicit since it doesn't have this potential to confuse users. Also, the implicit conversion would only work when annex.largefiles is being used, but the explicit conversion can be done irregardless.
The explicit paths would be:
Seems worth documenting somewhere. Or making a command that handles these conversions, but the largen and smallen steps being manual, and the possibility to combine multiple of these into a single commit argues against a conversion command.
I will place implementation and possible tech difficulties aside for now. I am afraid that here and there we (well, me?) indeed wanted to see two conflicting behaviors somehow happen. On one hand (in there) we would like to keep the file initially committed to git under git, regardless what .gitattributes instructs. On the other, here I expected file to automagically jump between git and annex depending on
.gitattributes
. So, rather than explicit "to git" or "to annex" you outlined, to me the question sounds more like "retain the same storage (git or annex) as before" or "possibly perform conversion according to .gitattributes". And I see usecases where for some files (directories, e.g..datalad/metadata
) we would like to see one strategy (auto-conversion) and for the others (default?) the other (maintain git/annex). Given that in v6 there would only begit add
(so no explicitgit
vsgit annex
add), and that-n
forgit add
is a flag I was not even aware about, may be it is better to think about being able to explicitly set some additional gitattribute to allow (or disallow?) the conversion for given files, and then have consistent user-levelgit annex add
(and in v6git add
) which would perform necessary actions across provided files according tolargefiles
and that additional attribute value to decide on the destiny of the file?Documented both conversions in https://git-annex.branchable.com/tips/largefiles
In v6,
git add
rather thangit annex add
will also add it to the annex, given annex.largefiles setting. Of course,git annex add
can also be used in v6 mode too. And-n
is a flag togit commit
, notgit add
.Let's please not entangle this bug with that other bug. Unless your goal is that I merge them. Bear in mind that I consider the other bug a snake pit, and probably should have closed it as utterly useless some time ago.a
(Maybe that was uncharitable, but the other bug seems pretty well blocked on a complete reimplementation of v6 mode leading to a v6 mode that is not experimental, and entangling this bug into that does not seem wise.)
Sure! I just (probably erroneously) felt that they stem from the same point of absent clear "semantic" on either conversion should happen or not. I am yet to fully digest what you are suggesting, and either and how we should address for this at datalad level, but meanwhile FWIW:
adding
-n
to thecommit
(and not toadd
) is as uncommon to me in my daily use of git/git-annex, and I hope that I would never have to use it while performing regular "annex unlock file(s); annex add file(s); commit file(s)" sequence in order to maintain a file(s) under annex.either a file
smallen
according to git-annex/largefiles setting is unknown to the user (or some higher level tool using git-annex as datalad) without explicitly checking (not even sure yet how) or doinggit annex add
-ing it/them and seeing either it would now be added to git whenever it was added to annex before. So hopefully we do not need to do that either.That other bug is closed now, and let to 10138056dc240ad4265a4c473a32d4c715bbb629.
So in v6, when annex.largefiles is not configured for a file, git add of a modification to the file will store it however the file was stored before. But when annex.largefiles is configured, it overrides that default.
That didn't change the behavior of git-annex add, but we could consider doing so. Although I don't know if it's technically possible to support that behavior in v5 mode, due to the difficulties discussed in comment #3 above.
IIRC this involved v5 repositories and the pre-commit hook's hacks to make v5 unlocked files work. Stuff that was one of the motivations for v7.
I tried the script in a v7 repository, and the git commit of the file succeeded, no "partial commit" complaint. So I think the original bug is fixed.