This is an upstream resubmission of Debian bug #959506.
Please describe the problem.
As of v8, git-annex divides files into “large” and “non-large” files, the
former of which are supposed to be automatically added to the annex and the
latter to vanilla git when running git annex add
. git-annex uses the
configuration option annex.largefiles
, a file matching expression, to
categorize files as “large”; all other files end up as “non-large”.
Furthermore, git-annex always treats “dotfiles” as “non-large”, without
consulting annex.largefiles
. Setting the configuration option
annex.dotfiles
(false by default) makes git-annex use annex.largefiles
to also categorize “dotfiles”.
The manual never defines which files are considered “dotfiles”, therefore
I am assuming a definition of “a dotfile is a non-directory file whose
basename begins with an ASCII period”. git-annex however will treat any file
in any directory as a dotfile – i.e., it will ignore annex.largefiles
and
always add the file to vanilla git, unless annex.dotfiles
is set to true
– as long as the relative pathname to the file begins with an ASCII period,
e.g. .foo/bar.txt
(which is not a dotfile according to the assumed
definition above). git-annex will further cease treating the same file as
a dotfile if the relative pathname no longer begins with an ASCII period,
e.g. because the working directory has been changed.
I expect git-annex to distinguish between dotfiles and non-dotfiles solely
by looking at the file's basename, even if the relative path to the file
begins with a dot. I also expect annex.dotfiles
to have no influence
whatsoever on files whose basename doesn't begin with an ASCII period, even
if the containing directory does. git-annex's actual behavior is highly
counter-intuitive to the notion that being a dotfile is a property of the
file's (base-)name. Due to the lack of definition of dotfiles in the manual,
it is unclear to me whether this is intended (but in my opinion quirky)
behavior, or rather a bug.
What steps will reproduce the problem?
Set up a repository for use with git-annex. Leave
annex.dotfiles
at its default value (false
), andannex.largefiles
unset.Create files
bar1
,bar2
andbar3
within a directory.foo
. What matters is that.foo
begins with a period,bar1
etc. don't.Run
git annex add .foo/bar1
. git-annex will have forced the file into vanilla git as a “non-large” file, because it is recognized as a “dotfile”. Butbar1
is not a dotfile because it does not begin with a period.(Optional.) Set
annex.dotfiles
totrue
and rungit annex add .foo/bar2
. The file is added to the annex. In conjunction with step 3, this shows that git-annex really does apply theannex.dotfiles
setting to files such as.foo/bar2
, even if they aren't “dotfiles” because the file basenames don't start with a period.(Optional.) Set
annex.dotfiles
back to false, change directory to.foo
, then rungit annex add bar3
. The file will be added to the annex, as it is no longer recognized as a dotfile. This shows that git-annex's behavior is inconsistent: the same file is either seen as a dotfile or not, depending on which directory git-annex is run from and what the resulting relative pathnames look like.
What version of git-annex are you using? On what operating system?
git-annex v8.20200330, on Debian sid, as of 2020-05-08
Please provide any additional information below.
$ cd /tmp/git-annex-dotfiles
$ git init
Initialized empty Git repository in /tmp/git-annex-dotfiles/.git/
$ git annex init
init (scanning for unlocked files...)
ok
(recording state in git...)
$ mkdir .foo
$ echo a > .foo/bar1
$ echo b > .foo/bar2
$ echo c > .foo/bar3
$ git annex add .foo/bar1 # I expect this to be added to the annex, but no
add .foo/bar1 (non-large file; adding content to git repository) ok
(recording state in git...)
$ git config annex.dotfiles true
$ git annex add .foo/bar2 # clearly affected by annex.dotfiles
add .foo/bar2
ok
(recording state in git...)
$ git config annex.dotfiles false
$ cd .foo
$ git annex add bar3 # clearly affected by the exact relative pathname
add bar3
ok
(recording state in git...)
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes. git-annex has been in use on my end for a couple of years, and it is my go-to solution for “want something versioned, but can't store the contents themselves (too big, too sensitive, etc.)?”. Furthermore, git-annex documentation in general is excellent. But that is also why I'm stumped that the manual is so silent on this point.
fixed by resolving inconsistent behavior. Also improved documentation to be clear that dot directories are treated same as dotfiles.
I think it's quite likely that most people consider files in dotdirs to be dotfiles, most of the time. (.git/index is clearly not a dotfile, .git/config probably is) The exact semantics of it are vague enough that it's probably better to not consider them when it comes to this bug report.
The actual bug is not about whether .foo/bar is a dotfile, but about inconsistent behavior adding it.
Avoiding treating them as dotfiles, even if they're broadly understood as such would resolve the inconsistency.
Otoh, the inconsistency only arises when run inside a dot directory, which is probably a fairly rare thing to do.
I'm also reluctant to start another behavior change in this area, there has been more than enough drama around dotfile handling recently.
At least the behavior change would only result in small files that users want to store in git being annexed, rather than large files being unexpectedly put in git.
It would also be possible for users to get back the current behavior if desired by configuring annex.dotfiles and annex.largefiles.
Also as far as the priority of this goes, I think that the number of dotdirs that contain files that get version controlled at all is probably quite small, excluding version controlling of HOME.
OP here. Sorry for the radio silence.
(Emphasis mine.) Given the points above, which I agree with (1) or accept (2 and 3), I'm perfectly fine with documenting the current behavior as correct and leaving it at that. My confusion stems from the fact that when I see
.foo/bar
, I say “no dotfile, because the file is actually named ‘bar’”, and git-annex says “dotfile, because the pathname expression begins with a dot” (or something like that) instead. Stating the exact rules git-annex uses to classify dotfiles as such, ideally alongside the documentation forannex.dotfiles
andannex.largefiles
, would avoid this confusion.I think this assessment changes with the advent of containers. I use git-annex to version-control a research computation which runs inside a container. The folder is mounted as the home directory in that container, so python modules, self-compiled software etc. go into .local, configuration of tools goes into .config, etc. All of these are not annexed but inflate the repository size and might lead to inadverent sharing of information. It makes git-annexes behaviour difficult to explain to others with this exception of dotfile handling. Why can't git-annex just handle the
.git
folder differently and for all others just annex or not as set in the largefile rules?Because creating a .gitignore followed by
git-annex add
would then blow the user's foot off. And this would be a very common foot-shooting opportunity, and .gitignore is only the perhaps most common trigger for it.Files in dot directories are generally less common, outside of course of .git and $HOME. Which is the only reason I'm willing to consider changing the dotfiles handling to not include those.
But, .config/ seems to me to perfectly match what dotfiles are, which is files that are configuration that are named with a name starting with a dot in order to keep them from cluttering up
ls
. Just because in your use case you don't want to check those into git as dotfiles does not seem like a good argument for git-annex to not treat them as dotfiles by default.Revisiting this, it seems best to fix the inconsistent behavior by having git-annex get the path to the file relative to the top of the git repository, and check if there's a dot directory in the path.
@lell, would you be so kind to point to me to "formalization" you use for your containers/configuration layout? "I like containers too", and in neuro* domain we are formalizing layout of data etc on the filesystem in BIDS standard. Moreover we are trying to formalize at the level of the "entire project", see e.g. this issue with examples. I would be interested to learn what/how you do it. Feel welcome to reach out directly to e.g.
debian AT oneukrainian.com
. Cheers,