Please describe the problem.
I'm starting to slowly migrate my personal data collection (530GB 3.7M files) under git-annex. I'm going piece by piece, and not yet giving up my other synchronization methods (yet); thus I need to stay in direct mode.
I initially found that git annex
commands were quite slow, but I was able to address that by adding all my "not yet included" directories to .gitignore
at the root of the working tree. Unfortunately, git annex proxy
remains super slow, because I notice that it does not include --exclude-standard
in its calls to ls-files, and thus does not respect .gitignore
. Here's an example from the --debug
log:
read: git ["--git-dir=../../../../../.git","--work-tree=../../../../..","--literal-pathspecs","-c","core.bare=false","ls-files","--others","-z","--","../../../../.."]
As a result, I was very shocked to learn that (25 min later), git annex proxy
was still setting up, and had duplicated 140GB of untracked files!
My end goal is actually just to add files directly to the git repo, bypassing the annex, in spite of being in direct mode. (I can do this with the largefiles attribute, but I'd like to be able to control it directly irrespective of size.)
What steps will reproduce the problem?
git annex proxy --debug -- git commit myfile -m foo
What version of git-annex are you using? On what operating system?
Version 6.20170520 on Mac OS 10.13.4.
Please provide any additional information below.
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Today is my first day trying it out! It's fabulous so far, but I'm at the beginning of the learning curve.
Yay, git-annex proxy is deprecated and just passes through to git, since direct mode was eliminated. done --Joey
It wouldn't have actually copied 140 gb of files, unless you're using git-annex on a filesystem that does not support hard links. If it used hard links, it would not waste much space while running.
There may be edge cases where, if git-annex proxy did not copy/hard link ignored files from the work tree to its temporary directory, the proxied git command would not behave the same as an unproxed git command.
Let's see, such edge cases would have to involve a gitignored file that is still somehow affected by the proxied git command.
The obvious case is, you have
.*
gitignored, and you rungit annex proxy -- git add .foo --force
to add the ignored file. If git-annex didn't copy.foo
, that would fail, albeit in a fairly obvious way.Another problem case: You have
.*
gitignored, and you have a local file.foo
which is not checked in. You rungit annex proxy -- git merge branch
, and the branch happens to add.foo
with different contents. The merge would normally fail, because there are conflicting changes in the working tree. If proxy were changed, the proxied merge would succeed. The local changes in this case get lost. I've verified that this change causes data loss in this situation.So, the current behavior is the safe and right behavior; git-annex should not lose data by default to optimise for an unusual edge case.
It could be an option, but it would have to be flagged as causing data loss in some situations involving local modifications to gitignored files, and causing proxied git behavior to differ from non-proxied git behavior in other situations. I don't know if the potential benefit is worth the foot-gun potential.
The code change is very simple if you want to play with it. In Command/Proxy.hs find the Git.LsFiles line and change "True" to "False".