What steps will reproduce the problem? My annex dir has 23459 files and uses 749MB disk space. Just create a repository put this dir inside, and git-annex will crash.
What is the expected output? What do you see instead? I expect git-annex handles large number of files, and does not watch every single file of it.
What version of git-annex are you using? On what operating system? I'm using git-annex linux build, version 2013.04.17.
Please provide any additional information below.
[2013-04-17 23:52:35 CEST] Transferrer: Downloaded pappas_hu..di_44.jpg
git-annex: runInteractiveProcess: pipe: Too many open files
Committer crashed: lsof: createProcess: resource exhausted (Too many open files)
[2013-04-17 23:53:52 CEST] Committer: warning Committer crashed: lsof: createProcess: resource exhausted (Too many open files)
git-annex: runInteractiveProcess: pipe: Too many open files
git: createProcess: resource exhausted (Too many open files)
DaemonStatus crashed: /home/user/Desktop/down/annex_test/.git/annex/daemon.status.tmp21215: openFile: resource exhausted (Too many open files)
[2013-04-17 23:57:24 CEST] DaemonStatus: warning DaemonStatus crashed: /home/user/Desktop/down/annex_test/.git/annex/daemon.status.tmp21215: openFile: resource exhausted (Too many open files)
git-annex: runInteractiveProcess: pipe: Too many open files
git: createProcess: resource exhausted (Too many open files)
git-annex: runInteractiveProcess: pipe: Too many open files
NetWatcherFallback crashed: git: createProcess: resource exhausted (Too many open files)
[2013-04-18 00:27:17 CEST] NetWatcherFallback: warning NetWatcherFallback crashed: git: createProcess: resource exhausted (Too many open files)
git-annex: runInteractiveProcess: pipe: Too many open files
git-annex: git: createProcess: resource exhausted (Too many open files)
git-annex: accept: resource exhausted (Too many open files)
Instead of raising system's limit (which is a neverending story), can we make git-annex only watch a directory and not every file of it?
Or could the user specify some directory which he knows it is rarely change, to not be watched only check it once a day?
The best would be if git annex could automatically adapt itself. Ie. it watches eg. 200 files, and if some of it does not change for three days, then it drops from the watching basket, and those who changed (noticed while sanity checked) it adds to the basket.
I don't really want to raise the ulimit, because my ultimate goal is to have git-annex on multiple raspberry pi with external harddrive (one at my home, one at my mom's home, one at my friends home, etc, etc). And raspberry is fairly low on resource.
I'm interested in your thoughts.
Best, Laszlo
I have tried repeatedly to reproduce this problem, and I cannot.
git-annex does not keep every file open. It tends to have less than 10 open file descriptors at any one time.
I thought perhaps
lsof
opened every file, but it does not seem to, either.So far, I have no indication that the problem had to do with git-annex at all. If some other program on the system opened a great many files, it could cause this to happen to git-annex.
You pasted a debug log that shows that the problem persisted for several minutes. So you should make it happen again, and in that time period, investigate what program has so many files open. You can do this with lsof, or, if lsof won't run, by looking in /proc/$pid/fd/
Or, of course, give me enough information to reproduce the problem. "I have 23459 files" isn't much help..
On openSUSE 12.3 with this version (which I'm sure is horribly old):
I ran 'git annex get' on a large repository, and got this:
Then I ran it again and saw that after every file retrieved, git-annex leaks another lockfile. lsof shows an ever increasing number of files like this:
Hmm, Adam your version is older than the bug reporter's version. OTOH, while there were several FD leak fixes after your version, none of them were to Annex.LockPool, which is what's used for the ssh lock files.
I can't reproduce it with
git annex get
and the current release.. can you?At some point during a large copy, there's an ever increasing number of pipes in /proc/git-annex-pid/fd As soon as it hits the limit (1023 in my case), copies start failing
with every
The number of open fd's by git-annex increases by 1.
4.20130802 built with cabal on Ubuntu 13.04
It also looks like the location log has got corrupted (files are actually present, but not recorded in the location log) somewhere along the lines as I was trying to get/drop to figure out what's going on. Explicitly dropping files then getting files fixes the location log issue.
@Michael how large a copy are you doing? And what kind of remote are you copying the files to? It would be helpful if you could be more specific about something I could do to reproduce the problem. Without a test case, I am unlikely to fix the bug. With a test case, I'd be surprised if it took long to fix it.
If you have a process running that is experiencing the problem, you can also narrow it down a lot by looking at what these leaking pipe file descriptors are pipes to. For example, if you have:
lr-x------ 1 michael michael 64 Aug 10 20:14 895 -> pipe:[2251602]
You can run
find /proc/ -ls 2251602
and find the process at other end of the pipe, and look its pid up in ps to see what command it is.@Joey: it was a "pretty large" transfer, several hundred gigabytes in perhaps ~100000 files. The copying was going to a GPG-encrypted directory remote. The error only happened once or twice so far. Point taken about find in /proc; I'll do that if it happens next time.
I now understand the problem described in comment 6, where once it started failing, it would leak one file descriptor per failure.
I think that failure mode was fixed by accident in the changes in 2fd63f3cfac705f0a18f4bcbe0489ce8ea1800d7.
This doesn't explain what would open so many files to get it into that failure mode, however.
This is happening to me now about twice per day. I've got a repo that is completely synced already and I'm working on a project in IntelliJ. When I check git-annex I often see this message when it is doing a startup scan or consistency check.
The message can be seen here: http://imgur.com/Xb4LA73
Are there logs somewhere that I can gather and supply to you to help track this down? I'm using the 2014-01-07 release of git-annex for Mac OS.