git-annex-importgit-annexhttp://git-annex.branchable.com/git-annex-import/git-annexikiwiki2021-03-02T22:09:45Zimport from special directory remote fails due to running out of memory http://git-annex.branchable.com/git-annex-import/comment_1_8961635a772b4ddb5ba1e04a50034e6a/georg.schnabel2021-03-02T22:09:45Z2021-02-17T14:29:49Z
<p>First of all, git annex is an awesome tool, I like it very much!</p>
<p>When trying to <code>git annex import</code> from a special directory remote with a large number of files (~4 millions) with a cumulative size of about 1TB, git annex takes up all main memory during the final update remote/ref step on a machine with 16G of main memory and is then killed by the system. This also happens when supplying the <code>--no-content</code> option. Is there a way to make git annex less memory demanding when importing from a special directory remote with a large number of files?</p>
comment 2http://git-annex.branchable.com/git-annex-import/comment_2_e62b59d5c9cee04a77c51799624a3357/Lukey2021-03-02T22:09:45Z2021-02-19T07:16:25Z
Yes, just <code>cp</code> or <code>mv</code> the files inside the repo and <code>git annex add</code> them as usual.
re: import from special directory remote fails due to running out of memoryhttp://git-annex.branchable.com/git-annex-import/comment_3_3891ccc35010e1e5c328a9aa6b9e596b/joey2021-03-02T22:09:45Z2021-02-22T16:13:32Z
<p>Importing from special remotes necessarily needs to hold the list of files
in memory, or at least it seems like it would be hard to get it to stream
over them. So there may be some way to decrease the memory use per file
(currently 4.2 kb per file according to your numbers), possibly by around
50%, but it would still scale with the number of files. The whole import
interface would need to change to use trees to avoid that.
It would be ok to file a bug report about this.</p>
<p>The legacy directory import interface avoids such problems.</p>