bugs/Copying many files to bup remotes is very slowgit-annexhttp://git-annex.branchable.com/bugs/Copying_many_files_to_bup_remotes_is_very_slow/git-annexikiwiki2022-08-10T12:35:27Zcomment 1http://git-annex.branchable.com/bugs/Copying_many_files_to_bup_remotes_is_very_slow/comment_1_d81f94dfdb1b15b8bc0d9c243ef748db/joey2022-08-08T15:26:34Z2022-08-08T15:24:36Z
<p>I don't think that git-annex is doing anything particularly slow with the
bup special remote. Other than actually running bup, that special remote
should run about as fast as other similar special remotes, like say
the directory special remote.</p>
<p>So, this is probably a performance problem in bup. Now, git-annex does
use bup in an unusual way, running one bup-split per file to store in it.
That was the only way to shoehorn what git-annex needs to do into bup
though.</p>
comment 2http://git-annex.branchable.com/bugs/Copying_many_files_to_bup_remotes_is_very_slow/comment_2_0f5ff10d84450a6df35cb974cdb4739e/Atemu2022-08-09T06:14:05Z2022-08-09T06:14:05Z
<p>Agreed.</p>
<p>I see two potential ways to improve performance:</p>
<ul>
<li>Batching:
Bup can split multiple files at once. If it's given 10, 20, 100 files at a time, the per-split overhead matters less. Batching is something git-annex might need to learn sooner or later anyways because file transfer generally doesn't scale well currently (bup's slowness just exacerbates the problem).</li>
<li>Bup index+save:
Use the same pattern as Borg. Back up the whole git-annex repo at a time and selectively restore in order to <code>get</code>. Not sure this would be a great idea but it should improve performance in my use-case (copy everything).</li>
</ul>
comment 3http://git-annex.branchable.com/bugs/Copying_many_files_to_bup_remotes_is_very_slow/comment_3_7150178b47a15857d5eece0fbe57a971/joey2022-08-09T17:06:53Z2022-08-09T16:28:16Z
<p>While <code>bup split</code> can run on multiple files at once, it concacenates the
files together and stores a single object. That is not useful for git-annex.</p>
<p>Using the borg pattern with bup is a good idea.</p>
comment 4http://git-annex.branchable.com/bugs/Copying_many_files_to_bup_remotes_is_very_slow/comment_4_d2896b544d6d26663b532735c3924134/Atemu2022-08-10T12:35:27Z2022-08-10T12:35:27Z
<p>Indeed it does. I assumed it'd concatenate the files internally for efficiency but is still able to output them separately later but that is not the case. I guess you could store the offsets externally and skip/seek but that'd be inefficient.
Perhaps bup could add an option for batching multiple singular-file splits in one invocation to avoid overhead.</p>
<p>Alternatively, git-annex could index+save using excludes. That could be quite complicated (especially tracking which object exists inside which batches) but it could work.</p>