s3 vs ssh Performance Problems

I backup/sync my music using annex. I used to have one repo with 3 clones, one full copy on my vps, one full copy and one partial copy on my laptops. I decided to move all data to s3. Declared my vps repo dead, purged history (I do not care about history for this particular repo) pushed the git repo to a different computer (bare repo no data) and data to s3 (gpg encrypted). I've just finished uploading all files (around 200gb) couple files failed during the initial upload so I did a git annex copy --quiet --to mys3 --fast this command used to take 15 20 secs on my laptop when sending data to vps using ssh but now it took 2 hours to complete (pushing mem usage to 90%).

I have one other repo (1.5gb in size 42k files using indirect mode, data gpg encrypted on s3) using the same setup except this repos content has always been on s3 i had the same behavior on this repo too. adding a file and running a copy to push content to s3 took couple hours even if I add a single 1kb file. I used to blame my hard drive, damn thing is slow but now I think this is related to s3. is there any workaround for this?

Both machines are using,

git-annex version: 4.20130913-gd20a4f2
build flags: Assistant Webapp Pairing Testsuite S3 WebDAV Inotify DBus XMPP Feeds Quvi

RSS Atom

comment 1

git annex copy --not --in mys3 --to mys3 .

any faster?

Comment by Justin — Thu Nov 21 04:09:15 2013

Remove comment

comment 2

No difference. Both take roughly the same time but I did time a couple of runs with both commands they usually take 3 to 5 minutes to complete. Maybe git did something behind the scenes like gc? But still slower than it used to be. My other repo (one with 42k files) still takes a 2 hours.

Comment by Hamza — Thu Nov 21 17:47:57 2013

Remove comment

comment 3

Well it's not related to s3... that copy command won't even do any network traffic if there is nothing to copy. I have a similarly configured annex with 4500 files and that command takes 10 seconds to run.

I do remember there being a recent fix that reduced the algorithmic complexity of an operation, but I forget which.

Comment by Justin — Fri Nov 22 20:21:11 2013

Remove comment

comment 4

You mentioned something about high memory usage when copying. How much memory are we talking about?

Have you run git annex forget in this repository before? It kind of sounds like you have, and it might be possible that it's repeatedly trying to forget old history for some reason.

Comment by joeyh.name — Thu Dec 12 20:14:01 2013

Remove comment

Add a comment