I backup/sync my music using annex. I used to have one repo with 3 clones, one full copy on my vps, one full copy and one partial copy on my laptops. I decided to move all data to s3. Declared my vps repo dead, purged history (I do not care about history for this particular repo) pushed the git repo to a different computer (bare repo no data) and data to s3 (gpg encrypted). I've just finished uploading all files (around 200gb) couple files failed during the initial upload so I did a git annex copy --quiet --to mys3 --fast
this command used to take 15 20 secs on my laptop when sending data to vps using ssh but now it took 2 hours to complete (pushing mem usage to 90%).
I have one other repo (1.5gb in size 42k files using indirect mode, data gpg encrypted on s3) using the same setup except this repos content has always been on s3 i had the same behavior on this repo too. adding a file and running a copy to push content to s3 took couple hours even if I add a single 1kb file. I used to blame my hard drive, damn thing is slow but now I think this is related to s3. is there any workaround for this?
Both machines are using,
git-annex version: 4.20130913-gd20a4f2
build flags: Assistant Webapp Pairing Testsuite S3 WebDAV Inotify DBus XMPP Feeds Quvi
Is
any faster?
Well it's not related to s3... that copy command won't even do any network traffic if there is nothing to copy. I have a similarly configured annex with 4500 files and that command takes 10 seconds to run.
I do remember there being a recent fix that reduced the algorithmic complexity of an operation, but I forget which.
You mentioned something about high memory usage when copying. How much memory are we talking about?
Have you run
git annex forget
in this repository before? It kind of sounds like you have, and it might be possible that it's repeatedly trying to forget old history for some reason.