Recent comments posted to this site:

I'm seeing larger improvements in my repo: ~40% speedup with -J2 and even ~200% speedup without jobs. Good work!
Comment by Lukey Mon Jul 6 21:20:58 2020

Changes to Utility.CoProcess for async exception safety introduced some very strange behavior and had to be reverted in d66fc1a4646e354a8ae2514183b3d45a20ae8681.

Comment by joey Mon Jul 6 19:11:14 2020

Cache brought back in e72ec8b9b23346c4b741a071e9edae4460b233c9.

Benchmarking git-annex sync --content (w/o --all) with 10k files, there was very little change, it's only 0.4% faster. With --all, it's 20% faster.

That's kind of weird, because w/o --all it was doing 3 redundant queries, and --all only adds one more. The first result makes me think that a) git cat-file is doing its own caching or takes good advantage of disk cache and b) roundtrip time communicating with it does not seem significant so --buffer would not be much of an improvement. But the second result muddies the waters, and I guess it would still be worth trying the --buffer optimisation.

Comment by joey Mon Jul 6 16:23:07 2020

Nice trick with the %rest!

This looks like it could work. However, here the same command without --buffer is almost the same speed (0:2:16 vs 0:2:11, nearly lost in the noise). If there's any real benefit here, it seems it would be in keeping git cat-file more active, avoiding round trips in sending queries.

I was thinking it would be hard to implement because all location log queries would need to be changed to use this different source of the information. But, I think it can be finessed by having a small cache of contents of annex branch files, and prepopulate that cache with information about each key as it's read in from git cat-file.

Actually, there used to be a git-annex branch cache like that, caching just the most recently read file, but it was removed in 3417c55189275d038bc445fe3ef71090d518e79e.

I had forgotten that was removed, and it also seems possible that bringing that cache back would improve perf generally, because I think there are probably situations where eg the location log is looked at repeatedly.

.. In fact, loggedKeys calls checkDead on each key, so that's one extra location log lookup right there! Indeed, instrumenting getRef, I see sync --content getting the same location log 3x per key w/o --all, so probably 4x with --all!

Comment by joey Mon Jul 6 14:51:24 2020

Transfer repositories unfortunately do not behave that way. You can setup G1 to get content sitting in T1 since T1 will store content until G1 has it. But, in terms of the other direction, doing a git annex get from T1 won't ask SRV1 to get a file from G1 (I don't think).

The easiest way I can think of to achieve what you are looking for would be to create your own special remote on SRV1 (lets call it P1) that proxies the requests the way you need. Using hook you could create a special remote that when asked for a file, ssh-es into SRV1 and then runs a script on SRV1 that does a git annex get on G1. Then, a few hours later, you can do a get again to get the file.

I think a reasonable workaround, that ensures some measure of safety would be to just create some read-only credentials for your Glacier. Then add your Glacier repo to all the clients, so you can at least do gets from your clients.

—Andrew

Comment by andrew Sun Jul 5 17:25:49 2020
so for some reason the problem fixed itself. chances are that I am just a noob at how SSL stuff works. If anyone could provide some possible insights as to why this might have happened so that I can learn from it, that would be much appreciated.
Comment by jenkin.schibel Sat Jul 4 03:28:26 2020
so for some reason the problem fixed itself. chances are that I am just a noob at how SSL stuff works. If anyone could provide some possible insights as to why this might have happened so that I can learn from it, that would be much appreciated.
Comment by jenkin.schibel Sat Jul 4 03:27:58 2020
"the key generated by import --fast is probably not be the same one generated by a regular import" -- but that happens already with addurl; is the problem worse here?
Comment by Ilya_Shlyakhter Fri Jul 3 19:55:36 2020

implemented, directory remote only, but it could be added to adb easily, and possibly to S3. Also added it to the proposed import extension to the external special remote protocol.

Still unsure what to do about git-annex sync without --content importing. For now, sync doesn't do content-less imports still, but that could be changed if the concerns in comment #6 are dealt with.

Comment by joey Fri Jul 3 18:29:05 2020

Note that, since exporttree remotes are always untrusted, after importing --no-content from one, fsck is going to complain about it being the only location with the content.

Which seems right.. That content could be overwritten at any time and the only copy lost. But still worth keeping in mind.

Comment by joey Fri Jul 3 17:39:19 2020