ATM, even with ControlPersist=yes, on a fast interconnection between hosts (so it a millisecond or so to transfer a single file I have there), majority of time is spent I guess on running a new ssh (although with instructions for persistent connection etc) or is there some intentional sleep somewhere (or just output flushed at 1 sec intervals so ts has those consistent subsec offset), which ends up spending up to a second to transfer a single file. I have ~25,000 files so would need about 5 hours, although with direct rsync -e 'ssh' -L it takes total of 2 seconds for those 100MB . Here is the protocol (with ts with subsecond timing):

Aug 04 10:46:02.1438699562 get stimulus/task002/generate/part_1-182x121/000251.jpeg (from origin...)
Aug 04 10:46:02.1438699562 SHA256E-s1650--768d78ad49fc413d178a5cd9407b56bb442f40aaa629b7f608844330c2c4bbf9.jpeg
Aug 04 10:46:02.1438699562 ^M              0   0%    0.00kB/s    0:00:00  ^M          1,650 100%    1.57MB/s    0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:02.1438699562 ok
Aug 04 10:46:02.1438699562 get stimulus/task002/generate/part_1-182x121/000252.jpeg (from origin...)
Aug 04 10:46:02.1438699562 SHA256E-s1662--d156f08ecfcc248aeb01239f5e605f3ac9ed72fa77c54e593a54b1f6a8b3f0f4.jpeg
Aug 04 10:46:02.1438699562 ^M              0   0%    0.00kB/s    0:00:00  ^M          1,662 100%    1.59MB/s    0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:02.1438699562 ok
Aug 04 10:46:02.1438699562 get stimulus/task002/generate/part_1-182x121/000253.jpeg (from origin...)
Aug 04 10:46:02.1438699562 SHA256E-s1673--47562fe25853fe2972678cbaa1ef8e03bad068095d9f8575ba96f8df0a18cff0.jpeg
Aug 04 10:46:02.1438699562 ^M              0   0%    0.00kB/s    0:00:00  ^M          1,673 100%    1.60MB/s    0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:03.1438699563 ok
Aug 04 10:46:03.1438699563 get stimulus/task002/generate/part_1-182x121/000254.jpeg (from origin...)
Aug 04 10:46:03.1438699563 SHA256E-s1675--a3dae03d805040af4a7341479b782342431ee5377713c061e02daa075d188037.jpeg
Aug 04 10:46:03.1438699563 ^M              0   0%    0.00kB/s    0:00:00  ^M          1,675 100%    1.60MB/s    0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:03.1438699563 ok
Aug 04 10:46:03.1438699563 get stimulus/task002/generate/part_1-182x121/000255.jpeg (from origin...)
Aug 04 10:46:03.1438699563 SHA256E-s1674--a6743261238a87f9f3574295665896be2a8f373dd5b400fbf1552fb8d3b8fc66.jpeg
Aug 04 10:46:03.1438699563 ^M              0   0%    0.00kB/s    0:00:00  ^M          1,674 100%    1.60MB/s    0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:04.1438699564 ok
Aug 04 10:46:04.1438699564 get stimulus/task002/generate/part_1-182x121/000256.jpeg (from origin...)
Aug 04 10:46:04.1438699564 SHA256E-s1659--e1bd4829f53b226ad62ebccfbbb1a132d977af930545ade4161b5f9acf2e80b1.jpeg
Aug 04 10:46:04.1438699564 ^M              0   0%    0.00kB/s    0:00:00  ^M          1,659 100%    1.58MB/s    0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:04.1438699564 ok
Aug 04 10:46:04.1438699564 get stimulus/task002/generate/part_1-182x121/000257.jpeg (from origin...)
Aug 04 10:46:04.1438699564 SHA256E-s1663--ae0e673c60dede66c773441ee198fa8979c6de4a169e468fbb10dc22860afb27.jpeg
Aug 04 10:46:04.1438699564 ^M              0   0%    0.00kB/s    0:00:00  ^M          1,663 100%    1.59MB/s    0:00:00 (xfr#1, to-chk=0/1)
Aug 04 10:46:05.1438699565 ok  

both hosts do not show any high CPU load