forum/Using hashdirlower layout for S3 special remotegit-annexhttp://git-annex.branchable.com/forum/Using_hashdirlower_layout_for_S3_special_remote/git-annexikiwiki2020-06-17T01:18:32Zcomment 1http://git-annex.branchable.com/forum/Using_hashdirlower_layout_for_S3_special_remote/comment_1_ff8c84af9d08dfe7d8e502d1b84732f4/TroisSinges2020-06-17T01:18:32Z2019-09-02T22:19:04Z
Hmmm, this is strange, because according to <a href="http://git-annex.branchable.com/special_remotes/S3/#comment-e9d48ba37aa5b01c236ab894057660b7">http://git-annex.branchable.com/special_remotes/S3/#comment-e9d48ba37aa5b01c236ab894057660b7</a>, files should be "stored in a hashed directory structure with the names of their key used". This isn't the case for me, using S3 special remote for Wasabi.
comment 2http://git-annex.branchable.com/forum/Using_hashdirlower_layout_for_S3_special_remote/comment_2_9f22f7f14fab759caa1ddecda435311c/joey2020-06-17T01:18:32Z2019-09-05T15:29:57Z
<p>That comment was just bad wording or possibly I conflated S3 with how other
remotes that do use hash directories work. I've corrected the comment.</p>
<p>According to Amazon's documentation, S3 does not have a concept of
directories; "foo/bar" and "foo_bar" and "foo\bar" are all just opaque
strings as far as it's concerned. So I don't see any point in using hash
directories with S3.</p>
directories on S3http://git-annex.branchable.com/forum/Using_hashdirlower_layout_for_S3_special_remote/comment_3_d90ee84a9b0cf190961d92d089dbf7a6/Ilya_Shlyakhter2020-06-17T01:18:32Z2019-09-05T18:40:20Z
"S3 does not have a concept of directories; "foo/bar" and "foo_bar" and "foo\bar" are all just opaque strings as far as it's concerned" -- just to note: (1) AWS <a href="https://aws.amazon.com/cli/">CLI</a> commands like <code>aws s3 ls</code> and <code>aws s3 cp</code> do have the concept of directories; (2) An <a href="https://www.sumologic.com/insight/10-things-might-not-know-using-s3/">S3 tips page</a> says that "latency on S3 operations depends on key names since prefix similarities become a bottleneck at more than about 100 requests per second. If you have need for high volumes of operations, it is essential to consider naming schemes with more variability at the beginning of the key names, like alphanumeric or hex hash codes in the first 6 to 8 characters, to avoid internal “hot spots” within S3 infrastructure."
comment 4http://git-annex.branchable.com/forum/Using_hashdirlower_layout_for_S3_special_remote/comment_4_5aa6dff353af7beffdd56652d6fc6d3e/TroisSinges2020-06-17T01:18:32Z2019-09-05T19:12:30Z
<p>Thanks Joey for your answer. However, even if directories are indeed totally virtual in S3, file paths are split using "/" characters when using S3 web console, for example (it's the same for Wasabi, for example). It's quite convenient!</p>
<p>Moreover, some S3-compatible services do create directories using "/" delimiters.</p>
comment 5http://git-annex.branchable.com/forum/Using_hashdirlower_layout_for_S3_special_remote/comment_5_79519a4c0fc25d41226a7048cd3a8306/joey2020-06-17T01:18:32Z2019-09-06T17:53:00Z
<p>The stuff @Ilya found about prefix similaries causing bottlenecks in S3
infra is interesting. git-annex keys have a common prefix, and have
the hash at the end. So it could have a performance impact.</p>
<p>But that info also seems out of date when it talks about 6-8 characters
prefix length. And the rate limit has also been raised significantly, to
3000-5000 ops/sec per prefix. See
<a href="https://stackoverflow.com/questions/52443839/s3-what-exactly-is-a-prefix-and-what-ratelimits-apply">https://stackoverflow.com/questions/52443839/s3-what-exactly-is-a-prefix-and-what-ratelimits-apply</a></p>
<p>From that, it seems S3 does actually treat '/' as a prefix delimiter.
(Based on a single not very clear email from Amazon support and not
documented anywhere else..) So a single level of hash "directories"
could increase the rate limit accordingly.</p>
<p>If a git-annex exceeded those rate limits, it would start getting 503
responses from S3, so it wouldn't slow down but would instead fail whatever
operation it was doing. I can't recall anyone complaining of 503's from
S3.</p>