internals/hashinggit-annexhttp://git-annex.branchable.com/internals/hashing/git-annexikiwiki2015-03-22T22:38:54Zcomment 1http://git-annex.branchable.com/internals/hashing/comment_1_9153e4f4f9335e524cf1b96a51bef41f/Péter2014-01-31T00:45:48Z2014-01-31T00:45:47Z
<p>The correct old hash value for the empty file SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is pX/ZJ .</p>
<p>The text describes the old hash value computation incorrectly, because it doesn't mention that 1 bit is skipped between each group of 5 bits. See the sample implementation in display_32bits_as_dir in https://github.com/joeyh/git-annex/blob/master/Locations.hs</p>
comment 2http://git-annex.branchable.com/internals/hashing/comment_2_086ea37acf15e2a8694b8386222b73f6/Yaroslav2014-12-04T20:26:47Z2014-12-04T20:26:47Z
<p>1c to support Péter's statement:</p>
<pre><code>$> git annex examinekey --format='${hashdirmixed}' "SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
pX/ZJ/%
</code></pre>
any particular reason for the chosen characters for base32 encodinghttp://git-annex.branchable.com/internals/hashing/comment_3_19b7d20ca392078f14f9f10992f288ec/josch2015-01-31T17:13:57Z2015-01-31T17:13:57Z
are the characters "0123456789zqjxkmvwgpfZQJXKMVWGPF" chosen randomly for the base32 encoding or was there a reason to choose exactly these?
comment 4http://git-annex.branchable.com/internals/hashing/comment_4_7642d6ce5fd4d37d464b05d0b4f869c6/joey2015-02-04T19:01:25Z2015-02-04T17:14:24Z
<p>The only reason for the letter choice is that it avoids making random
words with possibly unintentional meanings..</p>
why md5sum?http://git-annex.branchable.com/internals/hashing/comment_5_b0cb207a85cda5a0ff2ea71caca22c0d/anarcat [id.koumbit.net]2015-02-13T15:59:46Z2015-02-13T15:59:46Z
<p>why the extra processing to generate the hashing directories?</p>
<p>we already have a hash here, for example, <code>SHA256E-s8242375--5f82490990812ad3feabb02355750710a9d94283ab256d1c691c3bf8d7d9fbe3.ogg</code> has a loon <code>5f82490990812ad3feabb02355750710a9d94283ab256d1c691c3bf8d7d9fbe3</code> hash. Why not use the first characters of that? This is will not change for a give file, and has a higher chance of generating collisions (which is a good thing here, because we can reuse directories).</p>
<p>In other words, why aren't the hashes of <code>SHA256E-s8242375--5f82490990812ad3feabb02355750710a9d94283ab256d1c691c3bf8d7d9fbe3.ogg</code> simply <code>5f8/249</code>? --<a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
re: why md5sum?http://git-annex.branchable.com/internals/hashing/comment_6_edb5c3388b5ac3481403c7accf9bb3f2/joey2015-02-17T21:54:51Z2015-02-17T21:51:59Z
Not all types of keys contain hashes.
Python implementationhttp://git-annex.branchable.com/internals/hashing/comment_7_843592cf125be06fb316be43b85b0524/giomasce2015-03-22T22:38:54Z2015-03-22T22:38:54Z
I wrote a Python implementation of the two hashing functions for a project of mine. <a href="https://gist.github.com/giomasce/a7802bda1417521c5b30">Here it is</a>, hoping it can be helpful for somone.