avoid duplicating key twice in symlink to object filegit-annexhttp://git-annex.branchable.com/todo/shorter_keys_through_better_encoding/git-annexikiwiki2020-07-29T21:28:49Zcomment 1http://git-annex.branchable.com/todo/shorter_keys_through_better_encoding/comment_1_4bd588a3cdca5cb94132549b7d78e6af/CandyAngel2019-01-21T15:42:51Z2018-09-27T07:39:57Z
<p>The key is in the path twice as a security measure (the write bit is removed from the directory, to prevent <code>rm -rf</code>ing all the files away by mistake).</p>
<p>This is known to cause slowness while traversing the objects directory, which is why there is repository <a href="http://git-annex.branchable.com/tuning/">tuning</a>. Perhaps you want to set some of these?</p>
comment 2http://git-annex.branchable.com/todo/shorter_keys_through_better_encoding/comment_2_454cb3b2b1102ce4da18f245bc583e7c/Ilya_Shlyakhter2019-01-21T15:42:51Z2018-09-27T11:09:04Z
"The key is in the path twice as a security measure" -- would key/f.txt be less secure than key/key.txt? I thought the security comes from having both a dir and a file, not from them both having the key in their name?
comment 3http://git-annex.branchable.com/todo/shorter_keys_through_better_encoding/comment_3_9f5f16a7ab5d3e28cdddc333d97b87ea/joey2019-01-21T15:42:51Z2018-10-04T18:35:28Z
<p>I'm not convinced that git-annex should try to make the symlinks shorter
just because some programs have UIs that don't work well with longer
symlinks. UIs can be improved.</p>
<p>I like to use <code>ls -lL</code> for example, which conveniently
avoids displaying the symlink target and also shows the size of the annexed
file.</p>
<p>Using the MD5 backend will also give you much shorter symlinks..</p>
<p>External backends is an interesting idea, but needing to deal with the
backend being missing or failing to work could have wide repurcussions in
the code base. It feels like too much complexity for too little gain.</p>
comment 4http://git-annex.branchable.com/todo/shorter_keys_through_better_encoding/comment_4_720012572e1d748bc628fdb17a55a3bd/Ilya_Shlyakhter2019-01-21T15:42:51Z2018-10-12T01:05:30Z
"I'm not convinced that git-annex should try to make the symlinks shorter just because some programs have UIs that don't work well with longer symlinks" -- UI is just one plus of shorter keys. Another is that some systems can't handle long paths; e.g. backends says don't use 512 or 384 hashes on Windows. Another is that long keys and symlinks increase the amount of data git deals with, which can matter for large repos. Using base64 encoding for hashes would shorten key lengths by a third; not repeating the hash twice in symlinks would give another factor of 2.
comment 5http://git-annex.branchable.com/todo/shorter_keys_through_better_encoding/comment_5_41857b7f25f8a922640378a41574b84d/yarikoptic2019-01-21T15:42:51Z2018-10-12T13:42:07Z
<p>I was arguing for removal of the KEY-directory/ for a while <img src="http://git-annex.branchable.com/smileys/smile4.png" alt=";)" /> See e.g. as old as https://github.com/datalad/datalad/issues/32 . There is an issue/discussion on this website too somewhere, couldn't find quickly.
IMHO it is just a "tech" problem, i.e. no design principle forbids fixing it. It might though lead to performance issues since the containing directory then needs to be chmod'ed back and forth to introduce changes to the KEY-file under it, but it is probably very similar to what it is now anyways.</p>
<p>FWIW in DataLad we moved to use MD5E backend as the default to at least somewhat relief the burden of long symlinks. I think we are "secure" enough for what we use DataLad here <img src="http://git-annex.branchable.com/smileys/smile4.png" alt=";)" /></p>
comment 6http://git-annex.branchable.com/todo/shorter_keys_through_better_encoding/comment_6_de3f0be0f6b560ca7bb66f7b5c8f0592/Ilya_Shlyakhter2019-01-21T15:42:51Z2018-10-12T13:58:05Z
Removing KEY/directory could give more savings, but sometimes there is more than one file there (eg key metadata), so the dir makes sense. But the content filename in the dir needn’t repeat the key. But, changing that could be hard. Adding backend variamts with base64-encoded checksums seems possible though?
comment 7http://git-annex.branchable.com/todo/shorter_keys_through_better_encoding/comment_7_d499c66c9a1981bb14d0ce63f4983ea5/joey2020-06-17T01:18:32Z2020-01-30T18:49:47Z
<p>Since there is a separate todo item <a href="http://git-annex.branchable.com/todo/external_backends/">external backends</a>, let's not
discuss that idea here.</p>
<p>key/f would have been a great idea to have had 10 years ago.
(Although it does mean that if the object file somehow gets moved out of
its directory, there's no indication in its name that it's a git-annex
object file)</p>
<p>But if that's all this todo is about, we'd need some kind of transition
plan for existing repos with history containing symlinks to key/key.
I doubt there is a good way to make that transition.</p>
re: shorter symlinkshttp://git-annex.branchable.com/todo/shorter_keys_through_better_encoding/comment_8_5592ac9d1887183daf3173a0c27b672f/amerlyq+annex2020-06-17T01:18:32Z2020-03-04T00:48:02Z
<p>I don't need strategy nor safety.
Create a tunable for new repos to disable directory sha completely (and if not sustainable - even readonly bit safety) - and that is enough.
I will replay whole my history again onto new repo and make fs snapshotting more frequently.
I can even live without base64.
Changing defaults is undesirable - and that is understandable.
But we would still prefer to have an option, even if to bear the whole grunt of consequences ourselves.
Sometimes I wonder if it worth to learn Haskell only to fork gitannex for specific needs, but the reason for this encompassing endeavor seems lame <img src="http://git-annex.branchable.com/smileys/sad.png" alt=":(" /></p>
comment 9http://git-annex.branchable.com/todo/shorter_keys_through_better_encoding/comment_9_75f611c63bf1e9c8b4885fea8e9d467f/joey2020-07-29T21:28:49Z2020-07-29T21:22:42Z
<p><a href="http://git-annex.branchable.com/todo/external_backends/">external backends</a> is now implemented, so you can write a program that
makes keys use some other, shorter hash encoding.</p>
<p>I don't know if that's really sufficient to close this.</p>