The link targets of annexed files are currently very long. This creates problems e.g. when browsing directories in Emacs (I mostly work through a text terminal). Ideally, the key would not be repeated twice in the link, but I understand this is hard to do compatibly. Maybe, the following simpler alternative could be implemented? Key checksums are currently represented in base16 using only the characters 0-9a-f . The same information could be represented with shorter strings using base64url or other encoding, where a larger range of chars is used. So for each backend you'd add a corresponding one that does the same thing, but encodes the checksum part of the key with shorter strings.
Or, if you're tired of backend requests, maybe implement a scheme for external backends, like the one for external special remotes? For external backend EXTNNN the user would put a script git-annex-external-backend-NNN in the path; the script would support commands like calckey, examinekey . Then I could also implement e.g. canonicalizing backends that strip away variable but semantically irrelevant information before computing the checksum.
The key is in the path twice as a security measure (the write bit is removed from the directory, to prevent
rm -rf
ing all the files away by mistake).This is known to cause slowness while traversing the objects directory, which is why there is repository tuning. Perhaps you want to set some of these?
I'm not convinced that git-annex should try to make the symlinks shorter just because some programs have UIs that don't work well with longer symlinks. UIs can be improved.
I like to use
ls -lL
for example, which conveniently avoids displaying the symlink target and also shows the size of the annexed file.Using the MD5 backend will also give you much shorter symlinks..
External backends is an interesting idea, but needing to deal with the backend being missing or failing to work could have wide repurcussions in the code base. It feels like too much complexity for too little gain.
I was arguing for removal of the KEY-directory/ for a while See e.g. as old as https://github.com/datalad/datalad/issues/32 . There is an issue/discussion on this website too somewhere, couldn't find quickly. IMHO it is just a "tech" problem, i.e. no design principle forbids fixing it. It might though lead to performance issues since the containing directory then needs to be chmod'ed back and forth to introduce changes to the KEY-file under it, but it is probably very similar to what it is now anyways.
FWIW in DataLad we moved to use MD5E backend as the default to at least somewhat relief the burden of long symlinks. I think we are "secure" enough for what we use DataLad here
Since there is a separate todo item external backends, let's not discuss that idea here.
key/f would have been a great idea to have had 10 years ago. (Although it does mean that if the object file somehow gets moved out of its directory, there's no indication in its name that it's a git-annex object file)
But if that's all this todo is about, we'd need some kind of transition plan for existing repos with history containing symlinks to key/key. I doubt there is a good way to make that transition.
I don't need strategy nor safety. Create a tunable for new repos to disable directory sha completely (and if not sustainable - even readonly bit safety) - and that is enough. I will replay whole my history again onto new repo and make fs snapshotting more frequently. I can even live without base64. Changing defaults is undesirable - and that is understandable. But we would still prefer to have an option, even if to bear the whole grunt of consequences ourselves. Sometimes I wonder if it worth to learn Haskell only to fork gitannex for specific needs, but the reason for this encompassing endeavor seems lame
external backends is now implemented, so you can write a program that makes keys use some other, shorter hash encoding.
I don't know if that's really sufficient to close this.