Please describe the problem.
My whole setup on all of my systems is with iso-8859-1. There is no use of UFT-8 (simply as my brain is not compatible to it).
I have metadata with :
in it. That works perfect and ad it is a low char 0x3a, it should be displayed correctly. Unfortunately, when I create a view, that char gets displayed as ï¹?
, which is fully unlogical.
What steps will reproduce the problem?
- Create metadata with : in it
- create a view with that metadata
What version of git-annex are you using? On what operating system?
10.20230126, Devuan
Please provide any additional information below.
~> locale
LANG=de_DE
LANGUAGE=de_DE:de:en
LC_CTYPE=de_DE
LC_NUMERIC=de_DE
LC_TIME=de_DE
LC_COLLATE=de_DE
LC_MONETARY=de_CH
LC_MESSAGES=de_DE
LC_PAPER=de_DE
LC_NAME=de_DE
LC_ADDRESS=de_CH
LC_TELEPHONE=de_DE
LC_MEASUREMENT=de_DE
LC_IDENTIFICATION=de_DE
LC_ALL=
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
yes, that worked in the past (I think)
The character is actually
﹕
That is used in views because on windows colon is a special character and putting it in the name of a viewed file would prevent checking out the view.
It would be possible to skip that on linux, but note that it also escapes
/
on linux with a unicode equivilant (for similar reasons).Also, the linux executable can sometimes be run on a windows system through WSL. I'm not sure how a
:
in a filename would be handled in that situation.That is what locales are for.
Why not use them just correctly? It is just wrong to use utf8 characters in non-utf8 environments.
I think, the correct handling would be to stay with the user choice (locales and the raw text, a user used for the metadata) and mangle that chars just on such broken systems like windows.
It's perfectly find to use unicode in filenames at any time. Files have the name they have no matter how you configure your locale.
If git renamed unicode files when cloning a repository, just because the current locale did not support unicode, it would be broken.
If git-annex metadata contains unicode and you enter a view, git-annex is operating acceptably when it preserves that unicode in the viewed filename.
Maybe git-annex could try to transliterate unicode in viewed filenames in some way to work better non-unicode locales. But the locale can change. And git-annex needs to be able to reverse view filenames back to the filename used on the viewed branch. So it's not practical to vary the view filenames to fit the locale, because that would prevent that reversing from working unless it had a way to determine that locale that was in use when the view was generated.
git-annex has to replace the
/
character with something when generating a viewed file from metadata that contains that character. It used to use%
, since that at least contains a slash, but I didn't think that was very readable. The unicode slash character it uses is very readable for the vast percentage of users who are not stuck with 1980's era displays.Sorry, it's simply a tradeoff between you and everyone else.
But in this case, the metadata does NOT contain unicode/utf-8/whatever exotic charset. It contains ASCII, plain ASCII and git-annex is doing something with it that is not expetable.
It gets even more strange. I tried to rename the files, replacing the strange chars with just a
:
. Git was happy with it, my eyes was happy with it, my filesystem was happy with it but git-annex added now a second tag, additional to the content:
, that is already there, it created the same key with a value ofï¹?
. So it is even inconsistent in that case of renaming.Another opportunity is to abort and tell the user that this char is not allowed in current filesystem. Or give the user a opportunity to replace it with his own choice.
But I did not speak of the slash, I spoke about a double point (
:
). That is a fully legal character in mature filesystems (the most). With that windows filesystems, you mentioned, you have even other troubles that is more important like symlinks...Well, the majority would use git-annex on proper filesystems. It is only a minority that still uses such broken filesystems that do not allow a double point.