Please describe the problem.
Mimeencoding detection doesn't work for files with Cyrillic filenaes.
What steps will reproduce the problem?
I have
git annex config --set annex.largefiles 'mimeencoding=binary and largerthan=0'
So I expect all binary files to be annexed.
But I have some jpg file with Cyrillic letters in filename: привет.jpg
$ file --mime-encoding привет.jpg
привет.jpg: binary
$ git annex add привет.jpg --verbose
add привет.jpg (non-large file; adding content to git repository) ok
(recording state in git...)
$ mv привет.jpg hello.jpg
$ git annex add hello.jpg --verbose
add hello.jpg
ok
(recording state in git...)
What version of git-annex are you using? On what operating system?
git-annex 10.20220223-g8f6b52b77
Windows 11
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
I just started using it and I love it. I like it more than git LFS
Is your git-annex built with support for mime type detection? Post the output of
git-annex version
I don't think it's possible that a filename can affect this. I'd only believe that if you showed me the same content in a file without a cryllic filename being treated differently.
Also, when I try feeding all the data into
git-annex matchexpression
, it behaves as expected:You could try the same command to see if your git-annex behaves differently.
Thanks for your response.
Your test for matchexpression doesn't show anything because you put
--mimeencoding=binary
part in it. So obviously expression will match.I prepared a simple script to reproduce the bug
It behaves as I described before. The file with Cyrillic letters is added as non-binary.
I think it proves that something is not right with MagicMime library dealing with Cyrillic filenames
Thanks, I think you must be right that there is some form of mojibake involved here, that is preventing cryllic filenames being sent through to libmagic, but only on Windows.
windows support talks about filename encoding problems on windows, and this is probably one of those.