Please describe the problem.
In git-annex version 10.20250721, certain non-Latin filenames, specifically those with Cyrillic characters, fail to be added, unlocked, or adjusted in repositories. The issue affects a range of filename patterns, including simple Cyrillic names, names with numbers, dashes, spaces, or special characters, and files with various extensions. This problem appears to be a regression in this version, as the same repository works perfectly with git-annex version 10.20220121.
What steps will reproduce the problem?
- Create a new git repository and initialize git-annex:
git init
git annex init
- Create test files with different Cyrillic filename patterns (both working and failing examples):
echo "test" > "ИА_2222.07.xlsx" # 2-char Cyrillic prefix - WORKS
echo "test" > "ЦППП_202206.xlsx" # no dot in date - WORKS
echo "test" > "ААА_55.22.xlsx" # different date format - WORKS
echo "test" > "ЦППП_2022.06.xlsx" # 4-char prefix + YYYY.MM - FAILS
echo "test" > "ИАИА_2222.07.xlsx" # 4-char prefix + YYYY.MM - FAILS
- Add the files:
git annex add *
- You will see that some files are successfully added, while others fail with the error:
git-annex: .git/annex/othertmp/.0: createSymbolicLink: already exists (File exists) failed
- Additionally, in existing repos, attempts to unlock or adjust in failed files will show errors like:
git-annex: ../.git/annex/othertmp/.22/SHA256E-s...: removeDirectoryRecursive: permission denied (Permission denied) failed
What version of git-annex are you using? On what operating system?
- git-annex version: 10.20250721 (broken)
- OS: Manjaro Linux (ext4 filesystem)
- git config:
core.quotepath=false
- Note: The issue does not occur in git-annex version 10.20220121 (tested on WSL Ubuntu).
Please provide any additional information below.
Problematic Filename Examples:
- "ЦППП_2022.06.xlsx" (4-char Cyrillic prefix with YYYY.MM date format) — fails
- "ИАИА_2222.07.xlsx" (4-char Cyrillic prefix with YYYY.MM date format) — fails
- "ДПК_2021.06-2.xlsx" (Cyrillic prefix with number and dash) — fails
- "ВУП Авто .pptx" (Cyrillic with spaces) — fails
- "Ачох_кейс.dat" (Cyrillic with underscore and special characters) — fails
Even Simple Non-Latin names:
- пожелания.md — fails
- обучение.xlsx — fails
- Протокол.xlsx — fails
- Согласие.docx — fails
- Грейдинг.pptx — fails
Working Examples:
- "ИА_2222.07.xlsx" (2-char Cyrillic prefix)
- "ЦППП_202206.xlsx" (no dot in date)
- "ААА_55.22.xlsx" (different date format)
- Latin-only filenames such as "IOIO_2222.07.xlsx" also work fine.
Debug Output shows escaped Cyrillic sequences:
git annex --debug whereis "ЦППП_2022.06.xlsx" 2>&1 | grep ls-files
git [...] ls-files [...] "\1062\1055\1055\1055_2022.06.xlsx"
- Workaround: Renaming the problematic file by adding a special character or changing the filename slightly (e.g., using an em-dash or a different date separator) resolves the issue:
mv "ЦППП_2022.06.xlsx" "ЦППП_2022.06—.xlsx" # Add em-dash
git annex add "ЦППП_2022.06—.xlsx" # This works
- Possible Root Cause: May be the temp filename generation algorithm in git-annex appears to have conflicts when processing escaped Cyrillic sequences (e.g., \1062\1055\1055\1055) in filenames that have 4+ character Cyrillic prefixes and a YYYY.MM date format. This causes temp filenames like "ЦП{PID}-{counter}" to conflict with existing operations.
Have you had any luck using git-annex before?
Yes, git-annex has been fantastic for managing large datasets across multiple machines, and the same repository works perfectly with an older version (10.20220121) on Ubuntu WSL. However, this issue with non-Latin filenames is a regression in the newer version. Despite this, git-annex remains an invaluable tool for distributed file management.
This issue appears to affect all Cyrillic filenames, not just the initially identified patterns, making the current version of git-annex barely usable for repositories containing non-Latin filenames.
Reproduced. Thank you for an excellent bug report.
And it is the temp filename generation causing the problem.
The cause is that relatedTemplate is returning "", which is not something the code is prepared for. That results in the ".0" directory name, and
".0" </> "" == ".0"
so it uses the same path for the temp file as for the subdirectory.Not all cyrllic names are affected though. Only ones that are exactly 21 bytes long. Longer or shorter are both ok.
The reason is that relatedTemplate wants to reserve 20 bytes for the random part of the temp filename. With a 21 byte filename, that means it wants to truncate it to 1 byte. But it that lands in the middle of the first unicode character, which is not allowed, so it truncates it to 0 bytes instead.
I've fixed this bug.