Recent comments posted to this site:
- Remove comment
If you exclude them from wanted or gitignore them before ever importing from the special remote, it won't delete them. But if you already imported a tree containing the files, and then exclude them, and then export a tree, git-annex will see that the old tree contained the file, and the new tree did not, and so will delete the file.
Reproduced this, and found the filename it's using with strace:
joey@darkstar:~/tmp/annex>touch '/home/joey/mnt/Music/KIRA/KIRA - The Introduction (Deluxe Edition)/KIRA - The Introduction (Deluxe Edition) - 05 Games (feat566f1-3-1a21815. '
touch: cannot touch '/home/joey/mnt/Music/KIRA/KIRA - The Introduction (Deluxe Edition)/KIRA - The Introduction (Deluxe Edition) - 05 Games (feat566f1-3-1a21815. ': Invalid argument
The problem is the trailing space in the filename, which VFAT does not support:
joey@darkstar:~/tmp/annex>touch '/home/joey/mnt/Music/KIRA/foo '
touch: cannot touch '/home/joey/mnt/Music/KIRA/foo ': Invalid argument
There was already a similar workaround for of not allowing a filename to end with ".", so I made it also check for whitespace.
Your hypothesis is right, it's items in the bucket with names ending in "/".
After fixing git-annex to skip and warn about those, it looks like this:
list s3-origin
Cannot import a file with a name that appears to be a directory: models/smartspim_production_models/
Cannot import a file with a name that appears to be a directory: models/smartspim_production_models/model_2_12202024/
Cannot import a file with a name that appears to be a directory: point_annotations/
Cannot import a file with a name that appears to be a directory: point_annotations/06-21-2024/
ok
Note that "models/smartspim_production_models/config.json" is a file in the bucket located "inside" the first path. So this is not a case of an empty directory being somehow stored to a S3 bucket as a file, but of something else. I have not looked at the contents of these objects, as I would likely not understand them anyway.
I couldn't think of a better method than to warn and skip them. Any name mangling would take a name that could be used by some other file. And not warning risks the user being surprised when all the data in the bucket does not get imported.
(Note that this is a bug that has already been closed.)
While yes, leading dot just means "hide it from ls", people do have a
legitimate complaint when git-annex add
annexes .gitattributes or a file
like that. Since we don't have any other general semantic information about
config files besides leading dot, this seems to be to be the best that can
be done to avoid what would otherwise be a common complaint, and turn it
into an uncommon complaint.
The only other good approach seems to be the git-lfs approach, of requiring
that the user configure explicitly which files they consider large, with eg
git lfs track "*.iso"
This is the adb push
command itself failing. Since git-annex has to use
that command with an adb special remote, I don't see how this could be
fixed in git-annex.
It seems likely that the Android device is configured to allow adb to read
files in Android/data/org.opencpn.opencpn
but not write to files there.
You might be able to change the permissions with root access.
This is the solution:
git annex enableremote annexA rsyncurl=host:/new/path
The configremote command changes the configuration of a remote that does not have to be enabled for use at all, and is currently only used to change the autoenable=true configuration. For changing other configuration the enableremote command is the thing to use.
I did spend some time now trying to figure FTW and why git-annex (version from end of last year) says "non-large file; adding content to git repository" whenever check-attr
insists that largefile
should apply to my huge .dandi/assets.json
. Only trying newer git-annex I think I got the reason which it finally announced as "dotfile; adding content to git repository" and I was able to recover this discussion!
Re
But, .config/ seems to me to perfectly match what dotfiles are, which is files that are configuration that are named with a name starting with a dot in order to keep them from cluttering up ls.
As far as I know, having leading dot just a convention for hidden and not config, and even not neccessarily text files.
Even though, dotfile
files (not folders) are most of the time are text files, I would not generalize that to dot-folders:
content of .cache/
or .venv/
(created by uv
) etc are unlikely to be text files to even start with. Those folder names start with dot to signal "hidden" not "text" or "small".
That is why I retain that it remains confusing and inconsistent to have any special treatment and need for extra configuration (git annex config --set annex.dotfiles true
) for content of dot-folders. I appreciate that such change would likely change behavior but IMHO it might just be "for the best".
This feature would alleviate one problem I have with annex in that the path stored in annex symlinks depends on the tree a file sits in.
This makes each git
object of a annexed file in a different folder unique.
If annexed files ever move, we now have a fairly useless new git object introduced into the repo.
Not at all a problem for one file but if you have tens of thousands of annexed files and you refactor, you start to notice that.
Unlocked files don't have this problem because their blobs point agnostically to the annex and key. But, of course, unlocking large amounts of files mean content copies so that's not great.
Symlink chains alleviate this because if I have a chain like .root -> ./
in the root and .root -> ../.root
in essentially every directory, then annex symlinks become agnostic too.
And on the git side, that's two new objects to add, and only a new tree object when performing a move.
Again this is only relevant when the number of files becomes massive. For sense of scale, let's assume a symlink payload is on the order of 100 bytes. So 10,000 files generates roughly a Mb of git objects, meaning if I had 100,000 files and moved them around once, I'd have 20 Mb of data dedicated to locating these files w/ 10 Mb of what I would deem as waste. Honestly, annex and git slow down appreciably at that scale for other reasons (pull/push/checkout, especially on slower file systems), so I say this is a non-issue by comparison. For those who had similar concerns, there's your benchmark: 10Mb of bloat per 10,000 files per move!