Currently, in a git annex view, filenames take the form basename%path%.ext. I understand that this is a carefully drafted mapping to allow changes to be merged back into metadata. However, maybe it would be possible to make the separator ('%') and the order (e.g. path%basename.ext instead) configurable?
git-annex really doesn't care what filenames are used with in a view. It only needs to ensure that each file gets a unique filename. Which is why the directory is included in the filename, to avoid conflicts if 2 files with the same name appear in different directories.
It would probably be better to make it avoid needing to include the directory in the filename unless there is such a conflict, rather than adding complexity configuring that.
However, since views are currently built by streaming the contents of the branch to git update-index, git-annex can't just eg, examine the working tree to see if a conflicting file exists. It seems it would need to keep a map of the files it's added to the view branch so far, and check against the map. But that would make memory use scale with the number of files in the view, which I'd prefer to avoid..
I'm going to move this from bugs to todo.
This name change shouldn't be necessary if on view that has directory structure from master: git annex view todo= "/="
This sounds like another use case for bloom filters
True, it could use a bloom filter.
I had not thought of
/=*
(or forgot about it). Views could, as a special case, use the original paths in that case. That's getting very close to adjusted branch territory, and I want to rewrite the view branch generation code to use adjusted branches eventually (so changes made in the view branch can be propigated back out to the source branch and so view branches can be updated when the source branch changes).Has the format been changed since this previously asked? I am currently trying to leverage git-annex and its metadata views with AI tooling, but the format seems to be filename_%path%, resulting in the extension being in the middle of the path. I have set
annex.maxextensionlength
to12
so the extensions are present on the files in the backend.whereas I would expect (or rather, I am trying to achieve):
@Xyem no, it's unchanged. But annex.maxextensionlength does not configure the extension length here currently. I think it would be a good thing for it to do, probably.
So if my understanding is correct, the file paths generated for this view should something like
sd/v1.5_%model%.safetensors
but asannex.maxextensionlength
isn't being considered during this, it doesn't realisesafetensors
is the extension?Unfortunately, the software will only regard certain extensions as being usable files, so I will be unable to use metadata views for now. I've set up separate branches and will copy symlinks between branches in the meantime.
I've made git-annex view use
annex.maxextensionlength
. Note that refining an existing view will reuse the extension length that was configured when initially constructing the view.