I've got a repo full of (legally aquired) Movies and series, most of which have corresponding metadata JSON and ASS subtitle files. When distributing them over many cold storage drives, I've noticed that git-annex would try to fill them up with many of the (much smaller) text files when there isn't enough space for another video file, leaving video and subtitles files on separate drives.
This isn't a critical issue since there are still enough copies and everything but it'd be annoying to have to search for and connect two or more drives to get videos + subtitles for a single series.
I was wondering if there was perhaps a clever solution to prevent this from happening. Everything is organised into subfolders, so ideally I'd prefer if git-annex could be made to operate on full series (as defined by subdirectories or metadata perhaps?) instead of context-less files somehow.
You could make the subtitles wanted in every repo so that all subtitles are present in every repo. Since they are small, the overhead shouldn't be large.
Or you could directly add them to git ("small files") so they are also present everywhere. On a fresh repo, this would help a bit with speed too since git-annex then doesn't need to keep track of the location of these small files.
Or (depending on how you configured your preferred content) you could increase numcopies just for the small files. See backups.
Unfortunately, preferred-content can't directly relate multiple files with each other. git-annex iterates over each file in the tree and checks if preferred-content matches for that particular file.
This is probably what I'll end up doing.
I do that for one type of metadata file that isn't important for consumption but I want everything else to be annex files so that I can assign metadata etc. to them.
Maybe I'm missing something, but doesn't git-annex-preferred-content support
metadata=field=glob
?One other option is to
tar
up each movie and all associated files into one archive, and annex that.There's a special remote in DataLad for accessing individual files inside annexed archives, though I guess in your case you'd normally want all files anyway.