I try to use git-annex to manage my photographs. With most cameras, I do shoot raw+jpeg, but sometimes or from some sources there are only jpegs. Is there a possibility to automatically detect the raw+jpeg pairs (same basename, in the same directory, matching raw+jpeg extensions, would be the rule) and drop these jpegs from my notebooks repository, but keep the jpegs that do not have a complementing raw?
This sounds like it could be built around my own suggestion of Bidirectional metadata.
There, Joey mentions that the datalad project uses a special remote. Perhaps one could be made where it does the JPG/NEF mapping and reports the JPG as available if the NEF is, allowing the NEF to count as a copy.
I wouldn't personally do this unless the JPG can be reliably reproduced from the NEF exactly though!
Hm.. if you want to just drop the JPGs which have known RAW versions, you can set some metadata to say it has a RAW version (again, I'd just set it to the key of the RAW as it makes it easier to get/find later) and then drop JPGs which have that field set.
Need to check the
--metadata
matching filter works as intended there..Given your criteria is only "file in the current directory with the same name, but different extension", you could script the population of that metadata field.
This will drop the JPGs where you've indicated it has a RAW even if that RAW isn't present though. I'm not entirely sure what behavior you want..
sync --content
should not transfer them back). What you are describing is basically a manual tagging and dropping of the images. Of course I could write a script to drop the jpegs, but this would be very slow for 100000 images, and I was hoping for some more automated way from within git-annex.You could set the preferred content to not include files with that metadata, so they would be dropped by
git annex drop --auto
and not brought back bygit annex sync
.git-annex still has to handle all those files and information related to them, manually or not, so it'll be just as slow. Not that 100K files is a lot to me anymore :P (my largest annex is in the 10s of millions..).
That means I update the metadata periodically by a script such as
and add a corresponding preferred content setting? I'll try that. Btw, is there a possibility to use the xmp files of my raw converter (darktable) as source for metadata? I want to do the same with images that have a low star rating (<1 star should not be synchronized to the notebook computer and dropped from there as soon as they are on the external disk), and I hope there is a possibility to not double this metadata but directly use the xmp output. The xmp files are checked into the git regularly (without git annex).
git-annex doesn't "use" the content of files for anything, except at specific points like deriving the key. You can make it automatically copy metadata from a file into git-annex when added (see automatically adding metadata) but it won't keep them in sync, as far as I know. Personally, I strip metadata from the images entirely and put everything into git-annex's metadata.
I would just make a script which has the behavior you want and run it when disk space is a concern or put it in cron (gets a list of 1-star or JPGs with RAWs, passes it to
git annex drop --batch
).Otherwise, creating a special remotes that will return
CHECKPRESENT-SUCCESS
for keys you want to drop, so you can set the preferred content to only want files that are "not present" in that special remote might work?Hm, the raw converter reads and writes metadata from the “sidecar” xmp files, so it would not work to have it only in git-annex. Therefore, a full sync would be required, but luckily this is one-directional, so it should be possible. I think that's the way to go for me.
Thanks for your effort helping me with my stupid questions :).
And disk space is my main concern here, as mentioned above. I really need solutions since I did not know where to put new images, which unfortunately coincided with the birth of my 2nd child and I caught myself taking less photographs of her because I did not know where to put them (and having them accessible from the photo management/raw converter). Therefore, I decided to finally seriously try git-annex which I watch almost from the beginning of the project.
Thanks again