forum/Photos, raw and jpeg: drop jpeg if raw is availablegit-annexhttp://git-annex.branchable.com/forum/Photos__44___raw_and_jpeg__58___drop_jpeg_if_raw_is_available/git-annexikiwiki2019-01-21T15:42:51Zcomment 1http://git-annex.branchable.com/forum/Photos__44___raw_and_jpeg__58___drop_jpeg_if_raw_is_available/comment_1_3f365138be319ac2c81bdefb3f59b0d8/CandyAngel2019-01-21T15:42:51Z2018-10-15T07:33:40Z
<p>This sounds like it could be built around my own suggestion of <a href="http://git-annex.branchable.com/todo/Bidirectional_metadata/">Bidirectional metadata</a>.</p>
<p>There, Joey mentions that the datalad project uses a special remote. Perhaps one could be made where it does the JPG/NEF mapping and reports the JPG as available if the NEF is, allowing the NEF to count as a copy.</p>
<p>I wouldn't personally do this unless the JPG can be reliably reproduced from the NEF <em>exactly</em> though!</p>
comment 2http://git-annex.branchable.com/forum/Photos__44___raw_and_jpeg__58___drop_jpeg_if_raw_is_available/comment_2_0186f593ec084e12216ad2f7da49474e/Chris2019-01-21T15:42:51Z2018-10-15T09:07:44Z
Thanks for the idea. The reason I want to get rid of the jpegs is not that I see them as derivatives of the raw (for my most recent camera they are since it can reprocess the raws), but that I consider them less important. On my notebook computer I do not have enough space to keep all the files anymore. Therefore I want to get rid of the less important ones, which means that later I want to also remove the raws with less than 2 stars rating, and the movie clips I do not work on currently. On the external hard disk, all the files are present anytime. The backup is separate, the git-annex setup is just for extending my notebooks hard disk drive since industry fails for years to come up with 2.5" drives larger than 2 TB in 9.5 mm height. So I think what you are describing is another use case. Automatically recognizing raw+jpeg pairs would just be partial automation of my workflow. Or did I miss something?
comment 4http://git-annex.branchable.com/forum/Photos__44___raw_and_jpeg__58___drop_jpeg_if_raw_is_available/comment_4_0bc307a3fbe000059b23cf681d125ce6/CandyAngel2019-01-21T15:42:51Z2018-10-15T12:59:49Z
<p>Hm.. if you want to just drop the JPGs which have known RAW versions, you can set some metadata to say it has a RAW version (again, I'd just set it to the key of the RAW as it makes it easier to get/find later) and then drop JPGs which have that field set.</p>
<pre><code>git annex metadata -s derived_from_raw=SHA256E-s[...].RAW DCIM_0001.JPG
git annex drop --metadata 'derived_from_raw=*' --include '*.JPG'
</code></pre>
<p>Need to check the <code>--metadata</code> matching filter works as intended there..</p>
<p>Given your criteria is only "file in the current directory with the same name, but different extension", you could script the population of that metadata field.</p>
<p>This will drop the JPGs where you've indicated it has a RAW even if that RAW isn't present though. I'm not entirely sure what behavior you want..</p>
comment 4http://git-annex.branchable.com/forum/Photos__44___raw_and_jpeg__58___drop_jpeg_if_raw_is_available/comment_4_0dc16ef1d592be0dfeb63c28014a70b2/Chris2019-01-21T15:42:51Z2018-10-15T21:43:03Z
I was hoping that there is a possibility that git-annex recognizes the files I do not want on my notebook based on the rule I described and drops them as soon as their copy arrived at the USB disk automatically (e.g. with the next sync command), and never pulls them back unless explicitly stated (especially, <code>sync --content</code> should not transfer them back). What you are describing is basically a manual tagging and dropping of the images. Of course I could write a script to drop the jpegs, but this would be very slow for 100000 images, and I was hoping for some more automated way from within git-annex.
comment 5http://git-annex.branchable.com/forum/Photos__44___raw_and_jpeg__58___drop_jpeg_if_raw_is_available/comment_5_9eb36b5b43b77de790f99ae1d764e4ea/CandyAngel2019-01-21T15:42:51Z2018-10-15T22:23:06Z
<p>You could set the preferred content to not include files with that metadata, so they would be dropped by <code>git annex drop --auto</code> and not brought back by <code>git annex sync</code>.</p>
<p>git-annex still has to handle all those files and information related to them, manually or not, so it'll be just as slow. Not that 100K files is a lot to me anymore :P (my largest annex is in the 10s of millions..).</p>
comment 6http://git-annex.branchable.com/forum/Photos__44___raw_and_jpeg__58___drop_jpeg_if_raw_is_available/comment_6_a34d404dbecd6730c1cc28af9c0eac26/Chris2019-01-21T15:42:51Z2018-10-16T11:07:30Z
<p>That means I update the metadata periodically by a script such as</p>
<pre><code>for k in `find . -name "*.CR2"`; do
if [ -e `dirname $k`/`basename $k .CR2`.JPG ]; then
<set metadata>
fi
done
</code></pre>
<p>and add a corresponding preferred content setting? I'll try that. Btw, is there a possibility to use the xmp files of my raw converter (darktable) as source for metadata? I want to do the same with images that have a low star rating (<1 star should not be synchronized to the notebook computer and dropped from there as soon as they are on the external disk), and I hope there is a possibility to not double this metadata but directly use the xmp output. The xmp files are checked into the git regularly (without git annex).</p>
comment 7http://git-annex.branchable.com/forum/Photos__44___raw_and_jpeg__58___drop_jpeg_if_raw_is_available/comment_7_88b04371b3906f01c7b33d110838dc71/CandyAngel2019-01-21T15:42:51Z2018-10-16T12:57:55Z
<p>git-annex doesn't "use" the content of files for anything, except at specific points like deriving the key. You can make it automatically copy metadata from a file into git-annex when added (see <a href="http://git-annex.branchable.com/tips/automatically_adding_metadata/">automatically adding metadata</a>) but it won't keep them in sync, as far as I know. Personally, I strip metadata from the images entirely and put everything into git-annex's metadata.</p>
<p>I would just make a script which has the behavior you want and run it when disk space is a concern or put it in cron (gets a list of 1-star or JPGs with RAWs, passes it to <code>git annex drop --batch</code>).</p>
<p>Otherwise, creating a special remotes that will return <code>CHECKPRESENT-SUCCESS</code> for keys you want to drop, so you can set the preferred content to only want files that are "not present" in that special remote might work?</p>
comment 8http://git-annex.branchable.com/forum/Photos__44___raw_and_jpeg__58___drop_jpeg_if_raw_is_available/comment_8_b3e9100b92aaed66b22c76141f7c08d5/Chris2019-01-21T15:42:51Z2018-10-16T15:15:17Z
<p>Hm, the raw converter reads and writes metadata from the “sidecar” xmp files, so it would not work to have it only in git-annex. Therefore, a full sync would be required, but luckily this is one-directional, so it should be possible. I think that's the way to go for me.</p>
<p>Thanks for your effort helping me with my stupid questions :).</p>
<p>And disk space is my main concern here, as mentioned above. I really need solutions since I did not know where to put new images, which unfortunately coincided with the birth of my 2nd child and I caught myself taking less photographs of her because I did not know where to put them (and having them accessible from the photo management/raw converter). Therefore, I decided to finally seriously try git-annex which I watch almost from the beginning of the project.</p>
<p>Thanks again <img src="http://git-annex.branchable.com/smileys/smile.png" alt=":)" /></p>