tips/automatically adding metadatagit-annexhttp://git-annex.branchable.com/tips/automatically_adding_metadata/git-annexikiwiki2021-09-03T15:57:18Zcomment 1http://git-annex.branchable.com/tips/automatically_adding_metadata/comment_1_ffc308cc6aedabbc55820db4f401e0fb/Jimmy2014-03-03T07:44:30Z2014-03-03T07:44:29Z
http://projects.iq.harvard.edu/fits might be an even better choice than libextractor. We use it in work and its not too bad, but it can be slow to startup due to the JVM.
comment 2http://git-annex.branchable.com/tips/automatically_adding_metadata/comment_2_bd64a53914107bc000c887b4d4bdf6af/anarcat [id.koumbit.net]2014-04-01T04:18:10Z2014-04-01T04:18:10Z
<p>is there a way for this to be done globally, without having to install and configure the hook for each repository? it seems like a fairly useful feature that could be factored in git-annex itself (as opposed to be shipped as a shell script)...</p>
<p>also, is there a way to retroactively parse the tags from existing files (as opposed to only new files added to the repo).</p>
<p>thanks</p>
comment 3http://git-annex.branchable.com/tips/automatically_adding_metadata/comment_3_02e5314f827d17d482343e8f22c42fd9/joeyh.name2014-04-17T20:15:07Z2014-04-17T20:15:07Z
<p>@anarcat, I have modified <a href="http://git-annex.branchable.com/tips/automatically_adding_metadata/pre-commit-annex">pre-commit-annex</a> so if it's passed already annexed files, it'll extract their metadata.</p>
<p>So this can be used to add metadata to files added before you installed the hook, or if you've configured more fields to be extracted.</p>
direct mode pre-commit hooks [on windows]http://git-annex.branchable.com/tips/automatically_adding_metadata/comment_4_cd3c8e2f45db93576d1b82cfbfbe601b/Michele2015-01-20T12:43:24Z2015-01-20T12:43:24Z
<p>seemingly pre-commit hooks are not being called on windows, it could have to do with git annex sync bypassing them when doing commits ?</p>
<p>on the other side genmetadata works. although that is not enough for me since I'd want to preserve complete last modification date/time and I was in the process of modifying the supplied pre-commit script to call for "stat %Y" (which btw is working fine on windows, while the last binaries for extract are failing there).</p>
<p>am I correct in assuming that direct mode [on windows at least] bypasses hooks [namely pre-commit as well as pre-commit-annex] ?</p>
comment 5http://git-annex.branchable.com/tips/automatically_adding_metadata/comment_5_888f0a77405d616a0d51a6176b44f605/joey2015-01-20T18:55:59Z2015-01-20T16:52:28Z
<p>@Michele git annex sync in a direct mode repository does bypass the
pre-commit hook. However, it will try to run the pre-commit-annex hook.</p>
<p>Most likely, the hook script does not appear executable on Windows, so
git-annex cannot run it.</p>
comment 6http://git-annex.branchable.com/tips/automatically_adding_metadata/comment_6_34f0c55d09ddee3de642f6b25a9f6269/joey2015-01-20T18:55:59Z2015-01-20T17:19:34Z
<p>@Michele after testing, git-annex on Windows seems to not see a file that
does have the executable bit set as executable. I have opened a bug report
<span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fautomatically_adding_metadata%2Fcomment_6_34f0c55d09ddee3de642f6b25a9f6269&page=bugs%2Fwindows_isExecutable_fail" rel="nofollow">?</a>windows isExecutable fail</span>, and worked around the problem now.</p>
pre-commit is OK on windows now - auto adding last mod datetimehttp://git-annex.branchable.com/tips/automatically_adding_metadata/comment_7_94877b21bf80374c2874b971f26f0e55/Michele2015-01-30T11:48:24Z2015-01-30T11:48:24Z
<p>@Joey just tested a nightly build and now pre-commit-annex is called, and with my modifications it autoadds last modified times for content.
Trivially it's just the matter of adding:</p>
<pre><code>field="datemod"
value=$(stat -c %Y $f)
addmeta "$f" "$field" "$value"
</code></pre>
<p>to the body of the process() function to the supplied pre-commit-annex script.
thanks</p>
hook and quotinghttp://git-annex.branchable.com/tips/automatically_adding_metadata/comment_8_3880fb7f13e74d33d9c4e86772cc6b0e/woffs2017-12-09T11:18:10Z2017-12-09T11:18:10Z
<p>Hi,</p>
<p>just detected that the provided pre-commit-annex script is broken for filenames containing a '</p>
comment 9http://git-annex.branchable.com/tips/automatically_adding_metadata/comment_9_db9c2fa8545188520d4bdda5ba545624/joey2017-12-11T18:27:42Z2017-12-11T18:23:58Z
<p>@woffs hmm, the hook script seems to quote every use of the filename,
which should avoid such problems. I tested it, using both extract
and exiftool, and a file named "foo'bar.jpg", and it worked fine.</p>
<p>If you have a case where it does not work, suggest you file a bug
report with enough information to reproduce the problem.</p>
adding support for additional metadata tools?http://git-annex.branchable.com/tips/automatically_adding_metadata/comment_10_cf770ba8eed7963f08517877bd460d3f/max2020-06-17T01:18:32Z2020-02-10T04:45:55Z
<p>hi there, sincere gratitude for your work on git-annex.</p>
<p>is it worth considering a more modular setup (maybe a python script <img src="http://git-annex.branchable.com/smileys/smile.png" alt=":)" /> or even haskell) for working with metadata extractors?</p>
<p>currently it really only supports field-based filtering extractors. i am particularly interested in using (minimally):</p>
<p>mp3info - offers a formatting string for printing but doesn't fit super nicely into the "want" fields in shell script. could probably be hacked in.</p>
<p>mp4info - does not seem to offer any native parameters for filtering; would likely need some engineering/thought about how to take in all possible fields then filtering off of those.</p>
<p>alternatively i could just write some personal scripts here, but just thought others would find it useful for auto extracting content from mp3/m4a files. extract doesn't seem to perform as well on these as, say, .flac files.</p>
<p>thanks again!</p>
re: adding support for additional metadata tools?http://git-annex.branchable.com/tips/automatically_adding_metadata/comment_11_6eeb21b66aa3541491ddc0dd3058ddc7/joey2020-06-17T01:18:32Z2020-02-17T16:55:20Z
<p>@max, notice that the hook script contains special case handling for
exiftool, including a config option for it. That was contributed by
Klaus Ethgen. I'd be inclined to merge patches that add handling
for other tools.</p>
<p>I imagine you could add config settings for the format string etc.</p>
Automatically adding metadata can be very slow http://git-annex.branchable.com/tips/automatically_adding_metadata/comment_12_92c28dee004562d7085191f3b9e29fec/aurelf2021-09-03T12:46:06Z2021-09-03T12:46:06Z
<p>Running extract on very large files (system backups) can be too long (killed it after running several hours).
In general <code>extract</code> seem slow on tar.gz archives.</p>
<p>I added <code>timeout 100s</code> before the tool is called in the pre commit script:</p>
<p><code>LC_ALL=C timeout 100s $tool_exec "./$f" | ...</code></p>
<p>This allows to have the commit to complete in reasonable time, probably loosing some metadata.</p>
Re: Automatically adding metadata can be very slowhttp://git-annex.branchable.com/tips/automatically_adding_metadata/comment_13_065b018dc290549b5ef00b50c3b09fcc/joey2021-09-03T15:57:18Z2021-09-03T15:36:08Z
<p>You could add a config to the script that skips over files larger than a
certian size.</p>
<p>Or for that matter, the script could be adapted to filter the files
to only include images/videos, using eg:</p>
<pre><code>git annex find --mimetype='image/*' --or --mimetype='video/*'
</code></pre>
<p>Should be a fairly easy change, patches accepted.</p>