git-annex's metadata works best when files have a lot of useful metadata attached to them.
To make git-annex automatically set the year and month when adding files,
run: git config annex.genmetadata true
git commit hook
A git commit hook can be set up to extract lots of metadata from files like photos, mp3s, etc. Whenever annexed files are committed, their metadata will be extracted and stored.
Download pre-commit-annex and install it in your git-annex repository
as .git/hooks/pre-commit-annex
Remember to make the script executable! chmod +x .git/hooks/pre-commit-annex
using extract
The git commit hook can use extract to get metadata.
Install it from http://www.gnu.org/software/libextractor/
apt-get install extract
Configure which metadata fields to ask extract for: git config metadata.extract "artist album title camera_make video_dimensions"
To get a list of all possible fields, run: extract -L | sed 's/ /_/g'
using exiftool
The git commit hook can also use exiftool to get metadata.
Install it from http://owl.phy.queensu.ca/~phil/exiftool/
apt-get install libimage-exiftool-perl
Configure which metadata fields to ask exiftool for: git config metadata.exiftool "Model ImageSize FocusRange GPSAltitude GPSCoordinates"
To get a list of all possible fields, run: exiftool -list
using both extract and exiftool
If you want some metadata that extract knows about, and other metadata
that exiftool knows about, just install them both, and set both
metadata.extract
and metadata.exiftool
.
overwriting existing metadata
By default, if a git-annex already has a metadata field for a file,
its value will not be overwritten with metadata taken from files.
To allow overwriting, run: git config metadata.overwrite true
is there a way for this to be done globally, without having to install and configure the hook for each repository? it seems like a fairly useful feature that could be factored in git-annex itself (as opposed to be shipped as a shell script)...
also, is there a way to retroactively parse the tags from existing files (as opposed to only new files added to the repo).
thanks
@anarcat, I have modified pre-commit-annex so if it's passed already annexed files, it'll extract their metadata.
So this can be used to add metadata to files added before you installed the hook, or if you've configured more fields to be extracted.
seemingly pre-commit hooks are not being called on windows, it could have to do with git annex sync bypassing them when doing commits ?
on the other side genmetadata works. although that is not enough for me since I'd want to preserve complete last modification date/time and I was in the process of modifying the supplied pre-commit script to call for "stat %Y" (which btw is working fine on windows, while the last binaries for extract are failing there).
am I correct in assuming that direct mode [on windows at least] bypasses hooks [namely pre-commit as well as pre-commit-annex] ?
@Michele git annex sync in a direct mode repository does bypass the pre-commit hook. However, it will try to run the pre-commit-annex hook.
Most likely, the hook script does not appear executable on Windows, so git-annex cannot run it.
@Michele after testing, git-annex on Windows seems to not see a file that does have the executable bit set as executable. I have opened a bug report ?windows isExecutable fail, and worked around the problem now.
@Joey just tested a nightly build and now pre-commit-annex is called, and with my modifications it autoadds last modified times for content. Trivially it's just the matter of adding:
to the body of the process() function to the supplied pre-commit-annex script. thanks
Hi,
just detected that the provided pre-commit-annex script is broken for filenames containing a '
@woffs hmm, the hook script seems to quote every use of the filename, which should avoid such problems. I tested it, using both extract and exiftool, and a file named "foo'bar.jpg", and it worked fine.
If you have a case where it does not work, suggest you file a bug report with enough information to reproduce the problem.
hi there, sincere gratitude for your work on git-annex.
is it worth considering a more modular setup (maybe a python script or even haskell) for working with metadata extractors?
currently it really only supports field-based filtering extractors. i am particularly interested in using (minimally):
mp3info - offers a formatting string for printing but doesn't fit super nicely into the "want" fields in shell script. could probably be hacked in.
mp4info - does not seem to offer any native parameters for filtering; would likely need some engineering/thought about how to take in all possible fields then filtering off of those.
alternatively i could just write some personal scripts here, but just thought others would find it useful for auto extracting content from mp3/m4a files. extract doesn't seem to perform as well on these as, say, .flac files.
thanks again!
@max, notice that the hook script contains special case handling for exiftool, including a config option for it. That was contributed by Klaus Ethgen. I'd be inclined to merge patches that add handling for other tools.
I imagine you could add config settings for the format string etc.
Running extract on very large files (system backups) can be too long (killed it after running several hours). In general
extract
seem slow on tar.gz archives.I added
timeout 100s
before the tool is called in the pre commit script:LC_ALL=C timeout 100s $tool_exec "./$f" | ...
This allows to have the commit to complete in reasonable time, probably loosing some metadata.
You could add a config to the script that skips over files larger than a certian size.
Or for that matter, the script could be adapted to filter the files to only include images/videos, using eg:
Should be a fairly easy change, patches accepted.