Hello. Am a newbie to Git Annex(ga), but love it already. I kept trying to index own important files for the past long time, but ended up all tangled up. With ga I now see a light at the end of the tunnel! (Hope it's not a train heading my way
So thanks a bucket for writing Git Annex!
I am an "archiver": Every file I add to ga repo is a never-to-be-changed file (it's checksum stays same throughout eternity, only metadata keeps changin). All I need ga for atm is to tag all files. Unfortunately we are talking about few hundred thousand files and the performance with the master git-annex-6.20170519 is not quite what one might hope for.
From your design/caching_database doc I gather that the outlook with metadata is positive ( "For metadata, the story is much nicer. Querying for 30000 keys that all have a particular tag in their metadata takes 0.65s. So fast enough to be used in views." ), but is not in a db (sqlite) yet in the master (git-annex-6.20170519) . I tried to dig through some of the Links there to find out which commit could I checkout and build to try out a cached metadata, but no avail.
Since I don't ever change any file once it gets checked into the ga repo, does that simplify my possible use of current metadata cache code, or will I have to try to learn haskell and will I need to code stuff to get performance (creating views and such).
TIA for any pointers, tips and cavats and THANKS AGAIN FOR WRITING GIT-ANNEX.
ganewbie01
To not sit idle, I've been looking for development branches (specifically the one containing code that gave the rise to Joey's claim "Querying for 30000 keys that all have a particular tag in their metadata takes 0.65s."), but could find only repos with the one branch - the master branch, which doesn't (naturally seem to) include the code for SQLite metadata tinkering.
Is there someplace I could find such development branches please?
Did you clone the repository?
I see lots of branches (remember they are remote branches so you will need the
-a
flag):You can checkout one of the branches like:
Does that help?
hi, thanks for your reply; I've spent several hours today looking through the git-annex repo. I think it was a great idea to place the forums and everything in one repo! It provides sort of a "running commentary" on what was going on and why ...
After a couple of hours looking through the repo using tig, I checked out the key commit "bb242bdd82a438ebfc937609d8d13b512cb49943" and found the foo.hs and fooes.hs files which are most likely the ones that Joey was writing about when he expressed hopes for metadata in an sqlite file. ( I didn't find a way to see "old branches" though, e.g. the one named
database
. Maybe if I study git more ... )Thanks for your reply to a silly newbie question anyway! I'll study this some more and see if I have some on-topic questions (hopefully they will be more educated by then )
g'day!
Yeah, you found the stuff. That's as far as the metadata cache idea has gotten yet. I've restored the missing "database" branch, which was just that commit you found.
I do hope to circle back around to this eventually to speed up generating views and other metadata queries.
But, as a programmer, you could create your own sqlite database and put metadata about your git-annex repository in it. Using
git annex metadata --batch --json
you can query git-annex for metadata about your files as fast as it can pull it out of git, and shove it into your database, and then write your own sql queries.That would be a good first step, because working with real-world data would help develop the sql schema and see if it'll be fast enough to bother with putting into git-annex.