ga dev newbie Q: pointers to start playing with metadata cache plz

Hello. Am a newbie to Git Annex(ga), but love it already. I kept trying to index own important files for the past long time, but ended up all tangled up. With ga I now see a light at the end of the tunnel! (Hope it's not a train heading my way

So thanks a bucket for writing Git Annex!

I am an "archiver": Every file I add to ga repo is a never-to-be-changed file (it's checksum stays same throughout eternity, only metadata keeps changin). All I need ga for atm is to tag all files. Unfortunately we are talking about few hundred thousand files and the performance with the master git-annex-6.20170519 is not quite what one might hope for.

From your design/caching_database doc I gather that the outlook with metadata is positive ( "For metadata, the story is much nicer. Querying for 30000 keys that all have a particular tag in their metadata takes 0.65s. So fast enough to be used in views." ), but is not in a db (sqlite) yet in the master (git-annex-6.20170519) . I tried to dig through some of the Links there to find out which commit could I checkout and build to try out a cached metadata, but no avail.

Since I don't ever change any file once it gets checked into the ga repo, does that simplify my possible use of current metadata cache code, or will I have to try to learn haskell and will I need to code stuff to get performance (creating views and such).

TIA for any pointers, tips and cavats and THANKS AGAIN FOR WRITING GIT-ANNEX.

ganewbie01

RSS Atom

development branches inaccessible?

To not sit idle, I've been looking for development branches (specifically the one containing code that gave the rise to Joey's claim "Querying for 30000 keys that all have a particular tag in their metadata takes 0.65s."), but could find only repos with the one branch - the master branch, which doesn't (naturally seem to) include the code for SQLite metadata tinkering.

Is there someplace I could find such development branches please?

Comment by ganewbie01 — Sun Nov 26 13:12:16 2017

Remove comment

comment 2

Did you clone the repository?

$ git clone git://git-annex.branchable.com/ git-annex

I see lots of branches (remember they are remote branches so you will need the -a flag):

$ git branch -a
* master
  remotes/origin/HEAD -> origin/master
  remotes/origin/atomic-store-test
  remotes/origin/debian
  remotes/origin/debian-jessie-backport
  remotes/origin/debian-squeeze-backport
  remotes/origin/debian-stable-security-fix
  remotes/origin/debian-wheezy-backport
  remotes/origin/ghc7.0
  remotes/origin/improved-smudge-filters
  remotes/origin/master
  remotes/origin/newwinrelease
  remotes/origin/no-direct-mode
  remotes/origin/p2p-map
  remotes/origin/setup
  remotes/origin/smudge
  remotes/origin/tweak-fetch
  remotes/origin/uuid-type-rework
  remotes/origin/winsplicehack

You can checkout one of the branches like:

$ git checkout remotes/origin/setup

Does that help?

Comment by olaf — Mon Nov 27 05:39:04 2017

Remove comment

found it! ( I think ... or should I be still looking for "database" branch? )

hi, thanks for your reply; I've spent several hours today looking through the git-annex repo. I think it was a great idea to place the forums and everything in one repo! It provides sort of a "running commentary" on what was going on and why ...

After a couple of hours looking through the repo using tig, I checked out the key commit "bb242bdd82a438ebfc937609d8d13b512cb49943" and found the foo.hs and fooes.hs files which are most likely the ones that Joey was writing about when he expressed hopes for metadata in an sqlite file. ( I didn't find a way to see "old branches" though, e.g. the one named database. Maybe if I study git more ... )

Thanks for your reply to a silly newbie question anyway! I'll study this some more and see if I have some on-topic questions (hopefully they will be more educated by then )

g'day!

Comment by ganewbie01 — Tue Nov 28 01:03:05 2017

Remove comment

comment 4

Yeah, you found the stuff. That's as far as the metadata cache idea has gotten yet. I've restored the missing "database" branch, which was just that commit you found.

I do hope to circle back around to this eventually to speed up generating views and other metadata queries.

But, as a programmer, you could create your own sqlite database and put metadata about your git-annex repository in it. Using git annex metadata --batch --json you can query git-annex for metadata about your files as fast as it can pull it out of git, and shove it into your database, and then write your own sql queries.

That would be a good first step, because working with real-world data would help develop the sql schema and see if it'll be fast enough to bother with putting into git-annex.

Comment by joey — Tue Nov 28 21:47:54 2017

Remove comment

Add a comment