Started implementing transfer control. Although I'm currently calling the configuration for it "preferred content expressions". (What a mouthful!)
I was mostly able to reuse the Limit code (used to handle parameters like --not --in otherrepo), so it can already build Matchers for preferred content expressions in my little Domain Specific Language.
Preferred content expressions can be edited with git annex vicfg
, which
checks that they parse properly.
The plan is that the first place to use them is not going to be inside the
assistant, but in commands that use the --auto
parameter, which will use
them as an additional constraint, in addition to the numcopies setting
already used. Once I get it working there, I'll add it to the assistant.
Let's say a repo has a preferred content setting of "(not copies=trusted:2) and (not in=usbdrive)"
git annex get --auto
will get files that have less than 2 trusted copies, and are not in the usb drive.git annex drop --auto
will drop files that have 2 or more trusted copies, and are not in the usb drive (assuming numcopies allows dropping them of course).git annex copy --auto --to thatrepo
run from another repo will only copy files that have less than 2 trusted copies. (And if that was run on the usb drive, it'd never copy anything!)
There is a complication here.. What if the repo with that preferred content setting is itself trusted? Then when it gets a file, its number of trusted copies increases, which will make it be dropped again.
This is a nuance that the numcopies code already deals with, but it's much harder to deal with it in these complicated expressions. I need to think about this; the three ideas I'm working on are:
- Leave it to whoever/whatever writes these expressions to write ones that avoid such problems. Which is ok if I'm the only one writing pre-canned ones, in practice..
- Transform expressions into ones that avoid such problems. (For example, replace "not copies=trusted:2" with "not (copies=trusted:2 or (in=here and trusted=here and copies=trusted:3))"
- Have some of the commands (mostly drop I think) pretend the drop has already happened, and check if it'd then want to get the file back again.
Nice. Would (not in=here) be the simplest paradoxical expression?
Is just disregarding the target repo completely during checks a possibility? This would interpret (not copies=trusted:X) as "not in X other trusted repositories, no matter whether we are trusted or not", and (not in=here) just as "true". I think this should generally arrive at the same results as the option 2., but by definition of the expression meaing, not by rewriting.
Alternative 3 (or is my wording different enough to be 3a?) - check that the invariant "we have all the known files matching our PCE and only these files" would hold after an operation before actually performing it - could be bistable if done both for gets and drops:
This is not necessarily bad. Checking just for drops should be monostable, I guess, but doesn't it look a bit arbitrary? (Though it would be again equivalent to option 2, wouldn't it? So maybe not that arbitrary.)
Yes, I think checking the future only for drops is both stable and equivilant to the other choices.
Disregarding the target solves the problem for the current set of expressions. There may be future expressions or operations where that does not hold. For example, if move supported --auto (which it does not), you'd need to disregard both sides.
That method would make it impossible to do some possibly useful things. "in=here or (not copies=3)"
The real problem with it is that existing options like --copies and --in already take all repos into account, so this would potentally lead to two divergent DSLs being used by git-annex, which would probably be confusing.