Hi joey,
While writing a more complex preferred content expression today I noticed that a onlyingroup=GROUPNAME
condition would be handy.
Consider a sneakernet where clumps of nodes can communicate quickly with each other but the inter-clump communication is not so great. This is the case for my setup where a raspberry pi in the field has a USB stick attached (so two repos in field
group), but their uplink is slow mobile data. Then I have an offsite-backup
group for other disks/servers that I have access to.
When I sync content from within an offsite-backup
repo, I don't want it to copy data via the slow mobile data - this I do via sneakernet by swapping the USB drive in the field from time to time. Not setting up the field
repos here is possible but prevents syncing of metadata, which is not ideal.
If I could do git annex groupwanted offsite-backup 'anything and not onlyingroup=field'
, the preferred content expression would be quite consise and short. onlyingroup
would match if no other group has the file in question, only the given group. This would extend the functionality already provided by inallgroup
.
Without onlyingroup
, one has to basically hard-code the amount of repos in the group with a rather complex expression ( 2 in this case, the Pi and the USB drive):
(anything and not copies=field:1 or copies=untrusted+:2) or (anything and not copies=field:2 or copies=untrusted+:3)
One also has to specifically catch each combination case where files between the field
group are (only on the Pi? Only on the USB drive? on both?). The copies=untrusted+:2
is then needed to allow copying other files that are around (e.g. because they were indeed copied over mobile data) but still in the field
group.
Maybe you see an easier way, but this is what I came up with. I think an onlyingroup
expression would be very helpful.
Cheers, Yann
onlyingroup
group would also be useful to match (and ignore from standard sync operations) files that are only available on hard-to-reachoffline-backup
repos, say offline backup drives or those in a safe. Normal repos can wantinclude=* and not onlyingroup=offline-backup
and then won't bother with unreachable files. This is an alternative to thearchive/
directory, which might be undesired or impractical to use (original file location is 'lost' when moving files into the archive).This seems like a pretty good idea. In particular, I think that the
not onlyingroup=offline-backup
example is persuasive. Thefield
group example might be able to be dealt with by having acollected
group and usingcopies=collected:1
in repositories that only want a copy of data once it's been collected from the field.Hmm, one problem is that if you have 2 groups that you are using onlyingroup with, eg
(not onlyingroup=offline-backup) and (not onlyingroup=offline-archive)
. A file that is in two repositories, one for each group, will not match.One way to address that would be something like
onlyingroups=offline-backup,offline-archive
which would match on presence in any listed groups. I don't really like that this has its own ANDing happening inside a term of a preferred content expression though.What is onlyingroup supposed to do when a repository is in the specified group, but also in another group? My first reaction was that it should never match when content is in such a repository, since the content is indeed present in another group. But that doesn't really seem useful.
If it treated a repository that's in the specified group, but also is in other groups, as matching, then it would be possible to put a repository in groups offline and backup and match on
not onlyingroup=offline
. I like this better than theonlyingroups=offline-backup,offline-archive
because it's simpler and more composable.Another way to look at this is that preffered content allows expressing
copies=offline:1
, but there's no way to express a match on the number of copies that are in repositories that are not in the offline group.That might be expressed as say,
copies=!offline:1
Then
not copies=!offline:0
would be useful to only want files that are available in some repository that is not offline.This would avoid the question of what to do with onlyingroup when a repository is in the specified group but also another one.
I really don't like the
copies=!offline:1
syntax though. (I already disliked copies=group:n` and this takes it in an even worse direction.)So while this is more generic, I kind of prefer the onlyingroup name..