I have a setup where a source repository a
is connected to a source repository b
(through SSH) which is then connected to backup repository c
(on amazon S3). I was expecting a file added on a
to be moved to c
through b
, but that doesn't seem to be happening...
I tried to reproduce with this basic setup:
[1009]anarcat@angela:g-a$ git init a Dépôt Git vide initialisé dans /home/anarcat/test/g-a/a/.git/ [1010]anarcat@angela:g-a$ git init b Dépôt Git vide initialisé dans /home/anarcat/test/g-a/b/.git/ [1011]anarcat@angela:g-a$ git init c Dépôt Git vide initialisé dans /home/anarcat/test/g-a/c/.git/ [1012]anarcat@angela:g-a$ cd a/ [1013]anarcat@angela:a$ git annex init init ok (Recording state in git...) [1014]anarcat@angela:a$ git annex group . source group . ok (Recording state in git...) [1015]anarcat@angela:a$ git annex wanted . groupwanted wanted . ok (Recording state in git...) [1036]anarcat@angela:a$ git remote add origin ../b [1016]anarcat@angela:a$ cd ../b [1025]anarcat@angela:b$ git annex init init ok (Recording state in git...) [1026]anarcat@angela:b$ git annex group . source group . ok (Recording state in git...) [1027]anarcat@angela:b$ git annex wanted . groupwanted wanted . ok (Recording state in git...) [1038]anarcat@angela:b$ git remote add origin ../c [1019]anarcat@angela:b$ cd ../c [1021]anarcat@angela:c$ git annex init init ok (Recording state in git...) [1022]anarcat@angela:c$ git annex group . backup group . ok (Recording state in git...) [1023]anarcat@angela:c$ git annex wanted . groupwanted wanted . ok (Recording state in git...) anarcat@angela:c$ cd ../a [1041]anarcat@angela:a$ git annex sync commit ok pull origin warning: no common commits remote: Décompte des objets: 11, fait. remote: Compression des objets: 100% (9/9), fait. remote: Total 11 (delta 1), reused 0 (delta 0) Dépaquetage des objets: 100% (11/11), fait. Depuis ../b * [nouvelle branche] git-annex -> origin/git-annex merge: refs/remotes/origin/master - not something we can merge merge: refs/remotes/origin/synced/master - not something we can merge failed (merging origin/git-annex into git-annex...) (Recording state in git...) (Recording state in git...) git-annex: sync: 1 failed [1042]anarcat@angela:a1$ cd ../b [1043]anarcat@angela:b$ git annex sync commit ok pull origin warning: no common commits remote: Décompte des objets: 11, fait. remote: Compression des objets: 100% (9/9), fait. remote: Total 11 (delta 1), reused 0 (delta 0) Dépaquetage des objets: 100% (11/11), fait. Depuis ../c * [nouvelle branche] git-annex -> origin/git-annex merge: refs/remotes/origin/master - not something we can merge merge: refs/remotes/origin/synced/master - not something we can merge failed (merging origin/git-annex into git-annex...) (Recording state in git...) (Recording state in git...) git-annex: sync: 1 failed [1063]anarcat@angela:b$ touch bar [1064]anarcat@angela:b$ ls bar [1065]anarcat@angela:b$ ls -al total 16K drwxr-xr-x 3 anarcat anarcat 4096 aoû 18 14:41 . drwxr-xr-x 5 anarcat anarcat 4096 aoû 18 14:33 .. lrwxrwxrwx 1 anarcat anarcat 178 aoû 18 14:41 bar -> .git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 drwxr-xr-x 9 anarcat anarcat 4096 aoû 18 14:41 .git [1066]anarcat@angela:b$ git annex sync commit ok pull origin ok push origin Décompte des objets: 26, fait. Delta compression using up to 2 threads. Compression des objets: 100% (22/22), fait. Écriture des objets: 100% (26/26), 2.47 KiB | 0 bytes/s, fait. Total 26 (delta 5), reused 0 (delta 0) To ../c * [new branch] git-annex -> synced/git-annex * [new branch] master -> synced/master ok [1067]anarcat@angela:b$ cd ../a [1068]anarcat@angela:a$ git annex sync commit ok pull origin remote: Décompte des objets: 8, fait. remote: Compression des objets: 100% (6/6), fait. remote: Total 8 (delta 1), reused 0 (delta 0) Dépaquetage des objets: 100% (8/8), fait. Depuis ../b 5d3090f..9e345e6 git-annex -> origin/git-annex * [nouvelle branche] master -> origin/master * [nouvelle branche] synced/master -> origin/synced/master Merge made by the 'recursive' strategy. bar | 1 + 1 file changed, 1 insertion(+) create mode 120000 bar Already up-to-date. ok (merging origin/git-annex into git-annex...) (Recording state in git...) (Recording state in git...) push origin Décompte des objets: 41, fait. Delta compression using up to 2 threads. Compression des objets: 100% (36/36), fait. Écriture des objets: 100% (41/41), 3.50 KiB | 0 bytes/s, fait. Total 41 (delta 20), reused 0 (delta 0) To ../b 6019ab8..368ca15 master -> synced/master * [new branch] git-annex -> synced/git-annex ok [1069]anarcat@angela:a$ touch quu^C [1069]anarcat@angela:a130$ echo foo > quux [1070]anarcat@angela:a$ cd ../b [1071]anarcat@angela:b$ ls bar foo [1072]anarcat@angela:b$ cd .. [1073]anarcat@angela:g-a$ cd a [1074]anarcat@angela:a$ git annex list here |origin ||web ||| XX_ bar XX_ foo X__ quux [1075]anarcat@angela:a$ git annex list --help git-annex: unrecognized option `--help' Usage: git-annex list [PATH ...] [option ...] --allrepos show all repositories, not only remotes To see additional options common to all commands, run: git annex help options [1076]anarcat@angela:a1$ git annex list --allrepos here- |origin ||web |||anarcat@angela:~/test/g-a/c |||| XX__ bar XX__ foo X___ quux
why don't the files get copied over to the backup repo by the assistant?
i somewhat understand that files don't get sent from a
to b
, but why doesn't the assistant copy the files from b
to c
?
i have tried using required
instead of wanted
and it doesn't work much better.
tested with 5.20150610+gitg608172f-1~ndall+1
(prod) and 5.20141125
(the above test). --anarcat
It's easy to see why a file is not copied from source repo A to source repo B: The preferred content expression for a source repo is "not (copies=1)", so a source repo will not want to get any files from any other repo.
You probably want to make the central repo use transfer; that's basically what it's for. Note that a transfer repo hangs onto file content until it reaches all client repos. So you might want to change the preferred content expression to refer to backup repos. Something as simple as this might work:
I don't see any reason why a file wouldn't move from B to backup repo C, but I don't see anything in your transcript that shows that not happening either. Your transcript doesn't actually show running any git-annex commands that move file contents at all; no git annex copy/move, and the sync is not run with --content...
the assistant is runnin on repos a and b. i was expecting it to move the files automatically.
about repo groups, if i understand correctly, it should be:
a
: sourceb
: transfer (or not inallgroup=backup?)c
: backupis that correct?
Your transcript doesn't show the assistant running. It's not clear that, in your live deployment, any files ever got to repo B for the assistant there to deal with.
Repos such as B need a preferred content expression like the one I gave. It doesn't matter a whole lot what group is used for them; overriding the groupwanted for the transfer group to use that expression would be one way to do it.
darn, you're right... then i screwed up my copy-paste. the assistant was running on a and b, that i am sure of.
here's a more complete transcript (hopefully):
now it seems that setting repo
b
in the transfer group helped, but the files didn't get purged froma
(orb
, for that matter).setting the central
b
wanted expression seems to help in dropping the file froma
, but not fromb
:i think i'm almost getting this now.
You have B configured as a regular transfer repo, so it only wants to drop files once they have reached some client repos. Settings its preferred content to "not inallgroup=backup" should fix that.
Also, the assistant can notice some configuration changes that are made while it's running, but maybe not all of them, and maybe it won't rescan files to transfer right away after such a change is made. I'd use
git annex sync --content
to test such changes, and save the assistant for once I have a working setup.right, i thought as well that the assistant wouldn't pickup some stuff...
but
sync --content
doesn't behave as expected either:files are still not removed from
a
and a few of them were dropped fromb
, but not all of them. but worse, one file still isn't sent to the backup serverc
at all.I'm afraid I don't have time to continue to read and try to debug transcripts of this being set up incorrectly in various ways.
So, here's a transcript of the configuration I described, which seems to be working as I'd expect it to work:
Now observe sync moving the file from A thru B to C:
Er, the 'A' remote in 'B' was unnecessary since A is origin. But otherwise, I think that's what you asked for.. HTH.
well, this is not exactly the topology i have here.
you setup a topology like this:
X <- Y
meansX is a remote of Y
.My topology is:
So B can't directly manage objects from A. It can receive objects from it, but that's it.
So here's a clearer transcript of the session, using the same semantics you have been using for the different repos, but with the remotes setup differently, as I describe above:
And of course now, it actually works fine, with
sync --content
:Now, that's all great for
sync --content
- but what about the assistant?so from
A
's perspective, it looks like the file didn't migrate properly. but it actually did!how long does it take for the assistant to start syncs like this? are those timers user-accessible somehow? this problem sure looks like sync-git-annex branch not syncing in the assistant - but maybe i'm confused there as well.
anyways, it does seem like content actually does gets synced around properly by the assistant. i'll try to deploy the new preferred content expression in production and report back here.
and sorry for the noisy pastes and hand-holding, but i was thoroughly confused by this one. i thought i had a good grasp on preferred content and all that, but it seems i was wrong...
so short version here: thanks for your help and i figured it, there were problems with the S3 credentials perms not respecting
sharedRepository
, multiple group support confusion, bare/non-bare, groupwanted vs standard confusion and so on... now the files are migrating properly in production. hurray! i believe documentation could be improved, and i have questions about timeouts, below.so one thing that was definitely broken in production was that, on the central server, the S3 credentials were accessible only to the user that ran the "enable s3" command (that is
anarcat
). it was not accessible to the user actually running the assistant (thegit-annex
sandbox account), which made it simply impossible for the assistant to upload files to s3. so maybe one problem here is that the.git/annex/creds
file do not respect thecore.sharedRepository = group
setting i have there...another problem in production, of course, was the transfer preferred group setting. once changed to
not inallgroup=backup
, things were better... but it was still keeping too many files. the problem then was that the repo was originally set to the source group (PEBKAC here again, sorry) and then it was added to the standard group, with thegit annex group . standard
command. i didn't expect that: i expected the group command to change the group, not to add to it. the documentation (git-annex-group) is clear enough, however, so this is yet another PEBKAC.Another possible problem is that the central server (
b
) is not a bare git repository. I am not sure why it was setup that way... maybe i was worried about bare repo suport and past experiences, or concerns about being able to actually interact with the files directly on the central server. i had to dogit co -b master synced/master
to eventually see the files locally, and this helped a little in diagnosing all of this.A problem remained after that though: files are still not removed from
A
in my tests in production. i don't understand why:A
is setup as a source repository:yet some files have more than one copy:
Indeed, it doesn't sem to want to drop that local file:
It turns out that I had documentation wrong again about that as well: i was using groupwanted as a
wanted
expression, but the groups were standard groups, so git-annex was just failing to use the proper content expression. setting thewanted
expression back to standard (yes, again as you showed, sorry about that) fixed the problem:hurray!
again, i apologise for all the hand holding here... as a software developper myself, i understand how frustrating it can be to try to make users come out of their cave of ignorance and see the light of day...
but i do feel there could be more work done to clarify how all of this works. i will certainly try to give a few kicks in the preferred content section and maybe this forum post will be helpful for future endeavors... or maybe just write up a new tips page about such hairy setups?
the questions that remain here for me are:
thanks so much for helping us through this, it's really appreciated!
I'm not comfortable making core.sharedrepository settings affect creds files. You don't normally want to give out S3 creds to other users in a unix group. And in the "everybody" case, it certianly seems entirely wrong to make the creds files world-readable. Willing to live with a little inconsistency here in order to not blow user's bank accounts off. It would be good to document it somewhere.
Your paste seems to show A as being in the "sourcethis" group, not the "source" group. I don't know what that means.
The assistant should notice config changes to the git-annex branch within 60 seconds of them being received. Syncs happen when changes are detected, or every 30 minutes.