I have a setup where a source repository a is connected to a source repository b (through SSH) which is then connected to backup repository c (on amazon S3). I was expecting a file added on a to be moved to c through b, but that doesn't seem to be happening...
I tried to reproduce with this basic setup:
[1009]anarcat@angela:g-a$ git init a
Dépôt Git vide initialisé dans /home/anarcat/test/g-a/a/.git/
[1010]anarcat@angela:g-a$ git init b
Dépôt Git vide initialisé dans /home/anarcat/test/g-a/b/.git/
[1011]anarcat@angela:g-a$ git init c
Dépôt Git vide initialisé dans /home/anarcat/test/g-a/c/.git/
[1012]anarcat@angela:g-a$ cd a/
[1013]anarcat@angela:a$ git annex init
init ok
(Recording state in git...)
[1014]anarcat@angela:a$ git annex group . source
group . ok
(Recording state in git...)
[1015]anarcat@angela:a$ git annex wanted . groupwanted
wanted . ok
(Recording state in git...)
[1036]anarcat@angela:a$ git remote add origin ../b
[1016]anarcat@angela:a$ cd ../b
[1025]anarcat@angela:b$ git annex init
init ok
(Recording state in git...)
[1026]anarcat@angela:b$ git annex group . source
group . ok
(Recording state in git...)
[1027]anarcat@angela:b$ git annex wanted . groupwanted
wanted . ok
(Recording state in git...)
[1038]anarcat@angela:b$ git remote add origin ../c
[1019]anarcat@angela:b$ cd ../c
[1021]anarcat@angela:c$ git annex init
init ok
(Recording state in git...)
[1022]anarcat@angela:c$ git annex group . backup
group . ok
(Recording state in git...)
[1023]anarcat@angela:c$ git annex wanted . groupwanted
wanted . ok
(Recording state in git...)
anarcat@angela:c$ cd ../a
[1041]anarcat@angela:a$ git annex sync
commit ok
pull origin
warning: no common commits
remote: Décompte des objets: 11, fait.
remote: Compression des objets: 100% (9/9), fait.
remote: Total 11 (delta 1), reused 0 (delta 0)
Dépaquetage des objets: 100% (11/11), fait.
Depuis ../b
* [nouvelle branche] git-annex -> origin/git-annex
merge: refs/remotes/origin/master - not something we can merge
merge: refs/remotes/origin/synced/master - not something we can merge
failed
(merging origin/git-annex into git-annex...)
(Recording state in git...)
(Recording state in git...)
git-annex: sync: 1 failed
[1042]anarcat@angela:a1$ cd ../b
[1043]anarcat@angela:b$ git annex sync
commit ok
pull origin
warning: no common commits
remote: Décompte des objets: 11, fait.
remote: Compression des objets: 100% (9/9), fait.
remote: Total 11 (delta 1), reused 0 (delta 0)
Dépaquetage des objets: 100% (11/11), fait.
Depuis ../c
* [nouvelle branche] git-annex -> origin/git-annex
merge: refs/remotes/origin/master - not something we can merge
merge: refs/remotes/origin/synced/master - not something we can merge
failed
(merging origin/git-annex into git-annex...)
(Recording state in git...)
(Recording state in git...)
git-annex: sync: 1 failed
[1063]anarcat@angela:b$ touch bar
[1064]anarcat@angela:b$ ls
bar
[1065]anarcat@angela:b$ ls -al
total 16K
drwxr-xr-x 3 anarcat anarcat 4096 aoû 18 14:41 .
drwxr-xr-x 5 anarcat anarcat 4096 aoû 18 14:33 ..
lrwxrwxrwx 1 anarcat anarcat 178 aoû 18 14:41 bar -> .git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
drwxr-xr-x 9 anarcat anarcat 4096 aoû 18 14:41 .git
[1066]anarcat@angela:b$ git annex sync
commit ok
pull origin
ok
push origin
Décompte des objets: 26, fait.
Delta compression using up to 2 threads.
Compression des objets: 100% (22/22), fait.
Écriture des objets: 100% (26/26), 2.47 KiB | 0 bytes/s, fait.
Total 26 (delta 5), reused 0 (delta 0)
To ../c
* [new branch] git-annex -> synced/git-annex
* [new branch] master -> synced/master
ok
[1067]anarcat@angela:b$ cd ../a
[1068]anarcat@angela:a$ git annex sync
commit ok
pull origin
remote: Décompte des objets: 8, fait.
remote: Compression des objets: 100% (6/6), fait.
remote: Total 8 (delta 1), reused 0 (delta 0)
Dépaquetage des objets: 100% (8/8), fait.
Depuis ../b
5d3090f..9e345e6 git-annex -> origin/git-annex
* [nouvelle branche] master -> origin/master
* [nouvelle branche] synced/master -> origin/synced/master
Merge made by the 'recursive' strategy.
bar | 1 +
1 file changed, 1 insertion(+)
create mode 120000 bar
Already up-to-date.
ok
(merging origin/git-annex into git-annex...)
(Recording state in git...)
(Recording state in git...)
push origin
Décompte des objets: 41, fait.
Delta compression using up to 2 threads.
Compression des objets: 100% (36/36), fait.
Écriture des objets: 100% (41/41), 3.50 KiB | 0 bytes/s, fait.
Total 41 (delta 20), reused 0 (delta 0)
To ../b
6019ab8..368ca15 master -> synced/master
* [new branch] git-annex -> synced/git-annex
ok
[1069]anarcat@angela:a$ touch quu^C
[1069]anarcat@angela:a130$ echo foo > quux
[1070]anarcat@angela:a$ cd ../b
[1071]anarcat@angela:b$ ls
bar foo
[1072]anarcat@angela:b$ cd ..
[1073]anarcat@angela:g-a$ cd a
[1074]anarcat@angela:a$ git annex list
here
|origin
||web
|||
XX_ bar
XX_ foo
X__ quux
[1075]anarcat@angela:a$ git annex list --help
git-annex: unrecognized option `--help'
Usage: git-annex list [PATH ...] [option ...]
--allrepos show all repositories, not only remotes
To see additional options common to all commands, run: git annex help options
[1076]anarcat@angela:a1$ git annex list --allrepos
here-
|origin
||web
|||anarcat@angela:~/test/g-a/c
||||
XX__ bar
XX__ foo
X___ quux
why don't the files get copied over to the backup repo by the assistant?
i somewhat understand that files don't get sent from a to b, but why doesn't the assistant copy the files from b to c?
i have tried using required instead of wanted and it doesn't work much better.
tested with 5.20150610+gitg608172f-1~ndall+1 (prod) and 5.20141125 (the above test). --anarcat

It's easy to see why a file is not copied from source repo A to source repo B: The preferred content expression for a source repo is "not (copies=1)", so a source repo will not want to get any files from any other repo.
You probably want to make the central repo use transfer; that's basically what it's for. Note that a transfer repo hangs onto file content until it reaches all client repos. So you might want to change the preferred content expression to refer to backup repos. Something as simple as this might work:
I don't see any reason why a file wouldn't move from B to backup repo C, but I don't see anything in your transcript that shows that not happening either. Your transcript doesn't actually show running any git-annex commands that move file contents at all; no git annex copy/move, and the sync is not run with --content...
the assistant is runnin on repos a and b. i was expecting it to move the files automatically.
about repo groups, if i understand correctly, it should be:
a: sourceb: transfer (or not inallgroup=backup?)c: backupis that correct?
Your transcript doesn't show the assistant running. It's not clear that, in your live deployment, any files ever got to repo B for the assistant there to deal with.
Repos such as B need a preferred content expression like the one I gave. It doesn't matter a whole lot what group is used for them; overriding the groupwanted for the transfer group to use that expression would be one way to do it.
darn, you're right... then i screwed up my copy-paste.
the assistant was running on a and b, that i am sure of.
here's a more complete transcript (hopefully):
now it seems that setting repo
bin the transfer group helped, but the files didn't get purged froma(orb, for that matter).setting the central
bwanted expression seems to help in dropping the file froma, but not fromb:i think i'm almost getting this now.
You have B configured as a regular transfer repo, so it only wants to drop files once they have reached some client repos. Settings its preferred content to "not inallgroup=backup" should fix that.
Also, the assistant can notice some configuration changes that are made while it's running, but maybe not all of them, and maybe it won't rescan files to transfer right away after such a change is made. I'd use
git annex sync --contentto test such changes, and save the assistant for once I have a working setup.right, i thought as well that the assistant wouldn't pickup some stuff...
but
sync --contentdoesn't behave as expected either:files are still not removed from
aand a few of them were dropped fromb, but not all of them. but worse, one file still isn't sent to the backup servercat all.I'm afraid I don't have time to continue to read and try to debug transcripts of this being set up incorrectly in various ways.
So, here's a transcript of the configuration I described, which seems to be working as I'd expect it to work:
Now observe sync moving the file from A thru B to C:
Er, the 'A' remote in 'B' was unnecessary since A is origin. But otherwise, I think that's what you asked for.. HTH.
well, this is not exactly the topology i have here.
you setup a topology like this:
X <- YmeansX is a remote of Y.My topology is:
So B can't directly manage objects from A. It can receive objects from it, but that's it.
So here's a clearer transcript of the session, using the same semantics you have been using for the different repos, but with the remotes setup differently, as I describe above:
And of course now, it actually works fine, with
sync --content:Now, that's all great for
sync --content- but what about the assistant?so from
A's perspective, it looks like the file didn't migrate properly. but it actually did!how long does it take for the assistant to start syncs like this? are those timers user-accessible somehow? this problem sure looks like sync-git-annex branch not syncing in the assistant - but maybe i'm confused there as well.
anyways, it does seem like content actually does gets synced around properly by the assistant. i'll try to deploy the new preferred content expression in production and report back here.
and sorry for the noisy pastes and hand-holding, but i was thoroughly confused by this one. i thought i had a good grasp on preferred content and all that, but it seems i was wrong...
so short version here: thanks for your help and i figured it, there were problems with the S3 credentials perms not respecting
sharedRepository, multiple group support confusion, bare/non-bare, groupwanted vs standard confusion and so on... now the files are migrating properly in production. hurray! i believe documentation could be improved, and i have questions about timeouts, below.so one thing that was definitely broken in production was that, on the central server, the S3 credentials were accessible only to the user that ran the "enable s3" command (that is
anarcat). it was not accessible to the user actually running the assistant (thegit-annexsandbox account), which made it simply impossible for the assistant to upload files to s3. so maybe one problem here is that the.git/annex/credsfile do not respect thecore.sharedRepository = groupsetting i have there...another problem in production, of course, was the transfer preferred group setting. once changed to
not inallgroup=backup, things were better... but it was still keeping too many files. the problem then was that the repo was originally set to the source group (PEBKAC here again, sorry) and then it was added to the standard group, with thegit annex group . standardcommand. i didn't expect that: i expected the group command to change the group, not to add to it. the documentation (git-annex-group) is clear enough, however, so this is yet another PEBKAC.Another possible problem is that the central server (
b) is not a bare git repository. I am not sure why it was setup that way... maybe i was worried about bare repo suport and past experiences, or concerns about being able to actually interact with the files directly on the central server. i had to dogit co -b master synced/masterto eventually see the files locally, and this helped a little in diagnosing all of this.A problem remained after that though: files are still not removed from
Ain my tests in production. i don't understand why:Ais setup as a source repository:yet some files have more than one copy:
Indeed, it doesn't sem to want to drop that local file:
It turns out that I had documentation wrong again about that as well: i was using groupwanted as a
wantedexpression, but the groups were standard groups, so git-annex was just failing to use the proper content expression. setting thewantedexpression back to standard (yes, again as you showed, sorry about that) fixed the problem:hurray!
again, i apologise for all the hand holding here... as a software developper myself, i understand how frustrating it can be to try to make users come out of their cave of ignorance and see the light of day...
but i do feel there could be more work done to clarify how all of this works. i will certainly try to give a few kicks in the preferred content section and maybe this forum post will be helpful for future endeavors... or maybe just write up a new tips page about such hairy setups?
the questions that remain here for me are:
thanks so much for helping us through this, it's really appreciated!
I'm not comfortable making core.sharedrepository settings affect creds files. You don't normally want to give out S3 creds to other users in a unix group. And in the "everybody" case, it certianly seems entirely wrong to make the creds files world-readable. Willing to live with a little inconsistency here in order to not blow user's bank accounts off. It would be good to document it somewhere.
Your paste seems to show A as being in the "sourcethis" group, not the "source" group. I don't know what that means.
The assistant should notice config changes to the git-annex branch within 60 seconds of them being received. Syncs happen when changes are detected, or every 30 minutes.