Hi,
I'm trying to get my head around groups, wanted, etc. for a particular use case.
Problem: I can't work out how to get a source(?) repository to automatically drop files when they hit a transfer repository.
I have a machine (Machine 1
) that is used for data acquisition but it is behind a strict firewall (both physical and virtual). I usually physically carry a USB drive over, set up a rsync ssh -> local-USB-drive from the one machine (Machine 2
) that is able to connect over the network to Machine 1
. As it is a pain to lug the drive over, I only do this rsync maybe weekly, so the rsync takes many hours (~24) to complete. Then (when I remember) I visit and I carry the USB drive back... Naturally, this slows down my work process.
What I was hoping to do was set up git-annex with the assistant to help me. I am able to run the assistant, but not the webapp on Machines 1 and 2
.
My thought was - as these have to be disconnected network transfers...
Repository 1 -> Repository 2
(when space permits)Repository 2 -> Repository 3
(when space permits)-> Repository 4
(USB drive(s))
Another limitation is that Repos/Machines 2 & 3
have limited storage space.
As a test case I can set up (Repo1 -> Repo2
) and (Repo2 -> Repo3
) (on other machines, but the commands should be the same...)
After reading a bit I made a changed preferred content for a transfer repo to:
not (inallgroup=client and copies=client:1) and ($client)
i.e. copies
from 2
to 1
.
Finally...The question
BUT I can't work out how to get Repo1
(the source) to automatically drop the files when they hit Repo2
(what I'm guessing should be a transfer repository).
Can anyone suggest how to automagically do this with the assistant?
If it would help I can share the git-annex commands I've been using, but as I'm only doing testing up at the moment, I'm happy to start from scratch if there is a RTFM page out there.
I've put some details about my thoughts on the repositories and restrictions below.
Thanks - Olaf
Repository 1
- Type: source (Data collection)
- Human readable directory structure
- Physically: Machine 1
- Strict firewall only incoming network connections from Machine 2
- Storage: 50Gb
Repository 2
- Type: transfer
- Physically: Machine 2
- Reasonably relaxed firewall, can talk to Repository 3
- Limited storage: 10Gb
Repository 3
- Type: transfer
- Pysically: Machine 3
- Reasonably relaxed firewall, can talk to Repository 2
- Limited storage: 10Gb
- Connected to USB drive(s)
Repository 4, 5, ...
- Type: ? Client ?
- Human readable directory structure
- Physically: USB drive
- Usually (but not always) connected to machine 3
- Large storage (2Tb) + Additional drives
Sounds like an interesting data-gathering application, I have to say I'm curious what it is.
If Repo1 is configured like this:
Then it should want to drop the contents of files from Repo1 once it knows they have reached any other repository. (Sometimes people put a repository in a group but forget to set wanted to "standard" ...)
Looks like Repo1 cannot make outgoing connections to Repo2?
So, you need to run the assistant on Repo2 and probably on Repo1. Then it works like this:
Note that if you get the files added and committed by some other process, you don't really need to run the assistant on Repo1.
The USB drives need to be client, so that once content reaches one of them, the content will be dropped from the transfer repositories. The way that part should work:
If you're having trouble getting any of this to work, I recommend running
git annex sync --content
manually while testing it, and make sure it does what you would expect to happen at each step.<ASIDE> Laughing I wish the data collection was interesting. Just a machine with locked down networking so that it's a tad more secure. Closest example is an EC2 instance sitting in a VPC... </ASIDE>
Thanks for the feedback Joey. I tried again and this approach works except where the downstream repo is direct mode - in this case the transfer repo doesn't drop. I'm not sure if it's a bug or a feature I didn't read about.
Happy to submit a bug report with steps if it's unexpected.
Oops. I committed the greatest sin...
I would not expect the behavior with a direct mode repo to be any different. Unless the file in direct mode is getting modified, or perhaps git-annex gets confused and thinks it's modified. Either would prevent dropping the content from the transfer repository to avoid losing the unmodified content.