git annex sync update remote

I have a repo on my cluster. On my system I cloned it by

[my_system] $ git clone ssh://jed//home/tdegeus/tmp/mydata cluster

This works fine to get files from the cluster.

However, I also want to send files from my system to the cluster and to be able to drop files on the cluster. The problem is, however, that I have no stable IP to SSH to my system, so I don't have a good way to add my system as remote on the cluster. I thought that running

[my_system] $ git annex sync

from my laptop would be enough. But that does not seems to sufficiently update the repo on the cluster.

From example on the cluster adding some files

[ssh->jed] $ git annex add foo.h5
[ssh->jed] $ git commit -m "my message"

and then downloading on my system

[my_system] $ git annex sync
[my_system] $ git annex get foo.h5
[my_system] $ git annex sync

works fine.

But then back on the cluster running

[ssh->jed] $ git annex drop foo.h5

results, erroneously, in

foo.h5 a.h5 (unsafe)
  Could only verify the existence of 0 out of 1 necessary copy

  Maybe add some of these git remotes (git remote add ...):
    ee3f4fc7-db8f-4c45-8c40-92e96d046999 -- tdegeus:~/Downloads/annex/cluster

  (Use --force to override this check, or adjust numcopies.)

RSS Atom

comment 1

Your repo you're executing git annex from needs to have direct access to the remote in question. There's no relaying.

If just the changing IP address is the problem, consider using a DynDNS service like https://freends.afraid.org, it's free and works very well.

Comment by nobodyinperson — Sat Apr 29 14:46:46 2023

Remove comment

What is the underlying reason for needing a two-way SSH connection

Thanks for the tip.

I wonder what the underlying reason is for needing a two-way remote. It would have been nice if for example

[my_system] $ git annex sync --force

would update the state of all available remotes. In that the only functionality that would not be available in the absence of a two-way remote is

[ssh->jed] $ git annex get myfile

But if one can live with that limitation it would simplify the setup and increase functionality. I do believe that it would also relax the git-annex dependency of the remote somewhat. Potentially, it would then only be needed at runtime on the remote. Not at runtime on the local system. That too would simplify the setup a lot (one can then use git-annex for a virtual environment that is not forced as the default base).

Comment by tom — Mon May 1 08:34:46 2023

Remove comment

git-annex relaying syncs across remotes

It does update the state of all available remotes of the my_system you're executing git-annex from. I would also love to see git-annex propagate the changes across the graph of connected remotes as far as it gets. It already has permissions to run git-annex on the remote server, so it could trigger a git annex sync from there, then reaching out for all configured remotes it has configured itself. But I'm sure @joey has a good reason not to do this, it screams of potential race conditions and git locking weirdness. But I don't know 🤷

Comment by nobodyinperson — Mon May 1 08:50:10 2023

Remove comment

comment 4

One way is to run this on your system:

git-annex drop foo.h5 --from cluster

If you really have a reason to want to run git-annex drop on the cluster itself, it needs to be able to connect back to your system in order to to verify that your system still has a copy of the file it's dropping.

Imagine what would happen if you ran git-annex drop of the same file on your system and on the cluster at the same time, if that check were not done. The file would be dropped from both places and so you'd lose data.

You can use git-annex drop --force on the cluster if you're sure that your system still has the content. But it's probably going to be better to just use git-annex drop --from cluster run on your system, which avoids the potential for a mistake causing data loss.

Comment by joey — Mon May 1 17:02:17 2023

Remove comment

Add a comment