Recent comments posted to this site:

Many thanks for the reply. I did not add too much detail to my post as it was my first, and I was a bit shy.

I did do all of the steps indicated re v6. The binary has been fine for me also.

My question is a bit more complex as I am looking to apply v6 at some point to a set of distributed repos. Essentially it has the typical duplicated system topology: 2 remotes, several clients, each of which is linked to both remotes. Thus it will survive multiple failure scenarios.

I would be looking at the strategy of the order of repo upgrade. Say, a client first, and run a set of tests. And so on. Remotes last perhaps.

In any case I built a test bed as above and have been experimenting with a single client at a v6 repo.

So far it has been smooth, but right now I seem to be stuck where thinning is not having any effect: I still have a particular big file (1.5GB) in both the annex and the working dir. Lock remove the big file from the working dir, and leaves a link, and unlock restores the big file in the working dir. Yet, the annex still contains the big file no matter what. I was under the expectation that one of the big file copies would go away.

Comment by Stan Tue Jun 28 23:10:07 2016
I always backup my git-annex repos using 'git bundle create ... --all' which is similar to a git clone, but much more convenient for backups, because it is just a single file :).
Comment by sts Mon Jun 27 19:36:58 2016

Good questions. The setup is essentially a star topology where the central node is the full-backup, bare git repository on the external drive on the server. It's not encrypted, if that matters. Since it's a bare repo, nothing actually changes in that node except when changes are pushed from elsewhere.

Every other node is configured as a client and is connected to that repo on the server via SSH. When a change is made to one of the client repos, syncing is happening automatically and nearly instantaneously--not just to the server repo, but the changes are very quickly propagated to the other SSH-connected clients. (I think this is supposed to happen for SSH repos now without using XMPP. In the release notes for version 5.20140421: "This release begins to deprecate XMPP support. In particular, if you use the assistant with a ssh remote that has this version of git-annex installed, you don't need XMPP any longer to get immediate syncing of changes.")

But then there's this one other node in the star topology---the client repo (non-bare) that is actually on the same system as the bare, backup repo. When changes are made to the client, they are synced immediately to the backup repo, and those changes are quickly synced to the other client repos on other computers connected by SSH. The problem is changes in the other direction; when a client repo on a remote system is changed, the updates are synced to the backup repo on the server, but no farther. The webapp running on the server never (or only very slowly) syncs those changes to that client repo that is on the same system. The webapp on the server is running with that local client repo as the "main" repo (I mean, the upper right hand corner says "Repository: ~/doc", which is the client repo). All of the status messages are green and say that the repos are synced. You can, of course, force a sync from there, and and then the changes are noticed and propagated to the client.

I did some testing and it seems like repositories with non-bare remotes on the same filesystem are synced immediately using inotify or something. Maybe this just doesn't happen for bare repos? That's the variable that seems to make syncing not happen automatically, in my limited testing. I can test more to isolate exactly when it happens, but I was hoping Joey would know off the top of his head if this should be working or not.

Comment by Don Fri Jun 24 07:16:20 2016

And the other repositories (which are syncing correctly), they are also marked as client? Clients should sync with each other whenever they can connect to each other, but perhaps your client repositories cannot connect to each other?

Here is a comment from Joey about a similar setup. He says you might need to set up an XMPP account for the various clients to message each other when they have new changes? But that was a long time ago. And anyways it seems like the webapp would be doing this automatically (yet in your post you say the computers do not use XMPP?).

I think everyone would benefit from more info still. For example, what does the webapp look like when you are syncing this repo? Are all the clients and the backup on the same page? Is each client only combined to the backup itself and no other repos? What kind of remote repos are the other computers?

Hope you don't mind the questions, I'm also curious about this issue myself. Thanks for bringing it up.

Comment by Farhan Fri Jun 24 06:35:10 2016
The bare repo on the external drive is configured as full backup and the repo at ~/doc is client.
Comment by Don Fri Jun 24 05:40:23 2016

I would like to help (though I have not enough time to fully make the feature for sure).

With regards to:

1) It seems that there is a library which implements an SSH server on Android. I see it at this repo as well as at this repo, both of which are owned by the makers of popular sshd apps on the Google Play Store

2) Do you mean that that library needs to be ported to Android/Java? If so, I could take a crack at it!

Comment by Farhan Fri Jun 24 03:22:38 2016
Although I believe Joey's analysis is correct, I want to interject a caution: while investigating this issue on my own system, I discovered that a gcrypt remote has an unencrypted pack file and unencrypted index, which somehow contained actual data of mine. I would consider it a git-annex bug that data (file contents) wound up in a pack file (this is a git-annex assistant repo) but a gcrypt bug that it made it there unencrypted.
Comment by jgoerzen Thu Jun 23 21:24:49 2016

since you haven't explicitly mentionned it, i'll just mention a pointer to the upgrades page which has specific instructions regarding upgrading repositories to the v6 layout.

as far as I know, you can just upgrade the git-annex program itself harmlessly. it will not upgrade your repositories without you deliberately running git-annex-upgrade.

then git-annex-upgrade will operate changes, but only on direct mode repositories, as far as i can tell. those repositories are switched to adjusted branches, which have their own set of issues right name, mostly limitations with the webapp but have also seen a few bugfixes on crippled filesystems recently.

i am personnally doing the "wait and see" game especially considering my specific use case for v6 (hide missing files) has not been completely implemented yet. but I am of course using 6.x binaries without problems.

i am not the git-annex author, so take my comments with a grain of salt of course. :) --anarcat

Comment by anarcat Thu Jun 23 14:44:18 2016

there are two related issues that were closed as wontfix here, which i missed in my original search as well:

  • ?clear file names in special remotes (archive)
  • ?New special remote suggeston - clean directory (archive)

those issues were actually removed, but are still on the internet archive for future reference.

to summarize those issues, the rationale there is that those remotes are potentially destructuve (lack of locking and checks) and have workarounds (rsync -a was suggested). it is also clearly stated that this is contrary to the git-annex design and is a "pony" feature.

i would counter that this is an often requested feature that seems to be a major usability issue for a lot of people. there are other unsafe remotes out there, like the bittorrent special remote, which is explicitely documented as such, and where we recommend users untrust the remote when it is setup. yet, those remotes have their uses and the rich diversitry of such remotes makes git-annex one of the most interesting projects out there.

furthermore, rsync -a is not the same as git-annex's excellent tracking features. in my use case (syncing music to my android device), there is no git annex repository that has the same set of files which I want present on the android device, so I cannot use rsync without incurring a large storage space cost.

as for this being contrary to git-annex design, you obviously know more about this than me, but from the outside it doesn't seem that counter-intuitive. it seems that we have go through a lot of hoops to try to make stuff like Facilitate public pretty S3 URLs work at all. having a different backend for specially crafted remotes would be a huge gain in usability and impact on data security could be limited with the usual trust/untrust mechanisms.

i would be curious to hear more about how such a backend would be contrary to git-annex's design. i am assuming here that my main repo could still have a SHA256E backend and some remotes could have different backends. Obviously, maybe that's a flawed assumption and I obviously see how such a dumb backend could break way more assumptions git-annex makes about its data, if it has to be used on all the remotes. Yet, there are backends right now like URL and WORM that could be considered "unsafe", yet do not provide as great usability gains as this dumb backend could provide.

i understand you have been requested this feature often and would understand if this other request would just be closed again. but considering how often it comes up, from different users, i think it should at least be considered as something that should be more explicitely documented (in not, backends and special remotes maybe?) to keep further requests from coming in. keeping an issue like this opened would also help in avoiding duplicate requests.

thank you for your time and efforts on git-annex, i cannot state enough how helpful it is to me. the fact that I could write a special remote to accomplish the above filthy todo is a testament to how flexible and powerful git-annex is. :)

Comment by anarcat Thu Jun 23 14:44:06 2016

I ended up writing a special remote for this, named git-annex-remote-dumb for lack of a better name. it mixes things up badly, is a abhorrent violation of protocol boundaries and is probably due to be taken to the back of the barn and shot, but it works for my case.

this would be better implemented as dumb, unsafe, human-readable backend, but I suspect this cannot be directly implemented either without significant changes of the git-annex design, something which is unlikely to happen since hide missing files is likely to be implemented with adjusted branches.

more details about known problems, remaining tasks and limitations in the script header.

Comment by anarcat Thu Jun 23 13:03:48 2016