I’m using git annex to manage my movie collection on various devices – my laptop, a NSLU tucked away somewhere with lots of space, some external hard drives. For this use case, I do not need the full power of git as a version control system, so having to run "git commit" and coming up with commit messages is annoying. Also, this makes sense for a version control system, but not for my media collection:
$ git annex add Hot\ Fuzz\ -\ English.mkv
add Hot Fuzz - English.mkv (checksum...) ok
(Recording state in git...)
$ git commit -m 'another movie added'
[master 851dc8a] another movie added
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 120000 00 Noch nicht gesehen/Hot Fuzz - English.mkv
$ git push jeff
Counting objects: 38, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (20/20), done.
Writing objects: 100% (26/26), 2.00 KiB, done.
Total 26 (delta 11), reused 0 (delta 0)
remote: error: refusing to update checked out branch: refs/heads/master
remote: error: By default, updating the current branch in a non-bare repository
remote: error: is denied, because it will make the index and work tree inconsistent
remote: error: with what you pushed, and will require 'git reset --hard' to match
remote: error: the work tree to HEAD.
remote: error:
remote: error: You can set 'receive.denyCurrentBranch' configuration variable to
remote: error: 'ignore' or 'warn' in the remote repository to allow pushing into
remote: error: its current branch; however, this is not recommended unless you
remote: error: arranged to update its work tree to match what you pushed in some
remote: error: other way.
remote: error:
remote: error: To squelch this message and still keep the default behaviour, set
remote: error: 'receive.denyCurrentBranch' configuration variable to 'refuse'.
To jeff:/mnt/media/Movies
! [rejected] git-annex -> git-annex (non-fast-forward)
! [remote rejected] master -> master (branch is currently checked out)
error: failed to push some refs to 'jeff:/mnt/media/Movies'
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes (e.g. 'git pull') before pushing again. See the
'Note about fast-forwards' section of 'git push --help' for details.
It seems that to successfully make the new files known to the other side, I have to log into jeff and pull from my current machine.
What I would like to have is that
- git annex add does not require a commit afterwards.
- Changes to the files are automatically picked up with the next git-annex call (similar to how etckeeper works).
- Commands "git annex push" and "git annex pull" that will sync the metadata (i.e. the list of files) in both directions without further manual intervention, at least not until the two repositories have diverged in a way that is not possible to merge sensible.
Summay: git-annex is great. git is not always. Please make it possible to use git annex without having to use git.
First, you need a bare git repository that you can push to, and pull from. This simplifies most git workflow.
Secondly, I use mr, with this in
.mrconfig
:Which makes "mr update" in repositories where I rarely care about git details take care of syncing my changes.
I also make "mr update" do a "git annex get" of some files in some repositories that I want to always populate. git-annex and mr go well together.
Perhaps my annexupdate above should be available as "git annex sync"?
Thanks for the tips so far. I guess a bare-only repo helps, but as well is something that I don’t need (for my use case), any only have to do because git works like this.
Also, if I have a mobile device that I want to push to, then I’d have to have two repositories on the device, as I might not be able to reach my main bare repository when traveling, but I cannot push to the „real“ repo on the mobile device from my computer. I guess I am spoiled by darcs, which will happily push to a checked out remote repository, updating the checkout if possible without conflict.
If I introduce a central bare repository to push to and from; I’d still have to have the other non-bare repos as remotes, so that git-annex will know about them and their files, right?
I’d appreciate a "git annex sync" that does what you described (commit all, pull, merge, push). Especially if it comes in a "git annex sync --all" variant that syncs all reachable repositories.
Git can actually push into a non-bare repository, so long as the branch you change there is not a checked out one. Pushing into
remotes/$foo/master
andremotes/$foo/git-annex
would work, however determining the value that the repository expects for$foo
is something git cannot do on its own. And of course you'd still have togit merge remotes/$foo/master
to get the changes.Yes, you still keep the non-bare repos as remotes when adding a bare repository, so git-annex knows how to get to them.
I've made
git annex sync
run the simple script above. Perhaps it can later be improved to sync all repositories.I thought about this some more, and I think I have a pretty decent solution that avoids a central bare repository. Instead of pushing to master (which git does not like) or trying to guess the remote branch name on the other side, there is a well-known branch name, say git-annex-master. Then a sync command would do something like this (untested):
The nice things are: One can push to any remote repository, and thus avoid the issue of pushing to a portable device; the merging happens on the master branch, so if it fails to merge automatically, regular git foo can resolve it, and all changes eventually reach every repository.
What do you think?
After some experimentation, this seems to work better:
Maybe this approach can be enhance to skip stuff gracefully if there is no git-annex-master branch and then be added to what "git annex sync" does, this way those who want to use the feature can do so by running "git branch git-annex-master" once. Or, if you like this and want to make it default, just make git-annex-init create the git-annex-master branch
It would be clearer to call "git-annex-master" "synced/master" (or really "synced/$current_branch"). That does highlight that this method of syncing is not particularly specific to git-annex.
I think this would be annoying to those who do use a central bare repository, because of the unnecessary pushing and pulling to other repos, which could be expensive to do, especially if you have a lot of interconnected repos. So having a way to enable/disable it seems best.
Maybe you should work up a patch to Command/Sync.hs, since I know you know haskell
I agree on the naming suggestions, and that it does not suit everybody. Maybe I’ll think some more about it. The point is: I’m trying to make live easy for those who do not want to manually create some complicated setup, so if it needs configuration, it is already off that track. But turning the current behavior into something people have to configure is also not well received by the users.
Given that "git annex sync" is a new command, maybe it is fine to have this as a default behavior, and offer an easy way out. The easy way out could be one of two flags that can be set for a repo (or a remote):
Maybe central is enough.
I don't mind changing the behavior of git-annex sync, certainly..
Looking thru git's documentation, I found some existing configuration that could be reused following your idea. There is a remote.name.skipDefaultUpdate and a remote.name.skipFetchAll. Though both have to do with fetches, not pushes. Another approach might be to use git's remote group stuff.
Another option that would please the naive user without hindering the more advanced user: "git annex init", by default, creates a synced/master branch. "git annex sync" will pull from every /sync/master branch it finds, and also push to any /sync/master branch it finds, but will not create any. So by default (at least for new users), this provides simple one-step syncing.
Advanced users can disable this per-repo by just deleting the synced/master branch. Presumably the logic will be: Every repo that should not be pushed to, because it has access to some central repo, should not have a synced/master branch. Every other repo, including the (or one of the few) central repos, will have the branch.
This is not the most expressive solution, as it does not allow configuring syncing between arbitrary pairs of repos, but it feels like a good compromise between that and simplicity and transparency.
I think it's about time that I provide less talk and more code. I’ll see when I find the time
OMG, my first sizable haskell patch!
So trying this out..
In each repo I want to sync, I first
git branch synced/master
Then in each repo, I found I had to pull from each of its remotes, to get the tracking branches that
defaultSyncRemotes
looks for to know those remotes are syncable. This was the surprising thing for me, I had expected sync to somehow work out which remotes were syncable without my explicit pull. And it was not very obvious that sync was not doing its thing before I did that, since it still does a lot of "stuff".Once set up properly,
git annex sync
fetches from each remote, merges, and then pushes to each remote that has a synced branch. Changes propigate around even when some links are one-directional. Cool!So it works fine, but I think more needs to be done to make setting up syncing easier. Ideally, all a user would need to do is run "git annex sync" and it syncs from all remotes, without needing to manually set up the synced/master branch.
While this would lose the ability to control which remotes are synced, I think that being able to
git annex sync origin
and only sync from/to origin is sufficient, for the centralized use case.Code review:
Why did you make
branch
strict?There is a bit of a bug in your use of Command.Merge.start. The git-annex branch merge code only runs once per git-annex run, and often this comes before sync fetches from the remotes, leading to a push conflict. I've fixed this in my "sync" branch, along with a few other minor things.
mergeRemote
merges fromrefs/remotes/foo/synced/master
. But that will only be up-to-date ifgit annex sync
has recently been run there. Is there any reason it couldn't merge fromrefs/remotes/foo/master
?I have made a new
autosync
branch, where all that the user needs to do is rungit annex sync
and it automatically sets up the synced/master branch. I find this very easy to use, what do you think?Note that
autosync
is also pretty smart about not running commands like "git merge" and "git push" when they would not do anything. So you may findgit annex sync
not showing all the steps you'd expect. The only step a sync always performs now is pulling from the remotes.Sorry for not replying earlier, but my non-mailinglist-communications-workflows are suboptimal
Right. But "git fetch" ought to be enough.
Personally, I’d just pull and push everywhere, but you pointed out that it ought to be manageable. The existence of the synced/master branch is the flag that indicates this, so you need to propagate this once. Note that if the branch were already created by "git annex init", then this would not be a problem.
It is not required to use "git fetch" once, you can also call "git annex sync " once with the remote explicitly mentioned; this would involve a fetch.
I’d leave this decision to you. But I see that you took the decision already, as your code now creates the synced/master branch when it does not exist (e290f4a8).
Because it did not work otherwise :-). It uses pipeRead, which is lazy, and for some reason git and/or your utility functions did not like that the output of the command was not consumed before the next git command was called. I did not investigate further. For better code, I’d suggest to add a function like pipeRead that completely reads the git output before returning, thus avoiding any issues with lazyIO.
Hmm, good question. It is probably save to merge from both, and push only to synced/master. But which one first? synced/master can be ahead if the repo was synced to from somewhere else, master can be ahead if there are local changes. Maybe git merge should be called on all remote heads simultaniously, thus generating only one commit for the merge. I don’t know how well that works in practice.
Thanks for including my code, Joachim
With a lazy branch, I get "git-annex: no branch is checked out". Weird.. my best guess is that it's because this is running at the seek stage, which is unusual, and the value is not used until a later stage and so perhaps the git command gets reaped by some cleanup code before its output is read.
(pipeRead is lazy because often it's used to read large quantities of data from git that are processed progressively.)
I did make it merge both branches, separately. It would be possible to do one single merge, but it's probably harder for the user to recover if there are conflicts in an octopus merge. The order of the merges does not seem to me to matter much, barring conflicts it will work either way. Dealing with conflicts during sync is probably a weakness of all this; after the first conflict the rest of the sync will continue failing.