I try to use git annex, but I frequently don't know if I'm doing things correctly, and my files are getting messed up.
An expert guide to the full workflow would go a long way toward user-friendliness for me. The walkthrough currently has guides to a number discrete items in the workflow, but it doesn't give me a clear sense of the process.
I'm always confused about when I'm supposed to be using pure git commands and when they should be git annex commands, when to commit, add, and sync --content, and when each of these is redundant.
If possible, most helpful would be a guide to how you imagine the workflow from the beginning and including each step of the process, in the order you'd do it.
I want to start keeping track of some files I have in a directory I want to copy them to a second computer. From a third place, I want to get them from the second computer. I change the files on one computer, and I want to make sure the changes get synced to the others. What are the commands you'd run at each step?
Many thanks.
I think this is a good idea for an extension to the walkthrough. I am probably the worst person to write it. How about, you write it, and I'll fact check and edit it.
I'll try to address some of this.
Git-annex supports a wide variety of workflows, a spectrum that ranges from completely automatic behavior where git-annex handles everything, through manual behavior where git-annex does only what you say, when you tell it to, down to internal behavior, where you have complete control and understand how everything is stored and exactly what changes are happening.
I will proceed to summarize all of these. Your question sounds like a high-level question, so I will begin at the automatic end, hoping that this is most useful, and drill down to the low level approaches. Note, however, that this the opposite order of how git-annex was apparently developed. A list of workflows that started from manual, commandline usage would be much more intuitive, but you'd have to be willing to read the man page and wiki pages to get started, and that's pretty much what's already out there anyway.
Note that for each of these levels of interaction, all the levels following will also work as well. So you can actually manually move annexed files around while the webapp is running, etc.
(0)
git annex webapp
. This command launches a local web server which serves a graphical user interface and automatically manages git annex. It will attempt to guide you through the whole process and do everything for you. I think the intent is that no other commands are needed. This should be run on every machine that may produce file changes.(1)
git annex assistant
without running the webapp. You could call this the command-line version of the webapp, giving you more control over creating and connecting your repositories, and configuring how files are moved between them. The assistant, when running, will automatically watch for file changes and synchronize them to other repositories, but you must manually create the repositories and configure the rules for syncing. To create a repository, usegit init
and thengit annex init
, and thengit remote add
it to any other repositories. If you want more than one annex, you can add their paths to ~/.config/git-annex/autostart if you would like them to automatically begin syncing whengit annex assistant --autostart
is run, perhaps on boot or login. You can configure rules for where files are copied using the repository setup commands such as preferred content expressions,git annex numcopies
, andgit config annex.largefiles
; most of the settings are accessible in one place withgit annex vicfg
.(2)
git annex watch
without running the assistant. This command is like the assistant but has no automatic network behavior, giving you complete control over when repositories are pushed and pulled, and when files are moved between systems. The repository is watched, and any file changes are added to git-annex. In order to synchronize between repositories, you must rungit annex sync --content
in the repository with the changes, which will merge the git history and logs with your remotes, and automatically transfer files to match your preferred and required content expressions.(3) No background processes, allowing you to decide when and what files are annexed. In order to tell git-annex to manage files, you must
git annex add
the files.(4) Plain
git annex sync
without--content
, giving you fine-grained control of where copies of your files are stored. This tells git-annex to merge git histories, but it does not automatically transfer your large files between systems. To transfer files and directories, you can usegit annex get
,git annex drop
,git annex move
, andgit annex copy
. Git-annex will not violate a required content expression or your numcopies setting unless you pass --force, so your files are still safe. This is the workflow I mostly use, and I find it the most stable. I'm trying to migrate up to--content
, but I have too many large files that haven't reached their numcopies yet for that to be effective.(5) Manual management of git history without running the syncronizer, allowing you to control precisely what is committed, what commit message is used, and how your history is merged between repositories. You must have an understanding of git, and run
git commit
aftergit annex add
to store the change. You must manage the git history yourself, usinggit pull
andgit push
, to synchronize repositories. You may freely use git normally side-by-side with git-annex.(6) Manual management of git annex keys, giving you control of what and where git annex stores your files under the hood, and how they are associated with your working tree, rather than using the
git annex add
andgit annex get
commands which reference files automatically. Git-annex has a variety of plumbing commands listed in the man page that let you directly store and retrieve data in an annex associated with your git repository, where every datafile is enumerated by a unique hashkey.There are a lot of possible workflows with git annex. It provides an array of tools that allow you to both completely automate things and have very fine control of them. The more you can learn, the more power you will have.
@xloem nice start!
It would be helpful to get this into a set of wiki pages documenting the workflows in detail. Since this site is a wiki, you can add pages as you like. Start by clicking the question mark to create workflow, and see wikilink for how to add sub-pages like ?assistant.
Good start on the workflow page!
I've added some links to it to make it discoverable.
Not sure if the workflow page quite gets to what was originally requested:
Leaving this todo open for now..
In a way the use cases on the front page of the website are trying to accomplish the same thing requested here. I think that section could be moved more in the direction of listing some ways to use git-annex and linking to walkthroughs for the different use cases.
i actually answered this in Full workflow guide before realizing this post was a duplicate.
i am not sure where that should be put - it's not a "full workflow guide", it's actually a "specific workflow guide", namely "i have 3 computers and need to sync files from A to B to C"...
i wonder if this should be marked as resolved or if we want to add a specific workflow for this or what?
this reminds me of my broader build a user guide task... -- anarcat