When using the assistant on some of my repositories, I would like to retain manual control over the granularity and contents of the commit history. Some motivating reasons:
- manually specified commit messages makes the history easier to follow
make a series of minor changes to a file over a period of a few hours would result in a single commit rather than capturing intermediate incomplete edits
manual choice of which files to annex (based on predicted usage) could be useful, e.g. a repo might contain a 4MB PDF which you want available in every remote even without
git annex get
, and also some 2MB images which are only required in some remotes
This particular case is now catered to by the "manual" repository group in preferred content settings. --Joey
Obviously this needs to be configurable at least per repository, and ideally perhaps even per remote, since usage habits can vary from machine to machine (e.g. I could choose to commit manually from my desktop machine which has a nice comfy keyboard and large screen, but this would be too much pain to do from my tiny netbook).
In fact, this is vaguely related to partial content, since the usefulness of the commit history depends on the context of the data being manipulated, which in turn depends on which subdirectories are being touched. So any mechanism for disabling sync per directory could potentially be reused for disabling auto-commit per directory.
According to Joey, it should be easy to arrange for the watcher thread not to run, but would need some more work for the assistant to notice manual commits in order to sync them; however the assistant already does some crazy inotify watching of git refs, in order to detect incoming pushes, so detecting manual commits wouldn't be a stretch.
You can do this now by pausing committing via the webapp, or setting
annex.autocommit=false
.When configured this way, the assistant doesn't push commits that you manually make, of course you can also manually push. --Joey
Revisiting this topic 3.5 years later ...
The assistant will still commit a change as soon as it notices it. This obviously has the advantage of synchronising changes to peers quicker, but it also has downsides:
I find this undesirable even when editing my TODO list, but it could be particularly problematic with large files, e.g. using
ffmpeg
to transcode a video between formats would presumably capture lots of intermediate states of the unfinished transcoding process. Similarly for rendering from a video editor.So it would be helpful if there was a configurable option to determine how often changes get committed. Ideally this would be configurable via
.gitattributes
, e.g.would autocommit most stuff immediately, but would only autocommit webm files if they haven't changed within the last 10 minutes.
The assistant does not commit files that are open for write.
So unless ffmpeg partially writes the file, then closes the file, then reopens it and writes some more, the assistant will only make a single commit.
Interesting; I see it uses lsof for this.
OK, so that probably wasn't a good example. But that still doesn't negate my TODO list editing example, and it is not hard to think of other scenarios where partial results are written by one process and then rewritten in-place by another.
It also doesn't negate the fact that throttling the commit speed would also help reduce I/O between remotes in some cases simply by reducing "churn" within any given repo, as noted in my comment on design/assistant/rate_limiting.
annex.autocommit
gitattribute which offers finer control of autocommit than the current binary toggle? It would be very useful to me, and maybe I'm not the only one.Since I haven't learnt Haskell yet (and even if I was able to hack a patch to git-annex I'm not sure whether it would be accepted), I've hacked up a simple
git-auto-commit
Python script for automatically committing based on a simple policy, together with a shell script which invokes it every$n
seconds.Currently the policy is hardcoded to only stage and commit files ending in
.org
which have unstaged changes and anmtime
over a minimum threshold, in order to throttle the rate at which automatic commits are made. Maybe at some point I'll change it to honour.gitattributes
such asbut of course it would be far nicer if the assistant could do this natively, since that would also solve forum/Can the assistant sync files if committed manually (autocommit=false)?.
I've moved the location of my
git-auto-commit
andauto-commit-daemon
wrapper, and added support for autocommit policy based anautocommit
git attribute.For example:
will ensure that all
.org
files last committed and modified more than 5 minutes ago will be automatically committed by the nextgit auto-commit
run in this repo.In the future I can imagine adding support for more sophisticated auto-commit policies.
Here's an example of the kind of systemd unit service file I put in
~/.config/systemd/user
so that I can control the daemon viasystemctl --user
:and
~/myrepo/.autocommit
containsSLEEP=1m
so that the daemon wrapper runsgit auto-commit
every minute.I couple this with another simple
auto-sync-daemon
script which usesinotifywait
to trigger invocations ofgit annex sync
whenevermaster
orsynced/master
change. Doing this across multiple remotes effectively achieves a poor man's assistant. (Incidentally, I usemr
, one of Joey's other hacks, to automatically set up and manage a systemd daemon for each repo).... BUT, I really wish this was supported natively in
git-annex
instead. I've kind of had to reinvent the base functionality of the assistant just to get this autocommit policy feature. It was painful, but still easier than learning Haskell and thegit-annex
codebase.