wishlist: disable automatic commits

When using the assistant on some of my repositories, I would like to retain manual control over the granularity and contents of the commit history. Some motivating reasons:

manually specified commit messages makes the history easier to follow
make a series of minor changes to a file over a period of a few hours would result in a single commit rather than capturing intermediate incomplete edits
manual choice of which files to annex (based on predicted usage) could be useful, e.g. a repo might contain a 4MB PDF which you want available in every remote even without git annex get, and also some 2MB images which are only required in some remotes

This particular case is now catered to by the "manual" repository group in preferred content settings. --Joey

Obviously this needs to be configurable at least per repository, and ideally perhaps even per remote, since usage habits can vary from machine to machine (e.g. I could choose to commit manually from my desktop machine which has a nice comfy keyboard and large screen, but this would be too much pain to do from my tiny netbook).

In fact, this is vaguely related to partial content, since the usefulness of the commit history depends on the context of the data being manipulated, which in turn depends on which subdirectories are being touched. So any mechanism for disabling sync per directory could potentially be reused for disabling auto-commit per directory.

According to Joey, it should be easy to arrange for the watcher thread not to run, but would need some more work for the assistant to notice manual commits in order to sync them; however the assistant already does some crazy inotify watching of git refs, in order to detect incoming pushes, so detecting manual commits wouldn't be a stretch.

You can do this now by pausing committing via the webapp, or setting annex.autocommit=false.

When configured this way, the assistant doesn't push commits that you manually make, of course you can also manually push. --Joey

RSS Atom

.gitattributes could solve this

Revisiting this topic 3.5 years later ...

The assistant will still commit a change as soon as it notices it. This obviously has the advantage of synchronising changes to peers quicker, but it also has downsides:

It pollutes the git history with every single (saved) edit.
Consequently it bloats the git object store.

I find this undesirable even when editing my TODO list, but it could be particularly problematic with large files, e.g. using ffmpeg to transcode a video between formats would presumably capture lots of intermediate states of the unfinished transcoding process. Similarly for rendering from a video editor.

So it would be helpful if there was a configurable option to determine how often changes get committed. Ideally this would be configurable via .gitattributes, e.g.

*      annex.autocommit=anything
*.webm annex.autocommit=(mtimebefore=10mins)

would autocommit most stuff immediately, but would only autocommit webm files if they haven't changed within the last 10 minutes.

Comment by branchable — Sun Sep 29 22:59:48 2019

Remove comment

comment 2

The assistant does not commit files that are open for write.

So unless ffmpeg partially writes the file, then closes the file, then reopens it and writes some more, the assistant will only make a single commit.

Comment by joey — Mon Sep 30 18:25:08 2019

Remove comment

There are still benefits to commit throttling

The assistant does not commit files that are open for write.

Interesting; I see it uses lsof for this.

So unless ffmpeg partially writes the file, then closes the file, then reopens it and writes some more, the assistant will only make a single commit.

OK, so that probably wasn't a good example. But that still doesn't negate my TODO list editing example, and it is not hard to think of other scenarios where partial results are written by one process and then rewritten in-place by another.

It also doesn't negate the fact that throttling the commit speed would also help reduce I/O between remotes in some cases simply by reducing "churn" within any given repo, as noted in my comment on design/assistant/rate_limiting.

Comment by branchable — Mon Sep 30 22:16:10 2019

Remove comment

Would you accept a patch implementing an annex.autocommit gitattribute?

In a semi-mythical future where I find the time to learn Haskell, in principle would you consider accepting a patch implementing my above suggestion of a new annex.autocommit gitattribute which offers finer control of autocommit than the current binary toggle? It would be very useful to me, and maybe I'm not the only one.

Comment by AdamSpiers — Mon Jan 6 16:12:54 2020

Remove comment

I've hacked up a Python script for policy-based automatic commits

Since I haven't learnt Haskell yet (and even if I was able to hack a patch to git-annex I'm not sure whether it would be accepted), I've hacked up a simple git-auto-commit Python script for automatically committing based on a simple policy, together with a shell script which invokes it every $n seconds.

Currently the policy is hardcoded to only stage and commit files ending in .org which have unstaged changes and an mtime over a minimum threshold, in order to throttle the rate at which automatic commits are made. Maybe at some point I'll change it to honour .gitattributes such as

*.org annex.autocommit=(mtimebefore=5mins)

but of course it would be far nicer if the assistant could do this natively, since that would also solve forum/Can the assistant sync files if committed manually (autocommit=false)?.

Comment by branchable — Thu Jun 11 10:10:52 2020

Remove comment

Update on my auto-commit / auto-sync scripts

I've moved the location of my git-auto-commit and auto-commit-daemon wrapper, and added support for autocommit policy based an autocommit git attribute.

For example:

$ echo "*.org autocommit=min-age=+5m" > .gitattributes

will ensure that all .org files last committed and modified more than 5 minutes ago will be automatically committed by the next git auto-commit run in this repo.

In the future I can imagine adding support for more sophisticated auto-commit policies.

Here's an example of the kind of systemd unit service file I put in ~/.config/systemd/user so that I can control the daemon via systemctl --user:

[Service]
ExecStart=/bin/sh -c "/home/adam/bin/auto-commit-daemon /home/adam/myrepo"
Restart=always
NoNewPrivileges=true
SyslogIdentifier=auto-commit-myrepo
EnvironmentFile=/home/adam/myrepo/.autocommit

[Install]
WantedBy=multi-user.target

and ~/myrepo/.autocommit contains SLEEP=1m so that the daemon wrapper runs git auto-commit every minute.

I couple this with another simple auto-sync-daemon script which uses inotifywait to trigger invocations of git annex sync whenever master or synced/master change. Doing this across multiple remotes effectively achieves a poor man's assistant. (Incidentally, I use mr, one of Joey's other hacks, to automatically set up and manage a systemd daemon for each repo).

... BUT, I really wish this was supported natively in git-annex instead. I've kind of had to reinvent the base functionality of the assistant just to get this autocommit policy feature. It was painful, but still easier than learning Haskell and the git-annex codebase.

Comment by branchable — Thu Jul 9 14:23:14 2020

Remove comment

Add a comment