Having 'git annex sync' optionally add

Hey joey,

Currently, there is no one-command-to-rule-them-all to actually 'get the repo synced entirely'. Current workarounds:

# plain git-annex
git annex add
(git add -A) # for the weird filter issues, see https://github.com/datalad/datalad/issues/7268
git annex sync --content

# with DataLad
datalad save # no way around the weird filter issues currently
git annex sync --content

Whenever I introduce new people to git annex, it's always a pain point to explain these multiple steps. It would be awesome to have just one command that syncs everything. Especially for beginners.

git annex sync already does quite a bit including git adding and git committing changes to tracked files. However it doesn't add new files. It would be awesome if git annex sync could optionally also do git annex add:

# configure 'git annex sync' to run 'git annex add' before as well (opt-in)
git annex config --set annex.syncadd true
# have it sync the content as well
git annex config --set annex.synccontent true

# From then on, 'git annex sync' does a *real* sync: everything that's new/changed here gets pushed elsewhere and new changes get pulled as well.
git annex sync

# This would then basically be a poor-mans-git-annex-assistant 😛
while true;do git annex sync --content;done

What do you think?

Cheers, Yann

done as git-annex assist --Joey

RSS Atom

adding all files or only in current directory?

git annex add only adds files in the current directory and below and has no flag to do a git add -A equivalent. For the purpose of making git annex sync also adding all files (e.g. with git annex sync --add or with prior git annex config --set annex.syncadd true), such an git annex add --all|-A option would be handy (which would also be used by git annex sync --add). It would also make git annex add more consistent with git add.

Comment by nobodyinperson — Wed May 3 14:23:04 2023

Remove comment

Git Alias for a 'full sync'

This alias pretty much does what I mean:

# set the alias
git config --global alias.sync '!sh -xc '"'"'cd "$(git rev-parse --show-toplevel)";git annex add;git add -A;git annex sync'"'"''
# This then goes into the repo root, adds stuff with git annex, adds the rest git-annex didn't add and syncs with git annex
git sync

Comment by nobodyinperson — Mon May 15 07:31:41 2023

Remove comment

comment 3

I don't think this can possibly become the default behavior, because plenty of users will have temporary files that they don't want to add (and haven't gitignored), or even sensitive files that happen to be in the same repository that it would be a security hole level surprise for git-annex sync to start adding, when it has not before.

For the same reason, git-annex config couldn't be used to make this a default git-annex sync behavior. A local git config to enable it would be ok.

It would would need to look at annex.largefiles, and add small files to git directly.

Comment by joey — Mon May 15 20:09:30 2023

Remove comment

comment 4

Exactly, I also would never want git annex sync to do the adding by default - strictly as an opt-in configuration.

I see your point that people might have sensitive files in a repo that they don't have .gitignored and would not want to get pushed. However, I argue:

If you clone a git annex repo and do git annex sync on it, you're pretty much already trusting that repo and the remotes. You either already have push access (that doesn't happen easily unless you're in control of the repo) or you don't, but then it's not a problem when you add your sensitive files because your changes won't ever get synced back. Even if they do at some later point: You git annex drop --force them (or just git rebase -i) to remove them from the repo entirely and worst case is that the file metadata is leaked back into the repo collection. Again, something I'd consider OK if you trust the repo.
When you trust a git annex repo, you also trust its git annex config settings like synccontent and consent with it.
Thus, if that repo has a git annex config --set annex.syncadd true setting that has git annex sync also do a prior git annex add, that's fine.

I think that only having a local git config annex.syncadd true setting is basically the same as having a customized alias like mine above, so that doesn't really improve the situation: in both cases users of the repo need to do local configs. A git annex config will be stored in the repo and be immediately effective for everybody using it - a major benefit for collaboration, for which git annex is a golden tool. If I understand correctly, any annex.* config can be overridden locally with a git config annex.* setting (right?). So cautious users can have git config --global annex.syncadd false to prevent git annex sync from ever adding files, even if the repo is configured to do so.

If the git config annex.syncadd true setting would just literally run git annex add, it automatically obeys the largefiles settings (also from the .gitattributes file).

All in all I'd consider the syncadd a matching feature considering how much git annex sync is already doing: committing, pulling, merging, pushing all over the place - only the adding it doesn't do.

The more behaviour can be configured in the repo directly, the better the out-of-the-box experience for (inexperienced) end users of a repo will be. I maintain many git annex repos and collaborate with different students like this on scientific projects. Together with default preferred content expressions the workflow would be extremely straight-forward:

# get the repo
git clone URL/repo
cd repo
# ”whenever you're done, run”:
git annex sync
# - auto-enables common remotes
# - auto-sets preferred content expressions (to prevent the repos from uselessly downloading EVERYTHING - just what's sensible by default)
# - auto-adds/commits/pulls/merges/pushes changes
# - auto-syncs content around

Sensible settings for largefiles in .gitattributes or the git annex config can be set by those who understand it, others just do git annex sync and are done with it.

Comment by nobodyinperson — Mon May 15 21:13:34 2023

Remove comment

comment 5

sync's intent is to replicate as close as possible the following common git workflow:

first stage some changes, and then run:
git commit -m foo
git pull
git push

Since git-annex has some complications involving pulling and pushing the git-annex branch, and transferring the content, it adds a learning curve that was too high. Giving users one new command to learn minimizes the learning curve. (It's unfortunate it didn't originally send content, and annex.synccontent aims to fix that oversight and I hope it may eventually become default.)

For git-annex sync to default to doing something not in the above pull+commit+push workflow would be surprising, because that's the workflow users have been told it handles.

Notice that the git-annex assistant will happily add all files to the repository and send their content. That's because it's not targeting users who expect to use that git workflow. So it's not surprising that it does what it does.

In conclusion, if having this behavior enabled by git config is not useful, then it does not belong in git-annex sync. It could go in a new command.

Comment by joey — Tue May 16 16:32:51 2023

Remove comment

comment 6

I also generally feel like git-annex sync was probably a bad conflation of 3 git commands into 1 command when there could have just been git-annex pull and git-annex push. The fact that it conflates several things makes users think of it as just a big "does all the things" command, which makes users want it to do more things.

It is not too late to split it up, and eventually deprecating it would be a good path to making annex.synccontent default.

Update: Implemented git-annex pull and git-annex push, although I have no plans yet to deprecate git-annex sync.

Update: I found a better path to transition to sync --content by default, and have started that transition. git-annex pull/push sync content by default already.

Comment by joey — Tue May 16 17:05:35 2023

Remove comment

comment 7

Cool, thanks joey!

If you want to go as far as deprecating git annex sync (a huge change in habits for git-annex users if that one goes away), I wonder how adding the annex.syncadd config (a strictly opt-in consensus for users of a repo) is too much of a disruption 😅 But okay, you're the maintainer.

I still think that having one command that does what the assistant does, but as a one-shot manual step, would benefit git-annex immensely. I recently tried running the assistant on my SailfishOS phone (https://fosstodon.org/@nobodyinperson/110203412944552425) and realised that having the assistant running in the background all the time is not ideal due to e.g. the power consumption - maybe due to the permanent SSH connection to the remotes. I am now using a systemd service that basically does git annex add;git annex sync. Would be awesome and certainly good for the understanding of new users if there was a command that basically one-steps the assistant - This is something I as a previous Syncthing user was dearly missing: one-stepping the syncing to see what's the bottleneck/problem/what it does exactly, etc. If you'd rather put this into a new command, I'm fine with that, though I still think git annex sync is a perfect candidate also due to the name.

Comment by nobodyinperson — Wed May 17 10:41:18 2023

Remove comment

comment 7

Brainstorming some names for a new command that is git add -A + sync...

git-annex publish kind of implies more exposure of local files to others than just syncing. But it more implies sending data, not also receiving it.

git-annex addsync is clear what it does, but not very memorable.

git-annex assist has a nice analogy to the git-annex assistant, which is close in behavior. Users who don't know about the assistant will miss the analogy though. "assist" also suggests git-annex will take care of everything, even more broadly than sync.

git-annex share also implies more exposure of local files than git-annex sync, and sharing goes both ways so it's better than git-annex publish.

I'd be inclined toward git-annex share or git-annex assist, but welcome better ideas.

Comment by joey — Wed May 17 17:24:03 2023

Remove comment

comment 9

Good suggestions, I'd prefer assist, as it'll be one step of what the assistant does. I'll also through git annex update into the ring. updateing goes both ways, shareing also sounds more like you getting your stuff out there, not necessarily others' stuff to you.

Comment by nobodyinperson — Thu May 18 08:40:37 2023

Remove comment

comment 10

Implementing git-annex assist now...

A tricky point is, should it add files only in the current directory and below, or all files in the repository? Note that the assistant can be run in a directory and it will only add changed/new files in that directory, although it can receive pulls that change files in other directories (and will then download those files content).

OTOH, git-annex sync commits all changes, not only those in the current directory. (The assistant does in some circumstanges commit changes made outside the current directory. Its behavior is a bit inconsistent in this area.)

So I think it makes sense for git-annex assist to only add files in the current directory by default. (Of course an option like -A could be added later.)

And while I'm a bit ambivilant about it, I'm making it commit all staged changes, not only those in the current directory. As well as following the behavior of git-annex sync and to an extent the assistant, it seems to me that if you run git-annex add ../foo; git-annex assist, you are intentionally building up a commit that includes file "foo". The same as if you ran git-annex add ../foo; git-annex add . ... If you're not, and you care about what files get added in what commit, you can of course commit manually.

Comment by joey — Thu May 18 17:45:44 2023

Remove comment

👍 git annex assist

Cool, joey!

I personally would like assist to add all changes in the entire repo by default. For beginners it's always a hassle to know what directory one is in and if one opens a terminal in a subfolder via a file manager, then run git annex assist, it won't sync all changes.

But I see that this would be inconsistent with git annex add... On the other hand, assist should help you easily sync the repo state without typing too much. One can always do git annex assist . for only this directory. A -A option is also an option, but effectively one would need to use it every time. As you said in the other comment - if you want more control, use the lower level commands. The assistant makes syncing a no-brainer, git annex assist should do so, too.

That the assistant can operate on only a subdir (do people do this?) itself is again an inconsistency then... A config like annex.assistaddall could be introduced, but meh...

Comment by nobodyinperson — Fri May 19 05:54:43 2023

Remove comment

comment 12

I'm also not too happy with the inconsistency of assist committing all staged changes and syncing all file contents, but only adding files in the cwd.

I suppose that consistency with the assistant doesn't really matter. The assistant's behavior when ran in a subdirectory is surprising, inconsistent, and undocumented.

So I'm going to change assist to add all files. Except when -C is used, then only add files in the specified directory.

Comment by joey — Fri May 19 18:37:28 2023

Remove comment

comment 13

Sounds good! Maybe also a -m short form of --message for less typing. Consistent with sync and datalad save.

Comment by nobodyinperson — Sat May 20 06:03:30 2023

Remove comment

comment 14

Oh, this new command and sync always supported -m. I've added it to the man page.

Comment by joey — Tue May 23 15:46:27 2023

Remove comment

Breaking change to "sync"

Via a comment on my bug about the new sync warning suddenly appearing in 10.20230626 I see that this fairly hidden discussion seems to have been the rationale for completely changing what "sync" does "because some users expect it to do everything".

At the beginning of the design I might have agreed with you about "what the sync command should do" (and having another, eg, "metasync" command for the smaller version). But changing a fundamental command, a decade later, to do something different, based entirely on a "wouldn't it have been nice (for some use cases) if..." seems quite a stretch.

As per the other bug, if you want to implement new behaviour for such a fundamental command ("git annex sync" is something one runs pretty constantly if using git annex actively from the command line) then it'd be best to implement it in another command name, instead of dramatically changing an existing one. (The other thread has a few more suggestions; I personally still like "fullsync" for the expanded version.)

Either way, such a major -- behaviour breaking -- change needs to be much better documented than "discussion in a bug that a few people saw", and a note in passing in a changelog (which people have to find by noticing strange new output in probably scripted git annex usage).

Ewen

PS: I too think it's a terrible idea to have git annex default to auto-adding any files it can find. Maybe that makes some "Dropbox-like" use cases nicer, but it also entirely breaks other long standing use cases.

Comment by ewen — Wed Jul 12 10:21:03 2023

Remove comment

Clarification

@ewen for the record:

I never suggested that git annex sync auto-adds new files by default, see my comment above:

Exactly, I also would never want git annex sync to do the adding by default - strictly as an opt-in configuration.

As git annex sync already had so many options for configuring its behaviour, I thought having one more that runs git annex add in the beginning wouldn't hurt. I never suggested changing git annex sync's default behaviour. Purely opt-in. Same for the existing options like annex.synccontent etc.

Changing git annex sync's default behaviour of now syncing content was joey's idea, not mine.

if you want to implement new behaviour for such a fundamental command

There is already git annex assist to do exactly that. See above or the changelogs.

Just to clarify. Please read the discussions you're referring to thoroughly before claiming things between the lines. 🙂

Now back to constructive discussion.

Comment by nobodyinperson — Wed Jul 12 11:29:41 2023

Remove comment

Add a comment