Hey joey,
Currently, there is no one-command-to-rule-them-all to actually 'get the repo synced entirely'. Current workarounds:
# plain git-annex
git annex add
(git add -A) # for the weird filter issues, see https://github.com/datalad/datalad/issues/7268
git annex sync --content
# with DataLad
datalad save # no way around the weird filter issues currently
git annex sync --content
Whenever I introduce new people to git annex, it's always a pain point to explain these multiple steps. It would be awesome to have just one command that syncs everything. Especially for beginners.
git annex sync
already does quite a bit including git add
ing and git commit
ting changes to tracked files. However it doesn't add new files. It would be awesome if git annex sync
could optionally also do git annex add
:
# configure 'git annex sync' to run 'git annex add' before as well (opt-in)
git annex config --set annex.syncadd true
# have it sync the content as well
git annex config --set annex.synccontent true
# From then on, 'git annex sync' does a *real* sync: everything that's new/changed here gets pushed elsewhere and new changes get pulled as well.
git annex sync
# This would then basically be a poor-mans-git-annex-assistant 😛
while true;do git annex sync --content;done
What do you think?
Cheers, Yann
git annex add
only adds files in the current directory and below and has no flag to do agit add -A
equivalent. For the purpose of makinggit annex sync
also adding all files (e.g. withgit annex sync --add
or with priorgit annex config --set annex.syncadd true
), such angit annex add --all|-A
option would be handy (which would also be used bygit annex sync --add
). It would also makegit annex add
more consistent withgit add
.This alias pretty much does what I mean:
I don't think this can possibly become the default behavior, because plenty of users will have temporary files that they don't want to add (and haven't gitignored), or even sensitive files that happen to be in the same repository that it would be a security hole level surprise for
git-annex sync
to start adding, when it has not before.For the same reason,
git-annex config
couldn't be used to make this a defaultgit-annex sync
behavior. A local git config to enable it would be ok.It would would need to look at annex.largefiles, and add small files to git directly.
Exactly, I also would never want
git annex sync
to do the adding by default - strictly as an opt-in configuration.I see your point that people might have sensitive files in a repo that they don't have
.gitignore
d and would not want to get pushed. However, I argue:git annex sync
on it, you're pretty much already trusting that repo and the remotes. You either already have push access (that doesn't happen easily unless you're in control of the repo) or you don't, but then it's not a problem when you add your sensitive files because your changes won't ever get synced back. Even if they do at some later point: Yougit annex drop --force
them (or justgit rebase -i
) to remove them from the repo entirely and worst case is that the file metadata is leaked back into the repo collection. Again, something I'd consider OK if you trust the repo.git annex config
settings likesynccontent
and consent with it.git annex config --set annex.syncadd true
setting that hasgit annex sync
also do a priorgit annex add
, that's fine.I think that only having a local
git config annex.syncadd true
setting is basically the same as having a customized alias like mine above, so that doesn't really improve the situation: in both cases users of the repo need to do local configs. Agit annex config
will be stored in the repo and be immediately effective for everybody using it - a major benefit for collaboration, for which git annex is a golden tool. If I understand correctly, anyannex.*
config can be overridden locally with agit config annex.*
setting (right?). So cautious users can havegit config --global annex.syncadd false
to preventgit annex sync
from ever adding files, even if the repo is configured to do so.If the
git config annex.syncadd true
setting would just literally rungit annex add
, it automatically obeys the largefiles settings (also from the .gitattributes file).All in all I'd consider the
syncadd
a matching feature considering how muchgit annex sync
is already doing: committing, pulling, merging, pushing all over the place - only the adding it doesn't do.The more behaviour can be configured in the repo directly, the better the out-of-the-box experience for (inexperienced) end users of a repo will be. I maintain many git annex repos and collaborate with different students like this on scientific projects. Together with default preferred content expressions the workflow would be extremely straight-forward:
Sensible settings for largefiles in
.gitattributes
or thegit annex config
can be set by those who understand it, others just dogit annex sync
and are done with it.sync's intent is to replicate as close as possible the following common git workflow:
Since git-annex has some complications involving pulling and pushing the git-annex branch, and transferring the content, it adds a learning curve that was too high. Giving users one new command to learn minimizes the learning curve. (It's unfortunate it didn't originally send content, and annex.synccontent aims to fix that oversight and I hope it may eventually become default.)
For git-annex sync to default to doing something not in the above pull+commit+push workflow would be surprising, because that's the workflow users have been told it handles.
Notice that the git-annex assistant will happily add all files to the repository and send their content. That's because it's not targeting users who expect to use that git workflow. So it's not surprising that it does what it does.
In conclusion, if having this behavior enabled by git config is not useful, then it does not belong in
git-annex sync
. It could go in a new command.I also generally feel like
git-annex sync
was probably a bad conflation of 3 git commands into 1 command when there could have just beengit-annex pull
andgit-annex push
. The fact that it conflates several things makes users think of it as just a big "does all the things" command, which makes users want it to do more things.It is not too late to split it up, and eventually deprecating it would be a good path to making annex.synccontent default.
Update: Implemented
git-annex pull
andgit-annex push
, although I have no plans yet to deprecategit-annex sync
.Update: I found a better path to transition to sync --content by default, and have started that transition.
git-annex pull/push
sync content by default already.Cool, thanks joey!
If you want to go as far as deprecating
git annex sync
(a huge change in habits for git-annex users if that one goes away), I wonder how adding theannex.syncadd
config (a strictly opt-in consensus for users of a repo) is too much of a disruption 😅 But okay, you're the maintainer.I still think that having one command that does what the assistant does, but as a one-shot manual step, would benefit git-annex immensely. I recently tried running the assistant on my SailfishOS phone (https://fosstodon.org/@nobodyinperson/110203412944552425) and realised that having the assistant running in the background all the time is not ideal due to e.g. the power consumption - maybe due to the permanent SSH connection to the remotes. I am now using a systemd service that basically does
git annex add;git annex sync
. Would be awesome and certainly good for the understanding of new users if there was a command that basically one-steps the assistant - This is something I as a previous Syncthing user was dearly missing: one-stepping the syncing to see what's the bottleneck/problem/what it does exactly, etc. If you'd rather put this into a new command, I'm fine with that, though I still thinkgit annex sync
is a perfect candidate also due to the name.Brainstorming some names for a new command that is
git add -A
+ sync...git-annex publish
kind of implies more exposure of local files to others than just syncing. But it more implies sending data, not also receiving it.git-annex addsync
is clear what it does, but not very memorable.git-annex assist
has a nice analogy to thegit-annex assistant
, which is close in behavior. Users who don't know about the assistant will miss the analogy though. "assist" also suggests git-annex will take care of everything, even more broadly than sync.git-annex share
also implies more exposure of local files than git-annex sync, and sharing goes both ways so it's better thangit-annex publish
.I'd be inclined toward
git-annex share
orgit-annex assist
, but welcome better ideas.assist
, as it'll be one step of what the assistant does. I'll also throughgit annex update
into the ring.update
ing goes both ways,share
ing also sounds more like you getting your stuff out there, not necessarily others' stuff to you.Implementing
git-annex assist
now...A tricky point is, should it add files only in the current directory and below, or all files in the repository? Note that the assistant can be run in a directory and it will only add changed/new files in that directory, although it can receive pulls that change files in other directories (and will then download those files content).
OTOH,
git-annex sync
commits all changes, not only those in the current directory. (The assistant does in some circumstanges commit changes made outside the current directory. Its behavior is a bit inconsistent in this area.)So I think it makes sense for
git-annex assist
to only add files in the current directory by default. (Of course an option like -A could be added later.)And while I'm a bit ambivilant about it, I'm making it commit all staged changes, not only those in the current directory. As well as following the behavior of
git-annex sync
and to an extent the assistant, it seems to me that if you rungit-annex add ../foo; git-annex assist
, you are intentionally building up a commit that includes file "foo". The same as if you rangit-annex add ../foo; git-annex add .
... If you're not, and you care about what files get added in what commit, you can of course commit manually.Cool, joey!
I personally would like
assist
to add all changes in the entire repo by default. For beginners it's always a hassle to know what directory one is in and if one opens a terminal in a subfolder via a file manager, then rungit annex assist
, it won't sync all changes.But I see that this would be inconsistent with
git annex add
... On the other hand,assist
should help you easily sync the repo state without typing too much. One can always dogit annex assist .
for only this directory. A-A
option is also an option, but effectively one would need to use it every time. As you said in the other comment - if you want more control, use the lower level commands. The assistant makes syncing a no-brainer,git annex assist
should do so, too.That the assistant can operate on only a subdir (do people do this?) itself is again an inconsistency then... A config like
annex.assistaddall
could be introduced, but meh...I'm also not too happy with the inconsistency of assist committing all staged changes and syncing all file contents, but only adding files in the cwd.
I suppose that consistency with the assistant doesn't really matter. The assistant's behavior when ran in a subdirectory is surprising, inconsistent, and undocumented.
So I'm going to change assist to add all files. Except when -C is used, then only add files in the specified directory.
-m
short form of--message
for less typing. Consistent with sync and datalad save.Oh, this new command and sync always supported -m. I've added it to the man page.
Via a comment on my bug about the new
sync
warning suddenly appearing in 10.20230626 I see that this fairly hidden discussion seems to have been the rationale for completely changing what "sync" does "because some users expect it to do everything".At the beginning of the design I might have agreed with you about "what the sync command should do" (and having another, eg, "metasync" command for the smaller version). But changing a fundamental command, a decade later, to do something different, based entirely on a "wouldn't it have been nice (for some use cases) if..." seems quite a stretch.
As per the other bug, if you want to implement new behaviour for such a fundamental command ("git annex sync" is something one runs pretty constantly if using git annex actively from the command line) then it'd be best to implement it in another command name, instead of dramatically changing an existing one. (The other thread has a few more suggestions; I personally still like "fullsync" for the expanded version.)
Either way, such a major -- behaviour breaking -- change needs to be much better documented than "discussion in a bug that a few people saw", and a note in passing in a changelog (which people have to find by noticing strange new output in probably scripted git annex usage).
Ewen
PS: I too think it's a terrible idea to have git annex default to auto-adding any files it can find. Maybe that makes some "Dropbox-like" use cases nicer, but it also entirely breaks other long standing use cases.
@ewen for the record:
I never suggested that
git annex sync
auto-adds new files by default, see my comment above:As
git annex sync
already had so many options for configuring its behaviour, I thought having one more that runsgit annex add
in the beginning wouldn't hurt. I never suggested changinggit annex sync
's default behaviour. Purely opt-in. Same for the existing options likeannex.synccontent
etc.Changing
git annex sync
's default behaviour of now syncing content was joey's idea, not mine.There is already
git annex assist
to do exactly that. See above or the changelogs.Just to clarify. Please read the discussions you're referring to thoroughly before claiming things between the lines. 🙂
Now back to constructive discussion.