ATM
files didn't datalad push
as they should have due to existing settings of wanted
:
❯ git annex wanted datasets.datalad.org
include=.datalad/* and (not metadata=distribution-restrictions=*)
❯ git annex find --not --in datasets.datalad.org .
crcns-2022-dataland.pdf
crcns-2022-dataland.png
crcns-2022-dataland.svg
❯ git annex metadata *
metadata crcns-2022-dataland.pdf
ok
metadata crcns-2022-dataland.png
ok
metadata crcns-2022-dataland.svg
ok
❯ git annex copy --auto --to datasets.datalad.org *
❯ git annex version
git-annex version: 10.20231227-1~ndall+1
so I was confused... the reason was
❯ git annex copy --to datasets.datalad.org *
copy crcns-2022-dataland.pdf (to datasets.datalad.org...)
copying to non-ssh repo not supported
failed
copy crcns-2022-dataland.png (to datasets.datalad.org...)
copying to non-ssh repo not supported
failed
copy crcns-2022-dataland.svg (to datasets.datalad.org...)
copying to non-ssh repo not supported
failed
copy: 3 failed
wherever I have
❯ git remote show -n datasets.datalad.org
* remote datasets.datalad.org
Fetch URL: https://datasets.datalad.org/datalad/artwork/.git
Push URL: falkor.dartmouth.edu:/srv/datasets.datalad.org/www/datalad/artwork/.git
...
and the use case is quite common for me and in particular for ReproNim/containers which is shared/adjusted in similar ways
This would not have prevented
copy --auto
from trying to copy the files and failing the same way ascopy
without that option. So I think there must be something in your preferred content that made it skip trying to copy those files.Maybe you meant to have an "or" there? With the and it only wants files that are in .datalad/ as well as not having the metadata set.
As for
pushInsteadOf
, in 2011 this was considered in https://bugs.debian.org/644278. And the result was that git-annex honorsinsteadOf
but notpushInsteadOf
orpushurl
. With the (weak?) rationalle that what git-annex does is neither pushing or pulling really.So it seems to me better for you to use
insteadOf
. Unless there's some reason why you need git to pull from the http url rather than from the ssh url?Perhaps you're setting this up for many users, some of whom are limited to read-only access. Pulling from http would work for those users. And git-annex get from http also works read-only the way your repository is set up.
If that is the reason you want to use
pushInsteadOf
rather thaninsteadOf
, it would follow that you would want git-annex to use the pull url for getting files, but use the push url for putting/dropping files.But: If this change were made, it would risk breaking existing working setups, that happen to have a push url that points to a different repository. When git-annex was upgraded to use the push url, it would start noticing that the repository behind the url has a different uuid than the remote does.
For a ssh repository, that would prevent it from using the repository until the user did something to fix the configuration.
For a local repository, git-annex currently automatically updates the cached repository uuid. It's not clear to me how that would work if there were two urls pointing to two different repositories. Does seem like this would prevent eg, getting files from the remote that it was able to get before.
I don't know how common such a setup with a push url pointing to a different repository might be. I think it is much more likely that
remote.foo.pushurl
be pointed to some other url that is not on the same server. pushInsteadOf is really intended to configure a different access method for the same server as the repository url.just ran into this again with
datalad push
which surprised me (since I do not get into it with regulargit push
), and took me a bit to figure out/find this issue.It is my pattern of working with git -- clone via public URL whenever possible (so I do not have to load/use any ssh key without necessity; could use the same URLs on public and private hosts alike) and only when needed to push, automagically push via ssh. FWIW I really love such workflow and use it not only for github but other hosting providers too!
And IMHO indeed it would make total sense for a similar separation of "use public public/read access route regardless of having or not credentials for private/write, and use secure/authenticated route only if write/push is necessary" for git-annex too. The utility of
insteadOf
is not allowing for such separation, but at least indeed would allow "location-wide" overload of using secure/authenticated even when simpler public access route possible.Indeed adding such a feature parity with
git
might break existing setups, but I would say it should only fix a possible divergence and remove the surprise that annex is behaving differently from how git does it. IMHO it is unlikely someone hadpushInsteadOf
configured to havegit
push somewhere else (thus git-annex branch going there too) while still somehow interested to use original URL for git-annex.FWIW, I keep running into this. Re
pushurl
could take precedence, as overwrite thepushInsteadOf
mapped value (did not check what git's behavior in presence of both pushurl and pushInsteadOf).git-annex does not currently use pushurl, and making it start to use it would be the same kind of potentially breaking change as making it start to use pushinsteadof.
I get where you're coming from but just because a lot of people use pushinstead of that way does not mean that other people don't use it to redirect pushes to an entirely different clone of the repository.
Here Junio calls using pushurl that way a "common mistake", so I guess he is seeing people do that. He does have a good point that with such a configuration refs/remotes/origin won't (usually) reflect the state of both repos.
If git-annex used pushInsteadOf for sending content to a remote, should it also use it for dropping content from the remote? Dropping is quite far from pushing. Does it make sense to expect the user to generalize "push" to "arbitrary write access" when it comes to git-annex's interpretation of configuration settings that were designed for git?
Granted,
git-annex push
can drop content from the remote when preferred content is configured to.Maybe what's really missing is
url.<base>.annexInsteadOf
corresponding tourl.<base>.pushInsteadOf
.The same way
remote.<name>.annexUrl
corresponds toremote.<name>.pushUrl
.You would need to set 2 configs, but the separation is clear. And you could set it once in your global git config for whatever servers you commonly use.
Another benefit to is that the new
git-annex p2phttp
server needs annexUrl to be configured to a different url than the git url when using it. annexInsteadOf would let that be configured a single time for all urls on a given git server.Update: Implemented that. Let me know if you think it solves your problem well enough.
annexInsteadOf
and its alignment toannexUrl <-> pushUrl
! I will try it up soonish. Thank you!Note that I've opened a related todo, config different remote to use for write operations which might be a better approach to the
pushInsteadOf
type of thing.