Recent comments posted to this site:
datalad push wants to use the same git push operations as
git-annex push does, which is nontrivial to reimplement,
especially in its handling of the git-annex branch.
See the long comment on pushBranch explaining the order of operations.
This is one place where git-annex push can't be emulated using other
git-annex commands that do support --json.
But, git-annex push --no-content doesn't do much besides run pushBranch.
So datalad push could use it when run in a git-annex repository.
There's no need for it to support --json either, the regular git push
output goes to stderr, so it can parse the git push progress out of
stderr as before.
It may want to pass --quiet to avoid the usual git-annex output to
stdout. AFAICS, git push does not itself output to stdout.
The only other thing that command does besides pushBranch is
updateBranches, which updates view branches and adjusted branches when
run in one.
git pull outputs its progress to stderr. So --json could leave that alone
and a program wanting to parse it just consume stderr. Delimiters could
be added to stderr around the git pull (with a separate option)
to make it easier for a program to find and parse it.
git pull also outputs some things to stdout.
In particular, that includes the git merge output when the merge is
successful. It seems to me that could be put in the json object, eg:
{"command":"pull","output":["Updating 8a433d0..9d47770" ...
While that will buffer it until the pull is complete. That seems ok;
it's displayed by git pull after the usually more expensive
network operation, so buffering it briefly wouldn't be too noticable if
a json consumer chooses to show it to the user.
Note that git-annex pull will pull from the remote a second time after
transferring content to/from it. So the json will have 2 "command":"pull"
records. And stderr may contain 2 delimited git pull stderrs.
The --json consumer may find that surprising, and it doesn't always happen,
which gets back to the original problem of the --json not being discoverable.
In the case where the git operation needs to prompt for a password, this would leave the user with a password prompt with no prior indication of what is being done. I don't think that's acceptable.
git pull and git push over ssh prompt for the password (to /dev/tty)
before outputing anything else. So I suppose it is acceptable.
The ca-certificates.crt file seems to be hardcoded in the git-annex-standalone package:
$ grep -R ca-certificates.crt .
grep: ./usr/lib/x86_64-linux-gnu/tls/x86_64: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v4: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4: Übereinstimmungen in Binärdatei
grep: ./usr/lib/x86_64-linux-gnu/libgnutls.so.30: Übereinstimmungen in Binärdatei
grep: ./usr/lib/x86_64-linux-gnu/x86_64: Warnung: rekursive Verzeichnisschleife
$ strings ./usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4 | grep ca-certificates
/etc/ssl/certs/ca-certificates.crt
$ strings ./usr/lib/x86_64-linux-gnu/libgnutls.so.30 | grep ca-certificates
/etc/ssl/certs/ca-certificates.crt
datalad push currently does not use git-annex push and it would be good
it it could in order to avoid some surprising behavior with its current
implementation.
But, it parses the git push output to display its own
progress messages. Since git-annex push interleaves that with whatever
else it outputs, adapting to parsing it would be difficult.
In order for it to use git-annex push, it seems it
would need --json-progress support, and either parsing of the git push
in git-annex that feeds through to the --json-progress, or some form of
machine readable delimiters in stdout and stderr around the git push
output.
The external special remote protocol recently got a DELEGATE extension. That offers a possible alternative way to handle wanting to compress some files and not others.
Suppose that special remotes can have compression enabled, or not, at initremote time. The compressor is also chosen then. Neither can be changed. And all files stored in the special remote are compressed. Very simple.
In order to compress some files, but not others, an external special remote could pick which files to compress (based on extension say). It would delegate to two different special remote configurations, one with compression and one without.
Similarly, if some files use one compressor and some files another one, it can delegate to different special remote configurations with the compressor it selects.
Note that, with this approach, the external special remote needs to take care to always compress the same set of files with the same compressor. If it changes its mind retrieval will fail at checksum verification time.
I'm not sold on this idea, but it's an interesting application of the DELEGATE extension.
Well, it could store the compressor in a byte or two at the start of the object file. Then there would only need to be a single namespace for compressed objects. That avoids the exponential blowup with chunking, more or less. If it currently tries 4 chunk sizes, also checking for compressed and non-compressed objects would double the overhead.
When not using chunking, there would also be a doubling of the overhead.
That seems acceptable, if only special remotes with compression enabled pay the price.
git-annex addurl --no-raw will prevent it from using the web remote in
these cases.
But, it does not force treating a given url as a torrent. I suppose the
torrent:url idea still has merit.