Recent comments posted to this site:

comment 8

datalad push wants to use the same git push operations as git-annex push does, which is nontrivial to reimplement, especially in its handling of the git-annex branch. See the long comment on pushBranch explaining the order of operations.

This is one place where git-annex push can't be emulated using other git-annex commands that do support --json.

But, git-annex push --no-content doesn't do much besides run pushBranch. So datalad push could use it when run in a git-annex repository. There's no need for it to support --json either, the regular git push output goes to stderr, so it can parse the git push progress out of stderr as before.

It may want to pass --quiet to avoid the usual git-annex output to stdout. AFAICS, git push does not itself output to stdout.

The only other thing that command does besides pushBranch is updateBranches, which updates view branches and adjusted branches when run in one.

Comment by joey
comment 7

git pull outputs its progress to stderr. So --json could leave that alone and a program wanting to parse it just consume stderr. Delimiters could be added to stderr around the git pull (with a separate option) to make it easier for a program to find and parse it.

git pull also outputs some things to stdout. In particular, that includes the git merge output when the merge is successful. It seems to me that could be put in the json object, eg:

{"command":"pull","output":["Updating 8a433d0..9d47770" ...

While that will buffer it until the pull is complete. That seems ok; it's displayed by git pull after the usually more expensive network operation, so buffering it briefly wouldn't be too noticable if a json consumer chooses to show it to the user.

Note that git-annex pull will pull from the remote a second time after transferring content to/from it. So the json will have 2 "command":"pull" records. And stderr may contain 2 delimited git pull stderrs. The --json consumer may find that surprising, and it doesn't always happen, which gets back to the original problem of the --json not being discoverable.

Comment by joey
comment 6

In the case where the git operation needs to prompt for a password, this would leave the user with a password prompt with no prior indication of what is being done. I don't think that's acceptable.

git pull and git push over ssh prompt for the password (to /dev/tty) before outputing anything else. So I suppose it is acceptable.

Comment by joey
comment 2

The ca-certificates.crt file seems to be hardcoded in the git-annex-standalone package:

$ grep -R ca-certificates.crt .
grep: ./usr/lib/x86_64-linux-gnu/tls/x86_64: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v4: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4: Übereinstimmungen in Binärdatei
grep: ./usr/lib/x86_64-linux-gnu/libgnutls.so.30: Übereinstimmungen in Binärdatei
grep: ./usr/lib/x86_64-linux-gnu/x86_64: Warnung: rekursive Verzeichnisschleife
$ strings ./usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4 | grep ca-certificates
/etc/ssl/certs/ca-certificates.crt
$ strings ./usr/lib/x86_64-linux-gnu/libgnutls.so.30 | grep ca-certificates
/etc/ssl/certs/ca-certificates.crt
Comment by matrss
comment 1
In a Debian 13 container this is all working fine. AFAICT the ca-certificates packages are very different between Rocky and Debian, on Debian I am getting /etc/ssl/certs populated with a bunch of files (presumably all CAs) and with a ca-certificates.crt file, while on Rocky I am only getting /etc/ssl/certs/ca-bundle.crt and /etc/ssl/certs/ca-bundle.trust.crt. After uninstalling ca-certificates on Debian the only file left in that directory is ca-certificates.crt and git-annex continues to work, so I assume this file is required but doesn't exist on Rocky.
Comment by matrss
comment 5

datalad push currently does not use git-annex push and it would be good it it could in order to avoid some surprising behavior with its current implementation.

But, it parses the git push output to display its own progress messages. Since git-annex push interleaves that with whatever else it outputs, adapting to parsing it would be difficult.

In order for it to use git-annex push, it seems it would need --json-progress support, and either parsing of the git push in git-annex that feeds through to the --json-progress, or some form of machine readable delimiters in stdout and stderr around the git push output.

Comment by joey
comment 7

The external special remote protocol recently got a DELEGATE extension. That offers a possible alternative way to handle wanting to compress some files and not others.

Suppose that special remotes can have compression enabled, or not, at initremote time. The compressor is also chosen then. Neither can be changed. And all files stored in the special remote are compressed. Very simple.

In order to compress some files, but not others, an external special remote could pick which files to compress (based on extension say). It would delegate to two different special remote configurations, one with compression and one without.

Similarly, if some files use one compressor and some files another one, it can delegate to different special remote configurations with the compressor it selects.

Note that, with this approach, the external special remote needs to take care to always compress the same set of files with the same compressor. If it changes its mind retrieval will fail at checksum verification time.

I'm not sold on this idea, but it's an interesting application of the DELEGATE extension.

Comment by joey
comment 6

Well, it could store the compressor in a byte or two at the start of the object file. Then there would only need to be a single namespace for compressed objects. That avoids the exponential blowup with chunking, more or less. If it currently tries 4 chunk sizes, also checking for compressed and non-compressed objects would double the overhead.

When not using chunking, there would also be a doubling of the overhead.

That seems acceptable, if only special remotes with compression enabled pay the price.

Comment by joey
comment 3

git-annex addurl --no-raw will prevent it from using the web remote in these cases.

But, it does not force treating a given url as a torrent. I suppose the torrent:url idea still has merit.

Comment by joey