Recent comments posted to this site:

comment 4

Rocky does have an equivilant ca-certificates package, which seems likely to provide certificates in a place that will work.

Unfortunately that is not the case. The ca-certificates package on Rocky Linux 9 does not provide /etc/ssl/certs/ca-certificates.crt, only /etc/ssl/certs/ca-bundle.crt and /etc/ssl/ca-bundle.trust.crt. /etc/ssl/certs/ca-bundle.crt seems to be Rocky's equivalent to /etc/ssl/certs/ca-certificates.crt, but it is not picked up by the git-annex-standalone package.

I can confirm that symlinking (ln -s /etc/ssl/certs/ca-bundle.crt /etc/ssl/certs/ca-certificates.crt) fixes the issue, but I am not sure if I will be able to convince the HPC sysadmins to do so everywhere.

While the changelog for the Rocky Linux 10 ca-certificates package says that the /etc/ssl/certs/ca-certificates.crt symlink has been restored (https://rockylinux.pkgs.org/10/rockylinux-baseos-x86_64/ca-certificates-2025.2.80_v9.0.305-102.el10.noarch.rpm.html#:~:text=%2A%20%2Fetc%2Fssl%2Fcerts%2Fca%2Dcertificates%2Ecrt) this doesn't actually seem to be the case, at least I can't find it in the docker.io/rockylinux/rockylinux:10 image.

I don't think there is any way to override the location with an environment variable.

At least for GnuTLS it seems to be not implemented: https://gitlab.com/gnutls/gnutls/-/work_items/1279. But some applications make it overwritable, and this gave me another idea: setting export GIT_SSL_CAINFO=/etc/ssl/certs/ca-bundle.crt eliminates the error I was seeing (which seemed to originate from git after all). But I am not sure if there won't be other issues due to the missing /etc/ssl/certs/ca-certificates.crt file and that path being hardcoded in GnuTLS. At least git annex get via p2phttp seems to be working fine, but now I am wondering which CAs it is actually using to validate the remote...

Comment by matrss
comment 3

The standalone build is made on Debian and inherits where it expects to find SSL certs from there. On Debian, the ca-certificates package provides the certificates, and derives from those shipped with Mozilla.

Rocky does have an equivilant ca-certificates package, which seems likely to provide certificates in a place that will work.

There may be an argument for the standalone build bundling its own copy of certificates. Of course, it then would need a security update every time there is a removal. But, Debian allows the admin to configure which certificate authorities they trust and only populates the file with those. If the standalone build overrode that, it would be at least surprising. It might be a good compromise for it to ship its own copy, but only use it if the file is not present in /etc/.

As well as the 2 C libraries, git-annex links to a haskell library that reads the certificates too. Each of these would need to be patched, probably, I don't think there is any way to override the location with an environment variable.

Comment by joey
comment 8

datalad push wants to use the same git push operations as git-annex push does, which is nontrivial to reimplement, especially in its handling of the git-annex branch. See the long comment on pushBranch explaining the order of operations.

This is one place where git-annex push can't be emulated using other git-annex commands that do support --json.

But, git-annex push --no-content doesn't do much besides run pushBranch. So datalad push could use it when run in a git-annex repository. There's no need for it to support --json either, the regular git push output goes to stderr, so it can parse the git push progress out of stderr as before.

It may want to pass --quiet to avoid the usual git-annex output to stdout. AFAICS, git push does not itself output to stdout.

The only other thing that command does besides pushBranch is updateBranches, which updates view branches and adjusted branches when run in one.

Comment by joey
comment 7

git pull outputs its progress to stderr. So --json could leave that alone and a program wanting to parse it just consume stderr. Delimiters could be added to stderr around the git pull (with a separate option) to make it easier for a program to find and parse it.

git pull also outputs some things to stdout. In particular, that includes the git merge output when the merge is successful. It seems to me that could be put in the json object, eg:

{"command":"pull","output":["Updating 8a433d0..9d47770" ...

While that will buffer it until the pull is complete. That seems ok; it's displayed by git pull after the usually more expensive network operation, so buffering it briefly wouldn't be too noticable if a json consumer chooses to show it to the user.

Note that git-annex pull will pull from the remote a second time after transferring content to/from it. So the json will have 2 "command":"pull" records. And stderr may contain 2 delimited git pull stderrs. The --json consumer may find that surprising, and it doesn't always happen, which gets back to the original problem of the --json not being discoverable.

Comment by joey
comment 6

In the case where the git operation needs to prompt for a password, this would leave the user with a password prompt with no prior indication of what is being done. I don't think that's acceptable.

git pull and git push over ssh prompt for the password (to /dev/tty) before outputing anything else. So I suppose it is acceptable.

Comment by joey
comment 2

The ca-certificates.crt file seems to be hardcoded in the git-annex-standalone package:

$ grep -R ca-certificates.crt .
grep: ./usr/lib/x86_64-linux-gnu/tls/x86_64: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v4: Warnung: rekursive Verzeichnisschleife
grep: ./usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4: Übereinstimmungen in Binärdatei
grep: ./usr/lib/x86_64-linux-gnu/libgnutls.so.30: Übereinstimmungen in Binärdatei
grep: ./usr/lib/x86_64-linux-gnu/x86_64: Warnung: rekursive Verzeichnisschleife
$ strings ./usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4 | grep ca-certificates
/etc/ssl/certs/ca-certificates.crt
$ strings ./usr/lib/x86_64-linux-gnu/libgnutls.so.30 | grep ca-certificates
/etc/ssl/certs/ca-certificates.crt
Comment by matrss
comment 1
In a Debian 13 container this is all working fine. AFAICT the ca-certificates packages are very different between Rocky and Debian, on Debian I am getting /etc/ssl/certs populated with a bunch of files (presumably all CAs) and with a ca-certificates.crt file, while on Rocky I am only getting /etc/ssl/certs/ca-bundle.crt and /etc/ssl/certs/ca-bundle.trust.crt. After uninstalling ca-certificates on Debian the only file left in that directory is ca-certificates.crt and git-annex continues to work, so I assume this file is required but doesn't exist on Rocky.
Comment by matrss
comment 5

datalad push currently does not use git-annex push and it would be good it it could in order to avoid some surprising behavior with its current implementation.

But, it parses the git push output to display its own progress messages. Since git-annex push interleaves that with whatever else it outputs, adapting to parsing it would be difficult.

In order for it to use git-annex push, it seems it would need --json-progress support, and either parsing of the git push in git-annex that feeds through to the --json-progress, or some form of machine readable delimiters in stdout and stderr around the git push output.

Comment by joey
comment 7

The external special remote protocol recently got a DELEGATE extension. That offers a possible alternative way to handle wanting to compress some files and not others.

Suppose that special remotes can have compression enabled, or not, at initremote time. The compressor is also chosen then. Neither can be changed. And all files stored in the special remote are compressed. Very simple.

In order to compress some files, but not others, an external special remote could pick which files to compress (based on extension say). It would delegate to two different special remote configurations, one with compression and one without.

Similarly, if some files use one compressor and some files another one, it can delegate to different special remote configurations with the compressor it selects.

Note that, with this approach, the external special remote needs to take care to always compress the same set of files with the same compressor. If it changes its mind retrieval will fail at checksum verification time.

I'm not sold on this idea, but it's an interesting application of the DELEGATE extension.

Comment by joey