I am looking to add a uri of files on my institution's HPC cluster to my local git annex with the git annex addurl command. I am able to get transfers over http working, but want to know if other protocols such as sftp or rsync are also available. What is your advice for accessing files from HPC systems for which the files are not in a remote annex (due to other limitations).
git-annex can be used with any url scheme that curl supports, but you have to configure it to allow using it. See the documentation of annex.security.allowed-url-schemes in the git-annex man page.
You will also have to set annex.security.allowed-ip-addresses to "all".
It seems that even with both settings, git-annex still avoids using curl for unsupported url schemes, unless you also set annex.web-options to some option used by curl. That forces it to use curl. I set it to "--netrc". You will probably need to use that option anyway since I think curl needs configuration in a netrc file to authenticate for sftp.
(I feel that it's a bug that annex.web-options needs to be set to make it use curl, and I've fixed that in master.)
I am really quite new to git annex (but am excited to familiarize myself with it!), so I don't know how to modify settings. I've tried this: $ git annex config --set annex.security.allowed-ip-addresses "all" git-annex: annex.security.allowed-ip-addresses is not a configuration setting that can be stored in the git-annex branch
I am also having a difficult time parsing through the webpage to find the documentation on the allowed-url-schemes setting.
->
It seems git-annex doesn't store this info in the git-annex branch. Probably because it's a security critical setting and you wouldn't want that to be set for you when cloning a public repo.
The documentation about these git config settings is in the main git-annex man page in the CONFIGURATION section.
If you pass an url to
git-annex addurl
that contains authentication data, it will be stored like any other url.I'd prefer not to have my credentials stored in plain text.
When I run curl with the "--user my_username" key-value pair, it requests my password for the server. Does git-annex allow this authentication request from curl to be passed to the user and returned so that the password isn't stored anywhere on disk?
git-annex doesn't prompt for passwords for files downloaded from the web, and it only stores the password if it's included in the url given to it.
So if you don't want to store the password in the git repository, storing it in an outside file like the netrc file seems like the only way.
I wonder if the web special remote is a good fit for what you're doing at all. It's not intended for authenticated access, but for access to files on the public web. It may be that you would be better off using a special-purpose remote that handles the downloads and passwords in whatever way makes sense for the HPC cluster. See external.