Recent comments posted to this site:
I diagnose a bug introduced in 86dbe9a825b9c615c63e0cfc5e4a737a249f8989 that makes it only be able to remove the size if the object file is locally present. Fixed.
What you can recommend, which works already, is:
git -c annex.verify=false annex get
As to adding this to --fast, I think some would be surprised if --fast
allowed bad data to get into the repository. And commands like
git-annex copy --to
that do support --fast already use it to avoid round
trip checks. It would not do to make --fast for those commands also avoid
verification. And git-annex copy
is very close to git-annex get
, to
the point that git-annex get --from
is the same as git-annex copy
--from
.
So, I think it's better to keep this a separate option, and the -c option I gave above works well enough I suppose.
With that said, you're the second person asking about this in an HPC
context this week. I suspect maybe you and @mih were working on the same
problem in asking about this? Anyway, since you both seemed to have
difficulty finding the way to do this, maybe it would be worth making a
dedicated option like --no-verify
.
Well curl does have a --cookie option. But setting that would make all downloads from the web special remote have the same cookies set. So exposing them to any other web servers you also use with that remote.
I think that generally, things involving authentication are a good use case for writing a little external special remote of your own that handles the particulars of a given service. Especially if you can share it with others. ?example.sh is a good starting place for writing that.
That said, this is also right on the line to something it might be possible for git-annex to support better without you needing to do that work. It's actually possible to initremote a second web special remote that is limited to a single host and is used preferentially to the web special remote:
git-annex initremote --sameas=web archiveorg type=web urlinclude='*archive.org/*'
git config remote.archiveorg.annex-cost 100
If annex.web-options
had a per-remote config, like some other configs do,
but which it currently does not, you could then just set that to pass the
cookies to curl when using that archiveorg special remote:
git config remote.archiveorg.annex-web-options "--cookie=whatever"
Since that seems like a good idea, I've implemented it! Get it in the next release or a daily build.
PS, you'll also need to set this, which does have its own security ramifications:
git config annex.security.allowed-ip-addresses all
Adding a large file to git just because the git-annex branch is currently checked out seems like it would be a large footbomb. That is generally harder to recover from than adding a file to the annex and then realizing it needs to be added to git instead.
Since git generally allows switching branches with new files
staged. It would be entirely reasonable to check out the git-annex branch
after adding a new annexed file but before committing it.
And checking out the git-annex branch, git-annex add
of a large file
without committing it, then switching back to the main branch and committing
there is also possible if someone wants to do that for some reason.
Since manual commits to the git-annex branch need extra steps anyway (eg removing .git/annex/index or committing using it instead of the usual index file), I don't see much point in refining it.
Annex.fast
setting, but the get
command implementation is harder to grep for.
I don't use the assistant, but you should be able to add the following line to your .gitignore
file:
*.crdownload
Sure it will work fine to have different versions of annexed files. git-annex will know the url it can use to get whichever version is checked out in the current git branch.
As for an easier way, it's possible to use git-annex-import with certian special remotes, which imports a tree of files from them, and re-running it imports whatever files are new or changed. This needs a special remote that supports it, and it would perhaps be possible to write such a special remote for Zenodo. Dunno if it would be worth the work to implement, but it may be worth seeing if Datalad could support that, if you use Datalad.