Recent comments posted to this site:

Sure it will work fine to have different versions of annexed files. git-annex will know the url it can use to get whichever version is checked out in the current git branch.

As for an easier way, it's possible to use git-annex-import with certian special remotes, which imports a tree of files from them, and re-running it imports whatever files are new or changed. This needs a special remote that supports it, and it would perhaps be possible to write such a special remote for Zenodo. Dunno if it would be worth the work to implement, but it may be worth seeing if Datalad could support that, if you use Datalad.

Comment by joey 1 day and 18 hours ago

I diagnose a bug introduced in 86dbe9a825b9c615c63e0cfc5e4a737a249f8989 that makes it only be able to remove the size if the object file is locally present. Fixed.

Comment by joey 1 day and 18 hours ago

What you can recommend, which works already, is:

git -c annex.verify=false annex get

As to adding this to --fast, I think some would be surprised if --fast allowed bad data to get into the repository. And commands like git-annex copy --to that do support --fast already use it to avoid round trip checks. It would not do to make --fast for those commands also avoid verification. And git-annex copy is very close to git-annex get, to the point that git-annex get --from is the same as git-annex copy --from.

So, I think it's better to keep this a separate option, and the -c option I gave above works well enough I suppose.

With that said, you're the second person asking about this in an HPC context this week. I suspect maybe you and @mih were working on the same problem in asking about this? Anyway, since you both seemed to have difficulty finding the way to do this, maybe it would be worth making a dedicated option like --no-verify.

Comment by joey 1 day and 18 hours ago

Well curl does have a --cookie option. But setting that would make all downloads from the web special remote have the same cookies set. So exposing them to any other web servers you also use with that remote.

I think that generally, things involving authentication are a good use case for writing a little external special remote of your own that handles the particulars of a given service. Especially if you can share it with others. ?example.sh is a good starting place for writing that.

That said, this is also right on the line to something it might be possible for git-annex to support better without you needing to do that work. It's actually possible to initremote a second web special remote that is limited to a single host and is used preferentially to the web special remote:

git-annex initremote --sameas=web archiveorg type=web urlinclude='*archive.org/*'
git config remote.archiveorg.annex-cost 100

If annex.web-options had a per-remote config, like some other configs do, but which it currently does not, you could then just set that to pass the cookies to curl when using that archiveorg special remote:

git config remote.archiveorg.annex-web-options "--cookie=whatever"

Since that seems like a good idea, I've implemented it! Get it in the next release or a daily build.

PS, you'll also need to set this, which does have its own security ramifications:

git config annex.security.allowed-ip-addresses all
Comment by joey 1 day and 19 hours ago

Adding a large file to git just because the git-annex branch is currently checked out seems like it would be a large footbomb. That is generally harder to recover from than adding a file to the annex and then realizing it needs to be added to git instead.

Since git generally allows switching branches with new files staged. It would be entirely reasonable to check out the git-annex branch after adding a new annexed file but before committing it. And checking out the git-annex branch, git-annex add of a large file without committing it, then switching back to the main branch and committing there is also possible if someone wants to do that for some reason.

Since manual commits to the git-annex branch need extra steps anyway (eg removing .git/annex/index or committing using it instead of the usual index file), I don't see much point in refining it.

Comment by joey 1 day and 19 hours ago
:facepalm: Command/Get.hs
Comment by cjmarkie 5 days and 19 hours ago
By the way, I'm happy to try my hand at contributing if I can get a couple pointers to where to start work. It's been a while since I wrote Haskell, so following execution paths is rusty. I've found the Annex.fast setting, but the get command implementation is harder to grep for.
Comment by cjmarkie 5 days and 19 hours ago
Thanks for this solution. I am still looking for a general solution.
Comment by jnkl 5 days and 22 hours ago

I don't use the assistant, but you should be able to add the following line to your .gitignore file:

*.crdownload
Comment by cjmarkie 6 days and 12 hours ago
Thanks for the fix, I will try that when I get the time and when that release gets built on https://github.com/datalad/git-annex/ so that I can use my existing action.
Comment by Basile.Pinsard 6 days and 14 hours ago