In "--fast" option for git annex get? it was indicated that git annex get --fast
doesn't have any effect.
In an HPC context, users are frequently expected to use login (or dedicated data transfer) nodes for data transfer, and can get their sessions killed for excessive CPU use. For OpenNeuro, the high bandwidth between many HPC centers and S3 means that checksums can become the bottleneck in data transfer. I would like to be able to recommend something like:
git annex get -f s3-PUBLIC --fast --all
srun git annex fsck
Is this feasible?
Annex.fast
setting, but theget
command implementation is harder to grep for.Command/Get.hs
What you can recommend, which works already, is:
As to adding this to --fast, I think some would be surprised if --fast allowed bad data to get into the repository. And commands like
git-annex copy --to
that do support --fast already use it to avoid round trip checks. It would not do to make --fast for those commands also avoid verification. Andgit-annex copy
is very close togit-annex get
, to the point thatgit-annex get --from
is the same asgit-annex copy --from
.So, I think it's better to keep this a separate option, and the -c option I gave above works well enough I suppose.
With that said, you're the second person asking about this in an HPC context this week. I suspect maybe you and @mih were working on the same problem in asking about this? Anyway, since you both seemed to have difficulty finding the way to do this, maybe it would be worth making a dedicated option like
--no-verify
.annex.verify
inman git-annex-config
. Do those docs need refreshing?