Hi, is there a quick way to implement a 'wget --mirror' like behaviour with git-annex?
I.e., I'd like to clone an entire website and adding it with original urls to my annex.
I've tried 'git-annex addurl http://website.tld' only, but of course it downloads a single page, it does not descend recursively.
Thanks
There is not, but if you can find a way to get wget or something to generate a list of urls and the files it downloaded them to, you can feed that into
git-annex addurl --batch
to teach git-annex what the urls are.There is a subsystem in git-annex that could in theory be used for this, git-annex-import can import trees of files from a special remote.
But the complexity of mirroring a website makes me think I would not want to try to support it in the web special remote. I mean, just look at how many options wget has that you might use to control how the mirroring works.
Other special remotes can support importing from specific types of websites though. Currently this is limited to built-in special remotes, such as S3, but it would be possible to expand it to support external special remotes as well. See importtree only remotes for discussion about doing that.