git-annex can use the web as a special remote, associating an url with an annexed file, and downloading the file content from the web. See using the web as a special remote for usage examples.
The web special remote is always enabled, without any manual setup being needed. Its name is "web".
This special remote can only be used for downloading content, not uploading content, or removing content from the web.
This special remote uses urls on the web as the source for content.
There are several other ways http can be used to download annexed objects,
including a git remote accessible by http, S3 with a publicurl
configured,
and the httpalso special remote.
configuration
These parameters can be passed to git annex initremote
or
git-annex enableremote
to configure a web remote:
urlinclude
- Only use urls that match the specified glob.
For example,urlinclude="https://s3.amazonaws.com/*"
urlexclude
- Don't use urls that match the specified glob.
For example, to prohibit http urls, but allow https, useurlexclude="http:*"
Globs are matched case-insensitively.
When there are multiple special remotes of type web, and some are not
configured with urlinclude
and/or urlexclude
, those will avoid using
urls that are matched by the configuration of other web remotes.
For example, this creates a second web special remote named "slowweb" that
is only used for urls on one host, and that has a higher cost than the
"web" special remote. With this configuration, git-annex get
will first
try to get the file from the "web" special remote, which will avoid
using any urls that match slowweb's urlinclude. Only if the content
can't be downloaded from "web" (or some other remote) will it fall back
to downloading from slowweb.
git-annex initremote --sameas=web slowweb type=web urlinclude='*//slowhost.com/*'
git config remote.slowweb.cost 300
When it says "arbitrary urls", it means it. The only requirement is that the url be well formed and that wget or whatever command you have it configured to use via annex.web-download-command knows how to download it.
Update 2018: That used to be the case, but it's now limited by default to http and https urls.
Sorry, should have read the man page.
of course I have to use %url and %file
So it works with "rsync %url %file" but it seems to not work recursive also it renames the files instead of adding them with their normal name. So not useful for what I want to do.
I want to access a normal unmodified directory on my server and add the files to my local directory. That would be a minimal setup, everything else means just extremly big setups with assistant runnig and a cronjob to delete unused files, and lots of cpu load for indexing this files on the server.
I think such minimal setup would be great to get startet without having very complex setups, you dont want to commit to such a tool and hours of setup to get somethnig useful, just to look if its useful for you.
there are 2 aproaches either have a normal repos on the server, with again cronjob and flat setup. which is quite a setup to get.
or use only a real repos on the client, both have big disadvantages normal repos complex setup, to complex to just test it. web links seem to be simple enough but is not recursive therefor only good for youtube links or stuff like that.
Is there really no simple solution for what I want to do?
@spiderbit, to support recursively adding content, it would need to parse html, and that's simply too complex and too unlikely to be useful to many people.