basic use
The web can be used as a special remote too.
# git annex addurl http://example.com/video.mpeg
addurl example.com_video.mpeg (downloading http://example.com/video.mpeg)
########################################################## 100.0%
ok
Now the file is downloaded, and has been added to the annex like any other file. So it can be renamed, copied to other repositories, and so on.
To add a lot of urls at once, just list them all as parameters to
git annex addurl
.
trust issues
Note that git-annex assumes that, if the web site does not 404, and has the right file size, the file is still present on the web, and this counts as one copy of the file. If the file still seems to be present on the web, it will let you remove your last copy, trusting it can be downloaded again:
# git annex drop example.com_video.mpeg
drop example.com_video.mpeg (checking http://example.com/video.mpeg) ok
If you don't trust the web to this degree, just let git-annex know:
# git annex untrust web
untrust web ok
With the result that it will hang onto files:
# git annex drop example.com_video.mpeg
drop example.com_video.mpeg (unsafe)
Could only verify the existence of 0 out of 1 necessary copies
Also these untrusted repositories may contain the file:
00000000-0000-0000-0000-000000000001 -- web
(Use --force to override this check, or adjust numcopies.)
failed
attaching urls to existing files
You can also attach urls to any file already in the annex:
# git annex addurl --file my_cool_big_file http://example.com/cool_big_file
addurl my_cool_big_file ok
# git annex whereis my_cool_big_file
whereis my_cool_big_file (2 copies)
00000000-0000-0000-0000-000000000001 -- web
27a9510c-760a-11e1-b9a0-c731d2b77df9 -- here
configuring addurl filenames
By default, addurl
will generate a filename for you. You can use
--file=
to specify the filename to use.
If you're adding a bunch of related files to a directory, or just don't
like the default filenames generated by addurl
, you can use --pathdepth
to specify how many parts of the url are put in the filename.
A positive number drops that many paths from the beginning, while a negative
number takes that many paths from the end.
# git annex addurl http://example.com/videos/2012/01/video.mpeg
addurl example.com_videos_2012_01_video.mpeg (downloading http://example.com/videos/2012/01/video.mpeg)
# git annex addurl http://example.com/videos/2012/01/video.mpeg --pathdepth=2
addurl 2012_01_video.mpeg (downloading http://example.com/videos/2012/01/video.mpeg)
# git annex addurl http://example.com/videos/2012/01/video.mpeg --pathdepth=-2
addurl 01_video.mpeg (downloading http://example.com/videos/2012/01/video.mpeg)
videos
There's support for downloading videos from sites like YouTube, Vimeo, and many more. This relies on yt-dlp to download the videos.
When you have yt-dlp installed, you can just
git annex addurl http://youtube.com/foo
and it will detect that
it is a video and download the video content for offline viewing.
(However, this is disabled by default as it can be a security risk. See the documentation of annex.security.allowed-ip-addresses in git-annex for details.)
Later, in another clone of the repository, you can run git annex get
on
the file and it will also be downloaded with yt-dlp. This works
even if the video host has transcoded or otherwise changed the video
in the meantime; the assumption is that these video files are equivalent.
There is an annex.youtube-dl-options
configuration setting that can be used
to pass parameters to yt-dlp. For example, you could set git config
annex.youtube-dl-options "--format worst"
to configure it to download low
quality videos from YouTube.
To download all the videos in a youtube channel, you can use
git-annex importfeed --scrape
with the url to the
channel, or you can find the RSS feed for the channel, and
git-annex importfeed
that url (without --scrape
).
bittorrent
The bittorrent special remote lets git-annex also download the content of torrent files, and magnet links to torrents.
You can simply pass the url to a torrent to git annex addurl
the same as any other url.
You have to have aria2 and bittornado (or the original bittorrent) installed for this to work.
podcasts
This is done using git annex importfeed
. See downloading podcasts.
configuring which url is used when there are several
An annexed file can have content at multiple urls that git-annex knows about, and git-annex may use any of those urls for downloading a file.
If some urls are especially fast, or especially slow, you might want to configure which urls git-annex prefers to use first, or should only use as a last resory. To accomplish that, you can create additional remotes, that are web special remotes, and are configured to only be used for some urls, and have a different cost than the web special remote.
For example, suppose that you want to prioritize using urls on "fasthost.com".
git-annex initremote --sameas=web fasthost type=web urlinclude='*//fasthost.com/*' cost=150
Now, git-annex get
of a file that is on both fasthost.com and another url
will prefer to use the fasthost special remote, rather than the web special
remote (which has a higher cost of 200), and so will use the fasthost.com
url. If that url is not available, it will fall back to the web special
remote, and use the other url.
Suppose that you want to avoid using urls on "slowhost.com", except as a last resort.
git-annex initremote --sameas=web slowhost type=web urlinclude='*//slowhost.com/*' cost=300
Now, git-annex get
of a file that is on both slowhost.com and another url
will first try the fasthost remote. If fasthost does not support the url,
it will next try the regular "web" remote. Which will avoid using
urls that are used by the configuration of either fasthost or slowhost.
Finally, if it's unable to get the file from some other url, it will
use the slowhost remote to get it from the slow url.
There are resources that I want to add to my annex that are currently available via a URL, but it seems like if I add these using
git-annex addurl
, they get symlinked to file in the annex/objects directory that starts withURL-...
, instead of the more typicalSHA256-...
, and this does not change even after the files are downloaded.My concern is that I really want to ensure that these files don't change, which is the appeal of content-addressable symlinking of normal files (as opposed to URL addressable ones).
Would there be a way to automate the injection of hash-based symlinking for files that are added via addurl? Sometimes I add a bunch of files via
addurl --fast
, and after I've download them viaget
, it would be nice to have those files have the same level of data integrity as when I download them using something outside of git-annex, add them to the annex, and do anaddurl --file
afterward.Thanks for all of your hard work!
addurl
only uses the URL- keys if you run it with --fast. Otherwise it downloads the content and hashes it the same asadd
does.If you use
--fast
, you can go back andgit annex migrate
the file once it's been downloaded, to convert it to the SHA backend.is there a way to remove one of the urls? e.g. if I have
and would like to remove the fail2ban.org one... ?
You can use
git annex rmurl $file $url
, which I just added to git-annex.(Also,
git annex drop $file --from web
will remove all the urls..)Adding videos from youtube ends up with it using the URL backend, even without fast.
Is migrating manually required or should I log a bug?
git annex migrate
, and be prepared forgit annex get --from web
to not work long term.Hi!
I have a somewhat interesting use case. My course notes require HTTP authentication. This is possible with wget, but is there any way to make git annex do it?
wget authentication stuff!
It would be nice to have the user and pass encrypted with GPG too. This might be a strange use case, but I can see other people wanting to do something like this in the future.
Thanks!
git annex addurl
. The url, including the password, will be stored in the git-annex branch though. If you want to protect the password from being exposed to anyone who gets a clone of the repository, just download manually, and thengit annex add
the file.addurl
orimportfeed
, how can I get the URL of the file or feed from git-annex?git annex whereis $file
If I have a big repo of YouTube stuff I might have some videos that I want to download with different options. Maybe I want higher quality for some videos, and don't care for others. It seems like youtube-dl-options can only be specified for an entire annex, though. I'd like to be able to do it per file.
The main motivation for this is YouTube videos where I only want the audio. Is there a good way to do this? Best I can think of is having a separate annex for audio and video.
@hobbes, there's not currently a way to do per-file youtube-dl options. The difficulty is that we don't know what youtube-dl options might be unsafe, and which such a feature could make eg
git annex get
use when run by a different user.I feel that this needs some support in youtube-dl to avoid git-annex needing to know about all its safe options.