git-annex-addurlgit-annexhttp://git-annex.branchable.com/git-annex-addurl/git-annexikiwiki2023-03-27T19:13:23ZUsing youtube-dl commandline options with git-annex-addurlhttp://git-annex.branchable.com/git-annex-addurl/comment_1_ce9c660be160a22c28aeb6de8b3b5818/joseph.rawson.works2018-05-30T15:29:16Z2018-05-30T15:29:16Z
<p>I have been trying to figure out how to use addurl to get this video.
I have this in my mscourtstuff annex as a large binary, but I would really like to
use the web as a remote for this.</p>
<p>Hughes v Hosemann 2010-CA-01949-SCT-43112001.mp4
youtube-dl --referer 'http://judicial.mc.edu/case.php?id=24206' http://player.vimeo.com/video/43112001</p>
comment 2http://git-annex.branchable.com/git-annex-addurl/comment_2_9bc984fb80d77309b62a4e915e65a31a/joey2018-05-30T16:31:27Z2018-05-30T16:30:26Z
<p>There's not currently a way to do per-file youtube-dl options.
The difficulty is that we don't know what youtube-dl options might be
unsafe, and which such a feature could make eg <code>git annex get</code> use when run
by a different user.</p>
<p>I feel that this needs some support in youtube-dl to avoid git-annex
needing to know about all its safe options. Especially since which options
are available, or safe, could vary between versions of youtube-dl.</p>
Inconsistent idiomhttp://git-annex.branchable.com/git-annex-addurl/comment_3_46e1b62d37619521a60eb2339b8d094f/pellman.john2019-02-06T20:53:00Z2019-02-06T20:53:00Z
<p>In using git-annex in the past, I've always found it counterintuitive that <a href="https://git-annex.branchable.com/git-annex-rmurl/">rmurl</a> uses the following form to remove a URL from a file:</p>
<pre><code>git annex rmurl [file] [url]
</code></pre>
<p>While, in contrast, addurl uses a flag to designate the file that a URL should be added to the list of URLs a file points to:</p>
<pre><code>git annex addurl [url] --file=[file]
</code></pre>
<p>It would make sense (at least to me) to make the syntax for these more congruous so that both commands use either two positional arguments or one positional argument and one keyword argument / flag.</p>
comment 4http://git-annex.branchable.com/git-annex-addurl/comment_4_0951800d761d18614eb6c5f08cdbb885/joey2019-02-07T19:29:13Z2019-02-07T19:26:12Z
<p>@john, the difference is that while addurl can make up a filename to use if
you do not provide one, rmurl needs you to specifiy a filename.</p>
<p>So, yes, "git annex rmurl --file=whatever url" would be more consistent,
but it requires typing more my making something that is not actually
optional into an option. And "git annex addurl file url" would make
the command more consistent with rmurl, but harder to use.</p>
<p>Consistency is not everything.</p>
<p>(Also, the rmurl batch interface would then be less consistent to its
command-line interface.)</p>
Provide flags to youtube-dl?http://git-annex.branchable.com/git-annex-addurl/comment_5_47dfa82fc6426fb9ad050dd00290dc03/gan2019-02-22T18:01:25Z2019-02-22T18:01:25Z
Is there already a way to specify flags to youtube-dl on a per-file basis. I think it would be OK to do it during either during addurl (modifying the resulting reference that is stored in the annex somehow), or during git-annex get. This is so that the preferred format can be specified. Primarily this would enable to download audio-only formats for some files. ) Apologies if I missed some documentation on how to achieve this)
Clarificationhttp://git-annex.branchable.com/git-annex-addurl/comment_7_8d8ac07a0f0fe599ea5ed1e4089b13fa/gan2019-02-22T18:03:16Z2019-02-22T18:03:16Z
So, to clarify - I read your first answer. But if this coulud be done during get perhaps then it's OK because it is an explicit request for the potentially unsafe operation?
Re: provide flags to youtube-dlhttp://git-annex.branchable.com/git-annex-addurl/comment_8_b261e7bfeeffb2c5264aaadae1d78817/joey2019-02-23T20:04:26Z2019-02-22T20:01:37Z
<p>@gan, there's not much point in providing flags that are only used in the
initial download; the main point in adding the url to git-annex is so you
can download the same content from it again later.</p>
Re: Inconsistent idiomhttp://git-annex.branchable.com/git-annex-addurl/comment_8_88aea3014245634d42f23a22ccf2fcc9/pellman.john2019-03-28T03:59:04Z2019-03-28T03:59:04Z
<p>Hi @joey,</p>
<p>Thanks for your continued work on git-annex and for responding to my last comment. I agree that consistency is not everything, but I do think that it's also important to balance functionality against the amount of cognitive load that an interface places on an end-user. My perspective is no doubt influenced by my cognitive science background, but whenever I find an interface where it's easy to confuse two similar operations that use different syntaxes, I'm reminded of Don Norman's <a href="https://www.researchgate.net/publication/202165676_The_trouble_with_UNIX_The_user_interface_is_horrid">rant about the early Unix UI</a>. I'm also reminded of common phenomena described in memory research such as <a href="https://en.wikipedia.org/wiki/Interference_theory#Proactive_interference">retroactive interference</a>, wherein a more recently learned memory interferes with something that was learned previously. In this case, if I were to learn the syntax for addurl first, and then learned the syntax for rmurl much later, my internal representation of rmurl would to some degree "overwrite" my previous knowledge of addurl and compete with it. Making the two syntaxes consistent with each other in this case would eliminate any competition between internal mental representations of how the two commands are structured.</p>
<p>I'm also not entirely sure why a positional argument can't be optional. If there's a good reason for this not to be so then I won't argue my point anymore, but something like the following syntax would make the most sense from my view:</p>
<p><code>git annex rmurl [url] [file]</code></p>
<p><code>git annex addurl [url] [file; optional positional argument]</code></p>
bug reporthttp://git-annex.branchable.com/git-annex-addurl/comment_9_48741c1c625dc1ecba2db4f7be2f644c/m152021-03-02T22:09:45Z2021-01-23T08:52:14Z
<pre><code>ENV:
macOS 10.14.6, installed by 'brew install git-annex'
git annex version
git-annex version: 8.20201129
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.27 DAV-1.3.4 feed-1.3.0.1 ghc-8.10.3 http-client-0.7.3 persistent-sqlite-2.11.0.0 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.1.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: darwin x86_64
supported repository versions: 8
upgrade supported from repository versions: 0 1 2 3 4 5 6 7
STEPS:
File copied to git (git-annex) repo's dir (did not 'git add' 'git annex add')
name: 'f.mp4'
Now run 'git annex addurl' (via Python, see below)
RESULT: (same if run in bash)
File "/opt/anaconda3/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['git', 'annex', 'addurl', '--file=f.mp4', '--raw', '--relaxed', 'https://www.youtube.com/watch?v=U33dsEcKgeQ']' returned non-zero exit status 1.
The command works after doing 'git annex add f.mp4' first
but it results in a backend not 'URL backend for youtube'
I'd like to use 'URL backend for youtube' cause I worry about youtube-video binay-change, in which case all future download will fail backend verification.
NOTE:
command line taken from https://git-annex.branchable.com/tips/using_the_web_as_a_special_remote/
</code></pre>
comment 10http://git-annex.branchable.com/git-annex-addurl/comment_10_66051cf22495e661828f2ef3291f8578/joey2021-03-02T22:09:45Z2021-02-01T16:24:23Z
<p>@m15 this page is not a bug tracking system. File bug reports over at
<a href="http://git-annex.branchable.com/bugs/">bugs</a>.</p>
Hashes for files added via addurlhttp://git-annex.branchable.com/git-annex-addurl/comment_11_a09a85bbabc112369ab661b7ac5f277d/tomdhunt2022-07-16T20:12:20Z2022-07-16T20:12:20Z
<p>If you add a file to your repo first via <code>addurl --fast</code>, it writes the filename as a symlink to a file that incorporates the URL, rather than the file hash. This is expected, since git-annex can't know the file hash until it's actually downloaded the file.</p>
<p>If you then <code>git annex get</code> that file, it downloads the file to the path that uses the URL. Is the hash ever recorded for these files? If you were to drop and re-download the file, would git-annex accept a different file?</p>
checksums and addurl --fasthttp://git-annex.branchable.com/git-annex-addurl/comment_12_f75e35ae6f739e98aeb15c3f8708be8a/Ilya_Shlyakhter2022-07-18T16:01:41Z2022-07-18T16:01:41Z
<blockquote><p>Is the hash ever recorded for these files? If you were to drop and re-download the file, would git-annex accept a different file?</p></blockquote>
<p>Hash is not recorded, but file size is. You can disable the size check with <code>--relaxed</code>. See <a href="http://git-annex.branchable.com/tips/using_the_web_as_a_special_remote/">using the web as a special remote</a>.
After <a href="http://git-annex.branchable.com/git-annex-get/"><code>git-annex-get</code></a>ting the file, you can use <a href="http://git-annex.branchable.com/git-annex-migrate/"><code>git-annex-migrate</code></a> to record it under a new checksum-based hash, then use <a href="http://git-annex.branchable.com/git-annex-unused/"><code>git-annex-unused</code></a> to find and remove the old key.</p>
<p>Sometimes you can get the hash without downloading the file, e.g. if the hash is stored next to the file at <code>http://my/file.md5</code>, or if the file is stored in the Google Cloud. Then you can use the plumbing commands <a href="http://git-annex.branchable.com/git-annex-registerurl/"><code>git-annex-registerurl</code></a> to associate the checksum-based key with the URL, and <a href="http://git-annex.branchable.com/git-annex-setpresentkey/"><code>git-annex-setpresentkey</code></a> to record the key's presence in the (web) remote.</p>
<p>Related discussion: <a href="http://git-annex.branchable.com/todo/alternate_keys_for_same_content/">alternate keys for same content</a></p>
Limitations on allowed characters in and length of URLshttp://git-annex.branchable.com/git-annex-addurl/comment_13_f413cc1febfa70e3ad885c6fa031a209/matthias.risze2023-01-04T11:56:35Z2023-01-04T11:56:35Z
Are there any limitations on which characters are allowed in an URL, and on how long an URL can be? I am working on a special remote which kind of abuses this feature by saving information about how to get the file in the URL with a specific scheme. For now it seems like I can use URLs of the form <code>scheme:<arbitrary json></code>, but I am not sure if this might become an issue later. I could also encode the data with base64 or something similar, in which case size limitations would still be relevant; if there are any. Although, the json variant has the added benefit of being much more easily readable in whereis output.
comment 14http://git-annex.branchable.com/git-annex-addurl/comment_14_5719924fbda2daabe83ef2fde447d620/joey2023-01-16T19:13:03Z2023-01-16T19:09:35Z
<p>@matthias.risze length is not an issue. You should avoid characters that
are not usually in urls, particularly whitespace and newline.</p>
<p>It seems to me though that your special remote would perhaps
be better served by using the SETSTATE and GETSTATE commands
(see <a href="http://git-annex.branchable.com/design/external_special_remote_protocol/">external special remote protocol</a>)</p>
securehashesonly conflicts with addurlhttp://git-annex.branchable.com/git-annex-addurl/comment_15_460d474cb8ef32d41eae71ee070de0b3/jt2023-03-25T03:22:50Z2023-03-25T03:22:50Z
<p>Turning on <code>securehashesonly</code> seems to disable the <code>addurl</code> command:</p>
<pre><code class="console">% git config --get annex.securehashesonly
true
% git annex addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html
addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html
annex.securehashesonly blocked transfer of URL key
failed
addurl: 1 failed
% git annex addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html --relaxed
addurl https://www.gutenberg.org/cache/epub/2591/pg2591-images.html (to www.gutenberg.org_cache_epub_2591_pg2591-images.html) ok
(recording state in git...)
% ls -l www.gutenberg.org_cache_epub_2591_pg2591-images.html
www.gutenberg.org_cache_epub_2591_pg2591-images.html -> .git/annex/objects/gg/kG/URL--https&c%%www.gutenberg.org%cache%epub%2591%pg2591-images.html/URL--https&c%%www.gutenberg.org%cache%epub%2591%pg2591-images.html
</code></pre>
<p>Does this have something to do with the URL prefix that the annex object has?</p>
Re: securehashesonly conflicts with addurlhttp://git-annex.branchable.com/git-annex-addurl/comment_16_c6e1743647bb4d45b5a1b237f53d77a6/joey2023-03-27T19:13:23Z2023-03-27T17:59:35Z
Opened a bug report: <a href="http://git-annex.branchable.com/bugs/securehashesonly_conflicts_with_addurl/">securehashesonly conflicts with addurl</a>