todo/donegit-annexhttp://git-annex.branchable.com/todo/done/git-annexikiwiki2024-03-01T20:53:17Zverified relaxed urlshttp://git-annex.branchable.com/todo/verified_relaxed_urls/2024-03-01T20:53:17Z2024-02-10T15:19:49Z
<p>Adding a relaxed url currently prevents verifying the content of the object
when transferring it between repositories. This risks data corruption going
unnoticed. Could the hash be recorded when a relaxed url is downloaded from
the web, and then used for verification of other transfers?</p>
<p>This would need a new per-key file in the git-annex branch, that can list
one or more hash-based keys. Then implement verification for url keys, that
checks if hash-based keys are recorded in the file, and if so uses them to
verify the content.</p>
<p>The web special remote can hash the content as it's downloading it from the
web, and record the resulting hash-based key.</p>
<blockquote><p><span class="selflink">done</span> --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p></blockquote>
<h2>handling upgrades</h2>
<p>A repository that currently contains the content of a relaxed url needs to
keep working. So, the object stored in the repository has to be treated as
valid, even though it does not correspond to any hash-based keys listed for
the url key.</p>
<p>This presents a problem; there is no way to tell the difference between
a valid object in such a repository and an object that was downloaded from
the web with its hash recorded, but has since gotten corrupted.</p>
<p>Seems that the only possible way to resolve this problem is to change to a
new type of url key, that is known to always have its hash recorded on
download from the web. (Call this a verifiable url key: a VURL.)
And handle all existing relaxed url keys as before.</p>
<p>That would leave it up to the user to migrate their URL keys to
VURL keys, if desired. Now that distributed migration is
implemented, that seems sufficiently easy.</p>
<p>See <a href="http://git-annex.branchable.com/todo/migration_to_VURL_by_default/">migration to VURL by default</a></p>
<h2>addurl --fast</h2>
<p>Using addurl --fast rather than --relaxed records the size but doesn't
hash. So it has the same problem that data corruption can go unnoticed,
only the data corruption has to involve bit flips and not truncation.</p>
<p>So it seems that --fast ought to also be handled. The difference being that
an url added with --fast is expected to always be the same on re-download
from the web, while an url added with --relaxed may change its content on
re-download from the web while being still considered the same object.</p>
<p>This can also use a VURL key, but include the size in it. When downloading
a sized VURL, the web special remote will hash the content, and verify
that either no hash has been recorded before (and record the hash when the
size matches), or that it matches the previously recorded hash.</p>
<p>Note that, if an url is added with --fast and that gets committed and
pulled by another repo, and then later both repos download the content
from the web, it would be possible for the web to serve up different
content to the two, and in that case either hash would be treated as
valid.</p>
<h2>other special remotes</h2>
<p>If the web special remote is what takes care of hashing the content on
download and recording the hash-based key, what about other special remotes
that claim an url?</p>
<p>This could also be implemented in the bittorrent special remote
(though ), but what
about external special remotes?</p>
<p>An alternative would be to add a downloadVerifiedUrl that is called instead
of retrieveKeyFile and returns a hash-based key (allowing hashing the
download on the fly). Then git-annex would take
care of recording the hash-based key. The external special remote interface
could be extended to include that.</p>
<blockquote><p>For now, gonna punt on this. It would be possible to support other
special remotes later, but implemented it in the web special remote only
for now. When using <code>git-annex addurl --verified</code> with others, it creates
a VURL, but never generates a hash key, so the VURL works just like an
URL key.</p></blockquote>
<h2>hash-based key choice</h2>
<p>Should annex.backend gitconfig be used to pick which hash-based key to use?
The risk is that config changes and several different hash-based keys get
recorded for a VURL. Not really a problem, but would increase the
size of the git-annex branch unncessarily, and require extra work when
verifying the key.)</p>
<blockquote><p>It will let annex.backend gitconfig and --backend be used,
but it didn't seem worth supporting annex.backend gitattribute, or really
even appropriate to since this is not really related to any particular
work tree file. --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p></blockquote>
<p>What if annex.backend uses WORM or something that is not hash-based?
Seems it ought to fall back to SHA256 or something then.</p>
<p>To support annex.securehashesonly it would be good if only
cryptographically secure hashes were recorded for a VURL. But of course,
which hashes are considered secure can change. Still, let's start by
only allowing currently secure hashes to be used for VURLs. This way,
when there are multiple hashes recorded for a VURL, they will all be
cryptographically secure normally, and so the VURL can be considered
cryptographically secure itself. If any of the hashes later becomes
broken, the VURL will no longer be treated as cryptographically secure,
because the broken hash can be used to verify its content.
In that case, the user would probably just migrate to a hash-based key,
although perhaps something VURL-specific could be built to upgrade its
hashes.</p>
<h2>use for other types of keys</h2>
<p>It would also be possible to use these new git-annex branch log files
for other types of keys, like WORM, or perhaps SHA1. But, the same upgrade
problem would apply. So, there's problably no benefit in doing that,
compared with migrating the key to the desired hash backend.</p>
<p>Does seem to be some chance this could help implementing
<a href="http://git-annex.branchable.com/todo/wishlist_degraded_files/">wishlist degraded files</a>.</p>
<h2>security</h2>
<p>There is the potential for a loop, where a VURL has recorded an equivilant
key what is the same VURL, or another VURL in a loop. Leading to a crafted
git-annex branch that DOSes git-annex.</p>
<p>To avoid this, any VURL in equivilant keys will be ignored.</p>
detect and handle submodules after path changed by mvhttp://git-annex.branchable.com/todo/detect_and_handle_submodules_after_path_changed_by_mv/2024-02-22T16:54:19Z2024-02-22T15:22:05Z
<p>It's an enhancement feature to handle submodules to manage data with associated its projects.</p>
<p>I want <code>git-annex</code> could detect submodule paths changed on disks which was cause by <code>mv</code> or file explorer.
If user uses <code>git-annex-assist daemon</code> or <code>git-annex-assist</code> command directly after <code>mv</code> command, The submodules would be totally broken.</p>
<p>Currently, the workaround is just use <code>git-mv</code> on each submodules manually.</p>
<p>I made a testing shell script for this.</p>
<p>```shell</p>
<h1>!/bin/bash</h1>
<h1>This is test script for submodule path changing.</h1>
<h1>set -e</h1>
<p>USE_GIT_MV=false # USE_GIT_MV=true works correctly
cd /tmp/
mkdir -p test_sub/{archive/projects,projects/2023_01_personal_some_cool_project,resources}
cd test_sub
git init
git annex init
git annex version
cd projects/2023_01_personal_some_cool_project</p>
<p>echo NOTE: Add some data and sub-projects for testing
touch README.md 01_dataset_lists.csv 09_reports.md
git submodule add https://github.com/Lykos153/git-annex-remote-googledrive.git
git submodule add https://github.com/alpernebbi/git-annex-adapter.git
git submodule status # check it
git annex assist
echo</p>
<p>echo NOTE: I think that the projects are need to be changed "01_Projects" for sorting order.
cd /tmp/test_sub
if $USE_GIT_MV; then
git mv projects 01_Projects
else
# NOTE: Just rename file makes submodules broken. directory depth is same
mv projects 01_Projects
(
cd 01_Projects/2023_01_personal_some_cool_project/git-annex-adapter
git status # it shows 'No such file or directory'
)
fi
git submodule status # check it
git annex assist
echo</p>
<p>echo NOTE: I want to change some submodule name is for referencing just for work.
cd /tmp/test_sub/01_Projects/2023_01_personal_some_cool_project
if $USE_GIT_MV; then
git mv git-annex-adapter ref_sample_adapter_code
else
# NOTE: Just rename file makes submodules broken. directory depth is same
mv git-annex-adapter ref_sample_adapter_code
fi
git submodule status # check it
git annex assist
echo</p>
<p>echo NOTE: Now, i want to archive my old projects.
cd /tmp/test_sub
if $USE_GIT_MV; then
git mv 01_Projects/2023_01_personal_some_cool_project archive/projects/2023_01_personal_some_cool_project
else
# NOTE: Just rename file makes submodules broken. directory depth is changed
mv 01_Projects/2023_01_personal_some_cool_project archive/projects/2023_01_personal_some_cool_project
fi
git submodule status # check it
git annex assist
echo</p>
<p>echo test done
``</p>
<blockquote><p><span class="selflink">wontfix</span> as it is out of scope --<a href="http://git-annex.branchable.com/users/joey/">Joey</a>`</p></blockquote>
make annex "respect" .git/hooks/prepare-commit-msghttp://git-annex.branchable.com/todo/make_annex___34__respect__34___.git__47__hooks__47__prepare-commit-msg/yoh2024-02-12T18:45:49Z2024-02-02T17:07:51Z
<p>I am trying to work out a helper for creating more meaningful automated commit messages for reprostim where we collect videos and annex assistant moves them to the server.</p>
<p><details>
<summary>I was about to file this TODO report describing more detail before I realized that in principle git has a hook I could use</summary></p>
<p>ATM (git annex 10.20231227-1~ndall+1) git annex assistant just commits with a generic non-descript (besides location) commit message e.g.</p>
<pre><code>git-annex in reprostim@reproiner:/data/reprostim
</code></pre>
<p>and to figure out what actually git-annex has done takes at least rerunning <code>git log --stat</code> but in git gui of some kind takes looking at each and every commit to identify the one of interest (which e.g. modified <code>config.yaml</code>) or specific filtering etc.</p>
<p>I would have appreciated if commits became more descriptive. Since git-annex assistant is quite reactive and (at least in our case) mostly commits one file change at a time I would have appreciated commit messages like</p>
<pre><code>Edited config.yaml
</code></pre>
<p>or <code>Addded</code> or <code>Removed</code>. If file is under a folder and length of the path is over some limit, would have abbreviated path to smth like <code>.../config.yaml</code>.</p>
<p>In case of multiple files per "action", summarized in numbers and possibly again with a hint on location</p>
<pre><code>Edited config.yaml, added 10 files under Videos/
</code></pre>
<p>Moreover, I would have still appreciated that information annex reports about location now, but I would have made it into the extended part of the commit message after a new line, and added a version info. So altogether could have looked like</p>
<pre><code>Edited config.yaml, added 10 files under Videos/
---
# git annex metadata
repository: reprostim@reproiner:/data/reprostim
version: 10.20231227-1~ndall+1
</code></pre>
<p>And even more "ideally", if it could pick up (if exists) an optional file (e.g. <code>.git/ANNEX_COMMIT_META.yaml</code>) or some <code>git config</code> field to add to the commit, I could have then added information about the system/software I care about, e.g.</p>
<pre><code>reprostim-version: 0.20240202.0
</code></pre>
<p>and thus have commit carrying metadata about the version which produces those video files. ATM the video capture is completely ignorant of git-annex so I cannot go and metadata annotate those files... but that would be a different issue ... ;)</p>
<p>As some kind of testament to possibility and usability of such commits, could navigate through commits of our automated datalad dandisets e.g. <a href="https://github.com/dandisets/000108/commits/draft/">000108</a> where now without any extra tools etc I can tell where we added or removed files or just modified some metadata. It is indeed custom and more specific to our use case, but I think aforementioned would already be better.</p>
<p></details></p>
<p><details>
<summary>But when I created this experimental version of the script to see what info I have available</summary></p>
<pre><code class="shell">#!/bin/sh
#
# A hook to provide custom commit messages for
# changes in the repo which would be better than default ones git-annex provides.
COMMIT_MSG_FILE=$1
COMMIT_SOURCE=$2
SHA1=$3
if [ -n "$COMMIT_MSG_FILE" ]; then
orig_msg=$(cat "$COMMIT_MSG_FILE")
else
orig_msg=NONE
fi
cat <<EOF >| "$COMMIT_MSG_FILE"
Custom commit msg for source=$COMMIT_SOURCE sha1=$SHA1
orig_msg: $orig_msg
git-annex version: `git annex version --raw`
environment:
`export`
EOF
</code></pre>
<p></details></p>
<p>I saw that <code>git-annex</code> somehow commits while avoiding this hook entirely!</p>
<pre><code>❯ rm random && dd if=/dev/random of=random count=1 && git annex add random && git commit -m "Added random to annex without custom commit msg"; git show git-annex | head -n 5
1+0 records in
1+0 records out
512 bytes copied, 0.000295152 s, 1.7 MB/s
add random
ok
(recording state in git...)
[master 470d6ce] Custom commit msg for source=message sha1=
1 file changed, 1 insertion(+), 1 deletion(-)
commit 97423356f56f1f16b6a9646614af1d3d4d3d8717
Author: Yaroslav Halchenko <debian@onerussian.com>
Date: Fri Feb 2 12:03:18 2024 -0500
update
</code></pre>
<p>so we got hook working for <code>master</code> and commit to <code>git-annex</code> branch went without it. FWIW if I use annex.commitmessage -- that one works but it is inflexible</p>
<pre><code class="shell">❯ rm random && dd if=/dev/random of=random count=1 && git -c annex.commitmessage="custom one" annex add random && git commit -m "Added random to annex WITH custom commit msg"; git show git-annex | head -n 5
1+0 records in
1+0 records out
512 bytes copied, 8.0667e-05 s, 6.3 MB/s
add random
ok
(recording state in git...)
[master 65d4400] Custom commit msg for source=message sha1=
1 file changed, 1 insertion(+), 1 deletion(-)
commit dff6b4833f4d0e2193d43576c834212f84e54f49
Author: Yaroslav Halchenko <debian@onerussian.com>
Date: Fri Feb 2 12:05:08 2024 -0500
custom one
</code></pre>
<p>I think it would be great if I could just create a "generic" prepare-commit-msg hook I could use for any branch, git-annex included.</p>
<p>The same applies to changes git-annex assistant commits to master -- apparently it seems to avoid that hook as well since even with the hook present I got</p>
<pre><code class="shell">commit 78c5dcb6bf91b056cba7dc4ee93dd5b31f15f297 (HEAD -> master, synced/master)
Author: Yaroslav Halchenko <debian@onerussian.com>
Date: Fri Feb 2 12:06:41 2024 -0500
git-annex in yoh@bilena:~/proj/datalad/trash/try-commit-message
</code></pre>
<blockquote><p><span class="selflink">done</span> --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p></blockquote>
git-annex-migrate using git-replacehttp://git-annex.branchable.com/todo/git-annex-migrate_using_git-replace/2024-02-10T15:19:49Z2019-04-22T01:26:53Z
<p>Currently, git-annex-migrate leads to content (and metadata) being stored under both old and new keys. git-annex-unused can drop the contents under the old key, but then you can't access the content if you check out an older commit. Maybe, an option can be added to migrate keys using <a href="https://git-scm.com/docs/git-replace">git-replace</a> ? You'd git-replace the blob .git/annex/objects/old_key with the blob .git/annex/objects/new_key, the blob ../.git/annex/objects/old_key with the blob ../.git/annex/objects/new_key , etc. You could then also have a setting to auto-migrate non-checksum keys to checksum keys whenever the contents gets downloaded.</p>
<p>More generally, git-annex-replace could be implemented this way, doing what git-replace does, but for git-annex keys rather than git hashes. <a href="http://git-annex.branchable.com/git-annex-pre-commit/">git-annex-pre-commit</a> might need to be changed to implement replacement of keys added later.</p>
<blockquote><p><span class="selflink">wontfix</span>, use distributed migration instead --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p></blockquote>
documentation: improve on special remoteshttp://git-annex.branchable.com/todo/documentation__58___improve_on_special_remotes/yoh2024-02-08T17:37:38Z2024-01-30T16:20:44Z
<p>Even though there is a <a href="https://git-annex.branchable.com/special_remotes/">special_remotes</a> page, description there is quite superficial and does not provide a good overview of various aspects on special remotes</p>
<ul>
<li>life cycle, e.g. how they relate to Git remotes -- that they become listed among <code>git remote</code>s whenever enabled, that git remotes get the corresponding <code>annex-uuid</code> assigned when "sensed" or get annex-ignore, how to "disable" an enabled remote (just <code>git remote remove</code>)</li>
<li>clearly list commands to operate on the special remotes (initremote, enableremote) or interrogate them (e.g. how to figure out if there is already a special remote with a target uuid but may be not enabled yet, etc)</li>
</ul>
<p>Without such a documentation it is hard to "on board" new git-annex users and developers.</p>
<blockquote><p>Hard to say that documentation is ever done, but I've made some
improvements and I guess am going to call this <span class="selflink">done</span> --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p></blockquote>
yt-dlp: parse/handle (error) "Video unavailable"http://git-annex.branchable.com/todo/yt-dlp__58___parse__47__handle___40__error__41_____34__Video_unavailable__34__/2024-02-05T19:48:47Z2024-01-29T22:03:45Z
<p>ATM <code>git-annex addurl</code> would happily download HTML from youtube instead of errorring, which would probably a better behavior, if video was removed from YouTube or made private. E.g.</p>
<p><details>
<summary>here is a detailed git annex --debug addurl output</summary></p>
<pre><code class="shell">$> git annex --debug addurl --file Комендантські_балачки/2024-01-28-___Стрім_20_30_28.01.2024_чат_рулетка_LIVE_стрим._Андрій_Попик.mkv https://www.youtube.com/watch?v=jy01CnsQ9ec
[2024-01-29 16:55:36.511153059] (Utility.Process) process [2205617] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
[2024-01-29 16:55:36.515415282] (Utility.Process) process [2205617] done ExitSuccess
[2024-01-29 16:55:36.515888181] (Utility.Process) process [2205618] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2024-01-29 16:55:36.519652103] (Utility.Process) process [2205618] done ExitSuccess
[2024-01-29 16:55:36.521989271] (Utility.Process) process [2205619] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
addurl https://www.youtube.com/watch?v=jy01CnsQ9ec
[2024-01-29 16:55:36.546921281] (Utility.Url) Request {
host = "www.youtube.com"
port = 443
secure = True
requestHeaders = [("Accept-Encoding",""),("User-Agent","git-annex/10.20231227-1~ndall+1")]
path = "/watch"
queryString = "?v=jy01CnsQ9ec"
method = "HEAD"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
proxySecureMode = ProxySecureWithConnect
}
[2024-01-29 16:55:36.767345074] (Utility.Process) process [2205641] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","symbolic-ref","-q","HEAD"]
[2024-01-29 16:55:36.771989935] (Utility.Process) process [2205641] done ExitSuccess
[2024-01-29 16:55:36.772619847] (Utility.Process) process [2205642] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","refs/heads/master"]
[2024-01-29 16:55:36.777260151] (Utility.Process) process [2205642] done ExitSuccess
[2024-01-29 16:55:36.778775886] (Utility.Process) process [2205643] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","check-attr","-z","--stdin","annex.backend","annex.largefiles","annex.numcopies","annex.mincopies","--"]
[2024-01-29 16:55:36.78290662] (Utility.Url) Request {
host = "www.youtube.com"
port = 443
secure = True
requestHeaders = [("Accept-Encoding","identity"),("User-Agent","git-annex/10.20231227-1~ndall+1")]
path = "/watch"
queryString = "?v=jy01CnsQ9ec"
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
proxySecureMode = ProxySecureWithConnect
}
[2024-01-29 16:55:36.982919787] (Utility.Process) process [2205644] read: yt-dlp ["https://www.youtube.com/watch?v=jy01CnsQ9ec","--get-filename","--no-warnings","--no-playlist"]
[2024-01-29 16:55:38.035161914] (Utility.Process) process [2205644] done ExitFailure 1
[2024-01-29 16:55:38.036097212] (Utility.Process) p(to Комендантські_балачки/2024-01-28-___Стрім_20_30_28.01.2024_чат_рулетка_LIVE_стрим._Андрій_Попик.mkv) "--stdin","--verbose","--non-matching"]
[2024-01-29 16:55:38.054030685] (Messages.explain) [ Комендантські_балачки/2024-01-28-___Стрім_20_30_28.01.2024_чат_рулетка_LIVE_стрим._Андрій_Попик.mkv does not match annex.largefiles: mimeencoding=binary[FALSE] ]
(non-large file; adding content to git repository)
[2024-01-29 16:55:38.054895156] (Utility.Process) process [2205761] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","hash-object","-w","--no-filters","--stdin-paths"]
[2024-01-29 16:55:38.072695867] (Utility.Process) process [2205762] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","-c","filter.annex.smudge=","-c","filter.annex.clean=","-c","filter.annex.process=","write-tree"]
[2024-01-29 16:55:38.078372317] (Utility.Process) process [2205762] done ExitSuccess
[2024-01-29 16:55:38.07887887] (Utility.Process) process [2205763] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/annex/last-index"]
addurl https://www.youtube.com/watch?v=jy01CnsQ9ec (to Комендантські_балачки/2024-01-28-___Стрім_20_30_28.01.2024_чат_рулетка_LIVE_стрим._Андрій_Попик.mkv) (non-large file; adding content to git repository) ok
(recording state in git...)
[2024-01-29 16:55:38.083368582] (Utility.Process) process [2205764] feed: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","update-index","-z","--index-info"]
[2024-01-29 16:55:38.129969795] (Utility.Process) process [2205764] done ExitSuccess
[2024-01-29 16:55:38.130183526] (Database.Handle) commitDb start
[2024-01-29 16:55:38.130668768] (Database.Handle) commitDb done
[2024-01-29 16:55:38.131339954] (Utility.Process) process [2205619] done ExitSuccess
[2024-01-29 16:55:38.131872474] (Utility.Process) process [2205643] done ExitSuccess
[2024-01-29 16:55:38.132363417] (Utility.Process) process [2205761] done ExitSuccess
[2024-01-29 16:55:38.132786755] (Utility.Process) process [2205760] done ExitFailure 1
$> file -L Комендантські_балачки/2024-01-28-___Стрім_20_30_28.01.2024_чат_рулетка_LIVE_стрим._Андрій_Попик.mkv
Комендантські_балачки/2024-01-28-___Стрім_20_30_28.01.2024_чат_рулетка_LIVE_стрим._Андрій_Попик.mkv: HTML document, ASCII text, with very long lines (56909)
</code></pre>
<p></details></p>
<p>whenever that video is now announced to be Private if you go to that youtube url or <code>yt-dlp</code> saying that</p>
<pre><code class="shell">❯ yt-dlp https://www.youtube.com/watch?v=jy01CnsQ9ec --get-filename --no-warnings --no-playlist
ERROR: [youtube] jy01CnsQ9ec: Video unavailable. This video has been removed by the uploader
❯ yt-dlp --version
2023.11.16
</code></pre>
<p>note that if I fake up that youtube url by changing letter to the next <code>d</code> - response does not include any reason</p>
<pre><code class="shell">❯ yt-dlp https://www.youtube.com/watch?v=jy01CnsQ9ed --get-filename --no-warnings --no-playlist
ERROR: [youtube] jy01CnsQ9ed: Video unavailable
</code></pre>
<blockquote><p>I don't consider this a bug because with --no-raw it will do what you
want. <span class="selflink">done</span> --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p></blockquote>
addurl --raw-except REMOTEs?http://git-annex.branchable.com/todo/addurl_--raw-except_REMOTEs__63__/yoh2024-02-05T19:18:28Z2024-01-31T20:35:02Z
<p>In <a href="https://github.com/dandi/backups2datalad/pull/21#issuecomment-1917911205">backups2datalad</a> we ran into a behavior that <code>addurl --raw</code> does not only not considers all the fancy handling for youtube and .torrents, but also disregards our (<code>git-annex-remote-datalad</code>) external special remote (although still CLAIMURLs it first).
I think generally we would still prefer to use <code>--raw</code> as to avoid possible side-effects when someone manages to add some <code>.torrent</code> file which we want to add as a file, not to download it. But we would like to explicitly allow interactions with our special remote. That is why I think the most viable solution would be to provide <code>--raw-except</code> which would be like <code>--raw</code> but allowing explicitly listed special remotes (or hardcoded keywords like <code>:torrents:</code>, <code>:youtube:</code>) to be handled if encountered.</p>
<blockquote><p><span class="selflink">done</span> --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p></blockquote>
record ETag when using addurl --fasthttp://git-annex.branchable.com/todo/record_ETag_when_using_addurl_--fast/2024-01-25T18:08:45Z2022-07-18T16:43:26Z
<p>Many websites return an Etag in the http response header, indicating the version of the resource. Could the etag (or a checksum of it) be recorded in the URL- key, the way size is now? Then e.g. <code>fsck --from web</code> could do a stronger check that the same file is still downloadable from the web, and the situation where different remotes have different versions of a file with the same URL- key could be better prevented.</p>
<blockquote><p>Closing as this does not seem like a useful idea. <span class="selflink">done</span> --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p></blockquote>
the same path looked up 3 times for libpcre*.sohttp://git-annex.branchable.com/todo/the_same_path_looked_up_3_times_for_libpcre__42__.so/2024-01-25T18:08:45Z2023-08-31T13:54:19Z
<p>I have upgraded our build environment for https://github.com/datalad/git-annex from bullseye (now old stable). to bookworm (now stable). Docker images are at https://hub.docker.com/repository/docker/datalad/buildenv-git-annex/tags (prior ones at https://hub.docker.com/repository/docker/datalad/buildenv-git-annex-buster/tags) .</p>
<p>The git-annex (<code>10.20230828+git2-g50300a47fe-1~ndall+1</code>) has built fine and all testing passed, but one <a href="https://github.com/datalad/git-annex/actions/runs/6029623735/job/16366573859#step:6:33">extra test has failed</a>:</p>
<pre><code>>> nfailed version 'libpcre.*so'
>> strace -f git-annex version
>> tee /dev/fd/2
>> wc -l
>> awk '/libpcre.*so.*ENOENT/{print}'
[pid 1843] openat(AT_FDCWD, "/usr/lib/git-annex.linux//lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v4/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 1843] openat(AT_FDCWD, "/usr/lib/git-annex.linux//lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 1843] openat(AT_FDCWD, "/usr/lib/git-annex.linux//lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 1843] openat(AT_FDCWD, "/usr/lib/git-annex.linux//lib/x86_64-linux-gnu/tls/haswell/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 1843] openat(AT_FDCWD, "/usr/lib/git-annex.linux//lib/x86_64-linux-gnu/tls/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 1843] openat(AT_FDCWD, "/usr/lib/git-annex.linux//lib/x86_64-linux-gnu/haswell/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> test 6 -lt 5
Error: Process completed with exit code 1.
</code></pre>
<p><a href="https://github.com/datalad/git-annex/actions/runs/6003338461/job/16282128434">previously (<code>10.20230828-1~ndall+1</code>was the last successful build) it was just 3</a>:</p>
<pre><code>>> nfailed version 'libpcre.*so'
>> strace -f git-annex version
>> tee /dev/fd/2
>> wc -l
>> awk '/libpcre.*so.*ENOENT/{print}'
[pid 1932] openat(AT_FDCWD, "/usr/lib/git-annex.linux//lib/x86_64-linux-gnu/tls/haswell/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 1932] openat(AT_FDCWD, "/usr/lib/git-annex.linux//lib/x86_64-linux-gnu/tls/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 1932] openat(AT_FDCWD, "/usr/lib/git-annex.linux//lib/x86_64-linux-gnu/haswell/libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> test 3 -lt 5
</code></pre>
<p>where those 3 extra (identical) looks up are coming from now?</p>
<p>PS Since not critical, I will workaround for now by raising max count to 7.</p>
<blockquote><p><span class="selflink">done</span> per my earlier comment --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p></blockquote>
Make webapp port configurablehttp://git-annex.branchable.com/todo/Make_webapp_port_configurable/2024-01-25T18:08:45Z2021-01-29T19:42:48Z
<h3>Please describe the problem.</h3>
<p>This is more of a feature request then a bug.</p>
<p>It would be nice and more intuitive if the webapp --listen parameter accepted a port specifier too allowing configuration of the port.</p>
<p>For my workflow, I thought I would contain all of git annexes dependencies inside a docker image since I'm quite comfortable with docker(and emerge on gentoo took a long time and finally failed). With an unconfigurable dynamic port though, it makes running the webapp subcommand in docker not really viable since I don't want to use docker run's port range mapping feature which will lock all those ports.</p>
<h3>What steps will reproduce the problem?</h3>
<p>Dockerfile:
ARG DEBIAN_TAG=buster-slim
FROM debian:${DEBIAN_TAG}</p>
<p>RUN set -ex \
&& apt-get update \
&& apt-get install -y \
git \
git-annex</p>
<p>docker build -t git-annex:test .
docker run --rm -it git-annex:test git annex webapp --listen 0.0.0.0:8888</p>
<h3>What version of git-annex are you using? On what operating system?</h3>
<p>git-annex version: 7.20190129
build flags: Assistant Webapp Pairing S3(multipartupload)(storageclasses) WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.20 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.0.0 ghc-8.4.4 http-client-0.5.13.1 persistent-sqlite-2.8.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0</p>
<h3>Please provide any additional information below.</h3>
<div class="highlight-sh"><pre class="hl"><span class="hl slc"># If you can, paste a complete transcript of the problem occurring here.</span>
<span class="hl slc"># If the problem is with the git-annex assistant, paste in .git/annex/daemon.log</span>
<span class="hl slc"># End of transcript or log.</span>
</pre></div>
<h3>Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)</h3>
<p>git annex seems awesome with the little bit of testing I've done. It seems like the perfect tool for what I want to accomplish. Thanks!</p>
<blockquote><p><span class="selflink">done</span> via --port option --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p></blockquote>