Recent comments posted to this site:

comment 1

Unfortunately it seems like I can't even remove the size after the import:

$ git switch --detach importdir/import
HEAD is now at 6d3aa43 import from importdir
$ git annex migrate --remove-size
migrate data.bin 
git-annex: failed creating link from old to new key
failed
migrate: 1 failed
[ble: exit 1]
Comment by matrss
FTR commands to check and "fix up"

in fears against modification of files in git-annex branch directly, here is the commands to 'check'

$> f=Чат_рулетка/2026-03-05-_армянин_за_путина._Армянин_из_россии_Воевал_против_Украины.mkv
$> key=$(readlink "$f" | xargs basename); alog=$(git ls-tree -r git-annex | grep "$key" | awk '/.web$/{print $4;}'); git show "git-annex:$alog"
1772708470s 1 https://www.youtube.com/watch?v=0fcKYGsBZxU

First I tried to fix via re-addurl, and we do get some difference:

$> git rm "$f"; git annex addurl --no-raw --file "$f" "$url" rm 'Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv' addurl https://www.youtube.com/watch?v=0fcKYGsBZxU (using yt-dlp) (to Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv) ok (recording state in git...) $> git status On branch master Your branch is up to date with 'origin/master'.

Changes to be committed: (use "git restore --staged ..." to unstage) modified: Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv

$> git diff --cached diff --git a/Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv b/Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv index e59a58c35..e12bb1280 120000 --- a/Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv +++ b/Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv @@ -1 +1 @@ -../.git/annex/objects/KQ/x1/URL-s0--https&c%%www.youtube.com%watch,63v,610fcKYGsBZxU/URL-s0--https&c%%www.youtube.com%watch,63v,610fcKYGsBZxU \ No newline at end of file +../.git/annex/objects/wq/jM/URL--yt&chttps&c%%www.youtube.com%watch,63v,610fcKYGsBZxU/URL--yt&chttps&c%%www.youtube.com%watch,63v,610fcKYGsBZxU \ No newline at end of file


for which I did not really care as long as I got that file if metadata transferred, but it didn't:

$> git commit -m 'redownloaded "unlucky" video for which no yt: was added' $f [master 379d379ea] redownloaded "unlucky" video for which no yt: was added 1 file changed, 1 insertion(+), 1 deletion(-) $> git annex metadata "$f" metadata Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv

ok ```

Also when I used recent after 202601 version which would auto-upgrade to use VURL key difference was to switch from URL to VURL. Could you please point me on where to read up on VURLs and their benefit for relaxed URLs?

then I tried to do the dance with unregisterurl, rmurl, addurl, which ended up having

$> key=$(readlink "$f" | xargs basename); alog=$(git ls-tree -r git-annex | grep "$key" | awk '/.web$/{print $4;}'); git show "git-annex:$alog"
1772750261s 0 https://www.youtube.com/watch?v=0fcKYGsBZxU
1772750309s 1 yt:https://www.youtube.com/watch?v=0fcKYGsBZxU

and for which I still was not able to get it:

$> git annex get "$f"
get Чат_рулетка/2026-03-05-_армянин_за_путина._Армянин_из_россии_Воевал_против_Украины.mkv (from web...)
  Verification of content failed

  Unable to access these remotes: web

  No other repository is known to contain the file.

  (Note that these git remotes have annex-ignore set: origin)
failed
get: 1 failed
git annex get "$f"  8.22s user 3.63s system 112% cpu 10.505 total

although I think it did fetch it. But i guess it is because of the -s0 in the original key! So original way with git rm + addurl was kinda legit as it also fixed up the URL BUT it lost the metadata for the key.

Is there a quick way to copy metadata from another key? (like internally it does for the same path?)

Or is there a better way to 'fix up URL/key' which would you recommend Joey so I could retain metadata?

Comment by yarikoptic
FTR commands to check and "fix up"

in fears against modification of files in git-annex branch directly, here is the commands to 'check'

$> f=Чат_рулетка/2026-03-05-_армянин_за_путина._Армянин_из_россии_Воевал_против_Украины.mkv
$> key=$(readlink "$f" | xargs basename); alog=$(git ls-tree -r git-annex | grep "$key" | awk '/.web$/{print $4;}'); git show "git-annex:$alog"
1772708470s 1 https://www.youtube.com/watch?v=0fcKYGsBZxU

First I tried to fix via re-addurl, and we do get some difference:

$> git rm "$f"; git annex addurl --no-raw --file "$f" "$url" rm 'Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv' addurl https://www.youtube.com/watch?v=0fcKYGsBZxU (using yt-dlp) (to Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv) ok (recording state in git...) $> git status On branch master Your branch is up to date with 'origin/master'.

Changes to be committed: (use "git restore --staged ..." to unstage) modified: Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv

$> git diff --cached diff --git a/Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv b/Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv index e59a58c35..e12bb1280 120000 --- a/Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv +++ b/Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv @@ -1 +1 @@ -../.git/annex/objects/KQ/x1/URL-s0--https&c%%www.youtube.com%watch,63v,610fcKYGsBZxU/URL-s0--https&c%%www.youtube.com%watch,63v,610fcKYGsBZxU \ No newline at end of file +../.git/annex/objects/wq/jM/URL--yt&chttps&c%%www.youtube.com%watch,63v,610fcKYGsBZxU/URL--yt&chttps&c%%www.youtube.com%watch,63v,610fcKYGsBZxU \ No newline at end of file


for which I did not really care as long as I got that file if metadata transferred, but it didn't:

$> git commit -m 'redownloaded "unlucky" video for which no yt: was added' $f [master 379d379ea] redownloaded "unlucky" video for which no yt: was added 1 file changed, 1 insertion(+), 1 deletion(-) $> git annex metadata "$f" metadata Чатрулетка/2026-03-05-армянинзапутина.АрмянинизроссииВоевалпротивУкраины.mkv

ok ```

Also when I used recent after 202601 version which would auto-upgrade to use VURL key difference was to switch from URL to VURL. Could you please point me on where to read up on VURLs and their benefit for relaxed URLs?

then I tried to do the dance with unregisterurl, rmurl, addurl, which ended up having

$> key=$(readlink "$f" | xargs basename); alog=$(git ls-tree -r git-annex | grep "$key" | awk '/.web$/{print $4;}'); git show "git-annex:$alog"
1772750261s 0 https://www.youtube.com/watch?v=0fcKYGsBZxU
1772750309s 1 yt:https://www.youtube.com/watch?v=0fcKYGsBZxU

and for which I still was not able to get it:

$> git annex get "$f"
get Чат_рулетка/2026-03-05-_армянин_за_путина._Армянин_из_россии_Воевал_против_Украины.mkv (from web...)
  Verification of content failed

  Unable to access these remotes: web

  No other repository is known to contain the file.

  (Note that these git remotes have annex-ignore set: origin)
failed
get: 1 failed
git annex get "$f"  8.22s user 3.63s system 112% cpu 10.505 total

although I think it did fetch it. But i guess it is because of the -s0 in the original key! So original way with git rm + addurl was kinda legit as it also fixed up the URL BUT it lost the metadata for the key.

Is there a quick way to copy metadata from another key? (like internally it does for the same path?)

Or is there a better way to 'fix up URL/key' which would you recommend Joey so I could retain metadata?

Comment by yarikoptic
comment 9

Started developing this in the ephemeral branch.

It seems to also make sense to allow DELEGATE as a response to WHEREIS.

I'm on the fence about delegating GETORDERED. Probably most remotes won't bother to respond to GETORDERED at all, and the only time it makes sense to delegate it is when always delegating to the same type of special remote. If delegating to different special remotes at different times, it doesn't make sense to delegate it to a single on of them.

Similarly I don't think it makes sense to delegate GETINFO unless only delegating to a single special remote. Will probably wait to see if someone has a use case before supporting GETINFO, GETAVAILABILITY, CLAIMURL, etc.

Comment by joey
simplified design with better name

Add to external special remote protocol, enabled by the DELEGATE extension:

DELEGATE type=whatever ephemeral=yes|no [params]

Which can be used as a response to TRANSFER, REMOVE, CHECKPRESENT, TRANSFEREXPORT, CHECKPRESENTEXPORT, REMOVEEXPORT, REMOVEEXPORTDIRECTORY, RENAMEEXPORT

This initializes a delegate special remote in a private namespace, and uses it to perform the operation.

Subsequent uses of DELEGATE with the same configuration avoid the overhead of reinitialization.

With ephemeral=yes, the delegate is automatically removed when the external special remote program shuts down (unless another one is using it.) With ephemeral=no, the delegate remains initialized for use next time.

Comment by joey
Re: comment 6

Is a non-ephemeral aspect visible/accessible outside the context of the special remote that set it up? Would it appear as a regular special remote for a CLI user, as if they ran initremote?

I think it would be best for it not to be visible to the user. Since these remotes can still set their own git configs though, they will necessarily show up in git remote list. (Any git config remote.foo.bar setting is enough for that.) It would be possible for git-annex to not treat them as valid remotes when used outside of the aspect context though. Easiest would be to set annex-ignore on them.

It would be possible to point GIT_CONFIG at a different config file when setting up and using the ephemeral special remote. That would have the problem though that if the special remote looks at some user-set git configs, it wouldn't see them. An example that comes to mind that a special remote would be expected to see is the "credential.helper" configuration. Maybe git-annex could merge .git/config into the ephemeral remote's version when using it? Seems complex and potentially slow though.

(BTW, Even ephemeral aspects will be user-visible while git-annex is running.)

At which stage would INITASPECT (have to) be used? The PREPARE stage, I guess.

I think it could be used at any point.

How expensive could INITASPECT be? Would it (immediately) trigger init/prepare of the aspect-remote?

As expensive as git-annex initremote initially, but subsequenty close to a noop when the remote configuration includes emphemeral=no

Also, calling it repeatedly in the same session with the same configuration should be a noop after the first time. So you could call it immediately before USEASPECT.

That does suggest a simplification: Rather than having a separate INITASPECT command:

USEASPECT type=whatever ephemeral=yes|no [params]

Neat, this avoids needing to name the aspect! And avoids any problem with the aspect name having been used before with a different config.

It also means that any failure to initialize will necessarily make the USEASPECT response be an error message, so error handling takes care of itself.

git-annex would still need a remote name internally; it could eg hash the configuration to get a name.

I'm inclined to go with this simplification.

Do I understand correctly that it would be possible to set the active aspect on a per-key and per-operation basis?

It's per-operation. If you want different aspects for different types of keys it would be up to you to pick between them.

Comment by joey
comment 6

I concur with you reasoning, also in particular with the observation that making this about URLs would be a mistake. I was already trying to have the "redirect" approach do things, it did not want to be used for.

Here is my understanding of the proposed design:

I could use this to implement an "orchestration" special remote that, rather then implementing store and retrieve procedures, is focused on what other implementations shall be used. For this, it can rely on the full set of special remotes available on a system. It would be possible to have a single remote (using this new feature) abstract a data holding site that can be talked to via various protocols, and the specific access approach can be selected dynamically. This would, therefore, include the ability to use a redirect special remote for URL-based downloads.

Few questions which I could not answer with confidence:

  • Is a non-ephemeral aspect visible/accessible outside the context of the special remote that set it up? Would it appear as a regular special remote for a CLI user, as if they ran initremote?
  • At which stage would INITASPECT (have to) be used? The PREPARE stage, I guess.
  • How expensive could INITASPECT be? Would it (immediately) trigger init/prepare of the aspect-remote?
  • INITASPECT-OK|INITASPECT-FAILURE are responses sent by the main git-annex process to the special remote, right? Any implementation would need to implement some kind of error handline (try another aspect, or error also).
  • Do I understand correctly that it would be possible to set the active aspect on a per-key and per-operation basis?
Comment by mih
proposed design

I'm here going with the name "aspect" to refer to a sameas remote that is in a private namespace belonging to the external special remote that uses it. This name is a bit of a placeholder, but I think some name is needed, because it would be surprising if "INITREMOTE" did a different thing than git-annex initremote.

Add to external special remote protocol, enabled by the REDIRECTREMOTE extension:

INITASPECT name type=whatever ephemeral=yes|no [params]
INITASPECT-OK
INITASPECT-FAILURE reason

Add response to TRANSFER, REMOVE, CHECKPRESENT, TRANSFEREXPORT, CHECKPRESENTEXPORT, REMOVEEXPORT, REMOVEEXPORTDIRECTORY, RENAMEEXPORT:

USEASPECT name

With ephemeral=yes, the aspect is automatically removed when the external special remote program shuts down (unless another one is using it.) With ephemeral=no, the aspect remains initialized for use next time.

Note that INITASPECT will successfully do nothing if the aspect already exists with the same config. If an aspect exists with that name but a different config, it will fail. I earlier thought it could remove the old one and make a new one, but that risks removing an aspect that is still in use by another process, which could result in unexpected behavior when that aspect reads its git config or cached creds or etc. It should be easy enough in most cases to avoid reusing the same aspect name for two different configs.

Comment by joey
comment 4

Continuing my line of thought, REDIRECT_REMOTE would I guess be provided with a remote name, not a uuid, since with --sameas the remote would have the same uuid.

While special remote "foo" could use "foo-bar", "foo-baz" etc as the name of its not-really-ephemeral helper remotes, that is not entirely satisfactory, since the user might have their own "foo-bar" remote. Or the user might notice "foo-bar" exists, and start using it, and then it would be painful if "foo" later removes it.

And, new protocol command like initremote does seem to be needed, because if a special remote runs git-annex initremote itself, the git-annex process that is using the special remote won't know about the new remote.

If there's an initremote-like protocol command, the special remotes it inits could be in a separate namespace, and REDIRECT_REMOTE could automatically use that namespace.

For example:

INITREMOTE blah type=blah url=whatever
INITREMOTE-OK
[...]
REDIRECT_REMOTE blah
[...]
REMOVEREMOTE blah

That might make a remote named eg "foo-$foouuid-blah" where $foouuid is the uuid of the special remote foo that owns it. So there is no possibility of collision. That would be in .git/config for the reasons I discussed earlier.

Depending on the type of remote, it might be cheap enough to INITREMOTE and REMOVEREMOTE in the same session. Making it emphmeral, athough with some disk writes happening behind the scenes to update the git config etc. Or, the REMOVEREMOTE could be skipped to leave it set up for the next session. Then an INITREMOTE with the same settings would be optimised to a no-op.

That would have git remote remote foo leave behind the configs for the not-so-ephemeral remotes that it set up. Not a big problem, the user can go in and delete them or a git-annex removeremote could handle it, as well as deleting .git/annex/journal-private/remote.log, cached creds, etc.

Comment by joey
comment 3

In a sparse checkout, most git commands behave as is they were run in a worktree that contains only the files in the sparse checkout, and not other files. Since git-annex uses git commands extensively when identifying files to work on, its commands skip over files not in the sparse checkout.

There are exceptions, the main one seems to be git ls-files, which does list files not in the sparse checkout. All commands that operate on annexed files and that use git ls-files to enumerate files though feed the files into git cat-file --batch, and that will say a file is not found when it's not part of the sparse checkout. So git-annex skips those.

The only exception I can find, and possibily the only one, is that git-annex add will add files that are in a subdirectory that is not included in the sparse checkout. (It uses git ls-files without git cat-file.) That is a different behavior than git add, which refuses to add such files (though the --sparse option overrides and causes them to be added).

I don't know if this git-annex add behavior would be a problem. The documentation for --sparse says that the reason git add doesn't default to it is because, after it adds such a file, it could get removed from the worktree without warning. Which would make it hard to get the file's content back if it didn't get committed first.

If that were a problem, it could be fixed by making git-annex run git ls-files with the --sparse option, which is supposed to filter out files not in the sparse checkout... Except that doesn't seem to work right when I try it. Maybe a bug in git (2.51.0)?

Anyway, my impression is that this would all need playing with to determine if it happens to meet your needs. Bearing in mind that sparse checkout is itself an experimental feature (for 6+ years?) that is documented to be subject to future behavior changes.

Comment by joey