Please describe the problem.

With URLs that are handled by the web remote the URL will be kept through a migration, i.e.

git annex addurl --relaxed <some-http-url>
git annex get <the-added-file>
git annex migrate

will migrate the key of the file to be hash based, and keep the URL associated to that key. If I do the same with a URL whose scheme is handled by a custom special remote (this one specifically is what I got the issue with: https://github.com/matrss/datalad-cds/blob/main/src/datalad_cds/cds_remote.py, it registers itself for the cds: scheme), the URL seems to be dropped from the key (i.e. whereis no longer shows it and git annex can no longer fetch it from the special remote).

What steps will reproduce the problem?

The steps mentioned above for a URL whose scheme is handled by an (external?) special remote. Specifically, I saw it with datalad-cds, it might happen for other special remotes as well.

What version of git-annex are you using? On what operating system?

git-annex version: 10.20231129
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24.1 bloomfilter-2.0.1.2 crypton-0.32 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.8 http-client-0.7.15 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
local repository version: 10

Installed with nix from nixpkgs on an ubuntu system.

Please provide any additional information below.

It works with the web remote:

# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
$ datalad create test-ds
create(ok): [...] (dataset)
$ cd test-ds/
$ git annex addurl --relaxed "https://git-annex.branchable.com/"
addurl https://git-annex.branchable.com/ (to git-annex.branchable.com_) ok
(recording state in git...)
$ git annex whereis git-annex.branchable.com_ 
whereis git-annex.branchable.com_ (1 copy) 
    00000000-0000-0000-0000-000000000001 -- web

  web: https://git-annex.branchable.com/
ok
$ ls -l
[...] git-annex.branchable.com_ -> '.git/annex/objects/pJ/v4/URL--https&c%%git-annex.branchable.com%/URL--https&c%%git-annex.branchable.com%'
$ git annex get git-annex.branchable.com_ 
get git-annex.branchable.com_ (from web...) 
ok
(recording state in git...)
$ git annex migrate
migrate git-annex.branchable.com_ (checksum...) ok
(recording state in git...)
$ ls -l
[...] git-annex.branchable.com_ -> .git/annex/objects/31/qv/MD5E-s12462--6b956d66f8352205df79936ada326ec3/MD5E-s12462--6b956d66f8352205df79936ada326ec3
$ git annex whereis git-annex.branchable.com_ 
whereis git-annex.branchable.com_ (2 copies) 
    00000000-0000-0000-0000-000000000001 -- web
    60079e0e-42e4-492e-a7b1-dde764d069eb -- [here]

  web: https://git-annex.branchable.com/
ok
# End of transcript or log.

It doesn't work with the custom special remote:

$ datalad create test-ds
create(ok): [...] (dataset)
$ cd test-ds/
$ datalad download-cds --lazy --nosave --path "2022-01-01.grib" "$(cat <<EOF
{
    "dataset": "reanalysis-era5-complete",
    "sub-selection": {
        "class": "ea",
        "date": "2022-01-01",
        "expver": "1",
        "levelist": "1",
        "levtype": "ml",
        "param": "130",
        "stream": "oper",
        "time": "00:00:00/06:00:00/12:00:00/18:00:00",
        "type": "an",                                 
        "grid": ".3/.3"
    }
}
EOF
)"
cds(ok): [...] (dataset)
$ ls -l
[...] 2022-01-01.grib -> '.git/annex/objects/Mx/4G/URL--cds&cv1-eyJkYXRhc2V0IjoicmVhbmFs-162a71d794c333f5e04b13283421a49a/URL--cds&cv1-eyJkYXRhc2V0IjoicmVhbmFs-162a71d794c333f5e04b13283421a49a'
$ git annex whereis 2022-01-01.grib 
whereis 2022-01-01.grib (1 copy) 
    923e2755-e747-42f4-890a-9c921068fb82 -- [cds]

  cds: {"dataset":"reanalysis-era5-complete","sub-selection":{"class":"ea","date":"2022-01-01","expver":"1","levelist":"1","levtype":"ml","param":"130","stream":"oper","time":"00:00:00/06:00:00/12:00:00/18:00:00","type":"an","grid":".3/.3"}}
  cds: cds:v1-eyJkYXRhc2V0IjoicmVhbmFseXNpcy1lcmE1LWNvbXBsZXRlIiwic3ViLXNlbGVjdGlvbiI6eyJjbGFzcyI6ImVhIiwiZGF0ZSI6IjIwMjItMDEtMDEiLCJleHB2ZXIiOiIxIiwibGV2ZWxpc3QiOiIxIiwibGV2dHlwZSI6Im1sIiwicGFyYW0iOiIxMzAiLCJzdHJlYW0iOiJvcGVyIiwidGltZSI6IjAwOjAwOjAwLzA2OjAwOjAwLzEyOjAwOjAwLzE4OjAwOjAwIiwidHlwZSI6ImFuIiwiZ3JpZCI6Ii4zLy4zIn19
ok
$ git config --local remote.cds.annex-security-allow-unverified-downloads ACKTHPPT
$ git annex get 2022-01-01.grib 
get 2022-01-01.grib (from cds...) 
2024-04-06 11:37:05,250 INFO Welcome to the CDS
2024-04-06 11:37:05,251 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-complete
2024-04-06 11:37:05,340 INFO Request is queued
2024-04-06 11:37:06,400 INFO Request is running
2024-04-06 11:37:26,399 INFO Request is completed
2024-04-06 11:37:26,399 INFO Downloading https://download-0017.copernicus-climate.eu/cache-compute-0017/cache/data9/adaptor.mars.external-1712396225.5545986-18258-18-822e5b91-cf60-4dbd-a808-a1253d4fe109.grib to .git/annex/tmp/URL--cds&cv1-eyJkYXRhc2V0IjoicmVhbmFs-162a71d794c333f5e04b13283421a49a (5.5M)
  0%|          | 0.00/5.51M [00:00<?, ?B/s]  2%|▏         | 104k/5.51M [00:00<00:06, 866kB/s] 17%|█▋        | 942k/5.51M [00:00<00:00, 5.00MB/s] 57%|█████▋    | 3.15M/5.51M [00:00<00:00, 13.0MB/s] 99%|█████████▉| 5.48M/5.51M [00:00<00:00, 17.3MB/s]                                                    2024-04-06 11:37:27,322 INFO Download rate 6M/s
ok
(recording state in git...)
$ git annex migrate
migrate 2022-01-01.grib (checksum...) ok
(recording state in git...)
$ ls -l
[...] 2022-01-01.grib -> .git/annex/objects/KJ/6K/MD5E-s5774880--94a848eefd02d72952c8541c52a93550.grib/MD5E-s5774880--94a848eefd02d72952c8541c52a93550.grib
$ git annex whereis 2022-01-01.grib 
whereis 2022-01-01.grib (1 copy) 
    5dfef0c9-8e18-4ea2-9ee1-646830b5749b --  [here]
ok
$ git annex drop 2022-01-01.grib 
drop 2022-01-01.grib (unsafe) 
  Could only verify the existence of 0 out of 1 necessary copy

  Rather than dropping this file, try using: git annex move

  (Use --force to override this check, or adjust numcopies.)
failed
drop: 1 failed

I know that this is sort of abusing the URL handling in git-annex, but it was super easy to implement. You recommended me to use SETSTATE/GETSTATE from the external special remote protocol instead already at some point, but I didn't get around to reworking it for that yet.

Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

Yes! It is absolutely great, thank you for it.