Recent comments posted to this site:
The reason downloading from the web special remote with just an URL key is allowed is because it avoids the specific security holes that annex-security-allow-unverified-downloads was added to deal with.
The urls are actually being copied to the new key. But whereis only displays
an url when it's located in a remote that contains a key. After the migration,
git-annex does not think that your special remote contains the new key,
so it does not display urls that are claimed by that special remote.
For the web, there is a special case that handles this, recording that the new key is present in the web special remote. (setUrlPresent does it)
I don't think it makes sense to make that be done for other types of
special remotes generally. Consider that git-annex get
would request the special remote get the new key. If the special remote
is a key/value store, the new key is not located on it, so the get would
fail. Similarly, git-annex fsck --from specialremote would detect a
problem.
Maybe there are other special remotes that are also url based in a way that it would make sense to update the location log when migrating a key. The bittorrent special remote comes to mind.
Once botan includes the blake3 hash, blake3 hash support could happen more or less for free..
Another route might be to use botan, once it supports blake3.
There is some work toward using botan in git-annex is slow at reading file content and once it's used, adding blake3 would just be adding another hash.
Botan3 can also accellerate sha1 and sha3 (on machines with AVX-512). https://botan.randombit.net/handbook/hardware_acceleration.html
I think it would make sense at this point to have a botan build flag that uses botan for all hashing. It may be that it's faster at other hashes than crypton too.
The build flag would need to not be default, since it needs the C++ library to be installed. They seem to be moving toward vendoring the C++ library in the haskell package: https://github.com/haskell-cryptography/botan/issues/98 At which point the build flag could become the default.
I have some early work toward this in the botan branch. A proof of
concept implementation for SHA256 keys benchmarked roughly
twice as fast on my laptop.
I have done some work adjacent to this todo, implmenting a --wanted
option and git-annex put command.
Now, if someone wants the equivilant of git-annex pull --json
$someremote, they can run:
git-annex pull --no-content $someremote
git-annex get --wanted --json --from $someremote
git-annex drop --wanted --json
The git-annex pull above does not have json output, but outputs the
usual git pull messages for the user to deal with as they see fit.
And, if someone wants the equivilant of git-annex push --json
$someremote, they can run:
git-annex copy --wanted --json --to $someremote
git-annex drop --wanted --json --from $someremote
git-annex push --no-content $someremote
The git-annex push above does not have json output, but outputs the
usual git push messages for the user to deal with as they see fit.
Similarly, the equivilant of git-annex pull --json with no remote
specified:
git-annex pull --no-content
git-annex get --wanted --json
git-annex drop --wanted --json
And, the equivilant of git-annex push without a remote specified:
git-annex put --wanted --json
git-annex drop --wanted --json
git-annex push --no-content
So, the argument for adding --json to pull/push now seems to be reduced. Here are all the arguments I can think of for still doing that:
These command sequences won't behave completely identically to pull/push in all configurations, eg they don't look at
remote.<name>.annex-pullandremote.<name>.annex-pushconfigs.A single
git-annex pushorpull(orsync) does less work than several git-annex commands. In the command sequences above, git-annex has to traverse the tree twice. That is a pretty small difference in overhead though most of the time.
git-annex get picks which remote to use, and falls back as needed to
another remote if the first is not available, and of course does nothing
if the content is present already.
It would be perhaps most symmetric with that if git-annex put picked one
remote to send content to (ie, the lowest cost one that wants it), fell
back to the next best remote if that one was not available, and avoided
sending any content for files that are in some other repository already.
As well as just being symmetric, that feels like a useful behavior that is not currently possible to get from any git-annex command.
That's in tension with the idea that git-annex put --json would
send to the same remotes that git-annex push would. Maybe that
behavior should be an option? Or maybe that belongs in yet another command.
Just how useful would the 1 copy behavior be? One indication maybe is that
noone has ever asked for that behavior. And it seems like it would be easy
for the content to go to an unexpected place and break a workflow. Eg,
suppose a user starts using git-annex put, which sends the content
to origin and makes it available to others. But they also have a remote
for a local USB drive, which has been disconnected all that time. When
they one day reconnect that drive, it has a lower cost, and so their
puts start going there, preventing others from accessing the files.
Also it's worth noting that pull picks a remote to get to, but
push sends to all remotes that want it. So this particular symmetry
is not maintained all the way up. So perhaps it's not a useful symmetry.
Overall, it seems like something that could be an option and not the default. If someone has a good use case.
FWIW, I've split updateBranches between pull and push now.
On git-annex push all it does is propagate adjusted branches
changes back to the original branch.
On git-annex pull it handles updating the view branch and/or
propagating changes from the original branch to the adjusted branch.
Also, git-annex push was fixed to not merge synced/master into master
and to not update the adjusted branch when the original branch has changed.
I'm surprised it responds to HEAD at all. It's not a documented part of the p2phttp API, and the implementation is only a GET endpoint. I guess that servant makes GET endpoints also support HEAD? Urk.
Yes, I think all of the "higher-level http server frameworks" I've encountered (definitely Flask and the construct Forgejo is using, but also others) automatically support HEAD for all GET endpoints, because a properly implemented HEAD is a subset of GET anyway. I'd expect servant to do the same.
(I do think it could have also happened without HEAD with just the right timing of the client hanging up on GET, still have not verified that. Of course, we had a whole bug about p2phttp can get stuck with interrupted clients that was dealt with previously, but maybe we missed it back then.)
At least I didn't get the p2phttp server stuck with interrupted clients while investigating this issue (that was my initial guess on what was causing the server to get stuck in the first place), but I did see a different bug that I didn't yet report which caused the p2phttp server to exit with exit code 141 if a client was interrupted at the "right" time. This one might already be fixed by https://git-annex.branchable.com/bugs/SIGPIPE_behavior_change/ though.
I've also documented HEAD /git-annex/$uuid/key/$key as supported by p2phttp because if you give a HTTP client an URL, I suppose it may try HEAD.
The initial use-case by mih was to point git annex addurl at this key endpoint, and that does try HEAD, which triggered the bug. So even git-annex itself does it, it just fell out of the report when I reduced the reproducer as far as possible 
Fixed this.
Thank you!
Fixed this.
(I do think it could have also happened without HEAD with just the right timing of the client hanging up on GET, still have not verified that. Of course, we had a whole bug about p2phttp can get stuck with interrupted clients that was dealt with previously, but maybe we missed it back then.)
I've also documented HEAD /git-annex/$uuid/key/$key as supported by
p2phttp because if you give a HTTP client an URL, I suppose it may try
HEAD.
I would rather that the versioned GET endpoints not also support HEAD,
just because it's not part of the interface git-annex uses. If I find a way
to prevent servant from automatically supporting HEAD for those, I will
use it.