Recent comments posted to this site:

I'd like to set a few additional configurations so that all clones treat a special remote similarly.

Particularly I'd like to set the trustlevel and tracking-branch for an exporttree special remote so that any clone that enables this remote also have these configurations enabled. In particular this is justified for a certain remote of mine because it exports to a version controlled environment that I trust, so it would just be nice not to have to run git config remote.name.annex-tracking-branch and git annex trust name semitrusted for every clone.

Of course, are git annex config --set remote.name.annex-trustlevel "semitrusted" and git annex config --set remote.name.annex-tracking-branch "main" (called once) any easier than the above called multiple times? Maybe not, but it would be slightly less mental overhead to not do the above.

Off hand can you imagine any caveats that would preclude adding these settings to the list of supported for this command? I agree that only some make sense for all clones to see rather than anything one can set in git config but of course that specification requires manual addition of config cases that do make sense. Maybe this is one of them.

Comment by Spencer Wed Oct 9 23:10:17 2024

What's the reason for not supporting importtree via webdav?

Would be nice to keep a tree in sync on my nextcloud and sync to my phone etc.

Comment by annex Wed Oct 9 06:42:40 2024

To access the manifest and bundles, one needs the UUID of the special remote initially configured. Then one can run

git clone 'annex::<UUID>?type=directory&encryption=none&directory=/path/to/space%20sanitized%20directory'

A bit tedious for both the need to type all settings (even those not shown by the remote helper when doing the push operations from the initial repo, in this case the directory, in other cases all required settings to init the remote in the first place) and for having to HTML sanitize any URL disallowed characters. But doable

The other option would be to manually clone by initializing the new empty repo, then adding the special remote the normal git annex way. This doesn't work right just yet because --uuid is not an allowed option for initremote. It would be nice if this were an option simply to avoid the tedium of typing the URL as above (one could copy and paste git --no-pager show git-annex:remote.log into initremote)

Despite the URL tedium, an exciting result of the current system is that any number of repos and file annexes can share one directory! Like an entire organization (or repo group) in one folder. Datalad has a similar archetype (remote indexed archives) which offer (slightly) improved user friendliness by filing each repo UUID into meaningfully-named folders (unhashed first three/remaining is nice for being actually the UUID but it still doesn't let me easily copy/paste the UUID for cloning). Although I kind of like how git-annex's implementation encourages a single unified "annex" (rather than RIA's UUID/Annexwhich gives each UUID a separate annex) and of course bundles over loose git files, especially for cloud special remotes which can be slow to upload each and every loose file.

Looking forward to seeing how this feature develops!

Comment by Spencer Mon Oct 7 20:00:24 2024
This bug report might have been a bit prematurely, I just noticed that p2phttp returns a 403 for the file locking endpoints in the case of unauth-readonly, while my forgejo-aneksajo implementation returns a 401. This seems to make git-annex prompt for a password instead of falling back to just checking presence.
Comment by matrss Mon Oct 7 14:12:23 2024
Have the same issue
Comment by sng Sun Oct 6 21:42:13 2024

Perhaps Joey can help me out here a bit with some background knowledge:

I've been seeing sporadic corruption with this setup:

  • chunking
  • encryption
  • old helper program git-annex-remote-rclone
  • rclone's pcloud backend

As it seems, rclone keeps partial files under the name of the full file when a transfer is interrupted, for the pcloud backend. (This is for rclone <= 1.67.0; 1.68.0 has changes for pcloud, which may fix this.) My theory how the corruption might have happened:

  • First interrupted run of git-annex uploads chunks A and a partial(!) chunk B
  • Second run skips chunks A and B(!); and proceedsto upload the rest of the chunks (C and D)
  • At the end we have uploaded A, C and D and a corrupted/partial chunk B

Joey: Is this a possible error scenario?

Comment by mike Fri Sep 27 12:18:41 2024
@adehnert: Setting default preferred content expressions is an open todo and Joey acknowledges that it's useful, but he didn't implement it yet. You could voice your motivation for this feature over in that todo, to keep everything sorted. I'm fully with you that this is very much needed and I always fall into the trap of running git annex assist directly after a git clone, wondering why I'm getting a million files shoved into my face, CTRL+C'ing it, being left with a weird unclean work tree for the download-aborted unlocked files, so I have to git restore . again, then configuring git annex wanted present before I continue.
Comment by nobodyinperson Wed Sep 25 09:25:42 2024

Is there any way to set a default preferred content setting -- either used when a new clone is made or whenever a repo doesn't specify one?

I've got an annex that has a couple servers with all the content, and several clients[1] -- which I create more often and more manually -- that just want the content I pick. Basically every time I set up another client, I run git annex sync --content, am surprised to see a bunch of get ... lines, go kill the sync, set group and preferred content to be manual/standard, and run the sync again. It'd be handy if I could set up the repo in advance to just configure that by default. (I guess I could make an alias that does like git clone $server/$repo && cd $repo && git annex wanted . standard && git annex group . manual, but it'd be nice if I could just do the git clone I'm used to and it would all work.)

[1] AIUI, the "client" group means "get every file referenced in HEAD, unless it's in archive/, and skip older versions"? I guess that makes sense for like a software project with some media assets. I've mostly used git-annex for situations where most files aren't being actively worked with and clients only have a few of them, which is where it seems to really shine over GitLFS. I've always been vaguely surprised by how the client group works as a result. Any sense of how commonly people use it for different use cases? It is excellent for the sparse checkout case though.

Comment by adehnert Tue Sep 24 00:02:20 2024

Thanks to Thowz for the above solution.

There's a couple of scaling issues for large numbers of files (100K+ files in my case) which makes it go slowly and actually breaks the command line length ("Argument list too long").

Here's my modification for the first two commands:

# Enable write permissions on *directories* containing misfiled items
find -xtype l -printf "%l\0" \
  |sed -z -r "s#.*(\.git/annex/objects)/(.)(.)/(.)(.)/([^/]*).*#\1/[\L\2\U\2][\L\3\U\3]/[\L\4\U\4][\L\5\U\5]/\E\6#" \
  |sort -z \
  |uniq -z \
  |xargs -0 -ifoo bash -c "chmod u+w foo"

# Reinject the *files* into the annex (note different sed pattern)
find -xtype l -printf "%l\0" \
  |sed -z -r "s#.*(\.git/annex/objects)/(.)(.)/(.)(.)/(.*)#\1/[\L\2\U\2][\L\3\U\3]/[\L\4\U\4][\L\5\U\5]/\E\6#" \
  |sort -z \
  |uniq -z \
  |xargs -0 -ifoo bash -c "git-annex reinject --known foo"

If you used bsdtar (or some other method that attempts to copy over Apple metadata resource forks) you'll see a ton of ._ prepended files in your archive. If you're using this on Linux going forward and want these to be cleaned up (and enable the below directory cleanups to actually succeed and know you don't actually want any of the metadata) you'll want to delete these with something like this:

find .git/annex/objects -name ._\* -print0 |xargs -0 rm

You can then continue with his last two commands:

# Remove empty directories (rmdir will fail on the non-empty directories)
find .git/annex/objects -mindepth 3 -maxdepth 3 -type d -exec rmdir {} \;
find .git/annex/objects -mindepth 2 -maxdepth 2 -type d -exec rmdir {} \;

# And, if you want to be thorough, add this one...
find .git/annex/objects -mindepth 1 -maxdepth 1 -type d -exec rmdir {} \;
Comment by AaronBrooks Sun Sep 22 22:21:05 2024

Another place this came up is https://git-annex.branchable.com/design/passthrough_proxy/#index14h2 where a proxy to an encrypted special remote necessarily does encryption server side, but the user may not want the server to see their unencrypted files.

There I suggested "adding a special remote that does its own client-side encryption in front of the proxy". Such a layered special remote could also be used with a git remote. There would be some complexity cost, since you would have two remote names, one used for git and the other for git-annex.

Implementing object encryption in git remotes is certianly possible, but it would be a special case and the existing code for encrypting special remotes (particularly Remote.Helper.Special.specialRemote) would not be able to be reused.

There's also the problem that, if such a git repository is added as a regular remote, and the git-annex branch that indicates that it is encrypted has not yet been pulled, git-annex would not realize that it is supposed to be encrypted, so would send unencrypted objects to it. This seems like an easy situation to accidentially get into eg:

git remote add foo http://example.com/
git annex move --to foo # oops unencrypted

Overall I prefer the idea of layering an encrypted special remote to complicating the git remote with encryption. Enabling that special remote could make git-annex treat the underlying remote as annex-ignore, to prevent accidentially sending unencrypted objects to it.

There could also be situations where you want to store some files unencrypted on a git hosting site to let them be accessible via its UI, but encrypt other files, and the layered special remote also allows for that kind of thing.

Comment by joey Wed Sep 18 12:17:02 2024