Recent comments posted to this site:

I tried to construct a test case that I can from from these transcripts.

key=XDLRA--refs
mkdir $base/repo
cd $base/repo
git -c diff.ignoreSubmodules=none -C $base/repo init --bare
git config -z -l --show-origin
git -c diff.ignoreSubmodules=none config annex.private true
git -c diff.ignoreSubmodules=none annex init
git config -z -l --show-origin
git annex version --raw
git -c diff.ignoreSubmodules=none annex initremote origin type=directory directory=$remote encryption=none -c annex.dotfiles=true
git config -z -l --show-origin
git -c diff.ignoreSubmodules=none cat-file blob git-annex:remote.log
git -c diff.ignoreSubmodules=none annex fsck -f origin --fast --key $key -c annex.dotfiles=true
git -c diff.ignoreSubmodules=none annex drop --force --key $key -c annex.dotfiles=true
git -c diff.ignoreSubmodules=none annex get --key $key -c annex.dotfiles=true
git -c diff.ignoreSubmodules=none annex contentlocation $key -c annex.dotfiles=true
git -c diff.ignoreSubmodules=none annex drop --force --all -c annex.dotfiles=true

Now, this would normally fail at the fsck step, since no object file exists for XDLRA--refs. I suppose that you must have pre-populated the special remote with it, somehow, and left that part out. The fsck would then learn that the object does exist there.

It seems to me that how you prepopulated it might be a crucial detail, since that object file gets copied from the special remote and perhaps something to do with its permissions etc are what is later preventing deleting the copy.

You're also using an external backend for whatever reason, and it would simplify the test case if that external backend were not needed and it just used a built-in backend. Using key=WORM--refs seems like it should behave the same, unless your external backend is doing something very strange. If the external backend is necessary to reproduce it, I will need some kind of minimal version of it.

So, I've tried pre-populating the file in the remote like this:

mkdir -p $remote/06b/f75/WORM--refs
echo hi > $remote/06b/f75/WORM--refs/WORM--refs

With that, I've gotten as far as getting the test case to succeed on linux.

But I don't think it's worth the bother of getting a windows environment set up and running these commands on it, without first knowing how you prepopulated the special remote with the object file.

Comment by joey Wed Jun 22 16:42:25 2022
My main use-case is actually the glob patterns; the first two use-cases. The other ones would just come as a bonus thanks to git-annex-matching-expressions.
Comment by Atemu Wed Jun 22 12:34:45 2022

The one caveat here is git annex sync --content --no-pull seems to ignore the no-pull option and tries to redownload all the raw files back to the laptop. How to do a one-way sync (laptop -> remote only with content)?

AFAICT, pull only concerns git actions, not git-annex actions like transferring content.

What your laptop's repo tries to copy depends on what content it wants and numcopies.

The latter probably isn't your problem since you should have at least 2 copies of everything distributed over the drives.
The former should ideally be solved with a proper customised groups setup but you can also set a wanted expression or use the built-in groups which is probably your best option. Your laptop's repo should be set to client or manual.

See https://git-annex.branchable.com/git-annex-preferred-content/ and https://git-annex.branchable.com/preferred_content/standard_groups/

XMP sidecars that are produced are tracked in regular git

Have you set annex.largefiles to anything?

Sidecar files are just XML AFAICT, so a filter that would only add files with binary mimeencoding to annex would exclude these and add them to regular git.

If you haven't touched that setting (double check with git config), that's a bug.

Try to sync with any available remote (this is the part I can't seem to figure out. I wish git-annex could be detect and try to to push to all available remotes instead me having to track which remotes are available). git annex [copy|move] have the --to=here option which will copy/move files from remotes to local repo. It would be exceptionally useful to have a --to=reachable option to send files to any reachable remote instead of having to copy/moves to each remote individually.

By default, sync syncs with all available remotes.

How exactly are your repo's remotes set up?

What does it to when syncing that you don't want it to?

Comment by Atemu Wed Jun 22 12:24:21 2022
Thank you Joey! I will try to gain better insight into our dance in that test of ours. I am also curious how it worked before that it doesn't work now. FWIW with datalad clone we do try URL as is, and then also with /.git to make users' life easier and so they could just e.g. datalad clone https://datasets.datalad.org.
Comment by yarikoptic Tue Jun 14 18:47:24 2022

The change that broke your test uses the exact same url uuid detection code that git-annex usually uses. So if you were using an url without .git before, it seems like it would have failed to detect a uuid after enabling the special remote.

If that reasoning does not hold, I'm going to need a test case to understand why. Unfortunately, your test case is hard for me to understand what it's doing.

I can't even get git to clone a non-bare git repo served over plain http without using .git in the url, so don't understand how your test case is doing that.

When I manually make a git remote with a http url without .git (which git cannot pull from), I see the git-annex uuid detection code failing, as I would expect:

git remote add test http://localhost/~joey/tmp/foo/
git config annex.security.allowed-ip-addresses all
git-annex info

  Remote test not usable by git-annex; setting annex-ignore

  http://localhost/~joey/tmp/foo//config download failed: Not Found

And the initremote behavior is consistent with that:

git-annex initremote bar type=git location=http://localhost/~joey/tmp/foo/
initremote bar
  http://localhost/~joey/tmp/foo//config download failed: Not Found

git-annex: git repository does not have an annex uuid
failed

(It's worth noting that the change was made to handle another datalad use case, ?strong>git is not working for unkn reason. I'd consider that use case unusual, and marginally worth supporting. I could of course revert the change, but then I'd have to close that bug as wontfix.)

Comment by joey Tue Jun 14 15:51:57 2022
Ok, thank you.
Comment by KachoOji Tue Jun 14 15:32:48 2022
the "heart" of the issue is that in initremote we provide URL which is without /.git suffix. That used to work just fine and our test testing git annex get on the clone accomplished the mission.
Comment by yarikoptic Mon Jun 13 23:52:48 2022

Most of this can be done with either git-annex adjust --hide-missing or views.

The only thing that cannot is limiting files to those matching a glob, though views can limit files to the contents of directories.

Unfortunately, adjusted branches and views don't compose. There is a todo about that, unify adjust with view. I think that is what you're looking for, or very close.

Comment by joey Mon Jun 13 17:39:41 2022

Well, this is a case where git add is failing for some reason. It seems very likely that it output some error message to stderr, so I'd look for that in the log. It's not clear to me if your grep would show that.

If git add was silent on stderr, then I'd move on to thinking it must have crashed (or have a weird bug). The 123 exit code is from xargs and only indicates that git add exited nonzero, so we don't know what the actual exit code was. But it seems that git add did not die of a segfault; it if had the exit code from xargs would be 125.

Since git-annex is batching up some number of files and sending them all to a single git add, the files that come last would fail to be added if any earlier file somehow caused git add to give up. So that could explain why it happens more often for some files.

Comment by joey Mon Jun 13 17:14:42 2022

There is not currently an easy way to do that. Although I suppose you could write a script to rename the files based on the filenames whereused outputs.

I do think this would be a good enhancement. I've made a todo git-annex-addunused-historical.

Comment by joey Mon Jun 13 17:06:21 2022