Add annex.orig-uuid config for 'ephemeral clones'

In DataLad we have a special mode for cloning git-annex repos called --reckless=ephemeral which we discussed with you Joey awhile back as a solution for throw away temporary copies of repos for processing in such a way that we would not need to fetch all TBs of already present on local drive data.

One gotcha is that in such a case population of .git/annex with new keys in the clone, does not inform original one about those changes. What we then need to do is to eventually run git annex fsck in original location so it realizes that it got all those possibly new keys. That might take at times quite a while.

I wondered if may be git-annex could gain some "native" support for such use-case which would avoid need for annex fsck and possibly would immediately reflect information on changes to availability either in that reckless clone (e.g. if it knows UUID of original one e.g. as stored in annex.orig-uuid config), or even in the original repo (by following the symlink or just some annex.orig-path dedicated config variable). WDYT Joey?

RSS Atom

comment 1

doh - forgot to add example of what kind of mode of operation I am talking about

Here is the script

#!/bin/bash
set -ex

cd "$(mktemp -d ${TMPDIR:-/tmp}/dl-XXXXXXX)"

mkdir orig
( cd orig; git init; git annex init; echo 123 > 123; git annex add 123; git commit -m 123 123; )

git clone orig ephemeral

(
cd ephemeral

# making it ephemeral
ln -s ../../orig/.git/annex .git/annex

git annex init; 

echo 124 > 124
git annex add 124
git commit -m 124 124

git annex whereis 124
)

(
cd orig;

git remote add ephemeral ../ephemeral
git pull ephemeral master

: it would still not know that it got 124
git annex whereis 124 || echo "exited with $?"

git annex fsck

: but would know now after fsck
git annex whereis 124
)

running which at the end produces

+ : it would still not know that it got 124
+ git annex whereis 124
whereis 124 (0 copies) failed
whereis: 1 failed
+ echo 'exited with 1'
exited with 1
+ git annex fsck
fsck 123 (checksum...) ok
fsck 124 (fixing location log) (checksum...) ok
(recording state in git...)
+ : but would know now after fsck
+ git annex whereis 124
whereis 124 (1 copy) 
    a813ca99-ce43-4e57-b7d9-c3a1456c6b55 -- yoh@lena:~/.tmp/dl-hlNOqBM/orig [here]
ok

where 124 file was annex added in the reckless clone. ```

Comment by yarikoptic — Fri Mar 10 02:52:08 2023

Remove comment

comment 2

Is there any reason you don't initialize the clone with the same uuid as the parent remote? That seems to me like it would make sense, since they are the same git-annex repository.

Can you refresh my memory of where we discussed this --reckless=ephemeral hack? I can't find it discussed by that name anywhere in git-annex or mail mail archives. Just want to understand the motivation of doing that, and why other approaches were not considered.

Comment by joey — Fri Mar 10 15:43:58 2023

Remove comment

comment 3

Occurs to me that this is very similar to git worktree. The difference is that you can make whatever changes to git branches in this "ephemeral clone" without affecting the parent repository. But as far as git-annex is concerned, in both cases there are two git repositories that share their .git/annex and so are essentially the same git-annex repository.