In DataLad we have a special mode for cloning git-annex repos called --reckless=ephemeral
which we discussed with you Joey awhile back as a solution for throw away temporary copies of repos for processing in such a way that we would not need to fetch all TBs of already present on local drive data.
One gotcha is that in such a case population of .git/annex
with new keys in the clone, does not inform original one about those changes. What we then need to do is to eventually run git annex fsck
in original location so it realizes that it got all those possibly new keys. That might take at times quite a while.
I wondered if may be git-annex could gain some "native" support for such use-case which would avoid need for annex fsck
and possibly would immediately reflect information on changes to availability either in that reckless clone (e.g. if it knows UUID of original one e.g. as stored in annex.orig-uuid
config), or even in the original repo (by following the symlink or just some annex.orig-path
dedicated config variable). WDYT Joey?
doh - forgot to add example of what kind of mode of operation I am talking about
Here is the script
running which at the end produces
where
124
file was annex added in the reckless clone. ```Is there any reason you don't initialize the clone with the same uuid as the parent remote? That seems to me like it would make sense, since they are the same git-annex repository.
Can you refresh my memory of where we discussed this --reckless=ephemeral hack? I can't find it discussed by that name anywhere in git-annex or mail mail archives. Just want to understand the motivation of doing that, and why other approaches were not considered.
Occurs to me that this is very similar to
git worktree
. The difference is that you can make whatever changes to git branches in this "ephemeral clone" without affecting the parent repository. But as far as git-annex is concerned, in both cases there are two git repositories that share their .git/annex and so are essentially the same git-annex repository.