I have two sets of git annex repositories:
A: Contains a tree of some files and then a huge chunk of files
B: Already contains a ton of files
Both repository sets have their own set of repos on different machines that are connected to another.
I now want the huge chunk of content in A to be in B instead of A but the rest of the content should remain in A.
On git side, I think basically what this boils down to is that I want to move the git sub-tree to the other repo which ought to be possible by simply git add
ing them. I might want to get fancy and cherry-pick the commit from the other repo in a way that keeps it but moves the content to another subdirectory but that's the simple part.
On the git-annex side, I want all the keys that were exclusively referenced in this set of files to be marked as dead and dropped from remotes which contain them. I effectively want to remove those keys from git-annex' consideration.
The keys should be added to B. They should start out with a blank location tracking state as B should not know about A's repositories. They should however retain any other metadata they previously had in A.
How could I achieve this?
I do this quite often because I use a monorepo approach with regular refactoring of subtrees into their own submodules. I have yet to find a bulletproof way to do this on the git-annex side.
The first step is as simple as
git annex unannex
inA
, or including--include "*"
if pattern matching is easier.git
side, this logs the files as deleted from the main repo (src
, let's call her). This is ideal so that you have a record for yourself (with a descriptive commit message) of where you've moved your files to.git-annex
side, (once you commit), the file data will eventually become "unused" - you'll have to do some combination ofgit annex push
andgit annex sync [--cleanup]
to ensure all branches really don't reference those files (including remote branches andsynced/*
branches).Now the question is: how do we get the data into the new repo (
dst
) and safely drop fromsrc
?dst
as a remote ofsrc
and pull onlydst
'sgit-annex
branch, which (after moving, re-annexing, and committing the unannexed files todst
) now shows as having a copy of those files. (Warning: this has bad side-effects).dst
to move any (used) files fromsrc
(Warning: this has bad side-effects).dst
as a remote andmove
unused files over (requires a clean unused stack already and having to do the push/sync stuff correctly and fully before the files can be released)src
first then move them over todst
. (Required because perdst
's knowledge, it has no record ofsrc
having any keys. I find it logical albeit sad thatgit-annex
can't dynamically poll local repos' annexes for file content)Conclusions
git annex unused
gives nothing) as much as you can, and clean it out before testing out any sort of move/drop operations like this.gx unannex
insrc
:src
as a remote indst
,mv
files intodst
,gx add
files indst
,gx copy
files fromdst
back tosrc
, then dogx move -f <src>
dst
. If it so happens that one of these files is actually duplicate data with something you want to also be insrc
, this will drop it and leave no record insrc
of where it went (besides yourgit
commit message).As described, there are still side effects with Option 4, but it's so far the best option I've devised. Oh, and if you want to keep
src
around as a remote ondst
to e.g. remind yourself of various relations, make sure you configure it in.git/config
with:annex.sync=false
. This skips it when you do agit annex sync
remote.fetch
spec, or addremote.skipFetchAll=true
. This ensuresgit fetch
doesn't fetch all the branch and unrelated objectsNow, what happens if a side-effect does happen and it looks like you lost some content and don't know where it went?
git annex whereis
is no help. Instead, you have to extract the key from the now broken symlink and runfind <> -type f -iname "<KEY>"
. Easy enough but kind of scary when it happens to you.Side-Effects of Option 1+2:
git-annex
synchronizationDON'T DEAD OPEN INSIDE
While this is currently the only way to propagate annex key information, it has bad side-effects:
git-annex
branch. For me this is a no-go because I have redundant remotes (an exporttree calleddropbox
in my case)dead
these remotes or repos and by coincidence thegit-annex
branch is later absorbed in the other direction, chaos ensues (dead
is propagated, remote annex key history is killed: especially gross for export/importtrees)dead
,forget --drop-dead
thensemitrust UUID
. Many steps, potentially undefined condition. Gross.Potential Feature Requests
Ideally, I would wish
git-annex
could intelligently scan another repo's annex and populate information about what keys it has simply by what keys are objectively in.git/annex/objects
. This pulls in the information we care about without cluttering additional information relevant only to each respective repo. Then, presuming you've set up a remote (dst
) pointing to this repo (src
) and rungit annex info
, thensrc
should have a list of keys that are insidedst
, andgx whereis
fromsrc
will identify the keys insidedst
, anddrop
will happily do so.acquaintance
repo that is not allowed to be synced, pulled, fetched, pushed to.gx forget
, the list of keys is wiped.Turns out, the answer is simple:
git rm --cached "B"
B
):git add
git remote add tmp.parent <relpath/from/B/root/to/A/root>
git annex get
git remote remove tmp.parent
if you need just the files moved around
I haven't used metadata so I can't comment on how to move that around but you might have to rely on something akin to my first comment. In my brief testing, because metadata is stored in the
git-annex
branch on a per-key level, it does in fact require merging of the git-annex branch somehow to transfer.In short:
git-annex
can get file content in both an informed and uninformed way. Ifgit-annex
knows about content in a repo because of historic moves/copies-to or merging ofgit-annex
branches, it has informed knowledge of what's in certain remotes. If it does not, then it can still do an uninformed query for potential file content. In this way, e.g.git annex info
andgit annex list
may show file content as not in a particular remote, but agit annex get
orgit annex move
may actually still work.