I have a large git repository with binary files scattered over different branches. I want to switch to git-annex mainly for performance reasons, but I don't want to loose my history.
I tried to rewrite the (cloned) repository with git-filter-branch but failed miserably for several reasons:
- --tree-filter performs its operations in a temporary directory (.git-rewrite/t/) so the symlinks point to the wrong destination (../../.git/annex/).
- annex log files are stored in .git-annex/ instead of .git-rewrite/t/.git-annex/ so the filter operation misses them
Any suggestions how to proceed?
EDIT 3/2/2010
I finally got it working for my purposes. Hardest part was preserving the branches while injecting the new git annex setup
base commit.
Clone repository
git clone original migrate
cd migrate
git checkout mybranch
git checkout master
git remote rm origin
Inject git annex setup
base commit and repair branches
git symbolic-ref HEAD refs/heads/newroot
git rm --cached *
git clean -f -d
git annex init master
echo \*.rpm annex.backend=SHA1 >> .gitattributes
git commit -m "store rpms in git annex" .gitattributes
git cherry-pick $(git rev-list --reverse master | head -1)
git rebase --onto newroot newroot master
git rebase --onto master mybranch~1 mybranch
git branch -d newroot
Migrate repository
mkdir .temp
cp .git-annex/* .temp/
MYWORKDIR=$(pwd) git filter-branch \
--tag-name-filter cat \
--tree-filter '
mkdir -p .git-annex;
cp ${MYWORKDIR}/.temp/* .git-annex/;
for rpm in $(git ls-files | grep "\.rpm$"); do
echo;
git annex add $rpm;
annexdest=$(readlink $rpm);
if [ -e .git-annex/$(basename $annexdest).log ]; then
echo "FOUND $(basename $annexdest).log";
else
echo "COPY $(basename $annexdest).log";
cp ${MYWORKDIR}/.git-annex/$(basename $annexdest).log .git-annex/;
cp ${MYWORKDIR}/.git-annex/$(basename $annexdest).log ${MYWORKDIR}/.temp/;
fi;
ln -sf ${annexdest#../../} $rpm;
done;
git reset HEAD .git-rewrite;
:
' -- $(git branch | cut -c 3-)
rm -rf .temp
git reset --hard
TODO:
- Find a way to repair branches automatically (detect branch points and run appropriate
git rebase
commands)
I'll be happy to try any suggestions to improve this migration script.
P.S. Is there a way to edit comments?
I don't know how to approach this yet, but I support the idea -- it would be great if there was a tool that could punch files out of git history and put them in the annex. (Of course with typical git history rewriting caveats.)
Sounds like it might be enough to add a switch to git-annex that overrides where it considers the top of the git repository to be?
My current workflow looks like this (I'm still experimenting):
Create backup clone for migration
Inject git annex initialization at repository base
Start migration with tree filter
There are still some drawbacks:
It should sufficient to honor GIT_DIR/GIT_WORK_TREE/GIT_INDEX_FILE environment variables. git filter-branch sets GIT_WORK_TREE to ., but this can be mitigated by starting the filter script with 'GIT_WORK_TREE=$(pwd $GIT_WORK_TREE)'. E.g. GIT_DIR=/home/tyger/repo/.git, GIT_WORK_TREE=/home/tyger/repo/.git-rewrite/t, then git annex should be able to compute the correct relative path or maybe use absolute pathes in symlinks.
Another problem I observed is that git annex add automatically commits the symlink; this behaviour doesn't work well with filter-tree. git annex commits the wrong path (.git-rewrite/t/LINK instead of LINK). Also filter-tree doesn't expect that the filter script commmits anything; new files in the temporary work tree will be committed by filter-tree on each iteration of the filter script (missing files will be removed).