I do not know yet if that it possible since I have only cursory knowledge of former grafts in git and what new came to replace them.
The use case: we have a git-annex repo (heudiconv converted MRI data) where some files with sensitive information were added directly to git not git-annex. We do not want to rewrite the entire history since that repository already saw a good number of clones and forks. I wondered if we could
- move those files under git-annex
- establish a new history from that new tree object
- "graft" new history commit into the old "sensitive" one as to establish correspondence between the two
- I would expect then something like "git pull --ff-only" to work for people seamlessly jumping between those points
- git push only new history to github thus not revealing old history with sensitive data under git
Joey, WDYT? any words of wisdom?
Well
git replace
does turn out to be able to do this. With some caveats.I started with this history:
Then amended 6df78e, following the usual process to convert git to annexed, as outlined in largefiles, producing commit f36b42e.
Then
git replace 6df78e f36b42e
and the log changed to:Note that, when there are further commits made on top of the bad commit, they all would need to be replaced with amended commits as well.
I'd already pushed to origin before this. And origin still showed the bad commit in its log. But pushing the replace refs fixed that:
Now looking at the origin repo showed the same amended history.
In another clone, the old history is still visible, but that can be fixed, by running:
Then I deleted the git object for 6df78e from the origin repo, along with the blob object that contained the content of the file accidentially added to git.
This is where it fell down, because cloning from that origin repo then fails:
But, that can be worked around. Just need to make an additional commit on top of the replaced commit, so git clone will see a commit that has not been deleted. I did that, and:
So, you can do this if you're ok with clones needing to manually fetch the replace refs in order to access the replaced history.
And of course, existing clones need to be manually updated to fetch the replace refs. And probably ought to have the bad objects deleted out of their .git/objects/ to avoid accidental data leakage.
I'd also caution that, if the history you rewrite with
git replace
contains a lot of commits, the number of refs in refs/replace/* could get large, and a large number of git refs can be innefficient in various ways.Oh and I doubt github lets you delete .git/object files from a repo hosted there, so you might need to delete the repo and re-create it.
Given how github shares git object stores amoung forks, even that might not be enough to eliminate the objects from it.
A simpler approach is to make a redacted history, publish that, and locally replace the redacted history with the unredacted history.
git merge --squash unredacted-master
and the convert the problem file to annexed before committing)git replace master unredacted-master
That last step lets you locally access all your unredacted history locally, but pushes of further changes to master will still push the redacted history.
You can do the same git replace in each existing clone of the repository and keep on referring to the unredacted history in those, while publishing the redacted history.
Wrote up a tip with an even more polished version of that, redacting history by converting git files to annexed