Please describe the problem.
I have a main annex with ~2TB of data. In the past is was using SHA256 then I migrated to SHA256E . Recently it was becoming quite full so I took some spare HD and cloned it and moved data from the main to the spares. To my surprise, the main annex disk usage did not go down a bit.
It took me some time to understand why . The problem is exemplified by the shell script http://mennucc1.debian.net/git-annex/git-annex-no-dedup.sh .
In short, if a annex is migrated to a new backend and afterwards files are moved, then the hardlinks are broken, and disk usage doubles.
What steps will reproduce the problem?
run above script
What version of git-annex are you using? On what operating system?
5.20141125 on Debian Jessie amd64
Please provide any additional information below.
Of course a simple solution would be to drop all unused files. This is ugly , though, because it does not distinguish between (1) unused files that are previous copies of files I care about (2) unused files that are due to the problem described in the example, and that I do not care about.
A more complex but more elegant solution would be:
(a) when a file is migrated , the old and new objects in the annex are hardlinked; moreover two symlinks should be creates, so that git-annex knows at a glance which two files are hardlinked (see http://mennucc1.debian.net/git-annex/cross_links.txt for example)
(b) when moving of copying files, all hardlinked versions whould be move/copied
(c) when dropping , an option may be used to specify if all hardlinked versions should be dropped alltogether
bye
and thanks, A.
ps
I tried to attach two files to this bug report but failed
I don't feel that migration is an important enough feature to complicate the rest of git-annex with special handling of multiple keys that point to the same content.
You could have used
mv
in your use case to move the repo to the new drive while preseving hard links.What might be useful is for
git annex migrate
to write a list of the old keys someplace. These could then be dropped when the user wants to get rid of them, with mixing them up with other unused files. Although if you care about old versions of files and don't want to drop them as unused, it seems to me you'd also want to be able to access the old keys from before the migration.hi Joey, thanks for your interest;
I created a script to address this problem
http://mennucc1.debian.net/git-annex/git-annex_de-re-link_hash-E
it can relink keys that are not hardlinked anymore (-L option) ; it can use an unsed key to recreate a missin key (-R option) ; it can also drop redundant keys (-D option)
a.