It's quite hard to delete a file from a git repository once it's checked in and pushed to origin. This is normally ok, since git repositories contain mostly small files, and a good thing since losing hard work stinks.
With git-annex this changes some: Very large files can be managed with git-annex, and it's not uncommon to be done with such a file and want to delete it. So, git-annex provides a number of ways to handle this, while still trying to avoid accidental foot shooting that would lose the last copy of an important file.
the garbage collecting method
In this method, you just remove annexed files whenever you want, and commit the changes. This is probably the most natural way to go.
You can do this the same way you would in a regular git repository. For example, git rm foo; git commit -m "removed foo"
. This leaves the contents of the files still in the annex, not really deleted yet.
Either way, deleting files can leave some garbage lying around in either the local repository, or other repositories that contained a copy of the content of the file you deleted. Eventually you'll want to free up some disk space used by one of these repositories, and then it's time to take out the garbage.
To collect the garbage, you can run git annex unused
inside the repository which you want to slim down. That will list files stored in the annex that are not used by any git branches or tags. Followed by git annex dropunused 1-10
to delete a range of the unused files from the annex.
In recent versions of git-annex, git annex dropunused
checks that enough other copies of a file's content exist in other repositories before deleting it, so this won't ever delete the last copy of some file. This is a good default, because these unused files are still referred to by some commits in the git history, and you might want to retain the full history of every version of a file.
But, let's say you don't care about that, you only want to keep files that are in use by branches and tags. Then you can use git annex dropunused --force
with a range of files, which will delete them even if it's the last copy.
Finally, sometimes you want to remove unused files from a special remote. To accomplish this, pass --from remotename
to the unused and dropunused commands, and they will act on
files stored in that remote, rather than on the local repository.
let the assistant take care of it
If you're using the git-annex assistant, you don't normally need to worry about this. Just delete files however you normally would. The assistant will try to migrate unused file contents away from your local repository and store them in whatever backup repositories you've set up.
delete all the copies method
You have a file. You want that file to immediately vanish from the face of the earth to the best of your abilities.
Note that, since git-annex deduplicates files by default, any files with the same content will be removed by these commands.
git annex drop --force file
git annex whereis file
git annex drop --force file --from $repo
repeat for each repository listed by the whereis commandrm file; git annex sync
Of course, if you have offline backup repositories that contain this file, you'll have to bring them online before you can drop it from them, etc.
Hey there, love git annex!
I have a (silly?) question:
git annex dropunused
tells me that there are not enough copies. I agree it is a good thing or at least might be in some cases.The problem is this: I have synced and used
git annex copy --to
for all my repositories and still I have unused files thatdropunused
will not delete, since it finds no copies.Am I doing something wrong here? Why are these (now) unused files not copied and therefore okay to delete locally?
git annex sync/copy/move/etc.
don't operate on old/unused files by default. You have to use the--all
option to operate on every file (even old/unused ones) or the--unused
option to operate only on files found bygit annex unused
.