Normally, when you clone a git-annex repository, and use git-annex in it, and then push or otherwise send changes back to origin, information gets committed to the git-annex branch about your clone. Things like the annexed files that are in it, its description, etc.
If you don't want the world to know about your clone, either for privacy reason or only because the clone is a temporary copy of the repository, here's how.
Recently git-annex got a new config setting, annex.private
.
Set it before you start using git-annex in a repository, and git-annex
will avoid recording any information about the repository into the
git-annex branch.
git clone ssh://... myclone
cd myclone
git config annex.private true
git annex init
Now you can use git-annex as usual, adding files to the repository, getting the contents of files, etc.
When you push changes back to origin, do still push the git-annex branch, since git-annex still uses it to record anything it needs to keep track of that does not involve your private repository.
And be sure, when adding or editing annexed files, that you git-annex copy
them to a publically accessible repository. Otherwise, to everyone else,
there will seem to be no copies of that file availble anywhere, since they
won't know about your private repo's copy.
private special remotes
You can also make private special remotes, by using git annex initremote
--private
.
Like a private repository, git-annex avoids storing any information about a private special remote to the git-annex branch. It will only be available in the repository where the special remote was created.
Bear in mind that, if you lose the repository where the private special remote was created, you'll lose the information git-annex needs to access that special remote, and that will likely mean you'll not be able to recover any files stored in it.
private git remotes
When the git config "remote.name.private" is set, git-annex will avoid
recording anything in the git-annex branch about the remote. This is
set by git-annex initremote --private
, and could also be set for
git remotes. This may be useful if, for example, you are trying to deduplicate content,
bifurcate repositories, or reinject it using a temporary annex as a staging area.
Git annex is excellent for these tasks because it naturally hashes all file content,
therefore if a 'copy' appears in one repo that should belong in another, you can drop
its content or move it to deduplicate. However, in this case, git-annex logging a relationship
between the two repos is undesirable. Especially if the repos are otherwise unrelated or
one of them is temporary (to be deleted once emptied), remote.private
is preferrable to declaring
the repo dead and doing a forget --drop-dead --force
operation.
where the data is actually stored
The private data gets stored in .git/annex/journal-private/ rather than in the git-annex branch.
The instructions indicate that
annex.private
should be set in the local repository configuration.However, the following approach is also a possibility:
It seems this repository was in private mode when it was initialized (expected). What is the implication of the switch not being permanent in the config? And by extension: what are the implications of removing the switch later in the lifetime of a repository clone?
I'm sure that the private information will not leak out from
.git/annex/journal-private/
into the git-annex branch after annex.private is unset. The design ensures this because, when making a change to the branch, it only reads the private journal file when the repository whose information is being changed is private.However, when git-annex does not have any private repositories configured, an optimisation makes it skip trying to read from the private journal. So information about those repositories, that were private, will no longer be read.
This effect is easy to see, for example:
I think this could be improved, eg it could check once if the private journal exists and if so read from it even when no private uuids are currently configured. A single stat to support this would be ok; the goal was to avoid checking nonexistany files on every branch read when private repositories are not used.
Configuring any remote with annex-private can be used to work around that problem, that lets it read information about all previously-private repositories as well.