cloning a repository privately

Normally, when you clone a git-annex repository, and use git-annex in it, and then push or otherwise send changes back to origin, information gets committed to the git-annex branch about your clone. Things like the annexed files that are in it, its description, etc.

If you don't want the world to know about your clone, either for privacy reason or only because the clone is a temporary copy of the repository, here's how.

Recently git-annex got a new config setting, annex.private. Set it before you start using git-annex in a repository, and git-annex will avoid recording any information about the repository into the git-annex branch.

git clone ssh://... myclone
cd myclone
git config annex.private true
git annex init

Now you can use git-annex as usual, adding files to the repository, getting the contents of files, etc.

When you push changes back to origin, do still push the git-annex branch, since git-annex still uses it to record anything it needs to keep track of that does not involve your private repository.

And be sure, when adding or editing annexed files, that you git-annex copy them to a publically accessible repository. Otherwise, to everyone else, there will seem to be no copies of that file availble anywhere, since they won't know about your private repo's copy.

private special remotes

You can also make private special remotes, by using git annex initremote --private.

Like a private repository, git-annex avoids storing any information about a private special remote to the git-annex branch. It will only be available in the repository where the special remote was created.

Bear in mind that, if you lose the repository where the private special remote was created, you'll lose the information git-annex needs to access that special remote, and that will likely mean you'll not be able to recover any files stored in it.

private git remotes

When the git config "remote.name.private" is set, git-annex will avoid recording anything in the git-annex branch about the remote. This is set by git-annex initremote --private, and could also be set for git remotes. This may be useful if, for example, you are trying to deduplicate content, bifurcate repositories, or reinject it using a temporary annex as a staging area. Git annex is excellent for these tasks because it naturally hashes all file content, therefore if a 'copy' appears in one repo that should belong in another, you can drop its content or move it to deduplicate. However, in this case, git-annex logging a relationship between the two repos is undesirable. Especially if the repos are otherwise unrelated or one of them is temporary (to be deleted once emptied), remote.private is preferrable to declaring the repo dead and doing a forget --drop-dead --force operation.

where the data is actually stored

The private data gets stored in .git/annex/journal-private/ rather than in the git-annex branch.

RSS Atom

What about temporary annex.private declaration?

The instructions indicate that annex.private should be set in the local repository configuration.

However, the following approach is also a possibility:

❯ mkdir priv
❯ cd priv
❯ git init
Initialized empty Git repository in /tmp/priv/.git/

❯ git -c annex.private=1 annex init
init  ok

❯ ls .git/annex/journal-private
uuid.log

❯ cat .git/config
[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[annex]
        uuid = 955373ac-6044-493e-a696-1a706437b542
        version = 10
[filter "annex"]
        smudge = git-annex smudge -- %f
        clean = git-annex smudge --clean -- %f
        process = git-annex filter-process

It seems this repository was in private mode when it was initialized (expected). What is the implication of the switch not being permanent in the config? And by extension: what are the implications of removing the switch later in the lifetime of a repository clone?

Comment by mih — Tue Nov 7 15:49:47 2023

Remove comment

Re: What about temporary annex.private declaration?

I'm sure that the private information will not leak out from .git/annex/journal-private/ into the git-annex branch after annex.private is unset. The design ensures this because, when making a change to the branch, it only reads the private journal file when the repository whose information is being changed is private.

However, when git-annex does not have any private repositories configured, an optimisation makes it skip trying to read from the private journal. So information about those repositories, that were private, will no longer be read.

This effect is easy to see, for example:

joey@darkstar:~/tmp/xxx>git-annex whereis
whereis foo (1 copy)
    ff1f0bbd-7be6-45ff-8c90-fd322820b717 -- joey@darkstar:~/tmp/xxx [here]
ok
joey@darkstar:~/tmp/xxx>git config annex.private false
joey@darkstar:~/tmp/xxx>git-annex whereis
whereis foo (0 copies) failed
whereis: 1 failed

I think this could be improved, eg it could check once if the private journal exists and if so read from it even when no private uuids are currently configured. A single stat to support this would be ok; the goal was to avoid checking nonexistany files on every branch read when private repositories are not used.

Configuring any remote with annex-private can be used to work around that problem, that lets it read information about all previously-private repositories as well.

Comment by joey — Wed Nov 29 16:50:05 2023

Remove comment

"Buyer's remorse"

If you initialise a remote as private, and then change your mind, is there anyway to promote that remote and it's metadata back to a public scoping?

Comment by beryllium — Mon Apr 28 05:53:47 2025

Remove comment

Add a comment