Annexed data is stored inside your git repository's .git/annex
directory.
Some special remotes can store annexed data elsewhere.
It's important that data not get lost by an ill-considered git-annex drop
command. So, git-annex can be configured to try to keep a number of copies
of a file's content available across all repositories.
By default, it keeps 1 copy; this is configured by running git-annex
numcopies N
, or can be overridden on a per-file-type basis by the
annex.numcopies setting in .gitattributes
files. The --numcopies switch
allows temporarily using a different value.
When dropping content, git-annex checks with remotes to make sure If enough other repositories cannot be verified to have copies, it will refuse to drop it, avoid data loss.
In unusual situations, involving special remotes that do not support
locking, and concurrent drops of the same content from multiple
repositories, git-annex may violate the numcopies setting. It still
guarantees at least 1 copy is preserved. This can be configured by
running git-annex mincopies N
or can be overridden on a per-file-type
basis by the annex.mincopies setting in .gitattributes
files.
The --mincopies switch allows temporarily using a different value.
Note that [trusted repositories|trust]] are assumed to continue to contain content, so checking them is skipped. So dropping content from trusted repositories does risk numcopies and mincopies later being violated.
To express more detailed requirements about which repositories contain which content, see required content.
example
For example, consider three repositories: Server, Laptop, and USB. Both
Server and USB have a copy of a file, and numcopies is 1. If on Laptop, you
git-annex get $file
, this will transfer it from either Server or USB
(depending on which is available), and there are now 3 copies of the file.
Suppose you want to free up space on Laptop again, and you git-annex drop
the file there. If USB is connected, or Server can be contacted, git-annex
can check that it still has a copy of the file, and the content is removed
from Laptop. But if USB is currently disconnected, and Server also cannot
be contacted, it can't verify that it is safe to drop the file, and will
refuse to do so.
With numcopies of 2, in order to drop the file content from Laptop, it would need access to both USB and Server.
I'm looking for a way to express that remotes may be of differing reliability. E.g. some remotes are on RAIDs that are robust to disk failures, but others are on single, possibly external disks. I would like copies of a file on a reliable remote to count for more than those on an unreliable one, so that e.g. git-annex requires two copies on single disks but allows one on a RAID.
Is there a way to do this via the copies logic? Or some other method?
git annex can't have multiple copies of the same file in one repo. And if your disk fails, a second copy doesn't really help, does it? But you can achieve what you want with either:
git annex untrust repo-on-normal-disk
, this will assume your normal disks might fail anytime and will put files elsewhere as wellgit annex group mydisk1 baddisks
andgit annex group mydisk2 baddisks
git annex groupwanted baddisks ...
git annex wanted myrepo1 groupwanted
trust
could be like implemented in a way similar tocost
. It might open up possibilities for setting a required level of trust for a given file, adding up the various remotes to a sum.@strmd, there is a todo relevant to this, repositories that count as more than one copy.
I think it's a reasonable idea, but there are probably details that need to be thought through.