NAME
git-annex fsck - find and fix problems
SYNOPSIS
git annex fsck [path ...]
DESCRIPTION
This command checks annexed files for consistency, and warns about or
fixes any problems found. This is a good complement to git fsck
.
The default is to check all annexed files in the current directory and subdirectories. With parameters, only the specified files are checked.
The problems fsck finds include files that have gotten corrupted, files whose content has somehow become lost, files that do not have the configured number of copies yet made, and keys that can be upgraded to a better format.
OPTIONS
--from=remote
Check a remote, rather than the local repository.
Note that by default, files will be copied from the remote to check their contents. To avoid this expensive transfer, and only verify that the remote still has the files that are expected to be on it, add the
--fast
option.--fast
Avoids expensive checksum calculations (and expensive transfers when fscking a remote).
--incremental
Start a new incremental fsck pass. An incremental fsck can be interrupted at any time, with eg ctrl-c.
--more
Resume the last incremental fsck pass, where it left off.
Resuming may redundantly check some files that were checked before. Any files that fsck found problems with before will be re-checked on resume. Also, checkpoints are made every 1000 files or every 5 minutes during a fsck, and it resumes from the last checkpoint.
--incremental-schedule=time
This makes a new incremental fsck be started only a specified time period after the last incremental fsck was started.
The time is in the form "10d" or "300h".
Maybe you'd like to run a fsck for 5 hours at night, picking up each night where it left off. You'd like this to continue until all files have been fscked. And once it's done, you'd like a new fsck pass to start, but no more often than once a month. Then put this in a nightly cron job:
git annex fsck --incremental-schedule 30d --time-limit 5h
--numcopies=N
Override the normally configured number of copies.
To verify data integrity only while disregarding required number of copies, use
--numcopies=1
.--all
-A
Normally only the files in the currently checked out branch are fscked. This option causes all versions of all files to be fscked.
This is the default behavior when running git-annex in a bare repository.
--branch=ref
Operate on files in the specified branch or treeish.
--unused
Operate on files found by last run of git-annex unused.
--key=keyname
Use this option to fsck a specified key.
matching options
The git-annex-matching-options(1) can be used to control what to fsck.
--jobs=N
-JN
Runs multiple fsck jobs in parallel. For example:
-J4
Setting this to "cpus" will run one job per CPU core.
--json
Enable JSON output. This is intended to be parsed by programs that use git-annex. Each line of output is a JSON object.
--json-error-messages
Messages that would normally be output to standard error are included in the JSON instead.
--quiet
Like all git-annex commands, this option makes only error and warning messages be displayed. This is particularly useful with fsck, which normally displays all the files it's checking even when there is no problem with them.
Also the git-annex-common-options(1) can be used.
SEE ALSO
git-annex(1)
AUTHOR
Joey Hess id@joeyh.name
Warning: Automatically converted into a man page by mdwn2man. Edit with care.
I have old readonly backup media, say something like
tapeA1/apples.txt
tapeA2/apples.txt
tapeB1/earth.svg
tapeB2/earth.svg
I use git-annex special directory remotes to be able to navigate the directory tree that lives on those media (e.g. to decide if and which media I need to find to copy a file from that I need). I added the remotes like so (they are too big to import with content):
At some point I may buy new hardware and recreate those backup media as proper git-annex remotes, but wouldn't it be great to keep the existing backups as long as they show no sign of bitrot and together hold enough copies?
Though, git-annex fsck behaves unexpected: It seems I cannot force trust these remotes nor does
--numcopies=0 --mincopies=0
have the desired effect.Concretely, when calling
git annex fsck --from=tapeA1 --numcopies=0 --mincopies=0 --trust=tapeA1 --force
, for every file that is still intact on tapeA1, git-annex fsck reports a failure as followswhile I'd be happy to (semi)trust tapeA1 or to accept no copies whatsoever. So fsck ignores
--trust=tapeA1 --force
and/or--numcopies=0 --mincopies=0
which are common git-annex options that should work for fsck?Ideally, I would be able to (semi)trust my readonly tape remotes (which likely should be behind a
--force
as it may lead to data loss in classical directory remote settings). Then I can use git-annex to index those tapes, but also to monitor their health via fsck (so I can over the years replace the tapes that are showing signs of corruption).As for the corruption, I emulated bitrot on a test directory remote, which then leads to a fsck failure as follows:
This suffices to detect tapes that should be replaced, and it's kinda expected that files cannot be dropped.
Somehow fsck does not work as I would expect -- am I misunderstanding the numcopies/mincopies arguments here? Is there really no way to force-trust a directory remote, which to me seems appropriate in this case? Is there another way to achieve what I have in mind with git-annex?
Thanks for this great piece of software – also use the assistant in another day-to-day usecase and it's simply great!
appendonly=yes
for the special directory remote would likely help in my scenario.importtree=yes remotes are always untrusted. The reason is that something else is assumed to be writing to those remotes, which is what populates them with files. And that could delete or change any file at any time. So if git-annex didn't untrust the remote, and relied on it to hold the only copy of a file, such a change would cause data loss.
There would need to be a new config setting to add the concept of guaranteed readonly importtree=yes remotes.
git-annex does not allow --numcopies to be set to 0 as that can cause data loss.