Recent comments posted to this site:
Using offline drives as remotes makes it easy to enable encryption. I can rely on git-annex to encrypt the annexed files instead of setting up block device or file system encryption. The git repo does not need to be cloned to the drive.
Can I move annexed files out of my laptop and into archival drives only? I have multiple drives plugged in via USB to store multiple copies. But it seems git-annex doesn't consider it safe because copies can't be locked down with directory remotes? I'm only moving files into these drives by invoking one instance of git-annex, I'm pretty sure files won't be concurrently removed from the drives. Can I move the files without entirely disabling all safety checks?
In a non-export S3 bucket with versioning, fsck also cannot recover from a corrupted object, due to the same problem with the versionId. The same method should work to handle this case.
Note that it would also be possible for a valid object to be sent, but then get corrupted in the remote storage. I don't think that's what happened here.
If that did happen, a similar recovery process is also needed.
Which I think says that focusing on a recovery process, rather than on prevention, is more useful.
yes -- that one is embargoed (can be seen by going to https://dandiarchive.org/dandiset/000675)
And when you replicated the problem from the backup, were you using it in the configuration where it cannot access those?
if I got the question right and since I do not recall now -- judging from me using ( source .git/secrets.env; git-annex import master... I think I was with credentials allowing to access them (hence no errors while importing)
Do you have annex.largefiles configured in this repository, and are all of the affected files non-annexed files?
yes
(venv-annex) dandi@drogon:/mnt/backup/dandi/dandiset-manifests$ grep largefiles .gitattributes
**/.git* annex.largefiles=nothing
* annex.largefiles=((mimeencoding=binary)and(largerthan=0))
and it seems all go into git
(venv-annex) dandi@drogon:/mnt/backup/dandi/dandiset-manifests$ git annex list
here
|s3-dandiarchive (untrusted)
||web
|||bittorrent
||||
is empty
yes -- small files, go to git
no, it is a small number of files created/renamed. In this case it is a set of 4 files pre-created empty and closed, and then 3 out of 4 opened for writing by duct and at the end of the process closed, and that original 1 (_info.json) is reopened for writing to dump the record and closed. Then outside tool which ran it takes all of them and renames into the filename with end timestamp. git-annex manages to detect that original 0-sized _info.json one gets removed but does not pick up the new one which gets rapidly renamed into a longer name.
In git log looks like:
commit 65e9f13a882ef78d743fbe634c8e05f9dcb32c45
Author: ReproStim User <changeme@example.com>
Date: Tue Dec 16 09:44:30 2025 -0500
git-annex in reprostim@reproiner:/data/reprostim
Videos/2025/12/2025.12.16-09.30.29.570--.mkv.duct_info.json | 0
Videos/2025/12/2025.12.16-09.30.29.570--2025.12.16-09.44.28.225.mkv | 1 +
Videos/2025/12/2025.12.16-09.30.29.570--2025.12.16-09.44.28.225.mkv.duct_usage.json | 1 +
Videos/2025/12/2025.12.16-09.30.29.570--2025.12.16-09.44.28.225.mkv.log | 1 +
4 files changed, 3 insertions(+)
commit 3fe4710fc058e7d1433637c9af538b3bb9e5ebed
Author: ReproStim User <changeme@example.com>
Date: Tue Dec 16 09:30:31 2025 -0500
git-annex in reprostim@reproiner:/data/reprostim
Videos/2025/12/2025.12.16-09.30.29.570--.mkv.duct_info.json | 0
1 file changed, 0 insertions(+), 0 deletions(-)
commit f6bb6137c81ef36387ded229a4d8592964530bc8
Author: ReproStim User <changeme@example.com>
Date: Tue Dec 16 09:30:23 2025 -0500
git-annex in reprostim@reproiner:/data/reprostim
Videos/2025/12/2025.12.16-09.29.32.681--.mkv.duct_info.json | 0
Videos/2025/12/2025.12.16-09.29.32.681--2025.12.16-09.30.21.889.mkv | 1 +
Videos/2025/12/2025.12.16-09.29.32.681--2025.12.16-09.30.21.889.mkv.duct_usage.json | 1 +
Videos/2025/12/2025.12.16-09.29.32.681--2025.12.16-09.30.21.889.mkv.log | 1 +
4 files changed, 3 insertions(+)
commit 00444920167e17b429d10fa29df8f1947930152c
Author: ReproStim User <changeme@example.com>
Date: Tue Dec 16 09:29:34 2025 -0500
git-annex in reprostim@reproiner:/data/reprostim
Videos/2025/12/2025.12.16-09.29.32.681--.mkv.duct_info.json | 0
1 file changed, 0 insertions(+), 0 deletions(-)
Here is a copy of current process: https://www.oneukrainian.com/tmp/daemon-20251216.log
I'm trying to pass additional flags to rclone, like --bwlimit for example. Not sure how to do that, though. The --whatelse flag tells me they should just be passed by default:
> git annex initremote hetzner type=rclone rcloneremotename=hetzner rcloneprefix=someprefix encryption=shared chunk=500MiB --whatelse
embedcreds
embed credentials into git repository
(yes or no)
onlyencryptcreds
only encrypt embedded credentials, not annexed files
(yes or no)
mac
how to encrypt filenames used on the remote
(HMACSHA1 or HMACSHA224 or HMACSHA256 or HMACSHA384 or HMACSHA512)
keyid
gpg key id
keyid+
add additional gpg key
keyid-
remove gpg key
*
all other parameters are passed to rclone
I tried --bwlimit 3000 and bwlimit=3000, but that gives me invalid option plus help text or git-annex: Unexpected parameters: bwlimit respectively.
This is not a bug. While it could be moved to todo, anyone can write an external special remote to use this or any other storage system.
So I am closing this bug report.
Actually I have gone ahead an implemented some
git-annex-matching-options that will be useful
in finding content to drop from the trashbin:
--presentsince --lackingsince --changedsince
You might use, for example:
git-annex drop --force --from trashbin \
--presentsince=trashbin:7d --and --not --changedsince=here:7d
That will match files that were moved to the trashbin 7 days ago, and that have not re-entered the current repository in the time since then.
FWIW, dynamically linked binary is no good either:
[yoh@dbic-mrinbox ~]$ wget https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-amd64.tar.gz
[yoh@dbic-mrinbox ~]$ tar -xzvf git-annex-standalone-amd64.tar.gz
[yoh@dbic-mrinbox ~]$ cd git-annex.linux/
[yoh@dbic-mrinbox ~/git-annex.linux]$ ls
LICENSE exe git-annex git-core git-remote-tor-annex lib logo_16x16.png templates
README extra git-annex-shell git-receive-pack git-shell lib64 magic trustedkeys.gpg
bin gconvdir git-annex-webapp git-remote-annex git-upload-pack libdirs runshell usr
buildid git git-annex.MANIFEST git-remote-p2p-annex i18n logo.svg shimmed
[yoh@dbic-mrinbox ~/git-annex.linux]$ ./git-annex
ELF binary type "3" not known.
exec: /usr/home/yoh/git-annex.linux/exe/git-annex: Exec format error
I will try to assemble build commands later...
annex.trashbin is implemented.
I am going to close this todo; if it turns out there is some preferred
content improvement that would help with cleaning out the trash, let's talk
about that on a new todo. But I'm guessing you'll make do with find.
I think I would deliberately want this to be invisible to the user, since I wouldn't want anyone to actively start relying on it.
With a private remote it's reasonably invisible. The very observant user might notice a drop time that scales with the size of the file being dropped and be able to guess this feature is being used. And, if there is some error when it tries to move the object to the remote, the drop will fail. The error message in that case cannot really obscure the fact that annex.trashbin is configured.