Recent comments posted to this site:

Directory remotes in offline drives for archiving?

Using offline drives as remotes makes it easy to enable encryption. I can rely on git-annex to encrypt the annexed files instead of setting up block device or file system encryption. The git repo does not need to be cloned to the drive.

Can I move annexed files out of my laptop and into archival drives only? I have multiple drives plugged in via USB to store multiple copies. But it seems git-annex doesn't consider it safe because copies can't be locked down with directory remotes? I'm only moving files into these drives by invoking one instance of git-annex, I'm pretty sure files won't be concurrently removed from the drives. Can I move the files without entirely disabling all safety checks?

Comment by wzhd
comment 2

In a non-export S3 bucket with versioning, fsck also cannot recover from a corrupted object, due to the same problem with the versionId. The same method should work to handle this case.

Comment by joey
comment 1

Note that it would also be possible for a valid object to be sent, but then get corrupted in the remote storage. I don't think that's what happened here.

If that did happen, a similar recovery process is also needed.

Which I think says that focusing on a recovery process, rather than on prevention, is more useful.

Comment by joey
comment 4

yes -- that one is embargoed (can be seen by going to https://dandiarchive.org/dandiset/000675)

And when you replicated the problem from the backup, were you using it in the configuration where it cannot access those?

if I got the question right and since I do not recall now -- judging from me using ( source .git/secrets.env; git-annex import master... I think I was with credentials allowing to access them (hence no errors while importing)

Do you have annex.largefiles configured in this repository, and are all of the affected files non-annexed files?

yes

(venv-annex) dandi@drogon:/mnt/backup/dandi/dandiset-manifests$ grep largefiles .gitattributes
**/.git* annex.largefiles=nothing
* annex.largefiles=((mimeencoding=binary)and(largerthan=0))

and it seems all go into git

(venv-annex) dandi@drogon:/mnt/backup/dandi/dandiset-manifests$ git annex list
here          
|s3-dandiarchive (untrusted)
||web         
|||bittorrent 
||||          

is empty

Comment by yarikoptic
comment 3

yes -- small files, go to git

no, it is a small number of files created/renamed. In this case it is a set of 4 files pre-created empty and closed, and then 3 out of 4 opened for writing by duct and at the end of the process closed, and that original 1 (_info.json) is reopened for writing to dump the record and closed. Then outside tool which ran it takes all of them and renames into the filename with end timestamp. git-annex manages to detect that original 0-sized _info.json one gets removed but does not pick up the new one which gets rapidly renamed into a longer name.

In git log looks like:

commit 65e9f13a882ef78d743fbe634c8e05f9dcb32c45
Author: ReproStim User <changeme@example.com>
Date:   Tue Dec 16 09:44:30 2025 -0500

    git-annex in reprostim@reproiner:/data/reprostim

 Videos/2025/12/2025.12.16-09.30.29.570--.mkv.duct_info.json                         | 0
 Videos/2025/12/2025.12.16-09.30.29.570--2025.12.16-09.44.28.225.mkv                 | 1 +
 Videos/2025/12/2025.12.16-09.30.29.570--2025.12.16-09.44.28.225.mkv.duct_usage.json | 1 +
 Videos/2025/12/2025.12.16-09.30.29.570--2025.12.16-09.44.28.225.mkv.log             | 1 +
 4 files changed, 3 insertions(+)

commit 3fe4710fc058e7d1433637c9af538b3bb9e5ebed
Author: ReproStim User <changeme@example.com>
Date:   Tue Dec 16 09:30:31 2025 -0500

    git-annex in reprostim@reproiner:/data/reprostim

 Videos/2025/12/2025.12.16-09.30.29.570--.mkv.duct_info.json | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

commit f6bb6137c81ef36387ded229a4d8592964530bc8
Author: ReproStim User <changeme@example.com>
Date:   Tue Dec 16 09:30:23 2025 -0500

    git-annex in reprostim@reproiner:/data/reprostim

 Videos/2025/12/2025.12.16-09.29.32.681--.mkv.duct_info.json                         | 0
 Videos/2025/12/2025.12.16-09.29.32.681--2025.12.16-09.30.21.889.mkv                 | 1 +
 Videos/2025/12/2025.12.16-09.29.32.681--2025.12.16-09.30.21.889.mkv.duct_usage.json | 1 +
 Videos/2025/12/2025.12.16-09.29.32.681--2025.12.16-09.30.21.889.mkv.log             | 1 +
 4 files changed, 3 insertions(+)

commit 00444920167e17b429d10fa29df8f1947930152c
Author: ReproStim User <changeme@example.com>
Date:   Tue Dec 16 09:29:34 2025 -0500

    git-annex in reprostim@reproiner:/data/reprostim

 Videos/2025/12/2025.12.16-09.29.32.681--.mkv.duct_info.json | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

Here is a copy of current process: https://www.oneukrainian.com/tmp/daemon-20251216.log

Comment by yarikoptic
passing additional flags to rclone

I'm trying to pass additional flags to rclone, like --bwlimit for example. Not sure how to do that, though. The --whatelse flag tells me they should just be passed by default:

> git annex initremote hetzner type=rclone rcloneremotename=hetzner rcloneprefix=someprefix  encryption=shared chunk=500MiB  --whatelse
embedcreds
    embed credentials into git repository
    (yes or no)
onlyencryptcreds
    only encrypt embedded credentials, not annexed files
    (yes or no)
mac
    how to encrypt filenames used on the remote
    (HMACSHA1 or HMACSHA224 or HMACSHA256 or HMACSHA384 or HMACSHA512)
keyid
    gpg key id
keyid+
    add additional gpg key
keyid-
    remove gpg key
*
    all other parameters are passed to rclone

I tried --bwlimit 3000 and bwlimit=3000, but that gives me invalid option plus help text or git-annex: Unexpected parameters: bwlimit respectively.

Comment by nadir
comment 1

This is not a bug. While it could be moved to todo, anyone can write an external special remote to use this or any other storage system.

So I am closing this bug report.

Comment by joey
comment 6

Actually I have gone ahead an implemented some git-annex-matching-options that will be useful in finding content to drop from the trashbin: --presentsince --lackingsince --changedsince

You might use, for example:

git-annex drop --force --from trashbin \
    --presentsince=trashbin:7d --and --not --changedsince=here:7d

That will match files that were moved to the trashbin 7 days ago, and that have not re-entered the current repository in the time since then.

Comment by joey
comment 5

FWIW, dynamically linked binary is no good either:

[yoh@dbic-mrinbox ~]$ wget https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-amd64.tar.gz
[yoh@dbic-mrinbox ~]$ tar -xzvf git-annex-standalone-amd64.tar.gz 
[yoh@dbic-mrinbox ~]$ cd git-annex.linux/
[yoh@dbic-mrinbox ~/git-annex.linux]$ ls
LICENSE         exe         git-annex       git-core        git-remote-tor-annex    lib         logo_16x16.png      templates
README          extra           git-annex-shell     git-receive-pack    git-shell       lib64           magic           trustedkeys.gpg
bin         gconvdir        git-annex-webapp    git-remote-annex    git-upload-pack     libdirs         runshell        usr
buildid         git         git-annex.MANIFEST  git-remote-p2p-annex    i18n            logo.svg        shimmed
[yoh@dbic-mrinbox ~/git-annex.linux]$ ./git-annex
ELF binary type "3" not known.
exec: /usr/home/yoh/git-annex.linux/exe/git-annex: Exec format error

I will try to assemble build commands later...

Comment by yarikoptic
comment 5

annex.trashbin is implemented.

I am going to close this todo; if it turns out there is some preferred content improvement that would help with cleaning out the trash, let's talk about that on a new todo. But I'm guessing you'll make do with find.

I think I would deliberately want this to be invisible to the user, since I wouldn't want anyone to actively start relying on it.

With a private remote it's reasonably invisible. The very observant user might notice a drop time that scales with the size of the file being dropped and be able to guess this feature is being used. And, if there is some error when it tries to move the object to the remote, the drop will fail. The error message in that case cannot really obscure the fact that annex.trashbin is configured.

Comment by joey