I'm looking for a way to count/track annex data copied to LTO tapes.
In my use case I'm splitting many terabytes of data to separate repositories on several hard drives, backing them up on tapes, and keeping track of all this with remotes in a central repository (with lots of dropped content). All good with the HDs, but I'm at a loss about how to handle the copies on tapes.
Three alternatives I've come up with so far:
- Simply tar the repositories from the HDs to the tapes. Problem: no way to notify git annex of the existence of these manual copies. Or is there?
- Remote (special or normal) on LTFS (linear posix compatible file system on top of tape). Problems:
git annex geting a dropped directory from there would cause files to be accessed in random order, right? Or is the retrieve guaranteed to happen in the same order as the files in the directory were written by
git annex copy?
- LTFS has a big block size (512KB) => wasted space when lots of small files. (Not a major problem, though.)
- Write a special (read-only) remote hook for
git annex getwould make one hook RETRIEVE request per file, leading to random access again, while the only effective way would be to get a list of all files to be retrieved, and then returning them in the order they turn up from the tar package (or even ingest the whole tar file to .git/annex/).