forum/Use on large media collection without modifying itgit-annexhttp://git-annex.branchable.com/forum/Use_on_large_media_collection_without_modifying_it/git-annexikiwiki2023-11-05T21:32:17Zcomment 1http://git-annex.branchable.com/forum/Use_on_large_media_collection_without_modifying_it/comment_1_76307d95cf46992fbc5f084f9c056edc/Lukey2022-09-03T21:32:29Z2022-09-03T21:32:29Z
<p>First of all you really want to look into/migrate to reflink-capable filesystems like XFS or btrfs.</p>
<p>I don't know why you'd need to use the rsync special-remote for case #2. You create git-annex repos on your usb drive,
add the existing collection as a directory special-remote with <code>--import-tree</code> and import everything. Then you clone the
repo to your laptop and can <code>git annex sync/get/copy</code> from the usb drive however you like. I think you can even <code>git annex enableremote</code> the
import special-remote on your laptop, and then git-annex will get files directly from it. Heck, you could even <code>git annex import --no-content</code>
and only have the file metadata imported, but none of the content actually stored in git-annex and then you can selectively <code>git annex get</code> files directly from the special-remote.</p>
<p>Also, you may want to set <code>git annex config --set annex.dotfiles true</code> on each of you repos. All of these options are documented in the <a href="http://git-annex.branchable.com/git-annex/">git-annex</a> manpage (also look at the <a href="http://git-annex.branchable.com/git-annex-config/">git-annex-config</a> manpage).</p>
comment 2http://git-annex.branchable.com/forum/Use_on_large_media_collection_without_modifying_it/comment_2_b89b598844b0709a5b6709d0fb2ef60c/jgoerzen2022-09-03T22:45:42Z2022-09-03T22:45:42Z
<p>Thank you for these thoughts!</p>
<p>I should have mentioned that I intend the USB drives to often live offsite, so they would be disconnected. You are quite correct, though, that if they are onsite I could think of them as the sort of "hub" repository and do everything from them like that.</p>
<p>Doing the enableremote for the special directory remote on the laptop does require it to be mounted as a filesystem there, hence my mention of sshfs. That can work but is a bit clunky.</p>
comment 3http://git-annex.branchable.com/forum/Use_on_large_media_collection_without_modifying_it/comment_3_2d707dc516dad666fb2a647f65fcedcf/jgoerzen2022-09-03T23:26:27Z2022-09-03T23:26:27Z
Forgot to mention - I'm on ZFS, which while it is a CoW filesystem, doesn't support cp --reflink. For various reasons, a migration to ZFS or btrfs isn't very practical for me.
comment 4http://git-annex.branchable.com/forum/Use_on_large_media_collection_without_modifying_it/comment_4_4529364f2919bd05f53da94cf8ba4268/unqueued2023-11-05T21:32:17Z2023-11-05T21:32:17Z
<p>Just putting this out there, but if you are on ZFS or BTRFS, you can just duplicate the subvolume/dataset, remove what you want, and send it. It will by default verify your data integrity, and it is often faster.</p>
<p>On BTRFS, it is easy to <code>btrfs sub create send.RW; cp --reflink=always .git/annex/objects send.RW; btrfs sub snap -r send.RW send.RO; btrfs sub del send.RW</code></p>
<p>Then, on the target, I can reflink copy into the target repo's .git/annex/objects, and the <code>git annex fsck --all --fast</code>, since the send operation verified the integrity.</p>
<p>Sometimes, if the target repo does not exist, I can take a snapshot of an entire repo, and then enter it, then re-init it with the target uuid, force drop what I don't want, and then send it. If you're dealing with hundreds of thousands of files, it can be more practical to do that.</p>
<p>If you want to verify the integrity of an annexed file on ZFS or BTRFS, all you have to do is read it, and let the filesystem verify the checksums for you.</p>
<p>If you want a nice progress display, you can just do <code>pv myfile > /dev/null</code></p>
<p>I considered making a git-annex-scrub script that would check if the underlying fs supports integrity verification, then just read the file and update the log.</p>
<p>BTRFS uses hardware accelerated crc32, which is fine for bitrot, but it is not secure from intentional tampering.</p>