tipsgit-annexhttp://git-annex.branchable.com/tips/git-annexikiwiki2024-02-10T16:50:02Zredacting history by converting git files to annexedhttp://git-annex.branchable.com/tips/redacting_history_by_converting_git_files_to_annexed/2024-02-10T16:50:02Z2024-02-10T16:50:02Z
<p>git-annex can be used to let people clone a repository, without being
able to access the content of annexed files whose content you want to keep
private.</p>
<p>But what to do if you're using a repository like that, and accidentially
add a file to git that you intended to keep private? And you didn't notice
for a while and made other changes.</p>
<p>Here's a way to recover from that mistake without throwing away the commits
you made. It creates a separate, redacted history where the private
file (<code>$privatefile</code>) is an annexed file. And uses <code>git replace</code> to let you
locally keep using the unredacted history.</p>
<p>Start by identifiying the parent commit of the commit that added the
private file to git (<code>$lastsafecommit</code>).</p>
<p>Reset back to <code>$lastsafecommit</code> and squash in all changes made since then:</p>
<pre><code>git branch unredacted-master master
git reset --hard $lastsafecommit
git merge --squash unredacted-master
</code></pre>
<p>Then convert <code>$privatefile</code> to an annexed file:</p>
<pre><code>git rm --cached $privatefile
git annex add --force-large $privatefile
</code></pre>
<p>Commit the redacted version of master, and locally replace it with your
original unredacted history.</p>
<pre><code>git commit
git replace master unredacted-master
</code></pre>
<p>Now you can push master to other remotes, and it will push the redacted
form of master:</p>
<pre><code>git push --force origin master
</code></pre>
<p>(Note that, if you already pushed the unredacted commits to origin, this
push will not necessarily completely delete the private content from it.
Making a new origin repo and pushing to that is an easy way to be sure.)</p>
<p>If you do want to share the unredacted history with any other repositories,
you can, by fetching the replacement refs into them:</p>
<pre><code>git fetch myhost:myrepo 'refs/replace/*'
git fetch myhost:myrepo unredacted-master
git replace master unredacted-master
</code></pre>
<p>Note that the above example makes the redacted history contain a single
squashed commit, but this technique is not limited to that. You can make
redacted versions of individual commits too, and build up whatever form of
redacted history you want to publish.</p>
using nested git repositorieshttp://git-annex.branchable.com/tips/using_nested_git_repositories/2022-01-13T18:08:24Z2022-01-13T18:08:24Z
<p>Using nested git repositories in git is not possible and thus this also applies to git-annex. However, here is a good workaround that I found:</p>
<p>Rename the <code>.git</code> directory of the nested repo to <code>dotgit</code> (or similar), <code>git annex add</code> it and then create a symbolic link from <code>.git</code> to <code>dotgit</code>. It's important that the link is created only after the nested repo has been <code>git annex add</code>'ed. Also, the link needs to be created manually on each clone. Finally you'll need to hide the <code>dotgit</code> directory from the nested repo itself by adding <code>/dotgit</code> to <code>dotgit/info/exclude</code>.</p>
<pre><code>mv nested/.git nested/dotgit; echo "/dotgit" >>nested/dotgit/info/exclude
git annex add nested; git commit -m "add nested"
cd nested; ln -s dotgit .git # needs to be done on every clone
</code></pre>
Using git-annex on NTFS with WSL1http://git-annex.branchable.com/tips/Using_git-annex_on_NTFS_with_WSL1/2022-10-13T15:29:12Z2021-10-22T22:13:45Z
<p>The following steps are tested on Windows 10 21h1 with Ubuntu 20 and are designed to allow use of the annexed files through both WSL and Windows.</p>
<p><strong> Limitations </strong></p>
<ul>
<li>The repository must be created with <code>annex.tune.objecthashlower=true</code>.</li>
<li><code>git annex adjust --unlock</code> will not work. Avoid <code>annex.addunlocked=true</code> and do not add multiple unlocked files to the index.</li>
</ul>
<p><strong>Setup</strong></p>
<ul>
<li>Enable Developer mode in Windows settings so that symlinks can be created without elevated privileges.</li>
<li>Mount the NTFS drive with metadata option. <a href="https://docs.microsoft.com/en-us/windows/wsl/wsl-config"><code>/etc/wsl.conf</code></a> can be used or a line such as <code>C: /mnt/c drvfs metadata</code> can be added in <code>/etc/fstab</code>.</li>
<li>Follow these steps in order when creating a new repository.
<ul>
<li><code>git config annex.sshcaching false</code></li>
<li><code>git annex init</code></li>
<li>git-annex should not detect the filesystem as crippled but now set <code>git config annex.crippledfilesystem true</code></li>
</ul>
</li>
<li>Safety of locked files will require these settings and scripts.
<ul>
<li><code>git config annex.freezecontent-command 'wsl-freezecontent %path'</code></li>
<li><code>git config annex.thawcontent-command 'wsl-thawcontent %path'</code></li>
</ul>
</li>
</ul>
<p><details>
<summary>wsl-freezecontent</summary></p>
<pre><code>#!/usr/bin/env bash
if [ -f "$1" ]; then
if [[ "$1" == *.git/annex/objects/* ]]; then
PERM='(DE,WD,AD)'
else
PERM='(WD,AD)'
fi
elif [ -d "$1" ]; then
PERM='(DE,DC,WD,AD)'
else
exit 0
fi
OUTPUT="$(icacls.exe "$(wslpath -w "$1")" /deny "Authenticated Users:$PERM")"
if [ "$?" -ne 0 ]; then
echo "$OUTPUT"
exit 1
fi
</code></pre>
<p></details></p>
<p><details>
<summary>wsl-thawcontent</summary></p>
<pre><code>#!/usr/bin/env bash
if [ -f "$1" ]; then
PERM='(DE,WD,AD)'
elif [ -d "$1" ]; then
PERM='(DE,DC,WD,AD)'
else
exit 0
fi
OUTPUT="$(icacls.exe "$(wslpath -w "$1")" /grant "Authenticated Users:$PERM")"
if [ "$?" -ne 0 ]; then
echo "$OUTPUT"
exit 1
fi
</code></pre>
<p></details></p>
<p><strong> Patches </strong></p>
<p>These patches may introduce problems when there are multiple independent processes writing to the repository. Use at your own risk.</p>
<p><details>
<summary>Create symlink to annexed objects in-place. The add, addunused, lock, and rekey commands will create symlinks in-place instead of in a temporary directory.</summary></p>
<pre><code>From d871289d22d2e86cb62776841343baf6c0f83484 Mon Sep 17 00:00:00 2001
From: Reiko Asakura <asakurareiko@protonmail.ch>
Date: Wed, 12 Oct 2022 17:13:55 -0400
Subject: [PATCH 2/3] Create symlink to annexed objects in-place
---
Annex/Ingest.hs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Annex/Ingest.hs b/Annex/Ingest.hs
index 89dc8acea..ec35fb15d 100644
--- a/Annex/Ingest.hs
+++ b/Annex/Ingest.hs
@@ -301,7 +301,7 @@ restoreFile file key e = do
makeLink :: RawFilePath -> Key -> Maybe InodeCache -> Annex LinkTarget
makeLink file key mcache = flip catchNonAsync (restoreFile file key) $ do
l <- calcRepo $ gitAnnexLink file key
- replaceWorkTreeFile file' $ makeAnnexLink l . toRawFilePath
+ makeAnnexLink l file
-- touch symlink to have same time as the original file,
-- as provided in the InodeCache
--
2.30.2
</code></pre>
<p></details></p>
<p><details>
<summary>Recreate symlinks after remote transfer. The copy, move, get, sync commands will recreate the symlink after transferring the file from a remote.</summary></p>
<pre><code>From 82ea0ffb02fbc5e4003a466a216c8d1030b7d70a Mon Sep 17 00:00:00 2001
From: Reiko Asakura <asakurareiko@protonmail.ch>
Date: Wed, 12 Oct 2022 19:10:07 -0400
Subject: [PATCH 3/3] Recreate symlinks after remote transfer
---
Annex/Link.hs | 7 +++++++
Command/Get.hs | 3 ++-
Command/Move.hs | 3 ++-
3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/Annex/Link.hs b/Annex/Link.hs
index 1a344d07e..e0f172d1d 100644
--- a/Annex/Link.hs
+++ b/Annex/Link.hs
@@ -96,6 +96,13 @@ getAnnexLinkTarget' file coresymlinks = if coresymlinks
then mempty
else s
+relinkAssociatedFile :: AssociatedFile -> Bool -> Annex ()
+relinkAssociatedFile (AssociatedFile (Just file)) True =
+ getAnnexLinkTarget file >>= \case
+ Just target -> makeAnnexLink target file
+ _ -> noop
+relinkAssociatedFile _ _ = noop
+
makeAnnexLink :: LinkTarget -> RawFilePath -> Annex ()
makeAnnexLink = makeGitLink
diff --git a/Command/Get.hs b/Command/Get.hs
index a25fd8bf1..e16362f79 100644
--- a/Command/Get.hs
+++ b/Command/Get.hs
@@ -12,6 +12,7 @@ import qualified Remote
import Annex.Transfer
import Annex.NumCopies
import Annex.Wanted
+import Annex.Link
import qualified Command.Move
cmd :: Command
@@ -95,7 +96,7 @@ getKey' key afile = dispatch
showNote "not available"
showlocs []
return False
- dispatch remotes = notifyTransfer Download afile $ \witness -> do
+ dispatch remotes = observe (relinkAssociatedFile afile) $ notifyTransfer Download afile $ \witness -> do
ok <- pickRemote remotes $ \r -> ifM (probablyPresent r)
( docopy r witness
, return False
diff --git a/Command/Move.hs b/Command/Move.hs
index 55fed5c37..d733a7cbb 100644
--- a/Command/Move.hs
+++ b/Command/Move.hs
@@ -20,6 +20,7 @@ import Logs.Presence
import Logs.Trust
import Logs.File
import Annex.NumCopies
+import Annex.Link
import qualified Data.ByteString.Char8 as B8
import qualified Data.ByteString.Lazy as L
@@ -241,7 +242,7 @@ fromPerform src removewhen key afile = do
then dispatch removewhen deststartedwithcopy True
else dispatch removewhen deststartedwithcopy =<< get
where
- get = notifyTransfer Download afile $
+ get = observe (relinkAssociatedFile afile) $ notifyTransfer Download afile $
download src key afile stdRetry
dispatch _ deststartedwithcopy False = do
--
2.30.2
</code></pre>
<p></details></p>
<p><details>
<summary>Allow git-annex fix on crippled filesystem</summary></p>
<pre><code>From 65fe6e362dfbf2f54c8da5ca17c59af26de5ff83 Mon Sep 17 00:00:00 2001
From: Reiko Asakura <asakurareiko@protonmail.ch>
Date: Sat, 23 Oct 2021 17:13:50 -0400
Subject: [PATCH 1/2] Allow git-annex fix on crippled filesystem
---
Command/Fix.hs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Command/Fix.hs b/Command/Fix.hs
index 39853c894..2d66c1461 100644
--- a/Command/Fix.hs
+++ b/Command/Fix.hs
@@ -31,7 +31,7 @@ cmd = noCommit $ withAnnexOptions [annexedMatchingOptions] $
paramPaths (withParams seek)
seek :: CmdParams -> CommandSeek
-seek ps = unlessM crippledFileSystem $
+seek ps =
withFilesInGitAnnex ww seeker =<< workTreeItems ww ps
where
ww = WarnUnmatchLsFiles
--
2.30.2
</code></pre>
<p></details></p>
<p><strong> Usage tips </strong></p>
<ul>
<li>WSL1 will not create symlinks that work in Windows if created before the target file exists. This can be fixed by recreating them with any method, such as delete them and <code>git checkout</code>. Also see the above patches to make git-annex automatically recreate symlinks.</li>
</ul>
<p><details>
<summary>Sample script to recreate all symlinks under the current directory</summary></p>
<pre><code>#!/usr/bin/env python3
import pathlib
import os
def do(p):
for c in list(p.iterdir()):
if c.is_symlink() and c.resolve().exists():
target = os.readlink(c) # use readlink here to get the relative link path
c.unlink()
c.symlink_to(target)
elif c.is_dir() and c.name != '.git':
do(c)
do(pathlib.Path('.'))
</code></pre>
<p></details></p>
<p><strong> Related bugs </strong></p>
<ul>
<li><a href="http://git-annex.branchable.com/bugs/WSL_adjusted_braches__58___smudge_fails_with_sqlite_thread_crashed_-_locking_protocol/">WSL adjusted braches: smudge fails with sqlite thread crashed - locking protocol</a></li>
<li><a href="http://git-annex.branchable.com/bugs/WSL1__58___git-annex-add_fails_in_DrvFs_filesystem/">WSL1: git-annex-add fails in DrvFs filesystem</a></li>
</ul>
enable tor on nixoshttp://git-annex.branchable.com/tips/enable_tor_on_nixos/2021-08-24T04:43:35Z2021-08-24T04:43:35Z
<p>On NixOS tor is run with a <code>torrc</code> directly in <code>/nix/store</code>, but <code>git-annex
enable-tor</code> attempts to both read and modify <code>/etc/tor/torrc</code>.</p>
<p>This behavior can be accomodated by making a copy:</p>
<pre><code class="sh">torrc=$( ps -ef | egrep -o '(\S*?torrc)$' )
sudo mkdir -p /etc/tor
sudo cp $torrc /etc/tor/torrc
</code></pre>
<p>This should allow you to run:</p>
<pre><code class="sh">git-annex enable-tor
</code></pre>
<p>without seeing an error, but the edited <code>torrc</code> will have no effect so
git-annex will keep waiting for the hidden service to come online. While it
does that, check what lines were added:</p>
<pre><code class="sh">diff -u $torrc /etc/tor/torrc
</code></pre>
<p>and then add a hidden service to your <code>configuration.nix</code>:</p>
<pre><code class="nix"> # add a service for the repository
services.tor.relay.onionServices.git-annex-5e77c94c-5907-4f43-96bf-282ae233b240 = {
# this is where git annex configures it, which works fine, but doesn't
# actually seem necessary, so it could be left empty
path = "/var/lib/tor/tor-annex_1000_5e77c94c-5907-4f43-96bf-282ae233b240";
# the HiddenServicePort directive requires both tor and git-annex # remotedaemon
# to be able to access the socket which is why git annex places it in a separate
# directory, but this also needs to be made visible to tor
map = [ {
port = 12345;
target.unix = "/var/lib/tor-annex/1000_5e77c94c-5907-4f43-96bf-282ae233b240/s";
} ];
};
# make the sockets directory visible to the otherwise sandboxed tor daemon
systemd.services.tor.serviceConfig.BindPaths = [ "/var/lib/tor-annex" ];
</code></pre>
<p>Note that without the <code>BindPaths</code> the tor daemon will not be able to access the
sockets and connections will be rejected (can be diagnosed by sending tor a
<code>SIGUSR2</code> to enable debug logging).</p>
<p>You should now be able to run <code>nixos-rebuild switch</code> and git-annex will
detect that the hidden service is running.</p>
cloning a repository privatelyhttp://git-annex.branchable.com/tips/cloning_a_repository_privately/2021-04-23T20:01:59Z2021-04-23T18:54:30Z
<p>Normally, when you clone a git-annex repository, and use git-annex in it,
and then push or otherwise send changes back to origin, information gets
committed to the git-annex branch about your clone. Things like the annexed
files that are in it, its description, etc.</p>
<p>If you don't want the world to know about your clone, either for privacy
reason or only because the clone is a temporary copy of the repository,
here's how.</p>
<p>Recently git-annex got a new config setting, <code>annex.private</code>.
Set it before you start using git-annex in a repository, and git-annex
will avoid recording any information about the repository into the
git-annex branch.</p>
<pre><code>git clone ssh://... myclone
cd myclone
git config annex.private true
git annex init
</code></pre>
<p>Now you can use git-annex as usual, adding files to the repository,
getting the contents of files, etc.</p>
<p>When you push changes back to origin, do still push the git-annex branch,
since git-annex still uses it to record anything it needs to keep track of
that does not involve your private repository.</p>
<p>And be sure, when adding or editing annexed files, that you <code>git-annex copy</code>
them to a publically accessible repository. Otherwise, to everyone else,
there will seem to be no copies of that file availble anywhere, since they
won't know about your private repo's copy.</p>
<h2>private special remotes</h2>
<p>You can also make private special remotes, by using <code>git annex initremote
--private</code>.</p>
<p>Like a private repository, git-annex avoids storing any information about
a private special remote to the git-annex branch. It will only be available in
the repository where the special remote was created.</p>
<p>Bear in mind that, if you lose the repository where the private special
remote was created, you'll lose the information git-annex needs to access
that special remote, and that will likely mean you'll not be able to
recover any files stored in it.</p>
<h2>private git remotes</h2>
<p>When the git config "remote.name.private" is set, git-annex will avoid
recording anything in the git-annex branch about the remote. This is
set by <code>git-annex initremote --private</code>, and could also be set for
git remotes. This could be useful, perhaps. Update this tip if you have a
good way to use it.</p>
<h2>where the data is actually stored</h2>
<p>The private data gets stored in .git/annex/journal-private/ rather
than in the git-annex branch.</p>
using borg for efficient storage of old annexed fileshttp://git-annex.branchable.com/tips/using_borg_for_efficient_storage_of_old_annexed_files/2021-06-14T17:05:24Z2020-12-28T21:05:12Z
<p>If your git-annex repository contains 10 versions of a 100 megabyte file,
it will need 1000 megabytes of disk space to store them all. To save space
those old versions can be moved to a remote, but most remotes also don't
store similar versions efficiently.</p>
<p><a href="https://www.borgbackup.org/">Borg</a> is a deduplicating archiver
with compression and encryption. This makes it a good solution to this
problem, only the differences between the old versions of the file will be
stored by borg.</p>
<p>Borg can be used with git-annex as an unusual kind of remote.
git-annex is not able to store files in borg itself. Instead the way this
works is you use borg to store your git-annex repository, and then
<code>git-annex sync</code> scans the borg repository to find out what annexed files are
stored in it.</p>
<p>Let's set that up. Run this from the top directory of your git-annex repository
to create a borg repository next to it that stores all the files in it, and
let git-annex treat it as a remote.</p>
<pre><code># borg init --encryption=keyfile ../borgrepo
# git annex initremote borg type=borg borgrepo=../borgrepo
# borg create ../borgrepo `pwd`::{now}
# git annex sync borg
</code></pre>
<p>Now git-annex knows that all the files in the repository, including all the
old versions, have been stored in borg. But when you try to drop a file,
you'll find that git-annex does not trust the borg repository.</p>
<pre><code>drop file (unsafe)
Could only verify the existence of 0 out of 1 necessary copies
Also these untrusted repositories may contain the file:
ca863c47-9ded-4dd0-bd7d-9b65e5624171 -- [borg]
</code></pre>
<p>Why is this? Well, you could use <code>borg delete</code> or <code>borg prune</code> to delete
the content of the file from the borg repository at any time, so git-annex
defaults to not trusting it. This is fine when you're using borg to take
backups, and need to delete old borg archives to free up space on the
backup drive. And it can be useful to use git-annex with such borg backups.
But our goal is instead to move old versions of files to borg. So, you need
to tell git-annex that you will only use borg to append to the borg
repository, not to delete things from it.</p>
<pre><code># git annex enableremote borg appendonly=yes
</code></pre>
<p>Now all the old versions of files can be dropped from the git-annex
repository, freeing up disk space.</p>
<pre><code># git annex unused
# git annex drop --unused
</code></pre>
<p>You can continue running <code>borg create</code> and <code>git-annex sync</code> to store
changed files in borg and let git-annex know what's stored there.</p>
<p>It's possible to access the same borg repository from another clone of the
git-annex repository too. Just run <code>git annex enableremote borg</code> in that
clone to set it up. This uses the same <code>borgrepo</code> value that was passed
to initremote, but you can override it, if, for example, you want to access
the borg repository over ssh from this new clone.</p>
using git-annex from your programhttp://git-annex.branchable.com/tips/using_git-annex_from_your_program/2020-09-16T17:03:37Z2020-09-16T17:03:37Z
<p>Want to write a program that uses git-annex? Lots of people have, see
<a href="http://git-annex.branchable.com/related_software/">related software</a> for a list. This tip is an overview of ways git-annex
facilitates being used by other programs.</p>
<h2>batch mode communication</h2>
<p>Many git-annex commands have a --batch option. This is handy if your
program is operating on a lot of files; rather than running git-annex once
per file, you can construct batch pipelines and probably it will run a lot
faster.</p>
<p>The way the --batch option typically works is, it makes the command read
lines from standard input. Each line is a filename or whatever other thing
the command operates on. It will reply with whatever output it usually
outputs.</p>
<p>For example:</p>
<pre><code>git-annex get --batch
foo
get foo (from origin...)
(checksum...) ok
bar
baz
get baz (from origin...)
(checksum...) ok
</code></pre>
<p>Notice the blank line it replies in response to "bar"? That's because,
in this example, the file "bar" is already present, and it does not need to
do anything to get it. Normally, git-annex silently skips files it does not
need to operate on, but in batch mode, it will reply with a blank line when
there's nothing to do for a given input line.</p>
<h2>JSON output</h2>
<p>Notice that git-annex happened to output 2 lines per file in the example
above. But it could output any number of lines. How can your program know
when the output for one batch item is complete? Let alone parse it to
determine if it succeeded or failed? Some git-annex commands don't have
this problem, and are documented to output exactly one line, in a specific
format, in batch mode.</p>
<p>For the rest, the answer is <code>--json</code>. Use it with <code>--batch</code> and now each
batch mode request results in a json object being output:</p>
<pre><code>git-annex get --batch --json
foo
{"command":"get","note":"from origin...\nchecksum...","success":true,"input":["foo"],"key":"SHA256E-s30--f888da7dd6c0d6f37e3847f390d848c9a8e1e2d876865a91aca7e5a6a83715e0","file":"foo"}
bar
baz
{"command":"get","note":"from origin...\nchecksum...","success":true,"input":["baz"],"key":"SHA256E-s1048576000--da87281c9f9ab6cef8f9362935f4fc864db94606d52212614894f1253461a762","file":"baz"}
</code></pre>
<p>Notice that it still outputs a blank line when there is nothing to do
for a request, so be prepared for that in your JSON parser.</p>
<p>There are also <code>--json-progress</code>, which adds more JSON messages giving
progress of transfers, and <code>--json-error-messages</code> which makes some error
messages be included in JSON objects instead of going to stderr.</p>
<p>The format of git-annex's JSON output is not documented in full,
because it varies from command to command. The shape is typically the same,
but a few commands have a more custom JSON. Try a command and see
what JSON it outputs and go from there. Fields won't be removed or renamed,
but new ones might be added from time to time.</p>
<h2>concurrency and batch mode</h2>
<p>If you use <code>-J</code> with <code>--batch</code>, some git-annex commands do support that,
and will handle multiple batch requests concurrently.</p>
<p>Suppose you want to get files concurrently in batch mode as the user clicks
on them, and display to the user in your GUI when each file transfer is
complete. But a problem: How to know which file a JSON reply corresponds
to, now that they are not always in the same order as you sent the
requests?</p>
<pre><code>git-annex get --batch -J3 --json
./foo
bar
baz
{"command":"get","note":"from origin...\nchecksum...","success":true,"input":["baz"],"key":"SHA256E-s1048576000--da87281c9f9ab6cef8f9362935f4fc864db94606d52212614894f1253461a762","file":"baz"}
{"command":"get","note":"from origin...\nchecksum...","success":true,"input":["./foo"],"key":"SHA256E-s30--f888da7dd6c0d6f37e3847f390d848c9a8e1e2d876865a91aca7e5a6a83715e0","file":"foo"}
</code></pre>
<p>You might think the "file" field is the thing to look at. And it can work.
But, the example above shows a way it can fail to work. <code>./foo</code> was
requested, but that got normalized internally, and the response has
<code>"file":"foo"</code></p>
<p>And looking at the "file" field won't help with other git-annex commands,
such as addurl, where you don't request filenames.</p>
<p>What will always work is to look at the "input" field. This is
always the exact input that git-annex was operating on when it output a
JSON object. (Older versions of git-annex don't include that field, but
those old versions don't run <code>--batch</code> concurrently either,
so if it's omitted, you can assume the JSON objects are in the same order
as you made requests.)</p>
using Backblaze B2http://git-annex.branchable.com/tips/using_Backblaze_B2/2020-06-17T01:18:32Z2020-03-05T02:33:30Z
<p>For using Backblaze B2 as a special remote, there are currently three
choices:</p>
<ul>
<li><p>Using <a href="http://git-annex.branchable.com/special_remotes/rclone/">rclone</a><br />
(Actively maintained)</p></li>
<li><p>Backblaze B2 supports supports the same API as Amazon S3, so
git-annex's built-in <a href="http://git-annex.branchable.com/special_remotes/S3/">S3 special remote</a> can be used
with it.</p>
<p>However, it needs S3 version 4 signatures, which are only supported by
git-annex 8.20200508 and newer.</p>
<p>Here is how to set up the special remote:</p>
<pre><code> git annex initremote backblaze type=S3 signature=v4 host=$endpoint bucket=$bucketid protocol=https
</code></pre>
<p>Remember to replace $endpoint with the actual backblaze endpoint and $bucketid with
the bucketid.</p></li>
<li><p>A dedicated special remote, <a href="https://github.com/encryptio/git-annex-remote-b2">https://github.com/encryptio/git-annex-remote-b2</a><br />
(Last updated 2016)</p></li>
</ul>
<p>At this time it's not clear which is better, so if you find one works
better than the other, please comment below.</p>
using Google Drivehttp://git-annex.branchable.com/tips/using_Google_Drive/2020-06-17T01:18:32Z2020-03-05T02:33:30Z
<p>For using <a href="https://google.com/drive">Google Drive</a>
as a special remote, there are currently two choices:</p>
<ul>
<li>A dedicated special remote,
<a href="https://github.com/Lykos153/git-annex-remote-googledrive">https://github.com/Lykos153/git-annex-remote-googledrive</a><br />
Includes support for exporttree and other features.</li>
<li>Using <a href="http://git-annex.branchable.com/special_remotes/rclone/">rclone</a></li>
</ul>
<p>If you find one works best, please comment below.</p>
using Hubichttp://git-annex.branchable.com/tips/using_Hubic/2020-06-17T01:18:32Z2020-03-05T02:33:30Z
<p>For using Hubic as a special remote, there are currently two choices:</p>
<ul>
<li>Using <a href="http://git-annex.branchable.com/special_remotes/rclone/">rclone</a><br />
Actively maintained.</li>
<li>A dedicated special remote,
<a href="https://github.com/Schnouki/git-annex-remote-hubic">https://github.com/Schnouki/git-annex-remote-hubic</a><br />
Not actively maintained.</li>
</ul>
<p>If you find one works best, please comment below.</p>
finding which file matches a keyhttp://git-annex.branchable.com/tips/finding_which_file_matches_a_key/2020-06-17T01:18:32Z2019-10-21T15:40:54Z
<p>I have a music file which makes my music player unhappy. unfortunately, it (the music player) only shows me the target of the symlink, the "key" of the file, e.g. <code>SHA256E-s16279847--ce02487cd9f78f5944cbc1acb6622d270f7c16172d0fa12ae1330a4d9c3144a0.mp3</code>. There's a way to find which remotes have that key:</p>
<pre><code>$ git annex whereis --key=SHA256E-s16279847--ce02487cd9f78f5944cbc1acb6622d270f7c16172d0fa12ae1330a4d9c3144a0.mp3
whereis SHA256E-s16279847--ce02487cd9f78f5944cbc1acb6622d270f7c16172d0fa12ae1330a4d9c3144a0.mp3 (7 copies)
059b8bdb-2716-4ac9-b06e-9b1176af361d -- anarcat@curie:~/mp3 [here]
ok
</code></pre>
<p>But that doesn't show me which file(s) actually point to it. <a href="http://git-annex.branchable.com/git-annex-list/">git-annex-list</a> and <a href="http://git-annex.branchable.com/git-annex-find/">git-annex-find</a> don't have the <code>--key</code> parameter and <a href="http://git-annex.branchable.com/git-annex-matching-options/">git-annex-matching-options</a> doesn't have it either, so it makes it difficult to find which file points to that key.</p>
<p>The only way I found to do this was to use the <code>find</code> command, like this:</p>
<pre><code>find -lname '*SHA256E-s16279847--ce02487cd9f78f5944cbc1acb6622d270f7c16172d0fa12ae1330a4d9c3144a0.mp3'
</code></pre>
<p>You can also use:</p>
<pre><code>git log --stat -1 -SKEY
</code></pre>
<p>to find the commits (and therefore the files) that link to the given key. That said, <code>git-annex</code> does not have any knowledge that would let it do better than either of these commands, at least not reliably. -- <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
multiple remotes accessing the same data storehttp://git-annex.branchable.com/tips/multiple_remotes_accessing_the_same_data_store/2020-09-02T16:26:20Z2019-10-14T20:09:23Z
<p>A remote configures git-annex with a way to access a particular data
store. Sometimes, there's more than one way for git-annex to access the
same underlying data store. It then makes sense to have multiple remotes.</p>
<p>The most common and easy case of this is when the data store is itself a
git-annex repository. For example, using git-annex on your laptop, you
might add a ssh remote which works from anywhere. But, when you're located
on the same LAN as the server, it might be faster or simpler to use a
different ssh url.</p>
<pre><code>git remote add server someserver.example.com:/data
git remote add lanserver someserver.lan:/data
</code></pre>
<p>git-annex is able to realize automatically that these two remotes both
access the same repository (by contacting the repository and looking up its
git-annex uuid). You can use git-annex to access which ever remote you
want to at a given time.</p>
<p><a href="http://git-annex.branchable.com/special_remotes/">Special remotes</a> don't store data in a git-annex repository. It's
possible to configure two special remotes that access the same underlying
data store, at least in theory. Whether it will work depends on how the
two special remotes store their data. If they don't use the same filename
(or whatever), it might not work.</p>
<p>The case almost guaranteed to work is two special remotes of the same type,
and other configuration, but with different urls that point to the same
data store. For example, a <a href="http://git-annex.branchable.com/special_remotes/git-lfs/">git-lfs</a> repository can be
accessed using http or ssh. So two git-lfs special remotes can be set up,
and both access the same data.</p>
<pre><code>git annex initremote lfs-ssh type=git-lfs encryption=shared url=git@example.com:repo
git annex initremote --sameas=lfs-ssh lfs-http type=git-lfs url=https://example.com/repo
</code></pre>
<p>The <code>--sameas</code> parameter tells git-annex that the new special remote
uses the same data store as an existing special remote. Note that
the encryption= parameter, which is usually mandatory, is omitted.
The two necessarily encrypt data in the same way, so it will
inherit the encryption configuration. Other configuration is not inherited,
so if you need some parameter to initremote to make a special remote behave
a certian way, be sure to pass it to both.</p>
<p>Finally, it's sometimes possible to access the same data stored in two
special remotes with different types. One combination that works is
a <a href="http://git-annex.branchable.com/special_remotes/directory/">directory</a> special remote
and a <a href="http://git-annex.branchable.com/special_remotes/rsync/">rsync</a> special remote.</p>
<pre><code>git annex initremote dir type=directory encryption=none directory=/foo
git annex initremote --sameas=dir rsync type=rsync rsyncurl=localhost:/foo
</code></pre>
<p>If a combination does not work, git-annex will be unable to access files
in one remote or the other, it could get into a scrambled mess. So it's
best to test a a combination carefully before you start using it for real.
If you find combinations that work, please edit this page to list them.</p>
<h2>known working combinations</h2>
<ul>
<li>directory and rsync</li>
<li>httpalso and directory</li>
<li>httpalso and rsync</li>
<li>httpalso and rclone (any layout except for frankencase)</li>
<li>httpalso and any special remote that uses exporttree=yes</li>
</ul>
storing data in git-lfshttp://git-annex.branchable.com/tips/storing_data_in_git-lfs/2020-06-17T01:18:32Z2019-08-05T17:44:42Z
<p>git-annex can store data in <a href="https://git-lfs.github.com/">git-lfs</a>
repositories, using the <a href="http://git-annex.branchable.com/special_remotes/git-lfs/">git-lfs special remote</a>.</p>
<p>You do not need the git-lfs program installed to use it, just a recent
enough version of git-annex.</p>
<h2>getting started</h2>
<p>Here's how to initialize a git-lfs special remote on Github.</p>
<pre><code>git annex initremote lfs type=git-lfs encryption=none url=https://github.com/yourname/yourrepo
</code></pre>
<p>In this example, the remote will not be encrypted, so anyone who can access
it can see its contents. It is possible to encrypt everything stored in a
git-lfs remote, see <a href="http://git-annex.branchable.com/tips/fully_encrypted_git_repositories_with_gcrypt/">fully encrypted git repositories with gcrypt</a>.</p>
<p>Once the git-lfs remote is set up, git-annex can store and retrieve
content in the usual ways:</p>
<pre><code>git annex copy * --to lfs
git annex get --from lfs
</code></pre>
<p>But, git-annex <strong>cannot delete anything</strong> from a git-lfs special remote,
because the protocol does not support deletion.</p>
<p>A git-lfs special remote also functions as a regular git remote. You can
use things like <code>git push</code> and <code>git pull</code> with it.</p>
<h2>enabling existing git-lfs special remotes</h2>
<p>There are two different ways to enable a git-lfs special
remote in another clone of the repository.</p>
<p>Of course, you can use <code>git annex enableremote</code> to enable a git-lfs special
remote, the same as you would enable any other special remote.
Eg, for the "lfs" remote initialized above:</p>
<pre><code>git annex enableremote lfs
</code></pre>
<p>But perhaps more simply, if git-annex sees a git remote that matches
the url that was provided to initremote earlier, it will <em>automatically</em>
enable that git remote as a git-lfs special remote.</p>
<p>So you can just git clone from the url, and the "origin" remote will be
automatically used as a git-lfs special remote.</p>
<pre><code>git clone https://github.com/yourname/yourrepo
cd yourrepo
git-annex get --from origin
</code></pre>
<p>Nice and simple, and much the same as git-annex handles its regular
remotes.</p>
<p>(Note that git-annex versions 7.20191115 and older didn't remember the url
privided to initremote, so you'll need to pass the url= parameter
to enableremote in that case. Newer versions of git-annex will then
remember the url.)</p>
<h2>multiple urls</h2>
<p>Often there are multiple urls that can access the same git repository.
You can set up git-lfs remotes for each url. For example,
to add a remote accessing the github repository over ssh:</p>
<pre><code>git annex initremote lfs-http --sameas=lfs url=git@github.com:yourname/yourrepo.git
</code></pre>
<p>The <code>--sameas</code> parameter tells git-annex that this is the same as the "lfs"
repository, so it will understand that anything it stores in one remote can
be accessed also with the other remote.</p>
android sync with adbhttp://git-annex.branchable.com/tips/android_sync_with_adb/2020-08-10T19:38:51Z2019-04-09T22:01:16Z
<p>While git-annex can be <a href="http://git-annex.branchable.com/Android/">installed on your Android device</a>,
it might be easier not to install it there, but run it on your computer
using <code>adb</code> to pull and push changes to the Android device.</p>
<p>A few reasons for going this route:</p>
<ul>
<li>Easier than installing git-annex on Android.</li>
<li>Avoids needing to type commands into a terminal on Android.</li>
<li>Avoids problems with putting a git-annex repository on Android's <code>/sdcard</code>,
which is crippled by not supporting hard links etc.</li>
</ul>
<p>All you should need is a USB cable (or adb over wifi), and the <code>adb</code>
command.</p>
<h2>setting it up</h2>
<p>First, initialize your git-annex repository on your computer, if you haven't
already.</p>
<p>Then, in that repository, set up an adb special remote:</p>
<pre><code>git-annex initremote android type=adb androiddirectory=/sdcard encryption=none exporttree=yes importtree=yes
</code></pre>
<p>The above example syncs with the /sdcard directory of the
Android device. That can be a lot of files, so you may want a more
limited directory. See the sample workflows below for some more examples.</p>
<p>Next, configure how trees of files imported from it will be merged into your
git repository.</p>
<pre><code> git config remote.android.annex-tracking-branch master:android
</code></pre>
<p>Setting "master:android" makes the phone be treated as containing a branch
of the master branch, and makes all its files appear to be inside a
subdirectory named <code>android</code>. If you want its files to not be in a
subdirectory, set it to "master" instead.</p>
<p>Finally, you may want to configure a preferred content expression for the
remote. That will limit both what is exported to it, and what is imported
from it. If you want to fully sync all files, you don't need to do this.</p>
<p>For example, to limit the files that get imported and exported to sound files:</p>
<pre><code>git annex wanted android 'include=*.mp3 or include=*.ogg'
</code></pre>
<h2>syncing with it</h2>
<pre><code>git annex sync --content android
</code></pre>
<p>This command does a bi-directional sync with the phone, first importing
new and changed files from it, merging that into the master branch,
and then exporting from the master branch back to the android device so any
modifications you have made get synced over to it.</p>
<p>See <span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fandroid_sync_with_adb&page=git-annex-import__44_____91____91__git-annex-export" rel="nofollow">?</a>git-annex-import, [[git-annex-export</span>, and <a href="http://git-annex.branchable.com/git-annex-sync/">git-annex-sync</a>
for more details, and bear in mind that you can also use commands like
these to only import from or export to the android device:</p>
<pre><code>git annex import master:android --from android
git annex merge android/master
git annex export master:android --to android
</code></pre>
<h2>sample workflows</h2>
<h3>photos</h3>
<p>Set up the remote to use the /sdcard/DCIM directory where the phone's
camera stores them.</p>
<pre><code>git-annex initremote android type=adb androiddirectory=/sdcard/DCIM encryption=none exporttree=yes importtree=yes
</code></pre>
<p>The annex-tracking-branch can be the same as before, to limit
the files that are synced to those in an android directory:</p>
<pre><code>git config remote.android.annex-tracking-branch master:android
</code></pre>
<p>If you don't want to keep old photos on your Android device, you can simply
<code>git mv</code> the files from the android directory to another directory, and
the next sync with the phone will delete them from the Android device:</p>
<pre><code>git mv android/* .
git annex sync --content
</code></pre>
<h3>music and podcasts</h3>
<p>You could set up the remote to use the /sdcard/Music directory.
But, I sometimes download music to other locations, and perhaps you do too.
Let's instead limit the remote to mp3 and ogg files:</p>
<pre><code>git-annex initremote android type=adb androiddirectory=/sdcard encryption=none exporttree=yes importtree=yes
git annex wanted android 'include=*.mp3 or include=*.ogg'
</code></pre>
<p>The annex-tracking-branch can be the same as before, to limit
the files that are synced to those in an android directory:</p>
<pre><code>git config remote.android.annex-tracking-branch master:android
</code></pre>
<p>And then do an initial sync:</p>
<pre><code>git annex sync --content android
</code></pre>
<p>Now, you can copy music and podcasts you want to listen
to over to the Android device, by first copying them to the android
directory of your git-annex repo:</p>
<pre><code>cp -a podcasts/LibreLounge/Episode_14__Secure_Scuttlebutt_with_Joey_Hess.ogg android/
git annex add android
git annex sync --content android
</code></pre>
<p>That will also import any new sound files from the Android device into
your git-annex repo.</p>
<p>Once you're done with listening to something on the Android device, you can
simply delete it from the device, and the next time git-annex syncs, it
will get removed from the android directory. Or, you can delete it from the
android directory and the next sync will delete it from the Android device.</p>
Using git-worktree with annexhttp://git-annex.branchable.com/tips/Using_git-worktree_with_annex/2019-01-21T15:42:51Z2018-11-28T09:02:10Z
<p><a href="https://git-scm.com/docs/git-worktree">Git worktrees</a> are supported since version 6.20180719.</p>
<p>Git normally makes a <code>.git</code> <strong>file</strong> in a
worktree, that points to the real git repository under <code>.git/worktrees/</code>.
This presents problems for git-annex. So, when used in a worktree,
git-annex will automatically replace the <code>.git</code> file with a symlink
pointing at the git repository. It also places an appropriate <code>annex</code> link
to <code>.git/worktrees/<name>/annex</code> to point to the object store. I don't know
how crippled filesystems are handled.</p>
<p>Getting, dropping and syncing content works fine in a worktree, however
if there is change in the tree then syncing doesn't update git worktrees
and their indices, but updates the checked out branches. This is different to
the handling of the main working directory as it's either got updated or left
behind with its branch if there is a conflict.</p>
<p>In its current state I use git-worktree to copy symlinks across branches and run <code>git annex fix</code>
on them. I only use temporary worktrees due to the syncing behavior.</p>
hiding missing fileshttp://git-annex.branchable.com/tips/hiding_missing_files/2019-03-22T13:51:53Z2018-10-20T18:23:29Z
<p>Annexed files can have their content either present in the repository, or
not locally present (but stored in other repositories). Normally such
missing files are represented by broken symlinks or pointer files.</p>
<p>Sometimes it can be useful to hide the missing files, so you can focus on
only the files whose content is available to use. This is possible to do,
but it needs some different workflows of using git-annex.</p>
<h2>getting started</h2>
<p>To get started, your repository needs to be upgraded to v7, since the
feature does not work in v5 repositories.</p>
<pre><code>git annex upgrade
</code></pre>
<p>The <a href="http://git-annex.branchable.com/git-annex-adjust/">git-annex adjust</a> command sets up an adjusted form
of a git branch, in this case we'll ask it to hide missing files.</p>
<pre><code>git annex adjust --hide-missing
</code></pre>
<p>And now the working tree only contains annexed files whose content is
present. Files with missing content are gone (but not forgotten).</p>
<p>The command switched to a branch with a name like "master(hidemissing)". Since
it's a regular git branch, you can switch back and forth between it and the
full branch at any time:</p>
<pre><code>git checkout master
...
git checkout "master(hidemissing)"
</code></pre>
<h2>git commands in the adjusted branch</h2>
<p>When in the adjusted branch, you can use the usual git commands, adding files,
renaming them, and deleting them, and committing. But bear in mind you're not
in the master branch, and so your commits won't touch the master branch. So,
you need a way to update the master branch the changes you made to the adjusted
branch. That's easy, just sync:</p>
<pre><code>touch new-file
git annex add new-file
git commit -m 'added a file'
git annex sync --no-push --no-pull
</code></pre>
<p>That sync updated the master branch, cherry-picking the commit into it:</p>
<pre><code>> git log --stat master -n 1
commit 175ce2309a9a6f61b2c918f0878ea3060eab10ea
Author: Joey Hess <joeyh@joeyh.name>
Date: Sat Oct 20 12:12:00 2018 -0400
added a file
new-file | 1 +
1 file changed, 1 insertion(+)
</code></pre>
<p>Similarly, you can delete a file and sync, and it will be removed from the master
branch.</p>
<div class="notebox">
<p>A tricky point, that's worth mentioning here is that, when you <code>git annex drop</code>
a file, and then delete it, and sync, it <em>won't</em> be removed from the master branch.</p>
<p>Why not? Well, the adjusted branch hides missing files; after dropping the file
is missing, and after deletion it's hidden. And you generally don't want to
remove hidden files from the master branch in a sync from the adjusted branch.</p>
<p>If that seems complicated, don't worry, the behavior will probably
make sense when you encounter this situation.</p>
</div>
<p>You can also use <code>git annex sync</code> to pull changes from remotes into the adjusted
branch. It will automatically filter out missing files while merging the other
changes.</p>
<p>So that's all the usual git operations covered; you can use regular git commands
on the working tree and to commit files, and you use <code>git annex sync</code> to push
and pull. Now we need to talk about git-annex operations that get or drop
content, which can be tricky since missing files are hidden.</p>
<h2>getting and dropping files</h2>
<p>So, you're in a branch, missing files are hidden, and you want git-annex to get
some file. What do you do?</p>
<pre><code>> git annex get some_file
git-annex: some_file not found
git-annex: get: 1 failed
</code></pre>
<p>Well of course, that doesn't work, the file's pointer is not in the working tree;
it's been hidden. Asking git-annex to get a whole directory won't work either;
all files in the working tree are present so it won't find any missing ones to
operate on. (This might be improved later, but it's how things are currently.)</p>
<p>What will work is to use <code>git annex sync</code>, which knows you're in an adjusted branch
and can get hidden files.</p>
<pre><code>git annex sync --content-of some_file --no-push --no-pull
</code></pre>
<p>Unlike getting files that are hidden, dropping files is no problem, since
the file you want to drop will be present. But, after dropping a file,
it won't be hidden right away. This is because updating the adjusted branch to
hide the dropped file is a bit expensive. Here's how to drop and then hide
files:</p>
<pre><code>git annex drop some_file
git annex adjust --hide-missing
</code></pre>
<p>Re-running <code>git annex adjust</code> while in the adjusted branch updates the branch
to hide any newly missing files, and unhide any files whose content is
now present. (Running <code>git annex sync</code> also does that, along with the other
syncing.)</p>
<p>If this seems a bit of a pain, read on for a simpler way ...</p>
<h2>a simple workflow</h2>
<p>Here's how I use this for my podcasts repository. I <a href="http://git-annex.branchable.com/tips/downloading_podcasts/">use git-annex to download
podcasts</a> to a server. I want to keep all the podcasts,
but on my laptop of phone, I mostly want to only see podcasts I've not already
listened to.</p>
<p>I set up the repository like this:</p>
<pre><code>git clone server:/path/to/podcasts
cd podcasts
git annex upgrade
git annex adjust --hide-missing
git annex group here client
git annex wanted here standard
</code></pre>
<p>The last two commands make the repository use the
<a href="http://git-annex.branchable.com/preferred_content/standard_groups/">standard preferred content setting for client repositories</a>,
so it wants to get a copy of all files except for files inside "archive"
directories. When I'm done with listening to a podcast, I'll move it into an
"archive" directory to indicate I'm done with it.</p>
<p>To download all the new podcasts and make the files visible,
and drop the drop the archived podcasts, and hide their files, I now
only need to run one command:</p>
<pre><code>git annex sync --content
</code></pre>
<p>Later, when I want to revisit an old podcast, I can simply check
out the master branch to make all the old files appear, and
<code>git annex get</code> the one I want.</p>
local caching of annexed fileshttp://git-annex.branchable.com/tips/local_caching_of_annexed_files/2018-08-03T18:10:28Z2018-08-01T19:35:58Z
<p>Here's how to set up a local cache of annexed files, that can be used
to avoid repeated downloads.</p>
<p>An example use case: Your CI system is operating on a git-annex repository,
so every time it runs it makes a fresh clone of the repository and uses
<code>git-annex get</code> to download a lot of data into it.</p>
<p>We'll create a cache repository, set it as a remote of the other git-annex
repositories, and configure git-annex to check the cache first before other
more expensive ways of retrieving content. The cache can be cleaned out
whenever you like with simple unix commands.</p>
<p>Some other nice properties -- When used on a system like BTRFS with COW
support, content from the cache can populate multiple other repositories
without using any additional disk space. And, git-annex repositories that
are otherwise unrelated can share use of the cache if they happen to
contain a common file.</p>
<p>You'll need git-annex 6.20180802 or newer to follow these instructions.</p>
<h2>creating the cache</h2>
<p>First let's create a new, empty git-annex repository. It will be put in
~/.annex-cache in the example, but for best results, put it in the same
filesystem as your other git-annex repositories.</p>
<pre><code>git init --bare ~/.annex-cache
cd ~/.annex-cache
git annex init
git config annex.hardlink true
git annex untrust here
</code></pre>
<p>The cache does not need to be a git annex repository; any kind of special
remote can be used as a cache too. But, using a git repository lets
annex.hardlink be used to make hard links between the cache and
repositories using it.</p>
<p>The cache is made untrusted, because its contents can be cleaned at any
time; other repositories should not trust it to retain content.</p>
<h2>making repositories use the cache</h2>
<p>Now in each git-annex repository that you want to use the cache, add it as
a remote, and configure it as follows:</p>
<pre><code>cd my-repository
git remote add cache ~/.annex-cache
git config remote.cache.annex-speculate-present true
git config remote.cache.annex-cost 10
git config remote.cache.annex-pull false
git config remote.cache.annex-push false
git config remote.cache.fetch do-not-fetch-from-this-remote:
</code></pre>
<p>The annex-speculate-present setting is the essential part. It makes
git-annex know that the cache repository may contain the content of any
annexed file. So, when getting a file, git-annex will try the cache
repository first.</p>
<p>The low annex-cost makes git-annex try to get content from the cache remote
before any other remotes.</p>
<p>The annex-pull and annex-push settings prevent <code>git-annex sync</code> from
pulling and pushing to the remote, and the remote.cache.fetch setting
further prevents git commands from fetching from it or pushing to it. The
cache repository will remain an empty git repository (except for the
content of annexed files). This means that the same cache can be used with
multiple different git-annex repositories, without intermingling their git
data.</p>
<h2>populating the cache</h2>
<p>For the cache to be used, you need to get file contents into it somehow.
A simple way to do that is, in a git-annex repository that already
contains the content of files:</p>
<pre><code>git annex copy --to cache
</code></pre>
<p>You could run that anytime after you get content. There are also ways to
automate it, but getting some files into the cache manually is a good
enough start.</p>
<h2>cleaning the cache</h2>
<p>You safely can remove content from the cache at any time to free up disk
space.</p>
<p>To remove everything:</p>
<pre><code>cd ~/.annex-cache
git annex drop --force
</code></pre>
<p>To remove files that have not been requested from the cache for the past day:</p>
<pre><code>cd ~/.annex-cache
git annex drop --force --not --accessedwithin=1d
</code></pre>
<h2>automatically populating the cache</h2>
<p>The assistant can be used to automatically populate the cache with files
that git-annex downloads into a repository.</p>
<h2>more caches</h2>
<p>The example above used a local cache on the same system. However, it's also
possible to have a cache repository shared amoung computers on a LAN.</p>
hashdeep integrationhttp://git-annex.branchable.com/tips/hashdeep_integration/2020-12-10T15:53:36Z2018-06-18T12:45:31Z
<h2>What is hashdeep</h2>
<p><a href="http://md5deep.sourceforge.net/">hashdeep</a> is a handy tool that allows you to check file integrity
across whole directory trees. It can detect renames and missing files,
for example.</p>
<h2>How to use it with git-annex</h2>
<p>The general working principle of hashdeep is that it iterates over a
set of files and produces a manifest that looks like this:</p>
<pre><code>$ hashdeep -r *
%%%% HASHDEEP-1.0
%%%% size,md5,sha256,filename
## Invoked from: /home/jessek
## $ hashdeep -r archives bin lib doc
21508,6178d221a1714b7e2089565e997d6ad1,92caa3f5754b22ca792e4f8626362d2ef39596b080abfcfed951a86bee82bec3,/home/jessek/archives/foo-1.2.1.tar.gz
12292,116e77a5dc6af0996597f7bc1b9252a2,c2afc6aa8d5c094a7226db1695d99a37fa858548f5d09aad9e41badfc62b1d27,/home/jessek/archives/bar-0.9.tar.bz2
145684,4409c1e0b5995c290c2fc3d1d6d74bac,f56881fb277358c95ed3ddf64f28c4ff3f3937e636e17d6a26d42822b16fd4ed,/home/jessek/bin/ls
</code></pre>
<p>Then this manifest can be used to check consistency of the files
later. Because git-annex also uses hashes to identify files, it fits
nicely with this pattern and I have used it to verify files that were
outside of git-annex's control yet still from the repository. First,
we produce the manifest file:</p>
<pre><code>(
echo '%%%% HASHDEEP-1.0'
echo '%%%% size,sha256,filename'
git annex find --format '${bytesize},${keyname},${file}\n' | sed 's/\.[^,]*,/,/'
) > manifest.txt
</code></pre>
<p>Then this can be used to verify an external fileset with the following
command:</p>
<pre><code>hashdeep -k manifest.txt -a -vv -e -r /mnt/ > result
</code></pre>
<p>This will create a listing of every file that was moved, that is
missing and so on. I have used this to audit corrupted files on my
phone's microSD card as it turned out that about half of the files
were corrupted for some mysterious reason:</p>
<pre><code>hashdeep: Audit failed
Input files examined: 0
Known files expecting: 0
Files matched: 0
Files partially matched: 0
Files moved: 3411
New files found: 2179
Known files not found: 42117
</code></pre>
<p>The non-zero numbers are interesting: 3411 files were detected as
being sane and just the filenames had changed. 2179 files were "new"
which means that they were not in the original set. Since files were
supposed to <em>only</em> come from the original set, this means those files
were corrupt. Actually, that's not completely true: some files (JPG
image files, namely) <em>were</em> created in the external fileset so I had
to be careful to exclude those false positives by hand. The 42117
"known files not found" were files that were simply not transferred
over to the phone for lack of space.</p>
<p>This way, I was able to quickly find which files were corrupt and
remove them. This created a list of files to remove:</p>
<pre><code>grep 'No match' result | grep -v '.jpg' | sed 's/: No match$//'
</code></pre>
<p>And I used the following loop to remove the files one by one:</p>
<pre><code>grep 'No match' result | grep -v '.jpg' | sed 's/: No match$//' | while read file; do rm "$file" ; done
</code></pre>
<p>Note the above is actually quite dangerous and you might want to
insert an <code>echo</code> in there to avoid shenanigans, especially if you do
not trust the filesystem.</p>
<h2>How else this might work</h2>
<p>Naturally, I could have imported all the files into git-annex and work
only with git-annex to operate this. But because the files were
renamed to some canonical version by the software transferring the file
(<a href="https://f-droid.org/en/packages/github.daneren2005.dsub/">dSub</a> and <a href="https://airsonic.github.io/">Airsonic</a>), it would have been difficult to make a
diff with the original set. This is on a (ex)fat filesystem too, which
might make git-annex operation difficult. Yet I can't help but think
this is something that <a href="http://git-annex.branchable.com/git-annex-export/">git-annex-export</a> should be able to do, but
I am not sure it could deal with the renames. And I must say I have
found it a little inconvenient to have to <code>initremote</code> to be able to
use what are essentially ephemeral storage mountpoints.</p>
<p>The above procedure reuses the best of both world: hashdeep does the
fuzzy matching and git-annex provides the catalog of files.</p>
<h2>Future improvements</h2>
<p>It would be nice if <a href="http://git-annex.branchable.com/git-annex-find/">git-annex-find</a> would allow listing only the
checksum, which would remove a potentially error-prone pattern
substitution above (<code>sed 's/\.[^,]*,/,/'</code>). This is necessary because
<code>${keyname}</code> includes the file extension which is expected with the
<code>SHA256E</code> backend, but it is somewhat inconvenient to deal with. Of
course, it would be pretty awesome if git-annex could output
hashdeep-compatible catalogs out of the box: it would improve
interoperability here... And the icing on cake would be a git-annex
command (a variation of <a href="http://git-annex.branchable.com/git-annex-import/">git-annex-import</a>?) that would audit an
external, non-annexed repository for consistency in the same way.</p>
<p>Also note that hashdeep can operate in "chunk" mode which means that
it can work across file boundaries, detecting partial matches, for
example. This is something that, as far as I know, is impossible in
git-annex as checksums are only file-based. This would be useful in
eliminating the false positives by distinguishing the "this file is
completely new" and "this file is corrupt" cases.</p>
<h2>Comments</h2>
<p>Those notes were provided by <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a> but would gladly welcome
corrections and improvements.</p>
install on Android in Termuxhttp://git-annex.branchable.com/tips/install_on_Android_in_Termux/2019-01-21T15:42:51Z2018-04-25T17:50:16Z
<p>The content that was here has moved to <a href="http://git-annex.branchable.com/Android/">Android</a>.</p>
Announcing recastex - (re)podcast from your annexhttp://git-annex.branchable.com/tips/Announcing_recastex_-___40__re__41__podcast__from_your_annex/2018-04-06T06:51:34Z2018-04-06T06:51:34Z
<p>Hi all,</p>
<p>I've written a simple tool in Python to re-podcast from an annex: <a href="https://github.com/stewart123579/recastex">recastex</a></p>
<p>Starting with the <a href="https://git-annex.branchable.com/tips/downloading_podcasts/">downloading podcasts</a> page, I've got a number of podcasts on my laptop, but they were not really synced to my podcast app on my phone. Not a problem any longer.</p>
<p>The app uses the metadata associated with <em>locally available</em> files to generate feeds for each of your "subscribed" podcasts - and collects anything else you have (like individual files) into a catch-all feed.</p>
<p>It's designed with git-annex + limited network + privacy in mind separating the public internet queries from the things that can be done over git-annex.</p>
<p><em>(As the author of <a href="https://git-annex.branchable.com/tips/a_gui_for_metadata_operations/">git-annex-metadata-gui</a> said...)</em> I hope these can be useful to someone other than myself.</p>
Splitting a git-annex repositoryhttp://git-annex.branchable.com/tips/splitting_a_repository/2023-07-13T23:58:44Z2017-04-24T13:29:23Z
<p>I have a <a href="https://git-annex.branchable.com/">git annex</a> repo for all my media
that has grown to 57866 files and git operations are getting slow, especially
on external spinning hard drives, so I decided to split it into separate
repositories.</p>
<p>Here is how to split out a repository that contains a subset of the files
in the larger repository. The larger repository is left as-is, but similar
methods can be used to remove the files from it. Or, it can be deleted
once it gets split up into several smaller repositories.</p>
<p>(This is the reverse of [[migrating two seperate disconnected directories
to git annex]].)</p>
<p>Suppose the old big repo is at <code>~/oldrepo</code>, and you want to split out
photos from it, and those are all located inside <code>~/oldrepo/photos</code>.</p>
<p>First, let's create a new empty repo.</p>
<pre><code>mkdir ~/photos
cd photos
git init
</code></pre>
<p>Now to populate the new repo with the files we want from the old repo. We
can use <code>git filter-branch</code> to create a git branch that contains only the
history of the files in <code>photos</code>. That command has a <em>lot</em> of options and
ways to use it, but here is one simple way:</p>
<pre><code>cd ~/oldrepo
# filter a branch to with only the files wanted by the new repository
git branch split-master master
git filter-branch --prune-empty --subdirectory-filter photos split-master
# replace the new repo's master branch with the filtered branch
git push ~/photos split-master
git branch -D split-master
cd ~/photos
git reset --hard split-master
git branch -d split-master
</code></pre>
<p>Next, the git-annex branch needs to be filtered to include only
the files in <code>photos</code>, and that filtered branch sent to the new repository.
That can be done with the <a href="http://git-annex.branchable.com/git-annex-filter-branch/">git-annex-filter-branch</a>(1) command.</p>
<pre><code>cd ~/oldrepo
annexrev=$(git annex filter-branch photos --include-all-key-information --include-all-repo-config --include-global-config)
git push ~/photos $annexrev:refs/heads/git-annex
</code></pre>
<p>Next, initialize git-annex on the new repository. This uses
the same annex.uuid as was in the old repository. That's ok, because
the repository that's been split off will never have the old repository
as a remote.</p>
<pre><code>cd ~/photos
git annex reinit $(git config --file ../tofilter/.git/config annex.uuid)
</code></pre>
<p>Finally the annexed file contents need to be copied to the new repository:</p>
<pre><code>cd ~/photos
# Hardlink all the annexed data from the old repo
cp -rl ~/oldrepo/.git/annex/objects .git/annex/
# Remove unneeded hard links
git annex unused --quiet
git annex drop --unused --force
# Fix up annex links to content and make sure it's all ok.
git annex fsck
</code></pre>
<p>Warning: This method of copying the annexed file contents and dropping
the unused ones causes the git-annex branch to log information.</p>
<h1>alternative older method</h1>
<p>Here is another way to do it. Suppose the old big repo is at <code>~/oldrepo</code>:</p>
<pre><code># Create a new repo for photos only
mkdir ~/photos
cd photos
git init
git annex init laptop
# Hardlink all the annexed data from the old repo
cp -rl ~/oldrepo/.git/annex/objects .git/annex/
# Regenerate the git annex metadata
git annex fsck --fast
# Also split the repo on the usb key
cd /media/usbkey
git clone ~/photos
cd photos
git annex init usbkey
cp -rl ../oldrepo/.git/annex/objects .git/annex/
git annex fsck --fast
# Connect the annexes as remotes of each other
git remote add laptop ~/photos
cd ~/photos
git remote add usbkey /media/usbkey
</code></pre>
<p>At this point, I went through all repos doing standard cleanup:</p>
<pre><code># Remove unneeded hard links
git annex unused
git annex dropunused --force 1-12345
# Sync
git annex sync
</code></pre>
<p>To make sure nothing is missing, I used <code>git annex find --not --in=here</code>
to see if, for example, the usbkey that should have everything could be missing
some thing.</p>
<p>Update: Antoine Beaupré pointed me to
<a href="http://git-annex.branchable.com/tips/Repositories_with_large_number_of_files/">this tip about Repositories with large number of files</a>
which I will try next time one of my repositories grows enough to hit a performance issue.</p>
<blockquote><p>This document was originally written by <a href="http://www.enricozini.org/blog/2017/debian/splitting-a-git-annex-repository/">Enrico Zini</a> and added to this wiki by <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a>.</p></blockquote>
Faster bash autocompletion with big annex reposhttp://git-annex.branchable.com/tips/Faster_bash_autocompletion_with_big_annex_repos/2017-04-14T22:11:55Z2017-04-14T20:19:29Z
<p>I'm currently using git annex to manage my entire file collection
(including tons of music and books) and I noticed how slow
autocompletion has become for files in the index (say for git add).
The main offender is a while-read-case-echo bash loop in
<code>__git_index_files</code> that can be readily substituted with a much faster
sed invocation. Here is my benchmark:</p>
<pre><code>__git_index_files ()
{
local dir="$(__gitdir)" root="${2-.}" file;
if [ -d "$dir" ]; then
__git_ls_files_helper "$root" "$1" | while read -r file; do
case "$file" in
?*/*)
echo "${file%%/*}"
;;
*)
echo "$file"
;;
esac;
done | sort | uniq;
fi
}
time __git_index_files > /dev/null
real 0m0.830s
user 0m0.597s
sys 0m0.310s
__git_index_files ()
{
local dir="$(__gitdir)" root="${2-.}" file;
if [ -d "$dir" ]; then
__git_ls_files_helper "$root" "$1" | \
sed -r 's@/.*@@' | uniq | sort | uniq
fi
}
time __git_index_files > /dev/null
real 0m0.075s
user 0m0.083s
sys 0m0.010s
</code></pre>
<p>10 times faster! So you might redefine <code>__git_index_files</code> as above in your .bashrc after sourcing the git autocomplete script.</p>
semi-synchronized remoteshttp://git-annex.branchable.com/tips/semi-synchronized_remotes/2017-03-27T19:50:43Z2017-03-27T16:23:13Z
<p>In general, git-annex repositories that are "synchronized" (e.g. with
the <a href="http://git-annex.branchable.com/git-annex-sync/">git-annex-sync</a> command, whatever the backend) have a global
namespace. Repositories will eventually converge to have very exactly
the same content, generally using git's push/pull/merge
mechanisms.</p>
<p>What if we do <em>not</em> wish to exactly have the same content across all
repositories, but still want to share some objects?</p>
<p>An example use case here is content (e.g. <code>.git/annex/objects</code> blobs)
sharing, without having to deliberately collaborate over a globally
consistent set of objects in the <code>master</code> branch. Think of a
decentralized <a href="https://github.com/RichiH/conference_proceedings">conference proceedings</a> repository where each
conference could add their own content to a conference-specific
repository, while at the same time allowing a unified view in another,
more centralized repository, or allowing users to pick and choose
which conference they would want content from.</p>
<p>While each repository could have its own distinct branch, all
repositories will see all those branches and this may affect content
retention, as git-annex may consider files to be "in use" because they
are on some remote branch, for example. Furthermore, I consider git
branching to be a rather advanced topic in git usage. While git-annex
uses those mechanisms (e.g. the <code>git-annex</code> and <code>sync/*</code> branches),
those are generally hidden from the user until something goes
wrong. Therefore I looked into providing a more straightforward
approach to this problem for my users and myself.</p>
<p>In my use case, I have the following repositories:</p>
<ul>
<li>repoA: my own curated media collection</li>
<li>repoB: a third-party media collection</li>
</ul>
<p>I do not wish for my local curated collection (repoA) to be completely
synchronized with the third-party collection (repoB). This is because
we may have different tastes and retention policies: while I archive
everything, there are certain media I am not interested in. On the
other hand repoB might keep only (say) the last month of media and
disard older content but have a more varied collection, which only a
subset is interesting to me. Yet I still want to access some of that
content!</p>
<p>So I did the following to add the third party repository:</p>
<pre><code>git remote add repoB example.net:repoB
git annex sync --no-push repoB
git annex get --from=repoB
</code></pre>
<p>This works well: I get the files from repoB locally. Of course, if
repoB expires some files, this will be impacted locally, but I can
always revert those choices without conflict, because I do not push
those back.</p>
<p>The downside of the <code>--no-push</code> option in <a href="http://git-annex.branchable.com/git-annex-sync/">git-annex-sync</a> is that
it needs to be made explicit at each invocation of the
command. Furthermore, this option is not supported by the assistant,
which will happily sync the master branch to all remotes by default.</p>
<p>An alternative is to manually fetch and merge content:</p>
<pre><code>git fetch repoB
git annex merge repoB
git reset HEAD^
# revert any possible changes upstream we don't want
git commit
</code></pre>
<p>Needless to say this quickly becomes quite messy, but it's the amazing
level of control git and git-annex provides, which obviously comes
with its price in complexity. Such a method will also be ignored by
the assistant and further <code>sync</code> commands.</p>
<p>To make sure those principles are respected in the assistant or a
plain <code>git annex sync</code> that may mistakenly be ran in that repository,
I need some special setting. There are the options I considered, in
<a href="https://manpages.debian.org/git-config.1.en.html">.gitconfig</a> or <a href="http://git-annex.branchable.com/git-annex/">git-annex</a>'s config options:</p>
<ul>
<li><code>remote.<name>.annex-ignore=true</code>: <code>sync</code> and <code>assistant</code> will not
sync <em>content</em> to the repository, but explicit <code>get --from=repoB</code>
will still work.</li>
<li><code>remote.<name>.annex-sync=false</code>: <code>sync</code> (and <code>assistant</code>?) will
not sync the git repository with the remote</li>
<li><code>remote.<name>.push=nothing</code>: git won't push by default, unless
branches are explicitly given, which may actually be the case for
git-annex, so unlikely to work.</li>
<li><p><code>remote.<name>.pushurl=/dev/null</code>: will completely disable any push
functionality to that remote. any sync will yield the following
error:</p>
<pre><code>fatal: '/dev/null' does not appear to be a git repository
[...]
git-annex: sync: 1 failed
</code></pre></li>
<li><p><code>remote.<name>.pushurl=.</code>: will push to the local repo
instead. crude hack and may confuse the hell out of git-annex, but
at least doesn't yield errors.</p></li>
</ul>
<p>A similar approach to hacking the <code>pushurl</code> is to make <code>repoB</code>
read-only to the user. This however, may trigger the activation of
<code>annex-ignore</code> by git-annex and will otherwise yield the same warnings
as the <code>pushurl=/dev/null</code> hack.</p>
<p>Right now, I am using <code>annex-sync = false</code> in <code>.git/config</code>. I have
also configured the repository to be in the "manual" <a href="http://git-annex.branchable.com/preferred_content/standard_groups/">standard
group</a> which will avoid copying
files into that repository:</p>
<pre><code>$ git annex group repoB manual
group repoB ok
(recording state in git...)
$ git annex wanted repoB standard
wanted repoB ok
(recording state in git...)
</code></pre>
<p>This is roughly equivalent to setting <code>annex-ignore = true</code>, yet it
allows for more flexibility. I could, for example, create custom
content expressions to sync certain folders automatically.</p>
<p>A disadvantage of the <code>annex-sync</code> settings is that it affects both
ways (push and pull), not just push, which is what I am interested
in. Although it could be argued that restricting both is fine here
because we want to manually review changes when we pull changes from
those remotes anyways.</p>
<p>The best approach may be to have git-annex respect the
<code>remote.<name>.push=nothing</code> setting. Another approach would be to add
<code>remote.<name>.annex-push</code> and <code>remote.<name>.annex-pull</code> settings
that would match the <code>sync --[no-]push --[no-]pull</code> flags.</p>
<p>Note that this is similar in concept to
<a href="http://git-annex.branchable.com/todo/Bittorrent-like_features/">Bittorrent-like features</a>, although here we assumes you
already have some transport to share anything you need, yet still have
to address the question of semi-synchronized git repositories in some
way.</p>
<p>I would obviously welcome additional comments and questions on this
approach. -- <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
using signed git commitshttp://git-annex.branchable.com/tips/using_signed_git_commits/2017-02-27T20:18:38Z2017-02-27T20:12:00Z
<p>Git uses SHA1, which is becoming increasingly broken. Using git-annex
and signed commits, we can work around the weaknesses of SHA1, and
let anyone who clones a repository verify that the data they receive
is the same data that was originally commited to it.</p>
<p>This is recommended if you are storing any kind of binary
files in a git repository.</p>
<h2>Configuring git-annex</h2>
<p>You need git-annex 6.20170228. Upgrade if you don't have it.</p>
<p>git-annex can use many types of <a href="http://git-annex.branchable.com/backends/">backends</a> and not all of them are
secure. So, you need to configure git-annex to only use
cryptographically secure hashes.</p>
<pre><code>git annex config --set annex.securehashesonly true
</code></pre>
<p>Each new clone of the repository will then inherit that configuration.
But, any existing clones will not, so this should be run in them:</p>
<pre><code>git config annex.securehashesonly true
</code></pre>
<h2>Signed commits</h2>
<p>It's important that all commits to the git repository are signed.
Use <code>git commit --gpg-sign</code>, or enable the commit.gpgSign configuration.</p>
<p>Use <code>git log --show-signature</code> to check the signatures of commits.
If the signature is valid, it guarantees that all annexed files
have the same content that was orignally committed.</p>
<h2>Why is this more secure than git alone?</h2>
<p>SHA1 collisions exist now, and can be produced using a common-prefix
attack. See <a href="https://shattered.io/">https://shattered.io/</a>. Let's assume that a chosen-prefix
attack against SHA1 will also become feasible too. However, a full preimage
attack still seems unlikely, so we won't consider such attacks in the
analysis below.</p>
<p>The reason that git-annex can work around git's problematic use of SHA1 is
that git-annex uses other, <a href="http://git-annex.branchable.com/backends/">stronger hashes</a> of the contents of
annexed files. For example, an annexed file may be a symlink to
".git/annex/objects/Ab/Cd/SHA256--eb45a55eb8756646e244e6c5f47349294568d58a9321244f4ee09a163da23a27".</p>
<p>Such a symlink is stored as a git blob object. The SHA1 of the git blobs
are listed in a git tree object, and the git commit object contains the
SHA1 of the tree. Finally, the commit object is gpg signed.</p>
<p>So, by checking the signature of a commit (<code>git log --show-signature</code>),
you can verify that this is the same commit that was originally made
to the repository. As far as the git developers know, there is no way
to produce multiple colliding git tree objects (at least not without
creating files with spectacularly ugly and long names), so you
know that the tree object pointed to by the signed commit is the original one.</p>
<p>Now, what about the blob objects that the tree lists? If these blobs
were regular git files, a SHA1 collision could mean your git repository
does not contain the same file that was orignally committed, and the signed
commit would not help.</p>
<p>But, if the blob object is a git-annex symlink target, it has to contain the
strong hash of the file content. If a SHA1 collision swaps in some other
blob object, it will need to contain the strong hash of a different file's
content. The current common-prefix attack cannot do that.</p>
<p>A chosen-prefix attack could make two strong hashes SHA1 the same,
but it would need to include additional data after the hash to do it. Since
git-annex version 6.20170224, there is no place for an attacker to
put such data in a git-symlink target. (See
<span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fusing_signed_git_commits&page=todo%2Fsha1_collision_embedding_in_git-annex_keys" rel="nofollow">?</a>sha1 collision embedding in git-annex keys</span> for details
of how this was prevented.)</p>
<p>So, we have a SHA1 chain from the gpg signature to the git-annex symlink target,
and at no point in the chain is a SHA1 collision attack feasible.
Finally, git-annex verifies the strong hash when transferring
the content of a file into the repository (and <code>git annex fsck</code> verifies it
too), and so the content that the symlink is pointing to must be the same
content that was originally committed.</p>
making a remote repo update when changes are pushed to ithttp://git-annex.branchable.com/tips/making_a_remote_repo_update_when_changes_are_pushed_to_it/2017-02-17T19:58:47Z2017-02-17T19:44:11Z
<p>Normally, pushing a change into a remote git repository does not update its
working tree. But it can be very convenient to only need to <code>git push</code>
(or <code>git annex sync --content</code>) to a remote to update the files checked out
there.</p>
<p>Git has a way to let you do this, by setting <code>receive.denyCurrentBranch</code>
to <code>updateInstead</code> in the remote repository. For example:</p>
<pre><code>ssh remote
cd /path/to/repo
git config receive.denyCurrentBranch updateInstead
</code></pre>
<p>Now after a push to the remote, its working tree will be updated.</p>
<p>Changes in the remote's working tree can prevent this update from working;
normally you'll want to avoid manually changing the remote's working tree,
and only push changes into it in this configuration.</p>
<p>When the remote is using <a href="http://git-annex.branchable.com/direct_mode/">direct mode</a> or
<a href="http://git-annex.branchable.com/git-annex-adjust/">adjusted branches</a>, you need the
<a href="http://git-annex.branchable.com/git-annex-post-receive/">git-annex post-receive</a>
hook to be set up for pushes to update the remote's working tree.
This is a new feature in git-annex 6.20170217. If the remote was
initialized with an older version of git-annex, you will need to re-run
<code>git annex init</code> in the remote after upgrading git-annex.</p>
mc menu integrationhttp://git-annex.branchable.com/tips/mc_menu_integration/2017-01-31T16:53:56Z2017-01-31T16:53:56Z
<p>Put the following in your ~/.config/mc/menu to map g and G to
git-annex-get and git-annex-drop in the famous mc file manager:</p>
<pre><code>+ ! t t
g git annex get
git annex get %f
+ t t
g git annex get
git annex get %u
+ ! t t
G git annex drop
git annex drop %f
+ t t
G git annex drop
git annex drop %u
</code></pre>
antipatternshttp://git-annex.branchable.com/tips/antipatterns/2018-07-18T18:28:44Z2017-01-17T19:22:48Z
<p>This page tries to regroup a set of Really Bad Ideas people had with
git-annex in the past that can lead to catastrophic data loss, abusive
disk usage, improper swearing and other unfortunate experiences.</p>
<p>This could also be called the "git annex worst practices", but is
different than <a href="http://git-annex.branchable.com/not/">what git annex is not</a> in that it covers normal
use cases of git-annex, just implemented in the wrong way. Hopefully,
git-annex should make it as hard as possible to do those things, but
sometimes, you just can't help it, people figure out the worst
possible ways of doing things.</p>
<hr />
<h1><strong>Symlinking the <code>.git/annex</code> directory</strong></h1>
<p>Symlinking the <code>.git/annex</code> directory, in the hope of saving
disk space, is a horrible idea. The general antipattern is:</p>
<pre><code>git clone repoA repoB
mv repoB/.git/annex repoB/.git/annex.bak
ln -s repoA/.git/annex repoB/.git/annex
</code></pre>
<p>This is bad because git-annex will believe it has two copies of the
files and then would let you drop the single copy, therefore leading
to data loss.</p>
<h2>Proper pattern</h2>
<p>The proper way of doing this is through git-annex's hardlink support,
by cloning the repository with the <code>--shared</code> option:</p>
<pre><code>git clone --shared repoA repoB
</code></pre>
<p>This will setup repoB as an "untrusted" repository and use hardlinks
to copy files between the two repos, using space only once. This
works, of course, only on filesystems that support hardlinks, but
that's usually the case for filesystems that support symlinks.</p>
<p>Alternatively, <code>git worktree</code> can be used to add another worktree to a git
repository. This way, multiple worktrees can share the same git-annex
object store.</p>
<h2>Real world cases</h2>
<ul>
<li><a href="http://git-annex.branchable.com/forum/share_.git__47__annex__47__objects_across_multiple_repositories_on_one_machine/">share .git/annex/objects across multiple repositories on one machine</a></li>
<li>at least one IRC discussion</li>
</ul>
<h2>Fixes</h2>
<p>Probably no way to fix this in git-annex - if users want to shoot
themselves in the foot by messing with the backend, there's not much
we can do to change that in this case.</p>
<hr />
<h1><strong>Reinit repo with an existing uuid without <code>fsck</code></strong></h1>
<p>To quote the <a href="http://git-annex.branchable.com/git-annex-reinit/">git-annex-reinit</a> manpage:</p>
<blockquote><p>Normally, initializing a repository generates a new, unique
identifier (UUID) for that repository. Occasionally it may be useful
to reuse a UUID -- for example, if a repository got deleted, and
you're setting it back up.</p></blockquote>
<p><a href="http://git-annex.branchable.com/git-annex-reinit/">git-annex-reinit</a> can be used to reuse UUIDs for deleted
repositories. But what happens if you reuse the UUID of an <em>existing</em>
repository, or a repository that hasn't been properly emptied before
being declared dead? This can lead to git-annex getting confused
because, in that case, git-annex may think some files are still
present in the revived repository (while they may not actually be).</p>
<p>This should never result in data loss, because git-annex does not
trust its records about the contents of a repository, and checks
that it really contains files before dropping them from other
repositories. (The one exception to this rule is trusted repositories,
whose contents are never checked. See the next two sections for more
about problems with trusted repositories.)</p>
<h2>Proper pattern</h2>
<p>The proper way of using reinit is to make sure you run
<a href="http://git-annex.branchable.com/git-annex-fsck/">git-annex-fsck</a> (optionally with <code>--fast</code> to save time) on the
revived repo right after running reinit. This will ensure that at
least the location log will be updated, and git-annex will notice if
files are missing.</p>
<h2>Real world cases</h2>
<ul>
<li><a href="http://git-annex.branchable.com/bugs/remotes_disappeared/">remotes disappeared</a></li>
</ul>
<h2>Fixes</h2>
<p>An improvement to git-annex here would be to allow
<span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fantipatterns&page=todo%2Freinit_should_work_without_arguments" rel="nofollow">?</a>reinit to work without arguments</span>
to at least not encourage UUID reuse.</p>
<h1><strong>Deleting data from trusted repositories</strong></h1>
<p>When you use <a href="http://git-annex.branchable.com/git-annex-trust/">git-annex-trust</a> on a repository, you disable
some very important sanity checks that make sure that git-annex
never loses the content of files. So trusting a repository
is a good way to shoot yourself in the foot and lose data. Like the
man page says, "Use with care."</p>
<p>When you have made git-annex trust a repository, you can lose data
by dropping files from that repository. For example, suppose file <code>foo</code> is
present in the trusted repository, and also in a second repository.</p>
<p>Now suppose you run <code>git annex drop foo</code> in both repositories.
Normally, git-annex will not let both copies of the file be removed,
but if the trusted repository is able to verify that the second
repository has a copy, it will delete its copy. Then the drop in the second
repository will <em>trust</em> the trusted repository still has its copy,
and so the last copy of the file gets deleted.</p>
<h2>Proper pattern</h2>
<p>Either avoid using trusted repositories, or avoid dropping content
from them, or make sure you <code>git annex sync</code> just right, so
other reposities know that data has been removed from a trusted repository.</p>
<h1><strong>Deleting trusted repositories</strong></h1>
<p>Another way trusted repositories are unsafe is that even after they're
deleted, git-annex will trust that they contained the files they
used to contain.</p>
<h2>Proper pattern</h2>
<p>Always use <a href="http://git-annex.branchable.com/git-annex-dead/">git-annex-dead</a> to tell git-annex when a repository has
been deleted, especially if it was trusted.</p>
<h1>Other cases</h1>
<p>Feel free to add your lessons in catastrophe here! It's educational
and fun, and will improve git-annex for everyone.</p>
<p>PS: should this be a toplevel page instead of being drowned in the
<span class="selflink">tips</span> section? Where should it be linked to? -- <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
peer to peer network with torhttp://git-annex.branchable.com/tips/peer_to_peer_network_with_tor/2020-06-17T01:18:32Z2016-12-07T19:51:20Z
<p>git-annex has recently gotten support for running as a
<a href="https://torproject.org/">Tor</a> hidden service. This is a nice secure
and easy to use way to connect repositories in different
locations. No account on a central server is needed; it's peer-to-peer.</p>
<h2>dependencies</h2>
<p>To use this, you need to get Tor installed and running. See
<a href="https://torproject.org/">their website</a>, or try a command like:</p>
<pre><code>sudo apt-get install tor
</code></pre>
<p>You also need to install <a href="https://github.com/warner/magic-wormhole">Magic Wormhole</a> -
here are <a href="https://magic-wormhole.readthedocs.io/en/latest/welcome.html#installation">the installation instructions</a>.</p>
<p><em>Important:</em></p>
<ul>
<li><p>At the time of writing, you need to install Magic Wormhole under Python 2,
because <a href="https://magic-wormhole.readthedocs.io/en/latest/tor.html">Tor support is only available under python2.7</a>.</p></li>
<li><p>The installation process must make a <code>wormhole</code> executable available
somewhere on your <code>$PATH</code>. Some distributions may only install executables
which reference the Python version, e.g. <code>wormhole-2.7</code>, in which case you
will need to manually create a symlink (and maybe file a bug with your distribution).</p></li>
<li><p>You need git-annex version 6.20180705. Older versions of git-annex
unfortunately had a bug that prevents this process from working correctly.</p></li>
</ul>
<h2>pairing two repositories</h2>
<p>You have two git-annex repositories on different computers, and want to
connect them together over Tor so they share their contents. Or, you and a
friend want to connect your repositories together. Pairing is an easy way
to accomplish this.</p>
<p>(The instructions below use the command line. If you or your friend would
rather avoid using the command line, follow the
<a href="http://git-annex.branchable.com/assistant/share_with_a_friend_walkthrough/">webapp walkthrough</a>. It's fine
for one person to use the command line and the other to use the webapp.)</p>
<p>In each git-annex repository, run these commands:</p>
<pre><code>git annex enable-tor
git annex remotedaemon
</code></pre>
<p>The enable-tor command may prompt for the root password, since it
configures Tor. Now git-annex is running as a Tor hidden service, but
it will only talk to peers after pairing with them.</p>
<p>In both repositories, run this command:</p>
<pre><code>git annex p2p --pair
</code></pre>
<p>This will print out a pairing code, like "11-incredible-tumeric",
and prompt for you to enter the other repository's pairing code.</p>
<p>So you have to get in contact with your friend to exchange codes.
See the section below "how to exchange pairing codes" for tips on
how to do that securely.</p>
<p>Once the pairing codes are exchanged, the two repositories will be securely
connected to one-another via Tor. Each will have a git remote, with a name
like "peer1", which connects to the other repository.</p>
<p>Then, you can run commands like <code>git annex sync peer1 --content</code> to sync
with the paired repository.</p>
<p>Pairing connects just two repositories, but you can repeat the process to
pair with as many other repositories as you like, in order to build up
larger networks of repositories.</p>
<h2>example session</h2>
<p>Here's how it all looks:</p>
<pre><code>$ git annex enable-tor
enable-tor
You will be prompted for root's password
Password:
Tor hidden service is configured. Checking connection to it. This may take a few minutes.
Tor hidden service is working.
ok
$ git annex remotedaemon
$ git annex p2p --pair
p2p pair peer1 (using Magic Wormhole)
This repository's pairing code is: 11-incredible-tumeric
Enter the other repository's pairing code: 1-revenue-icecream
Exchanging pairing data...
Successfully exchanged pairing data. Connecting to peer1...
ok
$ git annex sync peer1 --content
commit
On branch master
nothing to commit, working tree clean
ok
pull peer1
remote: Enumerating objects: 10, done.
remote: Counting objects: 100% (10/10), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 8 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (8/8), done.
From tor-annex::wa3i6wgttmworwli.onion:5162
452db22..a894c60 git-annex -> peer1/git-annex
c0ac431..44ca7f6 master -> peer1/master
Updating c0ac431..44ca7f6
Fast-forward
amazing_file | 1 +
1 file changed, 1 insertion(+)
create mode 120000 amazing_file
ok
(merging peer1/git-annex into git-annex...)
get amazing_file (from peer1...)
(checksum...) ok
</code></pre>
<h2>how to exchange pairing codes</h2>
<p>When pairing with a friend's repository, you have to exchange
pairing codes. How to do this securely?</p>
<p>The pairing codes can only be used once, so it's ok to exchange them in
a way that someone else can access later. However, if someone can overhear
your exchange of codes in real time, they could trick you into pairing
with them.</p>
<p>Here are some suggestions for how to exchange the codes,
with the most secure ways first:</p>
<ul>
<li>In person.</li>
<li>In an encrypted message (gpg signed email, Off The Record (OTR)
conversation, etc).</li>
<li>By a voice phone call.</li>
</ul>
<h2>starting git-annex remotedaemon on boot</h2>
<p>Notice the <code>git annex remotedaemon</code> being run in the above examples.
That command runs the Tor hidden service so that other peers
can connect to your repository over Tor.</p>
<p>So, you may want to arrange for the remotedaemon to be started on boot.
You can do that with a simple cron job:</p>
<pre><code>@reboot cd ~/myannexrepo && git annex remotedaemon
</code></pre>
<p>If you use the git-annex assistant, and have it auto-starting on boot, it
will take care of starting the remotedaemon for you.</p>
<h2>speed of large transfers</h2>
<p>Tor prioritizes security over speed, and the Tor network only has so much
bandwidth to go around. So, distributing large quantities (gigabytes)
of data over Tor may be slow, and should probably be avoided.</p>
<p>One way to avoid sending much data over tor is to set up an encrypted
<a href="http://git-annex.branchable.com/special_remotes/">special remote</a> someplace. git-annex knows that Tor is
rather expensive to use, so if a file is available on a special remote as
well as over Tor, it will download it from the special remote.</p>
<p>You can contribute to the Tor network by
<a href="https://www.torproject.org/getinvolved/relays.html.en">running a Tor relay or bridge</a>.</p>
<h2>onion addresses and authentication</h2>
<p>You don't need to know about this, but it might be helpful to understand
how it works.</p>
<p>git-annex's Tor support uses onion address as the address of a git remote.
You can <code>git pull</code>, push, etc with those onion addresses:</p>
<pre><code>git pull tor-annex::eeaytkuhaupbarfi.onion:4412
git remote add peer1 tor-annex::eeaytkuhaupbarfi.onion:4412
</code></pre>
<p>Onion addresses are semi-public. When you add a remote, they appear in your
<code>.git/config</code> file. For security, there's a second level of authentication
that git-annex uses to make sure that only people you want to can access
your repository over Tor. That takes the form of a long string of numbers
and letters, like "7f53c5b65b8957ef626fd461ceaae8056e3dbc459ae715e4".</p>
<p>The addresses generated by <code>git annex p2p --gen-addresses</code>
combine the onion address with the authentication data.</p>
<p>When you run <code>git annex p2p --link</code>, it sets up a git remote using
the onion address, and it stashes the authentication data away in a file in
<code>.git/annex/creds/</code></p>
<p>When you pair repositories, these addresses are exchanged using
<a href="https://github.com/warner/magic-wormhole">Magic Wormhole</a>.</p>
<h2>security</h2>
<p>Tor hidden services can be quite secure. But this doesn't mean that using
git-annex over Tor is automatically perfectly secure. Here are some things
to consider:</p>
<ul>
<li><p>Anyone who learns the onion address address and authentication data of a peer
can connect to that peer, download the whole history of the git repository,
and any available annexed files. They can also upload new files to the peer,
and even remove annexed files from the peer. So consider ways that the
authentication data of a peer might be exposed.</p></li>
<li><p>While Tor can be used to anonymize who you are, git defaults to including
your name and email address in git commit messages. So if you want an
anonymous git-annex repository, you'll need to configure git not to do
that.</p></li>
<li><p>Using Tor prevents listeners from decrypting your traffic. But, they'll
probably still know you're using Tor. Also, by traffic analysis,
they may be able to guess if you're using git-annex over tor, and even
make guesses about the sizes and types of files that you're exchanging
with peers.</p></li>
<li><p>There have been past attacks on the Tor network that have exposed
who was running Tor hidden services.
<a href="https://blog.torproject.org/blog/tor-security-advisory-relay-early-traffic-confirmation-attack">https://blog.torproject.org/blog/tor-security-advisory-relay-early-traffic-confirmation-attack</a></p></li>
<li><p>An attacker who can connect to the git-annex Tor hidden service, even
without authenticating, can try to perform denial of service attacks.</p></li>
<li><p>Magic wormhole is pretty secure, but the code phrase could be guessed
(unlikely) or intercepted. An attacker gets just one chance to try to enter
the correct code phrase, before pairing finishes. If the attacker
successfully guesses/intercepts both code phrases, they can MITM the
pairing process.</p>
<p>If you don't want to use magic wormhole, you can instead manually generate
addresses with <code>git annex p2p --gen-addresses</code> and send them over an
authenticated, encrypted channel (such as OTR) to a friend to add with
<code>git annex p2p --link</code>. This may be more secure, if you get it right.</p></li>
</ul>
a gui for metadata operationshttp://git-annex.branchable.com/tips/a_gui_for_metadata_operations/2016-12-05T19:34:54Z2016-12-05T19:34:54Z
<p>Hey everyone.</p>
<p>I wrote a GUI for git-annex metadata in Python: <a href="https://github.com/alpernebbi/git-annex-metadata-gui">git-annex-metadata-gui</a>.
It shows the files that are in the current branch (only those in the annex) in the respective folder hierarchy.
The keys that are in the repository, but not in the current branch are also shown in another tab.
You can view, edit or remove fields for individual files with support for multiple values for fields.
There is a file preview for image and text files as well.
I uploaded some screenshots in the repository to show it in action.</p>
<p>While making it, I decided to move the git-annex calls into its own Python package,
which became <a href="https://github.com/alpernebbi/git-annex-adapter">git-annex-adapter</a>.</p>
<p>I hope these can be useful to someone other than myself as well.</p>
Systemd unithttp://git-annex.branchable.com/tips/Systemd_unit/2016-11-10T06:34:43Z2016-11-08T10:10:10Z
<hr />
<hr />
<p><img src="http://git-annex.branchable.com/smileys/alert.png" alt="/!\" /> <strong>THIS PAGE IS A DRAFT</strong> <img src="http://git-annex.branchable.com/smileys/alert.png" alt="/!\" /></p>
<hr />
<hr />
<h2>Introduction</h2>
<p>Systemd is a suite of tools for system and user daemon management.</p>
<p>It can be used as an alternative to XDG autostart files to start the git-annex daemon and optionally the webapp, either at startup (system service) or when an user logs in (user service).</p>
<hr />
<h2>Setup</h2>
<h3>User service</h3>
<p>Sample unit file (<code>/etc/systemd/user/git-annex.service</code>):</p>
<pre><code>[Unit]
Description=git-annex assistant daemon
[Service]
ExecStart=/usr/bin/git-annex assistant --autostart --foreground
Restart=on-failure
[Install]
WantedBy=default.target
</code></pre>
<p>Commands for enabling and starting the service as the current user:</p>
<pre><code>systemctl --user enable git-annex.service
systemctl --user start git-annex.service
</code></pre>
<p>Usually services in <code>default.target</code> start during login. (Note however that they also <em>delay</em> the login process.) However, if you enable "linger" via <code>loginctl</code>, then these services start on boot instead.</p>
<h3>System service</h3>
<p>If for some reason you cannot use <code>systemd --user</code>, the other option is to have system-wide services:</p>
<p>Sample unit file (<code>/etc/systemd/system/git-annex@.service</code>):</p>
<pre><code>[Unit]
Description=git-annex assistant daemon
After=network.target
[Service]
User=%i
ExecStart=/usr/bin/git-annex assistant --autostart --foreground
Restart=on-failure
[Install]
WantedBy=multi-user.target
</code></pre>
<p>Commands for enabling and starting the service as user <code>u</code>:</p>
<pre><code>systemctl enable git-annex@u.service
systemctl start git-annex@u.service
</code></pre>
<hr />
<h2>Considerations</h2>
<h3>Webapp</h3>
<p>The webapp may be started instead of the assistant, by launching the associated command in the unit file. This will run an associated assistant automatically. However, it may also attempt to open a potentially unwanted browser window.</p>
<pre><code>ExecStart=git-annex webapp
</code></pre>
<p><em>TODO: try this.</em></p>
<h3>Encrypted home directory</h3>
<p>Users may store their keyring and repositories in their encrypted home directory mounted at login. This may break a system service running at boot.</p>
<p><em>TODO: try this.</em></p>
<h3>Common daemon</h3>
<p>One daemon may be used to sync repositories for multiple users. For this, it might be helpful to make use of ACLs to access other user directories.</p>
<hr />
<h2>References</h2>
<ul>
<li><a href="https://www.freedesktop.org/software/systemd/man/systemd.unit.html">systemd.unit man page</a></li>
<li><a href="https://wiki.archlinux.org/index.php/Systemd">systemd on ArchWiki</a></li>
<li><a href="https://docs.syncthing.net/users/autostart.html#using-systemd">TL;DR Syncthing with systemd documentation</a></li>
<li><a href="https://github.com/syncthing/syncthing/tree/master/etc/linux-systemd">Syncthing systemd unit files</a></li>
</ul>
git-annex extensions for ranger, the file managerhttp://git-annex.branchable.com/tips/git-annex_extensions_for_ranger__44___the_file_manager/2016-08-11T03:12:02Z2016-08-06T22:17:56Z
<p>If you use <a href="https://github.com/ranger/ranger">ranger</a>, the console-based file manager, you can benefit of its plugin system to get it easily entangled with git-annex. As far as I know, there 2 main types of extensions:</p>
<h2>Custom commands</h2>
<p>ranger lets you <a href="https://github.com/ranger/ranger/wiki/Commands">define custom commands</a> that can be called. Basically you could easily write proxy ranger commands for all git-annex commands you use. <a href="https://github.com/fiatjaf/dotfiles/blob/master/ranger-commands.py">Here</a> have written</p>
<ul>
<li><code>:ga_whereis</code> for outputting whereis information;</li>
<li><code>:ga_set</code> and <code>:ga_tag</code> for metadata changing;</li>
<li><code>:ga_get</code> and <code>:ga_drop</code> for quick fetching from remotes and dropping them from local (really useful).</li>
</ul>
<h2>Linemodes</h2>
<p>The small string of information that shows on each file line, aligned to the right, is called a <code>infostring</code>. <a href="https://github.com/ranger/ranger/wiki/Custom-linemodes">Linemodes in ranger</a> lets you change between its default linemodes or add your own custom linemodes. I have written two git-annex linemodes:</p>
<ul>
<li><a href="https://github.com/fiatjaf/dotfiles/blob/5087963cead99f65afee153be672c8e5e624d638/ranger-plugins/linemode_gitannex.py#L8-L51">git-annex-metadata</a>, which shows tags and metadata fields from git-annex; and</li>
<li><a href="https://github.com/fiatjaf/dotfiles/blob/5087963cead99f65afee153be672c8e5e624d638/ranger-plugins/linemode_gitannex.py#L54-L104">git-annex-whereis</a>, which shows the name of the repositories where each file is (except the current repository, as that should be clear from the ranger colours).</li>
</ul>
<p>To switch linemodes, just type <code>:linemode git-annex-whereis</code> or <code>:linemode git-annex-metadata</code>.</p>
<p>You can also set <code>default_linemode path=/your/annex/path/.* git-annex-whereis</code>, for example, to have that linemode automatically set whenever you browse your git-annex folder on ranger.</p>
<p>Beware of folders with too many files, as this will read output from git-annex for all the files, so ranger can freeze for some seconds.</p>
<hr />
<p>As I didn't have any better place to put the code, eveything here is referenced in my <a href="https://github.com/fiatjaf/dotfiles">dotfiles repository on GitHub</a>. Just copy the two referenced files to your <code>~/.config/ranger/plugins/</code> folder and <code>~/.config/ranger/commands.py</code> file to get this working.</p>
<p>Much other interesting commands and plugins can still be added. Modify this page if you come up with other ideas.</p>
playlist fetchhttp://git-annex.branchable.com/tips/playlist_fetch/2016-03-31T14:56:30Z2016-03-31T14:52:42Z
<p>I have made a small script to fetch a specific set of songs from a
playlist. It just iterates through a <a href="https://en.wikipedia.org/wiki/M3U">M3U</a> playlist and makes sure
that git-annex has a copy of every file in the list.</p>
<p>Sample run:</p>
<pre><code>[1041]anarcat@angela:Music1$ ~/bin/get-playlist -p1 -v ~/playlists/Favoris.m3u
git-annex: Bach/Unknown Album/Concerto for 2 Violins in D.mp3 not found
git-annex: get: 1 failed
git annex failed to get Bach/Unknown Album/Concerto for 2 Violins in D.mp3 (originally espresso/Bach/Unknown Album/Concerto for 2 Violins in D.mp3)
get Groovy Aardvark/Oryctérope/05 - Téléthargique.flac (from marcos...)
SHA256E-s26735079--13c04501b9c6fa5ddda02438484d569f4d3d9b1f0bcdd8740f3b927ab756c968.flac
26,735,079 100% 10.00MB/s 0:00:02 (xfr#1, to-chk=0/1)
(checksum...) ok
Groovy Aardvark/Oryctérope/05 - Téléthargique.flac
[...]
merge git-annex ok
</code></pre>
<p>I use this to synchronize specific playlists to my phone, instead of
the whole music collection, because of the limited space of the
device.</p>
<p>The source is AGPL and available in my
<a href="http://src.anarc.at/scripts.git/blob_plain/HEAD:/get-playlist">personal git repository</a>. Unfortunately, it is written in Python
and can probably not be merged into git-annex, but since it is so
specific, I figured it wouldn't be anyways. -- <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
centralised repository: starting from nothinghttp://git-annex.branchable.com/tips/centralised_repository__58___starting_from_nothing/2016-03-12T16:58:09Z2016-03-12T16:58:09Z
<p>If you are starting from nothing (no existing <code>git</code> or <code>git-annex</code> repository) and want to use a server as a centralised repository, try the following steps.</p>
<p>On the server where you'll hold the "master" repository:</p>
<pre><code>server$ cd /one/git
server$ mkdir m
server$ cd m
server$ git init --bare
Initialized empty Git repository in /one/git/m/
server$ git annex init origin
init origin ok
server$
</code></pre>
<p>Clone that to the laptop:</p>
<pre><code>laptop$ cd /other
laptop$ git clone ssh://server//one/git/m
Cloning into 'm'...
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 5 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (5/5), done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.
laptop$ cd m
laptop$ git annex init laptop
init laptop ok
laptop$
</code></pre>
<p>Add some content:</p>
<pre><code>laptop$ git annex addurl http://kitenet.net/~joey/screencasts/git-annex_coding_in_haskell.ogg
addurl kitenet.net_~joey_screencasts_git-annex_coding_in_haskell.ogg (downloading http://kitenet.net/~joey/screencasts/git-annex_coding_in_haskell.ogg ...) --2011-12-15 08:13:10-- http://kitenet.net/~joey/screencasts/git-annex_coding_in_haskell.ogg
Resolving kitenet.net (kitenet.net)... 2001:41c8:125:49::10, 80.68.85.49
Connecting to kitenet.net (kitenet.net)|2001:41c8:125:49::10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39362757 (38M) [audio/ogg]
Saving to: `/other/m/.git/annex/tmp/URL--http&c%%kitenet.net%~joey%screencasts%git-annex_coding_in_haskell.ogg'
100%[======================================>] 39,362,757 2.31M/s in 17s
2011-12-15 08:13:27 (2.21 MB/s) - `/other/m/.git/annex/tmp/URL--http&c%%kitenet.net%~joey%screencasts%git-annex_coding_in_haskell.ogg' saved [39362757/39362757]
(checksum...) ok
(Recording state in git...)
</code></pre>
<p>Don't forget to commit it:</p>
<pre><code>laptop$ git commit -m 'See Joey play.'
[master (root-commit) 106e923] See Joey play.
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 120000 kitenet.net_~joey_screencasts_git-annex_coding_in_haskell.ogg
laptop$
</code></pre>
<p>All fine, now push it back to the centralised master:</p>
<pre><code>laptop$ git push origin master
Counting objects: 20, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (11/11), done.
Writing objects: 100% (18/18), 1.50 KiB, done.
Total 18 (delta 1), reused 1 (delta 0)
To ssh://server//one/git/m
3ba1386..ad3bc9e git-annex -> git-annex
laptop$
</code></pre>
<p>You'll probably want to use <code>git annex copy --to origin</code> to copy the
annexed file contents to the server. See the <a href="http://git-annex.branchable.com/walkthrough/">walkthrough</a> for details.</p>
<p>You can add more "client" repositories by following the <code>laptop</code>
sequence of operations.</p>
annex.largefiles: configuring mixed content repositorieshttp://git-annex.branchable.com/tips/largefiles/2020-10-26T15:35:32Z2016-02-02T20:51:22Z
<p>Normally commands like <code>git annex add</code> always add files to the annex,
while <code>git add</code> adds files to git.</p>
<p>Let's suppose you're developing a video game, written in C. You have
source code, and some large game assets. You want to ensure the source
code is stored in git -- that's what git's for! And you want to store
the game assets in the git annex -- to avoid bloating your git repos with
possibly enormous files, but still version control them.</p>
<p>You could take care to use <code>git annex add</code> after changes to the assets,
but it would be easy to slip up and <code>git commit -a</code> (which runs <code>git add</code>),
checking your large assets into git. Configuring annex.largefiles
saves you the bother of keeping things straight when adding files.
Once you've told git-annex what files are large, both <code>git annex add</code>
and <code>git add</code>/<code>git commit -a</code> will add the large files to the annex and the
small files to git.</p>
<p>Other commands that use the annex.largefiles configuration include
<code>git annex import</code>, <code>git annex addurl</code>, <code>git annex importfeed</code>, and
the assistant.</p>
<h2>examples</h2>
<p>For example, let's make only files larger than 100 kb be added to the annex,
and never <code>*.c</code> and <code>*.h</code> source code files.</p>
<pre><code>git config annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)'
</code></pre>
<p>That is a local configuration, so will only apply to your clone of the
repository. To set a default that will apply to all clones, unless
overridden, do this instead:</p>
<pre><code>git annex config --set annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)'
</code></pre>
<p>There's one other way to configure the same thing, you can put this in
the <code>.gitattributes</code> file:</p>
<pre><code>* annex.largefiles=largerthan=100kb
*.c annex.largefiles=nothing
*.h annex.largefiles=nothing
</code></pre>
<p>The syntax in .gitattributes is a bit different, because the .gitattributes
matches files itself, and the values of attributes cannot contain spaces.
So using .gitattributes for this is not recommended (but it does work for
older versions of git-annex, where the <code>git annex config</code> setting does
not). Any .gitattributes setting overrides the <code>git annex config</code> setting,
but will be overridden by the <code>git config</code> setting.</p>
<p>Another example. If you wanted <code>git add</code> to put all files the annex
in your local repository:</p>
<pre><code>git config annex.largefiles anything
</code></pre>
<p>Or in all clones:</p>
<pre><code>git annex config --set annex.largefiles anything
</code></pre>
<h2>syntax</h2>
<p>See <a href="http://git-annex.branchable.com/git-annex-matching-expression/">git-annex-matching-expression</a> for details about the syntax.</p>
<h2>gitattributes format</h2>
<p>Here's that example <code>.gitattributes</code> again:</p>
<pre><code>* annex.largefiles=largerthan=100kb
*.c annex.largefiles=nothing
*.h annex.largefiles=nothing
</code></pre>
<p>The way that works is, <code>*.c</code> and <code>*.h</code> files have the annex.largefiles
attribute set to "nothing", and so those files are never treated as large
files. All other files use the other value, which checks the file size.</p>
<p>Since git attribute values cannot contain whitespace, when you need
a more complicated annex.largefiles expression, you can instead
parenthesize the terms of the annex.largefiles attribute.
For example, this is the same as the git config shown earlier, shoehorned
into a single git attribute:</p>
<pre><code>* annex.largefiles=(largerthan=100kb)and(not((include=*.c)or(include=*.h)))
</code></pre>
<p>It's generally a better idea to use <code>git annex config</code> instead.</p>
<h2>temporarily override</h2>
<p>If you've set up an annex.largefiles configuration but want to force a file to
be stored in the annex, you can temporarily override the configuration like
this:</p>
<pre><code>git annex add --force-large smallfile
</code></pre>
<h2>converting git to annexed</h2>
<p>When you have a file that is currently stored in git, and you want to
convert that to be stored in the annex, here's how to accomplish that:</p>
<pre><code>git rm --cached file
git annex add --force-large file
git commit file
</code></pre>
<p>This first removes the file from git's index cache, and then adds it back
using git-annex. You can modify the file before the <code>git-annex add</code> step,
perhaps replacing it with new larger content that necessitates git-annex.</p>
<p>The --force-large option needs git-annex version 7.20200202.7 or newer.</p>
<h2>converting annexed to git</h2>
<p>When you have a file that is currently stored in the annex, and you want to
convert that to be stored in git, here's how to accomplish that:</p>
<pre><code>git annex unlock file
git rm --cached file
git annex add --force-small file
git commit file
</code></pre>
<p>You can modify the file after unlocking it and before adding it to
git. And this is probably a good idea if it was really a big file,
so that you can replace its content with something smaller.</p>
<p>The --force-small option needs git-annex version 7.20200202.7 or newer.</p>
Accessing files in bare remotes without git-annexhttp://git-annex.branchable.com/tips/Accessing_files_in_bare_remotes_without_git-annex/2019-07-17T18:27:04Z2016-01-12T00:32:29Z
<p>git-annex is amazing for my large files backup necessities, but I am a bit scared for the long-term possibility of accessing data, without git-annex.</p>
<p>For this reason I have prepared a small Python tool that accesses the content of bare git-annex repositories. The tool retrieves the locations of a file, and the path to the file in the current annex. This works with the last v7 version of the git-annex format.</p>
<p>This is motivated by the fact that in the master branch the files are stores as links to .git/annex/objects/XX/YY/KEY/KEY (as it is correct for non-bare repos), while in the bare the files are stored in annex/objecst/ZZ/WW/KEY/KEY.</p>
<p>The source code is available here: <a href="https://gist.github.com/eruffaldi/924f6b53a63dede6e59f">https://gist.github.com/eruffaldi/924f6b53a63dede6e59f</a></p>
Decrypting files in special remotes without git-annexhttp://git-annex.branchable.com/tips/Decrypting_files_in_special_remotes_without_git-annex/2016-01-08T07:38:00Z2016-01-08T07:38:00Z
<p>One of the selling points of <code>git-annex</code> is that it uses standard tools like <code>git</code> and <code>gpg</code> to deal with files, so that years from now it should be possible to explore and get useful data out of an old annex repository (this helps with <a href="http://git-annex.branchable.com/future_proofing/">future proofing</a>). If for whatever reason you need to decrypt files on <a href="http://git-annex.branchable.com/special_remotes/">special remotes</a> that use <a href="http://git-annex.branchable.com/encryption/">encryption</a> without using <code>git-annex</code>, this can be done fairly easily using <code>gpg</code> (and <code>openssl</code> to compute the HMAC keys used to create the file names used on the special remote so you can look up the right file to decrypt). Here is an example script demonstrating how to compute the special remote file names and how to decrypt the special remote files.</p>
<pre><code>#!/usr/bin/env bash
usage() {
echo "Usage: ga_decrypt.sh -r REMOTE [-k SYMLINK] [-d FILE]"
echo ""
echo " Either lookups up key on REMOTE for annex file linked with SYMLINK"
echo " or decrypts FILE encrypted for REMOTE."
echo ""
echo " -r: REMOTE is special remote to use"
echo " -k: SYMLINK is symlink in annex to print encrypted special remote key for"
echo " -d: FILE is path to special remote file to decrypt to STDOUT"
echo ""
echo "NOTES: "
echo " * Run in an indirect git annex repo."
echo " * Must specify -k or -d."
echo " * -k prints the key including the leading directory names used for a "
echo " directory remote (even if REMOTE is not a directory remote)"
echo " * -d works on a locally accessible file. It does not fetch a remote file"
echo " * Must have gpg and openssl"
}
decrypt_cipher() {
cipher="$1"
echo "$(echo -n "$cipher" | base64 -d | gpg --decrypt --quiet)"
}
lookup_key() {
encryption="$1"
cipher="$2"
symlink="$3"
if [ "$encryption" == "hybrid" ] || [ "$encryption" == "pubkey" ]; then
cipher="$(decrypt_cipher "$cipher")"
fi
# Pull out MAC cipher from beginning of cipher
if [ "$encryption" = "hybrid" ] ; then
cipher="$(echo -n "$cipher" | head -c 256 )"
elif [ "$encryption" = "shared" ] ; then
cipher="$(echo -n "$cipher" | base64 -d | tr -d '\n' | head -c 256 )"
elif [ "$encryption" = "pubkey" ] ; then
# pubkey cipher includes a trailing newline which was stripped in
# decrypt_cipher process substitution step above
IFS= read -rd '' cipher < <( printf "$cipher\n" )
fi
annex_key="$(basename "$(readlink "$symlink")")"
hash="$(echo -n "$annex_key" | openssl dgst -sha1 -hmac "$cipher" | sed 's/(stdin)= //')"
key="GPGHMACSHA1--$hash"
checksum="$(echo -n $key | md5sum)"
echo "${checksum:0:3}/${checksum:3:3}/$key"
}
decrypt_file() {
encryption="$1"
cipher="$2"
file_path="$3"
if [ "$encryption" = "pubkey" ] ; then
gpg --quiet --decrypt "${file_path}"
else
if [ "$encryption" = "hybrid" ] ; then
cipher="$(decrypt_cipher "$cipher" | tail -c +257)"
elif [ "$encryption" = "shared" ] ; then
cipher="$(echo -n "$cipher" | base64 -d | tr -d '\n' | tail -c +257 )"
fi
gpg --quiet --batch --passphrase "$cipher" --output - "${file_path}"
fi
}
main() {
OPTIND=1
mode=""
remote=""
while getopts "r:k:d:" opt; do
case "$opt" in
r) remote="$OPTARG"
;;
k) if [ -z "$mode" ] ; then
mode="lookup key"
else
usage
exit 2
fi
symlink="$OPTARG"
;;
d) if [ -z "$mode" ] ; then
mode="decrypt file"
else
usage
exit 2
fi
file_path="$OPTARG"
;;
esac
done
if [ -z "$mode" ] || [ -z "$remote" ] ; then
usage
exit 2
fi
shift $((OPTIND-1))
# Pull out config for desired remote name
remote_config="$(git show git-annex:remote.log | grep 'name='"$remote ")"
# Get encryption type and cipher from config
encryption="$(echo "$remote_config" | grep -oP 'encryption\=.*? ' | tr -d ' \n' | sed 's/encryption=//')"
cipher="$(echo "$remote_config" | grep -oP 'cipher\=.*? ' | tr -d ' \n' | sed 's/cipher=//')"
if [ "$mode" = "lookup key" ] ; then
lookup_key "$encryption" "$cipher" "$symlink"
elif [ "$mode" = "decrypt file" ] ; then
decrypt_file "$encryption" "$cipher" "${file_path}"
fi
}
main "$@"
</code></pre>
unlocked fileshttp://git-annex.branchable.com/tips/unlocked_files/2023-10-23T17:56:24Z2015-12-27T21:18:51Z
<p>Normally, git-annex stores annexed files in the repository, locked down,
which prevents the content of the file from being modified.
That's a good thing, because it might be the only copy, you wouldn't
want to lose it in a fumblefingered mistake.</p>
<pre><code># git annex add some_file
add some_file
# echo oops > some_file
bash: some_file: Permission denied
</code></pre>
<p>Sometimes though you want to modify a file. Maybe once, or maybe
repeatedly. To support this, git-annex also supports unlocked files.
They are stored in the git repository differently, and they appear as
regular files in the working tree, instead of the symbolic links used for
locked files.</p>
<h2>using unlocked files</h2>
<p>You can unlock any annexed file:</p>
<pre><code># git annex unlock my_cool_big_file
</code></pre>
<p>That changes what's stored in git between a git-annex symlink
(locked) and a git-annex pointer file (unlocked). You can commit
the change, if you want that file to be unlocked in other clones of the
repository. To lock the file again, use <code>git annex lock</code>.</p>
<p>The nice thing about an unlocked file is that you can modify it
in place -- it's a regular file. And you can commit your changes.</p>
<pre><code># echo more stuff >> my_cool_big_file
# git commit -a -m "some changes"
[master 196c0e2] some changes
1 files changed, 1 insertion(+), 1 deletion(-)
</code></pre>
<p>Notice that <code>git commit -a</code> added the new content of the file to the annex,
and only committed a change to the pointer. That happened because git-annex
knows this was an annexed file before. Git leaves the file unlocked, so
you can continue to make modifications to it.</p>
<p>By default, using git to add a file that has not been annexed before will
still add its contents to git, not to the annex. If you tell git-annex what
files are large, it will arrange for the large files to be added to the
annex, and the small ones to be added to git. This is done by configuring
annex.largefiles. See <a href="http://git-annex.branchable.com/tips/largefiles/">largefiles</a> for full documentation of that.</p>
<p>All the regular git-annex commands (find, get, drop, etc) can be used on
unlocked files as well as locked files. When you drop the content of
an unlocked file, it will be replaced by a pointer file, which
looks like "/annex/objects/...". So if you open a file and see
that, you'll need to use <code>git annex get</code>.</p>
<p>Under the hood, unlocked files use git's smudge/clean filter interface,
and git-annex converts between the content of the big file and a pointer
file, which is what gets committed to git.</p>
<div class="notebox">
<p>By default, git-annex commands will add files in locked mode,
unless used on a filesystem that does not support symlinks, when unlocked
mode is used. To make them always use unlocked mode, run:
<code>git config annex.addunlocked true</code><br />
<code>git add</code> always adds files in unlocked mode.</p>
</div>
<h2>adjusted branches</h2>
<p>If you want to mostly keep files locked, but be able to locally switch
to having them all unlocked, you can do so using <code>git annex adjust
--unlock</code>. See <a href="http://git-annex.branchable.com/git-annex-adjust/">git-annex-adjust</a> for details. This is particularly
useful when using filesystems like FAT, and OS's like Windows that don't
support symlinks. Indeed, <code>git-annex init</code> detects such filesystems and
automatically sets up a repository to use all unlocked files.</p>
<h2>finding unlocked files</h2>
<p>While it's easy to see when a file is a git-annex symlink, unlocked files
look the same as files stored in git. To see what files are unlocked or
locked, many git-annex commands support <code>--unlocked</code> and <code>--locked</code>
options.</p>
<pre><code>git annex find --unlocked
</code></pre>
<h2>imperfections</h2>
<p>Unlocked files mostly work very well, but there are a
few imperfections which you should be aware of when using them.</p>
<ol>
<li><p><code>git stash</code>, <code>git cherry-pick</code> and <code>git reset --hard</code> don't update
the working tree with the content of unlocked files. The files
will contain pointers, the same as if the content was not in the
repository. So after running these commands, you will need to manually
run <code>git annex smudge --update</code>.</p></li>
<li><p>When git-annex is running a command that gets or drops the content
of an unlocked file, git's index will briefly be locked, which might
prevent you from running a <code>git commit</code> at the same time.</p></li>
<li><p>Conversely, if you have a git commit in progress, running git-annex may
complain that the index is locked, though this will not prevent it from
working.</p></li>
<li><p>When an operation such as a checkout or merge needs to update a large
number of unlocked files, it can become slow. So can be <code>git add</code> of
a large number of files (<code>git annex add</code> is faster).</p></li>
</ol>
<p>(The technical reasons behind these imperfections are explained in
detail in <a href="http://git-annex.branchable.com/todo/git_smudge_clean_interface_suboptiomal/">git smudge clean interface suboptiomal</a>.)</p>
<h2>using less disk space</h2>
<p>Unlocked files are handy, but they have one significant disadvantage
compared with locked files: On most filesystems, they use more disk space.</p>
<p>While only one copy of a locked file has to be stored, often
two copies of an unlocked file are stored on disk. One copy is in
the git work tree, where you can use and modify it,
and the other is stashed away in <code>.git/annex/objects</code> (see <a href="http://git-annex.branchable.com/internals/">internals</a>).</p>
<p>The reason for that second copy is to preserve the old version of the file,
when you modify the unlocked file in the work tree. Being able to access
old versions of files is an important part of git after all!</p>
<p>(Some filesystems including btrfs and xfs support reflinks, and on those,
the extra copy is a reflink, and takes up no additional space.)</p>
<p>So two copies is a good safe default. But there are ways to use git-annex that
make the second copy not be worth keeping:</p>
<ul>
<li>When you're using git-annex to sync the current version of files across
devices, and don't care much about previous versions.</li>
<li>When you have set up a backup repository, and use git-annex to copy
your files to the backup.</li>
</ul>
<p>In situations like these, you may want to avoid the overhead of the second
local copy of unlocked files. There's a config setting for that.</p>
<div class="notebox">
<p>Note that setting annex.thin only has any effect on systems that support
hard links. It is supported on Windows, but not on FAT filesystems.</p>
</div>
<pre><code>git config annex.thin true
</code></pre>
<p>After changing annex.thin, you'll want to fix up the work tree to
match the new setting:</p>
<pre><code>git annex fix
</code></pre>
<div class="notebox">
<p>When a <a href="http://git-annex.branchable.com/direct_mode/">direct mode</a> repository is upgraded, annex.thin is automatically
set, because direct mode made the same single-copy tradeoff.</p>
</div>
<p>Setting annex.thin can save a lot of disk space, but it's a tradeoff
between disk usage and safety.</p>
<p>Keeping files locked is safer and also avoids using unnecessary
disk space, but trades off easy modification of files.</p>
<p>Pick the tradeoff that's right for you.</p>
get git-annex-shell into PATHhttp://git-annex.branchable.com/tips/get_git-annex-shell_into_PATH/2015-08-09T22:09:28Z2015-08-09T22:09:28Z
<p>The <a href="http://git-annex.branchable.com/git-annex-shell/">git-annex-shell</a> program is a part of git-annex that is used when
accessing a git-annex repository on a remote server. The client runs
something like "ssh server git-annex-shell". For this to work,
git-annex-shell needs to be installed in PATH.</p>
<p>If you install git-annex on your server as root, using a distribution's
package manager, like apt-get, or otherwise installing it into /usr/bin, or
/usr/local/bin, then git-annex-shell will be in PATH, and you'll not have
any trouble (and can stop reading here).</p>
<p>But, if you need to install git-annex on a server without being root,
it can be tricky to get it into PATH. The bash shell doesn't source all of
its config files when ssh uses it to run a non-interactive command like
git-annex-shell, so even if git-annex-shell seems to be in PATH when you're
logged onto the server, "ssh server git-annex-shell" won't find it.</p>
<pre><code>bash: git-annex-shell: command not found; failed; exit code 127
</code></pre>
<hr />
<p>In some systems (when it's compiled with <code>SSH_SOURCE_BASHRC</code> set), bash will
load your <code>~/.bashrc</code> (but not your <code>~/.bash_profile</code>). So you can add to
PATH in the .bashrc.</p>
<p>Note that many .bashrc files start with something like this:</p>
<pre><code># If not running interactively, don't do anything
[ -z "$PS1" ] && return
</code></pre>
<p>So, make sure to make any PATH changes before such a guard. For example:</p>
<pre><code>PATH=$HOME/bin/:$PATH
# If not running interactively, don't do anything else
[ -z "$PS1" ] && return
</code></pre>
<hr />
<p>In some systems, bash won't load <em>any</em> config files at all.
A few ways to deal with that:</p>
<ul>
<li><p>Move or symlink git-annex-shell into a directory like
/usr/bin, that is in the default PATH.</p></li>
<li><p>If you're not root, ask the system administrator to please install
git-annex system-wide.</p></li>
<li><p>As a last resort, you can configure the git repository that's using
the server to know where git-annex shell is installed, by configuring
<code>remote.<name>.annex-shell</code></p>
<p>For example, if git-annex-shell is installed in ~/bin/git-annex-shell
on the server, and the git remote named "annoyingserver" uses the server:</p>
<p> git config remote.annoyingserver.annex-shell /home/me/bin/git-annex-shell</p></li>
</ul>
git-annex on NFShttp://git-annex.branchable.com/tips/git-annex_on_NFS/2015-07-17T22:14:23Z2015-07-17T22:14:23Z
<p>There are multiple issues that have been reported that are related to using git-annex on networked file systems. We're generally talking about NFS, which we'll cover here, but this may also be the case on SMB filesystems.</p>
<h1>Locking issues</h1>
<p>Here is the prior art here:</p>
<ul>
<li><a href="http://git-annex.branchable.com/devblog/day_27__locking_fun/">day 27 locking fun</a></li>
<li><a href="http://git-annex.branchable.com/devblog/day_286-287__rotten_locks/">day 286-287 rotten locks</a></li>
<li><a href="http://git-annex.branchable.com/forum/Can__39__t_init_git_annex/">Can't init git annex</a></li>
<li><a href="http://git-annex.branchable.com/bugs/git-annex_merge_stalls/">git-annex merge stalls</a></li>
</ul>
<p>All of those issues but the first are related to locking on NFS filesystems, which is <a href="https://en.wikipedia.org/wiki/File_locking#Problems">notoriously bad</a>. However, the problems with it are not insurmountable and git-annex can actually be used, even if unreliably, on NFS filesystems.</p>
<p>The problem I mainly hit with NFS filesystems is with unreliable locking. If you have similar platforms (both running Linux for example, NFS locking doesn't work in BSD systems), locking <em>should</em> work, but sometimes fails without reason. This problem and the solution is well described in <a href="http://serverfault.com/a/455080">this stackoverflow answer</a>, taken from <a href="http://sophiedogg.com/lockd-and-statd-nfs-errors/">this excellent blog</a>. Basically, you need to restart a bunch of NFS daemon that get stuck on the server side and then locking works again. This generally fixed it for me:</p>
<pre>
service nfs-kernel-server stop
service rpcbind stop
service nfs-common stop
service rpcbind start
service nfs-common start
service nfs-kernel-server start
</pre>
<p>This needs to be run as root on the server side. Having a simple test script to see if locking works is also useful, i use the following:</p>
<pre>
#! /usr/bin/perl -w
use Fcntl qw(LOCK_SH LOCK_EX LOCK_UN);
$child = fork();
open(TESTLCK, ">testlock");
if ($child == 0) { # in child
print "locking exclusively\n";
flock(TESTLCK, LOCK_EX) || die "failed to lock exclusively: $!";
print "holding exclusively lock for 3 seconds\n";
sleep 3;
flock(TESTLCK, LOCK_UN) || die "failed to unlock exclusively: $!";
print "done locking exclusively\n";
} else { # in parent
print "locking shared\n";
flock(TESTLCK, LOCK_SH) || die "failed to lock shared: $!";
print "holding shared lock for 3 seconds\n";
sleep 3;
flock(TESTLCK, LOCK_UN) || die "failed to unlock shared: $!";
print "done locking shared, waiting for child to finish\n";
wait;
}
</pre>
<p>Also note that the <a href="http://nfs.sourceforge.net/">NFS FAQ</a> (currently offline, thanks to Sourceforge, see <a href="https://archive.is/QMMO">this archive</a>) also has interesting snippets about NFS locking. In short: it's a mess, but it can be worked around! -- <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
<h1>Socket issues</h1>
<p>Another thing that may fail is the "ssh caching code". Examples:</p>
<ul>
<li><a href="http://git-annex.branchable.com/forum/git_annex_sync_dies___40__sometimes__41__/">git annex sync dies (sometimes)</a></li>
<li><a href="http://git-annex.branchable.com/forum/NTFS_usb_on_linux_unable_to_connect_to_ssh_remote/">NTFS usb on linux unable to connect to ssh remote</a></li>
<li><span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fgit-annex_on_NFS&page=todo%2Fgit-annex_ignores_GIT__95__SSH" rel="nofollow">?</a>git-annex ignores GIT_SSH</span></li>
<li><span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fgit-annex_on_NFS&page=bugs%2Fgit-annex-shell_doesn__39__t_work_as_expected" rel="nofollow">?</a>git-annex-shell doesn't work as expected</span></li>
</ul>
<p>As you can see, this affects way more than NFS, which often just works there. But it can be that the SSH client can't create a socket for the SSH multiplexing that git-annex uses. Normally, git-annex should detect that and fallback properly, but sometimes this fails, especially with older versions of git-annex. A workaround is to disable the feature:</p>
<pre><code>git config annex.sshcaching false
</code></pre>
<p>The tradeoff is that syncs are faster, but it works. -- <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
<h1>Stray files issue</h1>
<p>This is a completely different issue, but could be related to file locking: <span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fgit-annex_on_NFS&page=bugs%2Fhuge_multiple_copies_of___39__.nfs__42____39___and___39__.panfs__42____39___being_created" rel="nofollow">?</a>huge multiple copies of '.nfs*' and '.panfs*' being created</span>. Basically, tons of files are left behind by git-annex when it is ran on an NFS server. It is yet unclear how this problem happens and how to resolve it. But it has been reproduced and could affect you, so until it is resolved, it is still an open issue here... -- <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
Repositories with large number of fileshttp://git-annex.branchable.com/tips/Repositories_with_large_number_of_files/2017-04-24T13:40:59Z2015-06-17T08:28:13Z
<p>Just as git does not scale well with large files, it can also become painful to work with when you have a large <em>number</em> of files. Below are things I have found to minimise the pain.</p>
<h1>Using version 4 index files</h1>
<p>During operations which affect the index, git writes an entirely new index out to index.lck and then replaces .git/index with it. With a large number of files, this index file can be quite large and take several seconds to write every time you manipulate the index!</p>
<p>This can be mitigated by changing it to version 4 which uses path compression to reduce the filesize:</p>
<pre><code>git update-index --index-version 4
</code></pre>
<p><em>NOTE: The git documentation warns that this version may not be supported by other git implementations like JGit and libgit2.</em></p>
<p>Personally, I saw a reduction from 516MB to 206MB (<em>40% of original size</em>) and got a much more responsive git!</p>
<p>It may also be worth doing the same to git-annex's index:</p>
<pre><code>GIT_INDEX_FILE=.git/annex/index git update-index --index-version 4
</code></pre>
<p>Though I didn't gain as much here with 89MB to 86MB (96% of original size).</p>
<h1>Packing</h1>
<p>As I have gc disabled:</p>
<pre><code>git config gc.auto 0
</code></pre>
<p>so I control when it is run, I ended up with a lot of loose objects which also cause slowness in git. Using</p>
<pre><code>git count-objects
</code></pre>
<p>to tell me how many loose objects I have, when I reach a threshold (~25000), I pack those loose objects and clean things up:</p>
<pre><code>git repack -d
git gc
git prune
</code></pre>
<h1>File count per directory</h1>
<p>If it takes a long time to list the files in a directory, naturally, git(-annex) will be affected by this bottleneck.</p>
<p>You can avoid this by keeping the number of files in a directory to between 5000 and 20000 (depends on the filesystem and its settings).</p>
<p><a href="https://sourceforge.net/projects/fpart/">fpart</a> can be a very useful tool to achieve this.</p>
<p>This sort of usage was discussed in <a href="http://git-annex.branchable.com/forum/Handling_a_large_number_of_files/">Handling a large number of files</a> and <a href="http://git-annex.branchable.com/forum/__34__git_annex_sync__34___synced_after_8_hours/">"git annex sync" synced after 8 hours</a>. -- <span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2FRepositories_with_large_number_of_files&page=CandyAngel" rel="nofollow">?</a>CandyAngel</span></p>
<h1>Forget tracking information</h1>
<p>In addition to keeping track of where files are, git-annex keeps a <em>log</em> that keeps track of where files <em>were</em>. This can take up space as well and slow down certain operations.</p>
<p>You can use the <a href="http://git-annex.branchable.com/git-annex-forget/">git-annex-forget</a> command to drop historical location tracking info for files.</p>
<p>Note: this was discussed in <a href="http://git-annex.branchable.com/forum/scalability_with_lots_of_files/">scalability with lots of files</a>. -- <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
public Amazon S3 remotehttp://git-annex.branchable.com/tips/public_Amazon_S3_remote/2018-07-02T16:35:04Z2015-06-05T20:39:24Z
<p>Here's how to create a Amazon <a href="http://git-annex.branchable.com/special_remotes/S3/">S3 special remote</a> that
can be read by anyone who gets a clone of your git-annex repository,
without them needing Amazon AWS credentials.</p>
<p>If you want to publish files to S3 so they can be accessed without using
git-annex, see <a href="http://git-annex.branchable.com/tips/publishing_your_files_to_the_public/">publishing your files to the public</a>.</p>
<p>Note: Bear in mind that Amazon will charge the owner of the bucket
for public downloads from that bucket.</p>
<h2>create public remote</h2>
<p>First, export your Amazon AWS credentials:</p>
<pre><code># export AWS_ACCESS_KEY_ID="08TJMT99S3511WOZEP91"
# export AWS_SECRET_ACCESS_KEY="s3kr1t"
</code></pre>
<p>Now, create the remote:</p>
<pre><code># git annex initremote pubs3 type=S3 encryption=none public=yes
initremote pubs3 (checking bucket) (creating bucket in US) ok
</code></pre>
<p>The public=yes makes git-annex set a S3 ACL so the files in the bucket
are publically readable. For git-annex to be able to download files from
that bucket without needing your AWS credentials, you then need to tell
it the url of the bucket. Find that url, and run:</p>
<pre><code># git annex enableremote pubs3 publicurl=...
</code></pre>
<p>In the above example, no encryption was used, but it will also work
if you enable encryption=shared. Then files will be encrypted on S3, and
anyone with a clone of the git repository will be able to download and
decrypt them.</p>
<p>It's also ok to enable chunking when setting up the remote.</p>
<p>Now, copy some files to the remote, in the usual way, and push your
git repository to someplace where someone else can access it.</p>
<h2>use public remote</h2>
<p>Once the S3 remote is set up, anyone who can clone the git repositry
can get files from the remote, without needing any Amazon AWS credentials.</p>
<p>Start by checking out the git repository.</p>
<p>In the checkout, enable the S3 remote:</p>
<pre><code># git annex enableremote pubs3
enableremote pubs3 ok
</code></pre>
<p>Now, git-annex can be used as usual to download files from that remote.</p>
<h2>sharing urls</h2>
<p>You can also share urls to files stored in a public S3 remote to people
who are not using git-annex. To find the url, use <code>git annex whereis</code>.</p>
<hr />
<p>See <a href="http://git-annex.branchable.com/special_remotes/S3/">S3</a> for details about configuring S3 remotes.</p>
disabling a special remotehttp://git-annex.branchable.com/tips/disabling_a_special_remote/2020-06-17T01:18:32Z2015-05-31T16:26:23Z
<p>In our quest to find dumb replacements for <span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fdisabling_a_special_remote&page=todo%2Fwishlist__58_____39__get__39___queue_and_schedule." rel="nofollow">?</a>wishlist: 'get' queue and schedule.</span> (and a more complete <a href="http://git-annex.branchable.com/git-annex-schedule/">git-annex-schedule</a>), we have setup a cronjob that would start and stop the assistant during certain time windows, to ensure that it would not download during prime bandwidth time.</p>
<p>But that isn't exactly what we're looking for: we would like the assistant to continue doing its usual thing of adding and removing files, and even syncing the git branches. Just not get/move files around.</p>
<p>One way I thought of doing this was to disable a remote locally. <a href="http://git-annex.branchable.com/git-annex-dead/">git-annex-dead</a> of course comes to mind, but that applies to all repositories, so it's not an option. If the remote was in git, i could just <code>git remote rm origin</code> and <code>git remote add origin</code> and be done with it, but this is the <em>web</em> remote, so it doesn't even show up in <code>git remote -v</code>.</p>
<p>But this doesn't work with <a href="http://git-annex.branchable.com/special_remotes/">special remotes</a>. Another solution is
simply to use the <code>remote.name.annex-ignore</code> configuration documented
in the main <a href="http://git-annex.branchable.com/git-annex/">git-annex</a> manpage. For example, to disable the web
remote, you would use:</p>
<pre><code>git config remote.web.annex-ignore true
</code></pre>
<p>The result would be:</p>
<pre><code>joey@darkstar:~/tmp/a>git annex addurl --fast http://localhost
addurl localhost ok
(recording state in git...)
joey@darkstar:~/tmp/a>git config remote.web.annex-ignore true
joey@darkstar:~/tmp/a>git annex get localhost
get localhost (not available)
Try making some of these repositories available:
00000000-0000-0000-0000-000000000001 -- web
(Note that these git remotes have annex-ignore set: web)
failed
git-annex: get: 1 failed
joey@darkstar:~/tmp/a>git config remote.web.annex-ignore false
joey@darkstar:~/tmp/a>git annex get localhost
get localhost (from web...)
/home/joey/tmp/a/.g 100%[=====================>] 10 --.-KB/s in 0s
ok
</code></pre>
<p>The assistant (probably?) needs to be restarted for those changes to
take effect. --<a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a> and <span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fdisabling_a_special_remote&page=joeyh" rel="nofollow">?</a>joeyh</span>.</p>
transmission integrationhttp://git-annex.branchable.com/tips/transmission_integration/2015-01-05T20:12:56Z2015-01-05T20:12:56Z
<p><a href="http://git-annex.branchable.com/tips/transmission_integration/transmission_integration.sh/">This simple script</a> will make sure files downloaded by the
<a href="https://www.transmissionbt.com/">Transmission BitTorrent client</a> will
be added into git-annex.</p>
<p>To enable it, install it to /usr/local/bin and add the following to
your settings.json:</p>
<pre><code>"script-torrent-done-enabled": true,
"script-torrent-done-filename": "/usr/local/bin/transmission-git-annex-add",
</code></pre>
<p>-- <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
publishing your files to the publichttp://git-annex.branchable.com/tips/publishing_your_files_to_the_public/2020-11-25T20:31:22Z2014-11-28T18:00:45Z
<p>You have a git-annex repository, and you want to publish the files
in it to the public. One way is to
<a href="http://git-annex.branchable.com/tips/setup_a_public_repository_on_a_web_site/">setup a public repository on a web site</a>, but perhaps you don't have a
web server that can run git-annex, and you just want to publish the current
files, not the whole git-annex repository.</p>
<p>The <a href="http://git-annex.branchable.com/git-annex-export/">git-annex export</a> command is the solution. It lets
a tree of files from your git-annex repository be published to Amazon
<a href="http://git-annex.branchable.com/special_remotes/S3/">S3</a>, as well as other types of special remotes like
<a href="http://git-annex.branchable.com/special_remotes/webdav/">webdav</a> and <a href="http://git-annex.branchable.com/special_remotes/directory/">directory</a>.</p>
<h1>Publishing to Amazon S3</h1>
<p>Let's create a bucket in Amazon S3 named $BUCKET and a special remote
named public-s3. Exporting has to be enabled when setting up a special
remote for the first time.</p>
<p>Set up your special <a href="http://git-annex.branchable.com/special_remotes/S3/">S3 remote</a> with (at least) these options:</p>
<pre><code>git annex initremote public-s3 type=S3 encryption=none bucket=$BUCKET exporttree=yes public=yes
</code></pre>
<p>Be sure to replace $BUCKET with something like
"public-bucket-joey" when you follow along in your shell.</p>
<div class="notebox">
<p>Want to only export files in a subdirectory of the master branch?
Use <code>master:subdir</code>.</p>
<p>Any git treeish can be used with the export command, so you can also
export tags, etc.</p>
</div>
<p>Then export the files in the master branch to the remote:</p>
<pre><code>git annex export master --to public-s3
</code></pre>
<p>Each exported file will be available to the public at
<code>http://$BUCKET.s3.amazonaws.com/$FILE</code></p>
<p>Note: Bear in mind that Amazon will charge the owner of the bucket
for public downloads from that bucket.</p>
<h1>Using <code>git-annex sync --content</code></h1>
<p>So far, the current contents of the master branch have been exported to
public-s3, and to update the export when the branch changes, you have to
remember to run <code>git annex export</code> again.</p>
<p>If you use a <code>git annex sync</code> workflow, it's useful to configure
it to also export changes to the remote. This is done by setting
the remote's <code>annex-tracking-branch</code> configuration:</p>
<pre><code>git config remote.public-s3.annex-tracking-branch master
</code></pre>
<p>That tells git-annex that the export should track changes to master.
When you run <code>git annex sync --content</code>, it will update all tracking
exports. The git-annex assistant also automatically updates tracking
exports.</p>
<p>Want to only export files in a subdirectory of the master branch?</p>
<pre><code>git config remote.public-s3.annex-tracking-branch master:subdir
</code></pre>
<h1>Amazon S3 indexes</h1>
<p>By default, there is no index.html file exported, so if you open
<code>http://$BUCKET.s3.amazonaws.com/</code> in a web browser, you'll see an
XML document listing the files.</p>
<p>For a nicer list of files, you can make an index.html file, check it into
git, and export it to the bucket. You'll need to configure the bucket to
use index.html as its index document, as
<a href="https://stackoverflow.com/questions/27899/is-there-a-way-to-have-index-html-functionality-with-content-hosted-on-s3">explained here</a>.</p>
<h1>Old method</h1>
<p>To use <code>git annex export</code>, you need git-annex version 6.20170909 or
newer. Before we had <code>git annex export</code> an <a href="http://git-annex.branchable.com/tips/publishing_your_files_to_the_public/old_method/">old method</a> was used instead.</p>
deleting unwanted fileshttp://git-annex.branchable.com/tips/deleting_unwanted_files/2020-06-17T01:18:32Z2014-10-13T22:15:12Z
<p>It's quite hard to delete a file from a git repository once it's checked in and pushed to origin. This is normally ok, since git repositories contain mostly small files, and a good thing since losing hard work stinks.</p>
<p>With git-annex this changes some: Very large files can be managed with git-annex, and it's not uncommon to be done with such a file and want to delete it. So, git-annex provides a number of ways to handle this, while still trying to avoid accidental foot shooting that would lose the last copy of an important file.</p>
<h2>the garbage collecting method</h2>
<p>In this method, you just remove annexed files whenever you want, and commit the changes. This is probably the most natural way to go.</p>
<p>You can do this the same way you would in a regular git repository. For example, <code>git rm foo; git commit -m "removed foo"</code>. This leaves the contents of the files still in the annex, not really deleted yet.</p>
<p>Either way, deleting files can leave some garbage lying around in either the local repository, or other repositories that contained a copy of the content of the file you deleted. Eventually you'll want to free up some disk space used by one of these repositories, and then it's time to take out the garbage.</p>
<p>To collect the garbage, you can run <code>git annex unused</code> inside the repository which you want to slim down. That will list files stored in the annex that are not used by any git branches or tags. Followed by <code>git annex dropunused 1-10</code> to delete a range of the unused files from the annex.</p>
<p>In recent versions of git-annex, <code>git annex dropunused</code> checks that enough other copies of a file's content exist in other repositories before deleting it, so this won't ever delete the last copy of some file. This is a good default, because these unused files are still referred to by some commits in the git history, and you might want to retain the full history of every version of a file.</p>
<p>But, let's say you don't care about that, you only want to keep files that are in use by branches and tags. Then you can use <code>git annex dropunused --force</code> with a range of files, which will delete them even if it's the last copy.</p>
<p>Finally, sometimes you want to remove unused files from a special remote. To accomplish this, pass <code>--from remotename</code> to the unused and dropunused commands, and they will act on
files stored in that remote, rather than on the local repository.</p>
<h2>let the assistant take care of it</h2>
<p>If you're using the git-annex assistant, you don't normally need to worry about this. Just delete files however you normally would. The assistant will try to migrate unused file contents away from your local repository and store them in whatever backup repositories you've set up.</p>
<h2>delete all the copies method</h2>
<p>You have a file. You want that file to immediately vanish from the face of the earth to the best of your abilities.</p>
<p>Note that, since git-annex deduplicates files by default, any files with
the same content will be removed by these commands.</p>
<ol>
<li><code>git annex drop --force file</code></li>
<li><code>git annex whereis file</code></li>
<li><code>git annex drop --force file --from $repo</code> repeat for each repository listed by the whereis command</li>
<li><code>rm file; git annex sync</code></li>
</ol>
<p>Of course, if you have offline backup repositories that contain this file, you'll have to bring them online before you can drop it from them, etc.</p>
dumb metadata extraction from xbmchttp://git-annex.branchable.com/tips/dumb_metadata_extraction_from_xbmc/2014-10-04T23:30:40Z2014-08-10T23:32:46Z
<p>I wanted to get the list of movies I haven't seen yet in XBMC, and i'm lazy. So I'll use <a href="http://git-annex.branchable.com/metadata/">metadata</a> to be able to extract those movies only, for the road for example.</p>
<p>First I fiddled around with shell scripts to extract the list of those films, which in XBMC-speak means that have a <code>NULL playCount</code>. Since there are two ways that XMBC can represent those files (in a <code>stack://</code> if there is multiple files for the movie or not), there are two scripts. For "stacked" movies:</p>
<pre><code>echo 'SELECT files.strFileName FROM movie JOIN files ON files.idFile=movie.idFile JOIN path ON path.idPath=files.idPath WHERE playCount IS NULL AND files.strFileName LIKE "stack://%";' | sqlite3 /home/video/.xbmc/userdata/Database/MyVideos75.db | sed "s#stack://##;s/, /\n/g" | sed "s#/home/media/video/##"
</code></pre>
<p>And the rest:</p>
<pre><code>echo 'SELECT path.strPath || files.strFileName FROM movie JOIN files ON files.idFile=movie.idFile JOIN path ON path.idPath=files.idPath WHERE playCount IS NULL AND files.strFileName NOT LIKE "stack://%";' | sqlite3 /home/video/.xbmc/userdata/Database/MyVideos75.db | sed "s#/home/media/video/##"
</code></pre>
<p>Also notice how I remove the absolute prefix for the annex so that i can refer to files as a relative path.</p>
<p>So this quick and dirty hack could have been used to mark files as "new". Unfortunately, this won't unmark them when the playcount increases. So instead I think this should be a field, and we need to extract the playcount. Play around with shell scripting enough to get sick, get back into bad perl habits and you'll end up with this nasty script: <a href="http://git-annex.branchable.com/tips/dumb_metadata_extraction_from_xbmc/git-annex-xbmc-playcount.pl/">git-annex-xbmc-playcount.pl</a>.</p>
<p>After the script is ran, you can sort the files by play count with:</p>
<pre><code>git annex view "playCount=*"
</code></pre>
<p>Or just show the files that haven't been played yet:</p>
<pre><code>git annex view playCount=0
</code></pre>
<p>Use <code>git checkout master</code> to reset the view. Note that the above will flatten the tree hierarchy, which you may not want. Try this in that case:</p>
<pre><code>git annex view playCount=0 films/=*
</code></pre>
<p>For more information, see <a href="http://git-annex.branchable.com/tips/metadata_driven_views/">metadata driven views</a>.</p>
<p>-- <a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
Bup repositories in git-annexhttp://git-annex.branchable.com/tips/Bup_repositories_in_git-annex/2014-08-05T20:14:25Z2014-08-05T20:14:25Z
<p>I'd like to share my setup for keeping <a href="https://github.com/bup/bup/">bup</a> repositories in git-annex.¹
I'm not sure if this is a <em>good</em> tip, so comments are welcome.</p>
<p>The purpose of this setup is to (kind of) bring encryption to bup,
and make it easy to keep bup backups in untrusted storage by making use of the encryption modes and backends provided by git-annex.
This approach can be used to make encrypted <em>backups of bup repositories</em>;
it can not replace encrypted filesystems such as EncFS or S3QL
which wouldn't necessarily require local bup repositories but also can't be combined with storage like Amazon Glacier.</p>
<p>To add a bup repository to git-annex, initialize a regular indirect git-annex repository,
and make the bup repository a subdirectory of it.²
Then <code>git annex add $BUP_REPO/objects/packs</code>, i.e. the location of the large data files (.pack & .par2).
The rest of the bup repository should be tracked by Git (<code>git add $BUP_REPO</code>).³
This way the repository stays fully functional.</p>
<p>After a bup-save the following steps will synchronize all remotes:⁴</p>
<pre><code>git annex add $BUP_REPO/objects/pack
git add $BUP_REPO
git commit -m "Backup on $(date)"
git annex sync --content
</code></pre>
<p>In my current setup, the git-annex repositories are located on a local file server.
Various clients use bup to create backups on the server.
This server also makes backups of other servers.
Afterwards, it uploads the annexed data to Glacier
(via an <a href="http://git-annex.branchable.com/special_remotes/S3/">encrypted S3 special remote</a>),
and pushes the small Git repositories to an S3QL filesystem and another off-site server.
Using these repositories (and my GPG key) the bup repositories could be recovered.</p>
<p>It may be important to note that in order to be able to <em>access</em> a bup repository,
<em>all</em> files have to be available locally.
Bup will not function if any pack files are missing (maybe this can be improved?).</p>
<hr />
<p>¹) Not to be confused with git-annex's <a href="http://git-annex.branchable.com/special_remotes/bup/">bup special remote</a>.</p>
<p>²) You can't initialize git-annex repositories directly inside bup repositories
because git-annex will (rightfully) identify them as bare git repositories and set itself up accordingly.</p>
<p>³) I've come up with these .gitignore rules to exclude potentially large files not needed for recovery:</p>
<pre><code>/bup_repo/bupindex*
/bup_repo/objects/pack/bup.bloom
/bup_repo/objects/pack/midx*midx
/bup_repo/objects/tmp*.pack
/bup_repo/index-cache/
</code></pre>
<p>⁴) <code>git annex sync</code> might not be the safest command to use because it would merge changes from the remotes.
However, assuming normal bup usage, external changes to the bup repository are not to be expected.</p>
ZSH completionhttp://git-annex.branchable.com/tips/ZSH_completion/2014-05-27T22:45:19Z2014-05-27T22:45:19Z
<p>ZSH users, here's some good news: after 2 years of silence, the completion function for git-annex has been updated. It now supports <em>all</em> git-annex commands (as of 5.20140517) and has many improvements for completing arguments, remotes, groups, and backends.</p>
<p>To install it:</p>
<ol>
<li>make sure your have Python 3 installed (as <code>python3</code> somewhere in your <code>$PATH</code>; tested with 3.4, should work with 3.2+)</li>
<li>get it from <a href="https://github.com/Schnouki/git-annex-zsh-completion">GitHub</a></li>
<li>copy <code>_git-annex</code> to somewhere in your <code>$fpath</code> (I use <code>$HOME/.config/zsh/completion</code>)</li>
<li>run <code>autoload -U path/to/_git-annex</code></li>
<li>type <code>git annex <TAB></code></li>
</ol>
<p>This is very far from being perfect, but it's (IMHO) better than nothing. If you have any issue or suggestion, please <a href="https://github.com/Schnouki/git-annex-zsh-completion/issues">tell me</a>!</p>
<p>Many thanks to Frank Terbeck and Valentin Haenel, the original authors of this completion function (<a href="https://github.com/esc/git-annex-zsh-completion">source</a>).</p>
file manager integrationhttp://git-annex.branchable.com/tips/file_manager_integration/2023-01-01T22:11:13Z2014-03-22T19:52:10Z
<p>Integrating git-annex and your file manager provides an easy way to select
annexed files to get or drop. The file manager can also be used to undo
changes to file managed by git-annex.</p>
<h2>GNOME (nautilus)</h2>
<p>Recent git-annex comes with built-in integration for Nautilus.</p>
<p><a href="http://git-annex.branchable.com/assistant/nautilusmenu.png"><img src="http://git-annex.branchable.com/assistant/nautilusmenu.png" width="627" height="451" class="img" /></a></p>
<p><a href="http://git-annex.branchable.com/assistant/downloadnotification.png"><img src="http://git-annex.branchable.com/assistant/downloadnotification.png" width="426" height="49" class="img" /></a></p>
<p>This is set up by git-annex creating simple scripts in
<code>~/.local/share/nautilus/scripts</code>, with names like "git-annex get"</p>
<h2>KDE (Dolphin/Konqueror)</h2>
<p>Even more recent git-annex comes with built-in integration with Konqueror.</p>
<p><a href="http://git-annex.branchable.com/assistant/konquerormenu.png"><img src="http://git-annex.branchable.com/assistant/konquerormenu.png" width="433" height="306" class="img" /></a></p>
<p>This is set up by git-annex creating a
<code>$XDG_DATA_HOME/kservices5/ServiceMenus/git-annex.desktop</code> file.</p>
<h2>Xfce (Thunar)</h2>
<p>Xfce uses the Thunar file manager.</p>
<p>Install <a href="https://pypi.org/project/thunar-plugins/">https://pypi.org/project/thunar-plugins/</a> to use its integrated
git-annex support.</p>
<p>Alternatively, thunar can also be easily configured to allow for custom
actions. Just go to the "Configure custom actions..." item in the "Edit"
menu, and create a custom action for get, drop, and undo with the following
commands:</p>
<pre><code>git-annex drop --notify-start --notify-finish -- %F
</code></pre>
<p>for drop, and for get:</p>
<pre><code>git-annex get --notify-start --notify-finish -- %F
</code></pre>
<p>and for undo:</p>
<pre><code>git-annex undo --notify-start --notify-finish -- %F
</code></pre>
<p>This gives me the resulting config on disk, in <code>.config/Thunar/uca.xml</code>:</p>
<pre><code><action>
<icon>git-annex</icon>
<name>git-annex get</name>
<unique-id>1396278104182858-3</unique-id>
<command>git-annex get --notify-start --notify-finish -- %F</command>
<description>get the files from a remote git annex repository</description>
<patterns>*</patterns>
<directories/>
<audio-files/>
<image-files/>
<other-files/>
<text-files/>
<video-files/>
</action>
<action>
<icon>git-annex</icon>
<name>git-annex drop</name>
<unique-id>1396278093174843-2</unique-id>
<command>git-annex drop --notify-start --notify-finish -- %F</command>
<description>drop the files from the local repository</description>
<patterns>*</patterns>
<directories/>
<audio-files/>
<image-files/>
<other-files/>
<text-files/>
<video-files/>
</action>
</code></pre>
<p>The complete instructions on how to setup actions is <a href="http://docs.xfce.org/xfce/thunar/custom-actions">in the Xfce documentation</a>.</p>
<h2>OS X (Finder) Full Integration</h2>
<p>Download and install the <a href="https://github.com/andrewringler/git-annex-turtle">git-annex-turtle</a> app (beta software). Provides Finder integration, badges and context menus.</p>
<h2>OS X (Finder) Context Menus</h2>
<p>For OS X, it is possible to get context menus in Finder. Due to how OS X
deals with symlinks, one needs to operate on folders.</p>
<ol>
<li>Open Automator and create a new Service.</li>
<li>Using the Drop down menus in the top create the sentence "Service receives selected folders in Finder.app" to have it work on folders. For direct mode operation it is probably reasonable to select "files or folders".</li>
<li><p>Add a "Run shell script" element and fill in line with the following script:</p>
<pre><code> #!/usr/bin/bash
source ~/.bash_profile
for f in "$@"
do
cd "$(dirname "$f")" && git-annex get "$f"
done
</code></pre></li>
</ol>
<p>The purpose of the first line is there to get git-annex on to the path. The
reason for the for loop is in case multiple files or folders are marked
when running the context menu command.</p>
<p>Finally save the the workflow under the name for which it should be listed in the context menu.</p>
<h2>your file manager here</h2>
<p>Edit this page and add instructions!</p>
<h2>general</h2>
<p>If your file manager can run a command on a file, it should be easy to
integrate git-annex with it. A simple script will suffice:</p>
<pre><code>#!/bin/sh
git-annex get --notify-start --notify-finish -- "$@"
</code></pre>
<p>The --notify-start and --notify-stop options make git-annex display a
desktop notification. This is useful to give the user an indication that
their action took effect. Desktop notifications are currently only
implemented for Linux.</p>
automatically adding metadatahttp://git-annex.branchable.com/tips/automatically_adding_metadata/2016-02-28T15:12:22Z2014-03-02T22:01:07Z
<p>git-annex's <a href="http://git-annex.branchable.com/metadata/">metadata</a> works best when files have a lot of useful
metadata attached to them.</p>
<p>To make git-annex automatically set the year and month when adding files,
run: <code>git config annex.genmetadata true</code></p>
<h2>git commit hook</h2>
<p>A git commit hook can be set up to extract lots of metadata from files
like photos, mp3s, etc. Whenever annexed files are committed, their
metadata will be extracted and stored.</p>
<p>Download <a href="http://git-annex.branchable.com/tips/automatically_adding_metadata/pre-commit-annex">pre-commit-annex</a> and install it in your git-annex repository
as <code>.git/hooks/pre-commit-annex</code><br />
Remember to make the script executable! <code>chmod +x .git/hooks/pre-commit-annex</code></p>
<h3>using extract</h3>
<p>The git commit hook can use extract to get metadata.</p>
<p>Install it from <a href="http://www.gnu.org/software/libextractor/">http://www.gnu.org/software/libextractor/</a><br />
<code>apt-get install extract</code></p>
<p>Configure which metadata fields to ask extract for: <code>git config metadata.extract "artist album title camera_make video_dimensions"</code></p>
<p>To get a list of all possible fields, run: <code>extract -L | sed 's/ /_/g'</code></p>
<h3>using exiftool</h3>
<p>The git commit hook can also use exiftool to get metadata.</p>
<p>Install it from <a href="http://owl.phy.queensu.ca/~phil/exiftool/">http://owl.phy.queensu.ca/~phil/exiftool/</a><br />
<code>apt-get install libimage-exiftool-perl</code></p>
<p>Configure which metadata fields to ask exiftool for: <code>git config metadata.exiftool "Model ImageSize FocusRange GPSAltitude GPSCoordinates"</code></p>
<p>To get a list of all possible fields, run: <code>exiftool -list</code></p>
<h3>using both extract and exiftool</h3>
<p>If you want some metadata that extract knows about, and other metadata
that exiftool knows about, just install them both, and set both
<code>metadata.extract</code> and <code>metadata.exiftool</code>.</p>
<h3>overwriting existing metadata</h3>
<p>By default, if a git-annex already has a metadata field for a file,
its value will not be overwritten with metadata taken from files.
To allow overwriting, run: <code>git config metadata.overwrite true</code></p>
remote webapp setuphttp://git-annex.branchable.com/tips/remote_webapp_setup/2014-03-01T04:41:29Z2014-03-01T02:39:06Z
<p>Here's the scenario: You have a remote server you can ssh into,
and you want to use the git-annex webapp there, displaying back on your local
web browser.</p>
<p>Sure, no problem! It can even be done securely!</p>
<p>Let's start by making the git-annex repository on the remote server.</p>
<pre><code>git init annex
cd annex
git annex init
</code></pre>
<p>Now, you need to generate a private key and a certificate for HTTPS.
These files are stored in <code>.git/annex/privkey.pem</code> and
<code>.git/annex/certificate.pem</code> inside the git repository. Here's
one way to generate those files, using a self-signed certificate:</p>
<pre><code>(umask 077 ; openssl genrsa -out .git/annex/privkey.pem 4096)
openssl req -new -x509 -key .git/annex/privkey.pem > .git/annex/certificate.pem
</code></pre>
<p>With those files in place, git-annex will automatically only accept HTTPS
connections. That's good, since HTTP connections are not secure over the
big bad internet.</p>
<p>All that remains is to make the webapp listen on the external interface
of the server. Normally, for security, git-annex only listens on localhost.
Tell it what hostname to listen on:</p>
<pre><code>git config annex.listen host.example.com
</code></pre>
<p>(If your hostname doesn't work, its IP address certianly will..)</p>
<p>When you run the webapp configured like that, it'll print out the
URL to use to open it. You can paste that into your web browser.</p>
<pre><code>git annex webapp
http://host.example.com:42232/?auth=ea7857ad...
</code></pre>
<p>Notice that the URL has a big jumble of letters at the end -- this is a
secret token that the webapp uses to verify you're you. So random attackers
can't find your webapp and do bad things with it.</p>
<p>If you like, you can make the server run <code>git annex assistant --autostart</code>
on boot.</p>
<p>To automate opening the remote server's webapp in your local browser,
just run this:</p>
<pre><code>firefox "$(ssh host.example.com git annex webapp)"
</code></pre>
metadata driven viewshttp://git-annex.branchable.com/tips/metadata_driven_views/2014-07-31T05:09:32Z2014-02-19T21:39:58Z
<p>git-annex now has support for storing
<a href="http://git-annex.branchable.com/metadata/">arbitrary metadata</a> about annexed files. For example, this can be
used to tag files, to record the author of a file, etc. The metadata is
synced around between repositories with the other information git-annex
keeps track of.</p>
<p>One nice way to use the metadata is through <strong>views</strong>. You can ask
git-annex to create a view of files in the currently checked out branch
that have certain metadata. Once you're in a view, you can move and copy
files to adjust their metadata further. Rather than the traditional
hierarchical directory structure, views are dynamic; you can easily
refine or reorder a view.</p>
<p>Let's get started by setting some tags on files. No views yet, just some
metadata:</p>
<div class="notebox">
<p>To avoid needing to manually tag files with the year (and month),
run <code>annex.genmetadata true</code>, and git-annex will do it for you
when adding files.</p>
</div>
<pre><code># git annex metadata --tag todo work/2014/*
# git annex metadata --untag todo work/2014/done/*
# git annex metadata --tag urgent work/2014/presentation_for_tomorrow.odt
# git annex metadata --tag done work/2013/* work/2014/done/*
# git annex metadata --tag work work
# git annex metadata --tag video videos
# git annex metadata --tag work videos/operating_heavy_machinery.mov
# git annex metadata --tag done videos/old
# git annex metadata --tag new videos/lotsofcats.ogv
# git annex metadata --tag sound podcasts
# git annex metadata --tag done podcasts/*/old
# git annex metadata --tag new podcasts/*/recent
</code></pre>
<p>So, you had a bunch of different kinds of files sorted into a directory
structure. But that didn't really reflect how you approach the files.
Adding some tags lets you categorize the files in different ways.</p>
<p>Ok, metadata is in place, but how to use it? Time to change views!</p>
<pre><code># git annex view tag=*
view (searching...)
Switched to branch 'views/_'
ok
</code></pre>
<div class="notebox">
<p>Notice that a single file may appear in multiple directories
depending on its tags. For example, <code>lotsofcats.ogv</code> is in
both <code>new/</code> and <code>video/</code>.</p>
</div>
<p>This searched for all files with any tag, and created a new git branch
that sorts the files according to their tags.</p>
<pre><code># tree -d
work
todo
urgent
done
new
video
sound
</code></pre>
<p>Ah, but you're at work now, and don't want to be distracted by cat videos.
Time to filter the view:</p>
<pre><code># git annex vfilter tag=work
vfilter
Switched to branch 'views/(work)/_'
ok
</code></pre>
<p>Now only the work files are in the view, and they're otherwise categorized
according to their other tags. So you can check the <code>urgent/</code> directory
to see what's next, and look in <code>todo/</code> for other work related files.</p>
<p>Now that you're in a tag based view, you can move files around between the
directories, and when you commit your changes to git, their tags will be
updated.</p>
<pre><code># git mv urgent/presentation_for_tomorrow_{work;2014}.odt ../done
# git commit -m "a good day's work"
metadata tag-=urgent
metadata tag+=done
</code></pre>
<p>You can return to a previous view by running <code>git annex vpop</code>. If you pop
all the way out of all views, you'll be back on the regular git branch you
originally started from. You can also use <code>git checkout</code> to switch between
views and other branches.</p>
<h2>fields</h2>
<p>Beyond simple tags and directories, you can add whatever kinds of metadata
you like, and use that metadata in more elaborate views. For example, let's
add a year field.</p>
<pre><code># git checkout master
# git annex metadata --set year=2014 work/2014
# git annex metadata --set year=2013 work/2013
# git annex view year=* tag=*
</code></pre>
<p>Now you're in a view with two levels of directories, first by year and then
by tag.</p>
<pre><code># tree -d
2014
|-- work
|-- todo
|-- urgent
`-- done
2013
|-- work
`-- done
</code></pre>
<p>Oh, did you want it the other way around? Easy!</p>
<pre><code># git annex vcycle
# tree -d
work
|-- 2014
`-- 2013
todo
`-- 2014
urgent
`-- 2014
done
|-- 2014
`-- 2013
</code></pre>
<h2>location fields</h2>
<p>Let's switch to a view containing only new podcasts. And since the
podcasts are organized into one subdirectory per show, let's
include those subdirectories in the view.</p>
<pre><code># git checkout master
# git annex view tag=new podcasts/=*
# tree -d
This_Developers_Life
Escape_Pod
GitMinutes
The_Haskell_Cast
StarShipSofa
</code></pre>
<p>That's an example of using part of the directory layout of the original
branch to inform the view. Every file gets fields automatically set up
corresponding to the directory it's in. So a file"foo/bar/baz/file" has
fields "/=foo", "foo/=bar", and "foo/bar/=baz". These location fields
can be used the same as other metadata to construct the view.</p>
<p>This has probably only scratched the surface of what you can do with views.</p>
Shamir secret sharing and git-annexhttp://git-annex.branchable.com/tips/Shamir_secret_sharing_and_git-annex/2014-01-24T05:05:01Z2014-01-24T04:50:52Z
<p>Combining git-annex with <a href="http://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing">Shamir secret sharing</a>
is an useful way to securely back up highly sensitive files,
such as a gpg key or bitcoin wallet.</p>
<p>Shamir secret sharing creates N shares of a file, of which any M can be
used to reconstitute the original file. Anyone who has less than M shares
cannot tell anything about the original file, other than its size.</p>
<p>Where git-annex comes in is as a way to manage these shares. They can be
added to the annex, and then git-annex used to move one share to each clone
of the repository. Since git-annex keeps track of where each file is
stored, this can aid later finding the shares again when they're needed, as
well as making ongoing management of the shares easier.</p>
<p>Note that this conveniece comes at a price: Any attacker who gets a copy
of the git repository can use it to figure out where the shares are
located. While this is not a crippling flaw, and can be worked around, it
needs to be considered when implementing this technique.</p>
<p>Here is an example of this method being used for a ~/.gnupg directory:
<a href="http://git.kitenet.net/?p=gpg.git;a=blob;f=README.sss">http://git.kitenet.net/?p=gpg.git;a=blob;f=README.sss</a></p>
Synology NAS and git annexhttp://git-annex.branchable.com/tips/Synology_NAS_and_git_annex/2015-04-17T16:39:08Z2014-01-02T06:41:07Z
<h1>How to use git-annex on a Synology NAS</h1>
<p>This is known to work with DSM 4.3-3810 Update 1 and git-annex standalone version 5.20131224-g6ca5271.</p>
<h2>Installation Steps</h2>
<p>(1) In the DSM Package Center, install Git, which is available from Synology (no third-party repository needed).</p>
<p>(2) Download the latest <a href="http://git-annex.branchable.com/install/Linux_standalone/">standalone</a> git-annex build for Linux on armel.</p>
<p>(3) Extract it somewhere sensible (eg: a bin/ directory your users home directory)</p>
<p>(4) Go into the git-annex.linux directory and ./runshell. You can now run git-annex as you normally would.</p>
<h2>How to sync with the Synology NAS</h2>
<h3>On the Synology</h3>
<p>(1) Setup port forwarding and associated dynamic dns, if applicable. Many good guides online for this.</p>
<p>(2) Setup ssh key based authentication with the Synology for each computer you want to sync with it. You want a specific key that is used only by git-annex, for each computer. Again, many good guides online.</p>
<p>(3) In the Synology .ssh/authorized_keys file for your account, add (substituting your username)</p>
<div class="highlight-sh"><pre class="hl"><span class="hl kwb">command</span><span class="hl opt">=</span><span class="hl str">"/home/</span><span class="hl ipl">$yourusername</span><span class="hl str">/.ssh/git-annex-shell"</span>
</pre></div>
<p>to the beginning of the line. Eg, it would look like this:</p>
<div class="highlight-sh"><pre class="hl"><span class="hl kwb">command</span><span class="hl opt">=</span><span class="hl str">"/home/greg/.ssh/git-annex-shell"</span> ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDT1yE96E<span class="hl opt">/</span>JQNPt0ziiNYJRvndCvLK4uG5h<span class="hl opt">/</span>SNYoAIBF1uH6L7VYAt3HWVqSyi3BcV70WDZ<span class="hl opt">/</span>yWgtNzbrcir46JpvEHMcvYaXLbANwoDGNjG<span class="hl opt">/</span>gsz7kP<span class="hl opt">/</span><span class="hl num">8</span>VUxZ6hG3P3ICuwnqVum5<span class="hl opt">+</span>rYXm6oj3xzWPfTRhhRoDZLOQdevSNpdGNaa<span class="hl opt">/</span>lSg8Vuq2suHwjQlQb8AIUuCZmS5cm6XwoUq<span class="hl opt">/</span>jJtN4LTuTPqMjzA6NkdhWM2Kigi9jPQBFborkYBPMphmZwBZiVnhsH1XpaOff<span class="hl opt">+</span>mP03D2gF<span class="hl opt">/</span>huC<span class="hl opt">+</span>b1vbWQstjuehUbY59rvJ4ijb3810Uq2ep7dwLagmILtX5GbL<span class="hl opt">+</span>GS64pAn9sIP annex-othercomputer
</pre></div>
<p>(4) the git-annex-shell script in your .ssh should be created for you aftering your initial ./runshell</p>
<p>(5) Double check that the script points to the correct directory of where your extracted git-annex.linux lives.</p>
<h3>On the other computers - the manual way</h3>
<p>(1) See step 2 above about creating the specific git-annex ssh keys.</p>
<p>(2) In your .ssh/config, create an alias for your Synology that includes specifying the right sshkey. For example, mine looks like:</p>
<div class="highlight-sh"><pre class="hl">Host synologyhost
HostName mydynamicdomain.no-ip.org
IdentityFile <span class="hl opt">/</span>home<span class="hl opt">/</span>greg<span class="hl opt">/</span>.ssh<span class="hl opt">/</span>annex_rsa
</pre></div>
<p>(3) Now when you clone the git repo from the Synology, or add it as a remote, do the following:</p>
<div class="highlight-sh"><pre class="hl">git clone greg@synologyhost<span class="hl opt">:/</span>absolute<span class="hl opt">/</span>path<span class="hl opt">/</span>to<span class="hl opt">/</span>annexname annexname
</pre></div>
<p>or</p>
<div class="highlight-sh"><pre class="hl">git remote add synology greg@synologyhost<span class="hl opt">:/</span>absolute<span class="hl opt">/</span>path<span class="hl opt">/</span>to<span class="hl opt">/</span>annexname
</pre></div>
<p>(4) Run git-annex sync</p>
<h3>On the other computers - Using the assistant</h3>
<p>(1) Use the webapp to add the remote. I'm not sure if there are any gotchas here as I have not done it this way yet.</p>
Crude Windows Synchttp://git-annex.branchable.com/tips/Crude_Windows_Sync/2013-11-27T22:47:37Z2013-11-21T12:37:48Z
<p>Here's a workaround to start syncing folders on Windows right now. It's a bit command line heavy, so you might need to set this up for your users. But I would much rather do this than use some other syncing solution and then have to migrate.</p>
<p>(1) Create a remote server git annex repository with the assistant on Linux or Mac.</p>
<p>(2) <a href="http://git-scm.com/">Install git</a> on the Windows machine.</p>
<p>(3) <a href="http://git-annex.branchable.com/install/Windows/">Install git-annex for Windows</a> on the Windows machine. Don't forget to run the installer as administrator.</p>
<p>(4) Run <em>Git Bash</em> from the system menu, and run these commands to clone your repository.</p>
<pre><code>ssh-keygen
cat .ssh/id_rsa.pub | ssh username@my-server.com "cat >> ~/.ssh/authorized_keys"
git clone username@my-server.com:/path/to/annex
cd annex
git annex init
</code></pre>
<p>(5) Create a script that will trigger a full sync</p>
<pre><code>echo '
#!/bin/bash
git annex sync
git annex get *
git annex add .
git annex sync
git annex copy * --to origin
' > sync.sh
chmod +x sync.sh
./sync.sh
</code></pre>
<p>(6) Copy the "Git Bash" shortcut from your windows menu to your desktop, and change the link target to:</p>
<pre><code>C:\Program Files\Git\bin\sh.exe" --login -i "annex/sync.sh"
</code></pre>
<p>Now ask your users to run this shortcut before and after they change files. You can also put it into the "autostart" folder to sync at boot.</p>
The perfect preferred content settings for my android phonehttp://git-annex.branchable.com/tips/The_perfect_preferred_content_settings_for_my_android_phone/2013-11-30T15:04:35Z2013-11-16T08:36:21Z
<p>I have an annex that syncs my personal files on all my computers. It works great. Phones are different.</p>
<p>For one, everything's a bit slower to sync, there's battery considerations, and I just don't need every last old file on my phone. Then there's some files I explicitly don't want on my phone in case it gets lost, like family pictures, passport scans, or private keys.</p>
<p>But I still want photos, videos and voice recordings I make on my phone to be synced to my server. A transfer repo would work, but I want to keep them. Then there's my PDF book collection; that would certainly be nice to always have around in case I have half on hour on a bus. And my music collection ought to be around as well.</p>
<p>So I came up with this solution, and I'm very happy with it.</p>
<pre><code>include=Music/* or include=Books/* or present
</code></pre>
<p>This will sync my music and book collections to my phone whenever I add something new on my computers, and it will sync and keep anything I add to the annex on my phone. Best of all worlds! Impressed how flexible preferred content is. More full-sync folders can be added like this:</p>
<pre><code>include=Music/* or include=Books/* or include = Notes/* or present
</code></pre>
<p>To add them, I first had to figure out the uuid of my phone repo. So I added a new tab on android, and did</p>
<pre><code>cd /sdcard/annex
git config annex.uuid
</code></pre>
<p>Then I went to one of my computers, and did</p>
<pre><code>git annex vicfg
</code></pre>
<p>And changed the line</p>
<pre><code>content [phone-uuid] = standard
</code></pre>
<p>to</p>
<pre><code>content [phone-uuid] = include=Music/* or include=Books/* or Notes/* or present
</code></pre>
<p>and commented out</p>
<pre><code>#group [phone-uuid] = client
</code></pre>
<p>And waited for it to sync.</p>
recovering from a corrupt git repositoryhttp://git-annex.branchable.com/tips/recovering_from_a_corrupt_git_repository/2013-11-27T22:47:37Z2013-11-11T05:35:43Z
<p>I have found this the most reliable way to recover from a corrupt git repository. I have had a lot of them lately, there might be a regression in btrfs in Ubuntu's Linux 3.8.0-33 (!).</p>
<ol>
<li>Create a clone of a known good repository.</li>
<li>Add the clone as an object alternate to the broken repository.</li>
<li>Do a <code>git-repack -a -d</code> to lift the external objects into repo-local packs.</li>
<li>Remove the clone</li>
</ol>
<div class="highlight-sh"><pre class="hl">$ <span class="hl kwb">cd</span> <span class="hl opt">/</span>tmp<span class="hl opt">/</span>
$ git clone good-host<span class="hl opt">:/</span>path<span class="hl opt">/</span>to<span class="hl opt">/</span>good-repo
$ <span class="hl kwb">cd</span> <span class="hl opt">/</span>home<span class="hl opt">/</span>user<span class="hl opt">/</span>broken-repo
$ <span class="hl kwb">echo</span> <span class="hl opt">/</span>tmp<span class="hl opt">/</span>good-repo<span class="hl opt">/</span>.git<span class="hl opt">/</span>objects<span class="hl opt">/ ></span> .git<span class="hl opt">/</span>objects<span class="hl opt">/</span>info<span class="hl opt">/</span>alternates
$ git repack <span class="hl kwb">-a -d</span>
$ <span class="hl kwc">rm</span> <span class="hl kwb">-rf</span> <span class="hl opt">/</span>tmp<span class="hl opt">/</span>good-repo
</pre></div>
<p>... and push early, push often. <img src="http://git-annex.branchable.com/smileys/smile4.png" alt=";-)" /></p>
offline archive driveshttp://git-annex.branchable.com/tips/offline_archive_drives/2013-12-16T06:17:15Z2013-09-22T20:46:11Z
<p>After you've used git-annex for a while, you will have data in your repository
that you don't want to keep in the limited disk space of a laptop or a server,
but that you don't want to entirely delete.</p>
<p>This is where git-annex's support for offline archive drives shines.
You can move old files to an archive drive, which can be kept offline if
it's not practical to keep it spinning. Better, you can move old files to
two or more archive drives, in case one of them later fails to spin up.<br />
(One consideration when <a href="http://git-annex.branchable.com/future_proofing/">future proofing</a> your archive.)</p>
<p>To set up an archive drive, you can take any removable drive, format
it with a filesystem you'll be able to read some years later, and then follow
the <a href="http://git-annex.branchable.com/walkthrough/">walkthrough</a> to set up a repository on it that is a git remote of
the repository in your computer you want to archive. In short:</p>
<pre><code>cd /media/archive
git clone ~/annex
cd ~/annex
git remote add archivedrive /media/archive/annex
git annex sync archivedrive
</code></pre>
<p>Don't forget to tell git-annex this is an archive drive (or a backup
drive; see <a href="http://git-annex.branchable.com/preferred_content/">preferred content</a>.). Also, give the drive a description that matches something you write on
its label, so you can find it later:</p>
<pre><code>git annex group archivedrive archive
git annex wanted archivedrive standard
git annex describe archivedrive "my first archive drive (SATA)"
</code></pre>
<p>Or you can use the assistant to set up the drive for you.<br />
(Nice video tutorial here: <a href="http://git-annex.branchable.com/videos/git-annex_assistant_archiving/">git-annex assistant archiving</a>)</p>
<p>(Keeping the archive drive in an offsite location? Consider encrypting
it! See <a href="http://git-annex.branchable.com/tips/fully_encrypted_git_repositories_with_gcrypt/">fully encrypted git repositories with gcrypt</a>.)</p>
<p>Then, when the archive drive is plugged in, you can easily copy files to
it:</p>
<pre><code>cd ~/annex
git-annex copy --auto --to archivedrive
</code></pre>
<p>Or, if you're using the assistant, it will automatically notice when the drive
gets plugged in and copy files that need to be archived.</p>
<p>When you want to get rid of the local file, leaving only the copy on the
archive, you can just:</p>
<pre><code>git annex drop file
</code></pre>
<p>The archive drive has to be plugged in for this to work, so git-annex
can verify it still has the file. If you had configured git-annex to
always store 2 <a href="http://git-annex.branchable.com/copies/">copies</a>, it will need 2 archive drives plugged in.
You may find it useful to configure a <a href="http://git-annex.branchable.com/trust/">trust</a> setting for the drive to
avoid needing to haul it out of storage to drop a file.</p>
<p>Now the really nice thing. When your archive drive gets filled up, you
can simply remove it, store it somewhere safe, and replace it with a new
drive, which can be mounted at the same location for simplicity. Set up
the new drive the same way described above, and use it to archive even more
files.</p>
<p>Finally, when you want to access one of the files you archived, you can
just ask for it:</p>
<pre><code>git annex get file
</code></pre>
<p>If necessary git-annex will tell you which archive drive you need to
pull out of storage to get the file back. This is where the description
you entered earlier comes in handy.</p>
shared git annex directory between multiple usershttp://git-annex.branchable.com/tips/shared_git_annex_directory_between_multiple_users/2015-05-27T20:56:05Z2013-09-10T23:13:59Z
<h1>Scenario</h1>
<p>You have a server where you want to welcome other people to push files, say for a family photo album. People have their own user account, so by default they will not be able to read/write from each other's repositories, due to git-annex strict restrictions.</p>
<h1>Solution</h1>
<p>Setup a shared git repository:</p>
<pre><code>git init shared ; cd shared # you can also do this on an existing git annex repo
git config core.sharedrepository group
chmod g+rwX -R .
chgrp -R $group .
</code></pre>
<p>The idea here is to use the new (since <span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fshared_git_annex_directory_between_multiple_users&page=news%2Fversion_4.20130909" rel="nofollow">?</a>version 4.20130909</span>) support for git's <code>sharedRepository</code> configuration and restrict access to a specific group (instead of the default, a single user). You can also this to make the files accessible to all users on the system:</p>
<pre><code>git config core.sharedrepository world
chmod a+rwX -R .
</code></pre>
<p>This will make sure that you anyone can operate that git annex repository remotely.</p>
<h2>Third party applications</h2>
<p>Now if another application that is not aware of git's <code>sharedRepository</code> configuration (say a <span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fshared_git_annex_directory_between_multiple_users&page=bittorrent" rel="nofollow">?</a>bittorrent</span> daemon) writes files there, you may want to make sure that the files created are also writable by everyone. This is more tricky, but one way of doing this is with the [[!wikipedia setgid]] bit:</p>
<pre><code>find -type d -exec chmod g+s {} \;
</code></pre>
<p>You will also need to start the process with a proper [[!wikipedia umask]] (<code>002</code> instead of <code>022</code>).</p>
<p><img src="http://git-annex.branchable.com/smileys/idea.png" alt="(!)" /> I haven't actually tested this part. --<a href="http://git-annex.branchable.com/users/anarcat/">anarcat</a></p>
<h1>See also</h1>
<ul>
<li><a href="http://git-annex.branchable.com/tips/setup_a_public_repository_on_a_web_site/">setup a public repository on a web site</a></li>
<li><span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fshared_git_annex_directory_between_multiple_users&page=news%2Fversion_4.20130909" rel="nofollow">?</a>version 4.20130909</span></li>
<li><a href="http://git-annex.branchable.com/bugs/acl_not_honoured_in_rsync_remote/">acl not honoured in rsync remote</a>: why this does not work on encrypted remotes</li>
</ul>
migrating two seperate disconnected directories to git annexhttp://git-annex.branchable.com/tips/migrating_two_seperate_disconnected_directories_to_git_annex/2022-11-03T18:07:56Z2013-09-10T18:11:46Z
<p>Note: this is the reverse of <a href="http://git-annex.branchable.com/tips/splitting_a_repository/">splitting a repository</a>.</p>
<h2>Scenario</h2>
<p>You are a new git-annex user. You have already files spread around many computers and wish to migrate those into git-annex, without having to recopy all files all over the place.</p>
<p>Let's say, for example, you have a server, named <code>marcos</code> and a workstation named <code>angela</code>. You have your audio collection stored in <code>/srv/mp3</code> in <code>marcos</code> and <code>~/mp3</code> on <code>angela</code>, but only <code>marcos</code> has all the files, and <code>angela</code> only has a subset.</p>
<p>We also assume that <code>marcos</code> has an SSH server.</p>
<p>How do you add all this stuff to git-annex?</p>
<h2>Create the biggest git-annex repository</h2>
<p>Start with <code>marcos</code>, with the complete directory:</p>
<pre><code>cd /srv/mp3
git init
git annex init
git annex add .
git commit -m "git annex yay"
</code></pre>
<p>This will checksum all files and add them to the <code>git-annex</code> branch of the git repository. Wait for this process to complete.</p>
<h2>Create the smaller repo and synchronise</h2>
<p>On <code>angela</code>, we want to synchronise the git annex metadata with <code>marcos</code>. We need to initialize a git repo with <code>marcos</code> as a remote:</p>
<pre><code>cd ~/mp3
git init
git remote add marcos marcos.example.com:/srv/mp3
git fetch marcos
git annex info # this should display the two repos
git annex add .
</code></pre>
<p>This will, again, checksum all files and add them to git annex. Once that is done, you can verify that the files are really the same as marcos with <code>whereis</code>:</p>
<pre><code>git annex whereis
</code></pre>
<p>This should display something like:</p>
<pre><code>whereis Orange Seeds/I remember.wav (2 copies)
b7802161-c984-4c9f-8d05-787a29c41cfe -- marcos (anarcat@marcos:/srv/mp3)
c2ca4a13-9a5f-461b-a44b-53255ed3e2f9 -- here (anarcat@angela)
ok
</code></pre>
<p>Once you are sure things went on okay, you can synchronise this with <code>marcos</code>:</p>
<pre><code>git annex sync --allow-unrelated-histories
</code></pre>
<p>This will push the metadata information to marcos, so it knows which files
are available on <code>angela</code>. From there on, you can freely get and move files
between the two repos!</p>
<h2>Importing files from a third directory</h2>
<p>Say that some files on <code>angela</code> are actually spread out outside of the <code>~/mp3</code> directory. You can use the <code>git annex import</code> command to add those extra directories:</p>
<pre><code>cd ~/mp3
git annex import ~/music/
</code></pre>
<p><img src="http://git-annex.branchable.com/smileys/idea.png" alt="(!)" /> Be careful that <code>~/music</code> is not a git-annex repository.</p>
<h2>Deleting deleted files</h2>
<p>It is quite possible some files were removed (or renamed!) on <code>marcos</code> but not on <code>angela</code>, since it was synchronised only some time ago. A good way to find out about those files is to use the <code>--not --in</code> argument, for example, on <code>angela</code>:</p>
<pre><code>git annex whereis --in here --not --in marcos
</code></pre>
<p>This will show files that are on <code>angela</code> and not on <code>marcos</code>. They could be new files that were only added on <code>angela</code>, so be careful! A manual analysis is necessary, but let's say you are certain those files are not relevant anymore, you can delete them from <code>angela</code>:</p>
<pre><code>git annex drop <file>
</code></pre>
fully encrypted git repositories with gcrypthttp://git-annex.branchable.com/tips/fully_encrypted_git_repositories_with_gcrypt/2021-04-13T19:05:38Z2013-09-08T19:48:41Z
<p><a href="https://spwhitton.name/tech/code/git-remote-gcrypt/">git-remote-gcrypt</a>
adds support for encrypted remotes to git. Combine this with git-annex
encrypting the files it stores in a remote, and you can fully encrypt
all the data stored on a remote.</p>
<p>Here are some ways you can use this awesome stuff..</p>
<p>This page will show how to set it up at the command line, but the git-annex
<a href="http://git-annex.branchable.com/assistant/">assistant</a> can also be used to help you set up encrypted git
repositories.</p>
<h2>prerequisites</h2>
<ul>
<li><p>Install <a href="https://spwhitton.name/tech/code/git-remote-gcrypt/">git-remote-gcrypt</a>.</p></li>
<li><p>Set up a gpg key. You might consider generating a special purpose key
just for this use case, since you may end up wanting to put the key
on multiple machines that you would not trust with your main gpg key.</p>
<p>The examples below use "$mykey" where you should put your gpg keyid.</p></li>
</ul>
<h2>encrypted backup drive</h2>
<p>Let's make a USB drive into an encrypted backup repository. It will contain
both the full contents of your git repository, and all the files you
instruct git-annex to store on it, and everything will be encrypted so that
only you can see it.</p>
<p>Here's how to set up the encrypted repository:</p>
<pre><code>git init --bare /mnt/encryptedbackup
git annex initremote encryptedbackup type=gcrypt gitrepo=/mnt/encryptedbackup keyid=$mykey
git annex sync encryptedbackup
</code></pre>
<p>(Remember to replace "$mykey" with the keyid of your gpg key.)</p>
<p>This uses the <a href="http://git-annex.branchable.com/special_remotes/gcrypt/">gcrypt special remote</a> to encrypt
pushes to the git remote, and git-annex will also encrypt the files it
stores there.</p>
<p>Now you can copy (or even move) files to the repository. After
sending files to it, you'll probably want to do a sync, which pushes
the git repository changes to it as well.</p>
<pre><code>git annex copy --to encryptedbackup ...
git annex sync encryptedbackup
</code></pre>
<p>Note that if you lose your gpg key, it will be <em>impossible</em> to get the
data out of your encrypted backup. You need to find a secure way to store a
backup of your gpg key. Printing it out and storing it in a safe deposit box,
for example.</p>
<p>You can actually specify keyid= as many times as you like to allow any one
of a set of gpg keys to access this repository. So you could add a friend's
key, or another gpg key you have.</p>
<p>To restore from the backup, just plug the drive into any machine that has
the gpg key used to encrypt it, and then:</p>
<pre><code>git clone gcrypt::/mnt/encryptedbackup restored
cd restored
git annex enableremote encryptedbackup gitrepo=/mnt/encryptedbackup
git annex get --from encryptedbackup
</code></pre>
<h2>encrypted git-annex repository on a ssh server</h2>
<p>If you have a server that has ssh and rsync installed on it, you can set up an
encrypted repository there. Works just like the encrypted drive except
without the cable.</p>
<p>This example uses rsync urls in a form supported by git-remote-gcrypt since
version 1.4. Older versions won't work with the urls used here, consult
its documentation if you have to use an old version.</p>
<p>First, on the server, run:</p>
<pre><code>git init --bare encryptedrepo
</code></pre>
<p>Now, in your existing git-annex repository, set up the encrypted remote:</p>
<pre><code>git annex initremote encryptedrepo type=gcrypt gitrepo=rsync://my.server/home/me/encryptedrepo keyid=$mykey
git annex sync encryptedrepo
</code></pre>
<p>(Remember to replace "$mykey" with the keyid of your gpg key.)</p>
<p>This uses the <a href="http://git-annex.branchable.com/special_remotes/gcrypt/">gcrypt special remote</a> to encrypt
pushes to the git remote, and git-annex will also encrypt the files it
stores there. Data is transferred using rsync over ssh.</p>
<p>If you're going to be sharing this repository with others, be sure to also
include their keyids, by specifying keyid= repeatedly.</p>
<p>Now you can copy (or even move) files to the repository. After
sending files to it, you'll probably want to do a sync, which pushes
the git repository changes to it as well.</p>
<pre><code>git annex copy --to encryptedrepo ...
git annex sync encryptedbackup
</code></pre>
<p>Anyone who has access to the repo it and has one of the keys
used to encrypt it can check it out:</p>
<pre><code>git clone gcrypt::rsync://my.server/home/me/encryptedrepo myrepo
cd myrepo
git annex enableremote encryptedrepo gitrepo=rsync://my.server/home/me/encryptedrepo
git annex get --from encryptedrepo
</code></pre>
<h2>private encrypted git remote on a git-lfs hosting site</h2>
<p>Some git repository hosting sites do not support git-annex, but do support
the similar git-lfs for storing large files alongside a git repository.
git-annex can use the git-lfs protocol to store files in such repositories,
and with gcrypt, everything stored in the remote can be encrypted.</p>
<p>First, make a new, empty git repository on the hosting site.
Get the ssh clone url for the repository, which might look
like "git@github.com:username/somerepo.git"</p>
<p>Then, in your git-annex repository, set up the encrypted remote:</p>
<pre><code>git annex initremote lfstest type=git-lfs url=gcrypt::git@github.com:username/somerepo.git keyid=$mykey
</code></pre>
<p>(Remember to replace "$mykey" with the keyid of your gpg key.)</p>
<p>This uses the <a href="http://git-annex.branchable.com/special_remotes/git-lfs/">git-lfs special remote</a>, and the
<code>gcrypt::</code> prefix on the url makes pushes be encrypted with gcrypt.</p>
<h2>private encrypted git remote on a git hosting site</h2>
<p>You can use gcrypt to store your git repository in encrypted form on any
hosting site that supports git. Only you can decrypt its contents. Using it
this way, git-annex does not store large files on the hosting site; it's
only used to store your git repository itself.</p>
<pre><code>git remote add encrypted gcrypt::ssh://hostingsite/myrepo.git
git push encrypted master git-annex
</code></pre>
<p>Now you can carry on using git-annex with your new repository. For example,
<code>git annex sync</code> will sync with it.</p>
<p>To check out the repository from the hosting site, use the same gcrypt::
url you used when setting it up:</p>
<pre><code>git clone gcrypt::ssh://hostingsite/myrepo.git
</code></pre>
<h2>multiuser encrypted git remote on a git hosting site</h2>
<p>Suppose two users want to share an encrypted git remote. Both of you
need to set up the remote, and configure gcrypt to encrypt it so that both
of you can see it.</p>
<pre><code>git remote add sharedencrypted gcrypt::ssh://hostingsite/myrepo.git
git config remote.sharedencrypted.gcryt-participants "$mykey $friendkey"
git push sharedencrypted master git-annex
</code></pre>
imapannexhttp://git-annex.branchable.com/tips/imapannex/2014-01-10T14:33:30Z2013-08-15T21:19:50Z
<h1>imapannex 0.2.0</h1>
<p>Hook program for gitannex to use imap as backend</p>
<h1>Requirements:</h1>
<pre><code>python2
</code></pre>
<h1>Install</h1>
<p>Clone the git repository in your home folder.</p>
<pre><code>git clone git://github.com/TobiasTheViking/imapannex.git
</code></pre>
<p>This should make a ~/imapannex folder</p>
<h1>Setup</h1>
<p>Make the file executable, and link it into PATH</p>
<pre><code>cd ~/imapannex; chmod +x git-annex-remote-imap; sudo ln -sf `pwd`/git-annex-remote-imap /usr/local/bin/git-annex-remote-imap
</code></pre>
<h1>Commands for gitannex:</h1>
<pre><code>USERNAME="username@provider.com" PASSWORD="password" git annex initremote imap type=external externaltype=imap encryption=shared folder=gitannex method="Normal password" ssl="SSL/TLS" host="imap.host.com" port="993"
git annex describe imap "the imap library"
</code></pre>
Git annex and Calibrehttp://git-annex.branchable.com/tips/Git_annex_and_Calibre/2020-06-17T01:18:32Z2013-08-13T16:30:17Z
<h1>The problem</h1>
<p><a href="http://calibre-ebook.com/">Calibre</a> is a ebook manager that is
available in <a href="http://packages.debian.org/sid/calibre">debian</a>. I use
it to maintain my library, but also to dowload every day an epub
version of a French newspaper and then put it on my kobo.</p>
<h1>Configuring git annex for this</h1>
<p>I wanted to use git-annex, so</p>
<pre><code>$ git init
$ git annex init "some useful name"
</code></pre>
<p>But I don't want every thing in annex, because Calibre use some text
file to save some metadata, so I used:</p>
<pre><code>$ git config annex.largefiles "include=* exclude=*.opf exclude=*.json"
</code></pre>
<p>then lets add everything</p>
<pre><code>$ git annex add *
$ git add *
$ git commit -m "first commit"
</code></pre>
<p>Calibre need read and write access on the its database, so let unlock it:</p>
<pre><code>$ git annex unlock metadata.db
</code></pre>
<p>On my other computer I only need to do</p>
<pre><code>$ git clone $user@$host:Calibre\ library
$ cd Calibre\ library
$ git annex init "another useful name"
$ git annex get .
$ git annex unlock metadata.db
</code></pre>
<p>The problem is that every time you will <code>git annex sync</code>, git annex
will lock again the metadata.db, so lets unlock it automatically. I
use git hooks, in <code>.git/hooks/post-commit</code> I have</p>
<pre><code>#!/bin/bash
git annex edit metadata.db
</code></pre>
<p>don't forget to make this file executable</p>
<pre><code>$ chmod a+x .git/hooks/post-commit
</code></pre>
<h1>Day to day operation</h1>
<pre><code>$ git annex add .
</code></pre>
<p>Will put new file into the annex</p>
<pre><code>$ git add .
</code></pre>
<p>Will take care of the files that should no go into annex</p>
<pre><code>$ git annex sync
</code></pre>
<p>Will make the repositories exchange informations about all this, and
make remote change local</p>
<pre><code>$ git annex get .
</code></pre>
<p>Will make remote book locally available</p>
<h2>Merge conflict</h2>
<p>You should not run calibre on the two computer simultaneously, or
without syncing before it. If you do, you will have a conflict that
git-annex will automatically <em>solve</em> by rename both of the file.</p>
<p>You can then either:</p>
<ul>
<li>Choose one. If no books have been changed or added on one of the
computer, to use the other <code>metadata.db</code> will not make you loose
any information</li>
<li>rebuild it. <code>calibredb restore_database</code> won't do it, but will tell
you how to do it.</li>
</ul>
<h2>Checking the library</h2>
<p>You can use <code>calibredb check_library</code> to check you library is
correct. If you use git for it, it will always tell you that it is not
correct: there is this author ".git" it doesn't know about. Just don't
care about it.</p>
<p>Maybe this can be solved by using <code>vcsh</code> but apparently
<code>vcsh</code>+<code>git annex</code> it not well tested yet.</p>
<h2>Automatic stuff</h2>
<p>I use <code>mr</code> to automatically run all this, but some config could be
done (I believe) to have <code>git annex copy --auto</code> do what it should.</p>
<p>There are also the git annex assistant for this kind of automatic
synchronizations of contents, but I don't know if my automatic
unlocking of one file will break this.</p>
<p>It might be interesting to find someway to unlock and lock the library
only when running calibre, a simple script to launch calibre will do
that. Note that each time you will lock and unlock, you will have a
new commit in git.</p>
beware of SSD wear when doing fsck on large special remoteshttp://git-annex.branchable.com/tips/beware_of_SSD_wear_when_doing_fsck_on_large_special_remotes/2013-11-27T22:47:37Z2013-07-31T04:39:21Z
<p>When git annex does fsck on (for example) a GPG-encrypted special directory remote, it first transfers the whole file into .git/annex/tmp directory.
If your annex is on an SSD, it's a good idea to make .git/annex/tmp a symlink to say /var/tmp so SSD isn't worn down. This actually may be a better default.</p>
downloading podcastshttp://git-annex.branchable.com/tips/downloading_podcasts/2024-01-30T20:12:40Z2013-07-28T20:58:26Z
<p>You can use git-annex as a podcatcher, to download podcast contents.
No additional software is required, but your git-annex must be built
with the Feeds feature (run <code>git annex version</code> to check).</p>
<p>All you need to do is put something like this in a cron job:</p>
<p><code>cd somerepo && git annex importfeed http://url/to/podcast http://other/podcast/url</code></p>
<p>This downloads the urls, and parses them as RSS, Atom, or RDF feeds.
All enclosures are downloaded and added to the repository, the same as if you
had manually run <code>git annex addurl</code> on each of them.</p>
<p>git-annex will avoid downloading a file from a feed if its url has already
been stored in the repository before. So once a file is downloaded,
you can move it around, delete it, <code>git annex drop</code> its content, etc,
and it will not be downloaded again by repeated runs of
<code>git annex importfeed</code>. Just how a podcatcher should behave. (git-annex versions
since 2015 also tracks the podcast <code>guid</code> values, as metadata, to help avoid
duplication if the media file url changes; use <code>git annex metadata ...</code> to inspect.)</p>
<h2>templates</h2>
<p>To control the filenames used for items downloaded from a feed,
there's a --template option. The default is
<code>--template='${feedtitle}/${itemtitle}${extension}'</code></p>
<p>Other available template variables:<br />
feedauthor, itemauthor, itemsummary, itemdescription, itemrights, itemid,
itempubdate, author, title.</p>
<h2>catching up</h2>
<p>To catch up on a feed without downloading its contents,
use <code>git annex importfeed --relaxed</code>, and delete the symlinks it creates.
Next time you run <code>git annex addurl</code> it will only fetch any new items.</p>
<h2>fast mode</h2>
<p>To add a feed without downloading its contents right now,
use <code>git annex importfeed --fast</code>. Then you can use <code>git annex get</code> as
usual to download the content of an item.</p>
<h2>storing the podcast list in git</h2>
<p>You can check the list of podcast urls into git right next to the
files it downloads. Just make a file named feeds and add one podcast url
per line.</p>
<p>Then you can run git-annex on all the feeds:</p>
<p><code>xargs git-annex importfeed < feeds</code></p>
<h2>recreating lost episodes</h2>
<p>If for some reason git-annex refuses to download files you are certain are in the podcast, it is quite possible it is because they have already been downloaded. In any case, you can use <code>--force</code> to redownload them:</p>
<p><code>git-annex importfeed --force http://example.com/feed</code></p>
<h2>distributed podcatching</h2>
<p>A nice benefit of using git-annex as a podcatcher is that you can
run <code>git annex importfeed</code> on the same url in different clones
of a repository, and <code>git annex sync</code> will sync it all up.</p>
<h2>centralized podcatching</h2>
<p>You can also have a designated machine which always fetches all podcstas
to local disk and stores them. That way, you can archive podcasts with
time-delayed deletion of upstream content. You can also work around slow
downloads upstream by podcatching to a server with ample bandwidth or work
around a slow local Internet connection by podcatching to your home server
and transferring to your laptop on demand.</p>
<h2>youtube channels</h2>
<p>You can also use <code>git annex importfeed</code> on youtube channels.
It will use yt-dlp to automatically download the videos.</p>
<p>You can either use <code>git-annex importfeed --scrape</code> with the url to the
channel, or you can find the RSS feed for the channel, and
<code>git-annex importfeed</code> that url (without <code>--scrape</code>).</p>
<p>Use of yt-dlp is disabled by default as it can be a security risk.
See the documentation of annex.security.allowed-ip-addresses
in <a href="http://git-annex.branchable.com/git-annex/">git-annex</a> for details.)</p>
<h2>metadata</h2>
<p>As well as storing the urls for items imported from a feed, git-annex can
store additional <a href="http://git-annex.branchable.com/metadata/">metadata</a>, like the author, and itemdescription.
This can then be looked up later, used in <a href="http://git-annex.branchable.com/tips/metadata_driven_views/">metadata driven views</a>, etc.</p>
<p>To make all available metadata from the feed be stored:
<code>git config annex.genmetadata true</code></p>
yet another simple disk usage like utilityhttp://git-annex.branchable.com/tips/yet_another_simple_disk_usage_like_utility/2013-11-27T22:47:37Z2013-07-12T19:28:09Z
<p>Here's the annex-du script that I use:</p>
<h1>!/bin/sh</h1>
<p>git annex find "$@" --include '*' --format='${bytesize}\n' |awk '{ sum += $1; nfiles++; } END { printf "%d files, %.3f MB\n", nfiles, sum/1000000 } '</p>
<p>This one can be slow on a large number of files, but it has an advantage of being able to use all of the filtering available in git annex find.
For example, to figure out how much is stored in remote X, do</p>
<p>annex-du --in=X</p>
Delay Assistant Startup on Loginhttp://git-annex.branchable.com/tips/Delay_Assistant_Startup_on_Login/2013-11-27T22:47:37Z2013-06-21T14:18:36Z
<h1>Problem</h1>
<p>I noticed that after installing git-annex assistant, my start up times greatly increased because the assistant does a startup scan while everything else is loading.</p>
<h1>Solution (for people using Gnome)</h1>
<p>The solution I came up with is to delay the assistant's startup, as well as setting its IO priority as idle. To do this in Gnome 3, run:</p>
<pre><code>gnome-session-properties
</code></pre>
<p>Find the "Git Annex Assistant" entry in the Startup Programs tab, then click edit. Change this:</p>
<pre><code>/usr/local/bin/git-annex assistant --autostart (your location of git-annex may be different)
</code></pre>
<p>to this:</p>
<pre><code>bash -c "sleep 30; ionice -c3 /usr/local/bin/git-annex assistant --autostart" (replace /usr/local/bin to wherever git-annex is installed)
</code></pre>
<p>The "sleep 30" command delays the startup of the assistant by 30 seconds, and "ionice -c3" sets git-annex's IO priority to "idle," the lowest level.</p>
owncloudannexhttp://git-annex.branchable.com/tips/owncloudannex/2020-06-17T01:18:32Z2013-06-01T08:42:56Z
<p>For using Owncloud and Nextcloud
as a special remote, there are currently three choices:</p>
<ul>
<li><p>Use git-annex's builtin <a href="http://git-annex.branchable.com/special_remotes/webdav/">webdav</a> support.</p></li>
<li><p>Alternatively, rclone supports them
so the <a href="http://git-annex.branchable.com/special_remotes/rclone/">rclone special remote</a> can be used.</p></li>
<li><p>Alternatively, there is a dedicated special remote,
<a href="https://github.com/TobiasTheViking/owncloudannex">https://github.com/TobiasTheViking/owncloudannex</a>
(Last updated 2014)</p></li>
</ul>
<p>At this time it's not clear which is better, so if you find one works
best, please comment below.</p>
skydriveannexhttp://git-annex.branchable.com/tips/skydriveannex/2014-01-11T13:25:48Z2013-05-27T21:26:21Z
<h1>skydriveannex 0.2.1</h1>
<p>Hook program for gitannex to use <a href="http://en.wikipedia.org/wiki/SkyDrive">skydrive</a> (previously <em>Windows Live SkyDrive</em> and <em>Windows Live Folders</em>) as backend</p>
<h1>Requirements:</h1>
<pre><code>python2
python-yaml
</code></pre>
<p>Credit for the Skydrive api interface goes to https://github.com/mk-fg/python-skydrive</p>
<h1>Install</h1>
<p>Clone the git repository in your home folder.</p>
<pre><code>git clone git://github.com/TobiasTheViking/skydriveannex.git
</code></pre>
<p>This should make a ~/skydriveannex folder</p>
<h1>Setup</h1>
<p>Make the file executable, and link it into PATH</p>
<pre><code>cd ~/skydriveannex; chmod +x git-annex-remote-skydrive; sudo ln -sf `pwd`/git-annex-remote-skydrive /usr/local/bin/git-annex-remote-skydrive
</code></pre>
<h1>Commands for gitannex:</h1>
<pre><code>git annex initremote skydrive type=external externaltype=skydrive encryption=shared folder=gitannex
</code></pre>
<p>An oauth authentication link should now be launched in the default browser. Authenticate, and use the last url as OAUTH key.</p>
<pre><code>OAUTH='URL after last redirect' git annex initremote skydrive type=external externaltype=skydrive encryption=shared folder=gitannex
git annex describe skydrive "the skydrive library"
</code></pre>
dropboxannexhttp://git-annex.branchable.com/tips/dropboxannex/2021-07-06T19:50:36Z2013-05-26T22:32:11Z
<p>For using Dropbox as a special remote, there are currently several choices:</p>
<ul>
<li>rclone supports the Dropbox API,
so the <a href="http://git-annex.branchable.com/special_remotes/rclone/">rclone special remote</a> can be used.<br />
(Actively maintained)</li>
<li>Alternatively, there are some dedicated special remotes:
<ul>
<li><a href="https://github.com/TobiasTheViking/dropboxannex">https://github.com/TobiasTheViking/dropboxannex</a><br />
(Last updated 2014)</li>
<li><a href="https://pypi.org/project/git-annex-remote-dbx/">https://pypi.org/project/git-annex-remote-dbx/</a></li>
</ul>
</li>
</ul>
<p>At this time it's not clear which is better, so if you find one works
best, please comment below.</p>
flickrannexhttp://git-annex.branchable.com/tips/flickrannex/2014-01-10T14:33:01Z2013-05-22T20:42:13Z
<h1>flickrannex</h1>
<p>Hook program for gitannex to use flickr as backend</p>
<h1>Requirements:</h1>
<pre><code>python2
</code></pre>
<p>Credit for the flickr api interface goes to: http://stuvel.eu/flickrapi
Credit for the png library goes to: https://github.com/drj11/pypng
Credit for the png tEXt patch goes to: https://code.google.com/p/pypng/issues/detail?id=65</p>
<h1>Install</h1>
<p>Clone the git repository in your home folder.</p>
<pre><code>git clone git://github.com/TobiasTheViking/flickrannex.git
</code></pre>
<p>This should make a ~/flickrannex folder</p>
<h1>Setup</h1>
<p>Make the file executable, and link it into PATH</p>
<pre><code>cd ~/flickrannex; chmod +x git-annex-remote-flickr; sudo ln -sf `pwd`/git-annex-remote-flickr /usr/local/bin/git-annex-remote-flickr
</code></pre>
<h1>Commands for gitannex:</h1>
<pre><code>USERNAME="username@provider.com" git annex initremote flickr type=external externaltype=flickr encryption=shared folder=gitannex
</code></pre>
<p>An oauth authentication link should now be launched in the default browser. The hook will wait for 30s for you to login and authenticate.</p>
<pre><code>git annex describe dropbox "the flickr library"
</code></pre>
<h1>Notes</h1>
<h2>Unencrypted mode</h2>
<p>The photo name on flickr is currently the GPGHMACSHA1 version.</p>
<h2>Encrypted mode</h2>
<p>The current version base64 encodes all the data, which results in ~35% larger filesize.</p>
<h2>Including directories as tags</h2>
<p>This feature is currently disabled, if it gets implemented again it will most likely not require user action to enable it.</p>
<p>In this case the image:
/home/me/annex-photos/holidays/2013/Greenland/img001.jpg
would get the following tags: "holidays" "2013" "Greenland"
(assuming "/home/me/annex-photos" is the top level in the annex...)</p>
<p>Caveat Emptor - Tags will <em>always</em> be NULL for indirect repos - we don't (easily) know the human-readable file name.</p>
megaannexhttp://git-annex.branchable.com/tips/megaannex/2020-06-17T01:18:32Z2013-05-21T17:28:09Z
<p>For using <a href="https://mega.nz">Mega</a>
as a special remote, there are currently three choices:</p>
<ul>
<li>There is a dedicated special remote,
<a href="https://github.com/dxtr/megaannex-go">https://github.com/dxtr/megaannex-go</a><br />
Last updated 2016</li>
<li>Alternatively, rclone supports Mega,
so the <a href="http://git-annex.branchable.com/special_remotes/rclone/">rclone special remote</a> can be used.</li>
<li>Alternatively, there's an older dedicated special remote,
<a href="https://github.com/TobiasTheViking/megaannex">https://github.com/TobiasTheViking/megaannex</a><br />
Reported to be no longer working</li>
</ul>
<p>At this time it's not clear which is better, so if you find one works
best, please comment below.</p>
Using Git-annex as a web browsing assistanthttp://git-annex.branchable.com/tips/Using_Git-annex_as_a_web_browsing_assistant/2018-06-26T16:00:19Z2013-04-11T07:27:14Z
<p><a href="http://git-annex.branchable.com/todo/wishlist__58___an___34__assistant__34___for_web-browsing_--_tracking_the_sources_of_the_downloads/">wishlist: an "assistant" for web-browsing -- tracking the sources of the downloads</a> suggests using git-annex as a tool to store downloads tied
to their URLs. This also enables people to have their files stored offline,
while being able to git annex drop them at any time and redownload them
with git annex get. Additionally, a clone of the repo can be used to
download whatever files are desired from online.</p>
<p>This tip explains how to implement a similar system to the one described in
the linked wishlist with existing software and features of git-annex.</p>
<p>The first step is to install the Firefox plugin
<a href="http://flashgot.net/">FlashGot</a>. We will use it to provide the Firefox
shortcuts to add things to our annex.</p>
<p>Once we have installed all that, we need a script that has an interface
which FlashGot can treat as a downloader, but which calls git-annex to do
the actual downloading. Such a script is available from
<a href="https://gist.github.com/andyg0808/5342434">https://gist.github.com/andyg0808/5342434</a>. Download it and store it
somewhere it can live, or cut and paste:</p>
<div class="highlight-sh"><pre class="hl"><span class="hl slc">#!/bin/bash</span>
<span class="hl slc"># $1=folder to cd to (must be a git annex repo)</span>
<span class="hl slc"># $2=URL to download</span>
<span class="hl kwb">cd</span> <span class="hl str">"</span><span class="hl ipl">$1</span><span class="hl str">"</span>
git-annex addurl <span class="hl str">"</span><span class="hl ipl">$2</span><span class="hl str">"</span>
</pre></div>
<p>Finally, we need to configure FlashGot to use the script as a downloader.
Go to Tools > Add-ons in Firefox. Click "Preferences" on FlashGot. Click
the Add button next to the list of download managers. Enter a name for the
git-annex downloader. Choose the script that was downloaded from the
"Locate executable file" dialog that appears. Now set the command line
arguments template to be "[FOLDER] [URL]" (you can find more substitution
expressions in the Placeholders dropdown above the Command line arguments
template field). You're done!</p>
<p>Go ahead and test it by trying to download a file using FlashGot. It should
offer as one of its available download managers the new manager you created
just above. Select it and have fun!</p>
replacing Sparkleshare or dvcs-autosync with the assistanthttp://git-annex.branchable.com/tips/replacing_Sparkleshare_or_dvcs-autosync_with_the_assistant/2016-02-02T20:51:22Z2013-03-29T21:06:36Z
<p>Sparkleshare and dvcs-autosync are tools to automatically commit your
changes to git and keep them in sync with other repositories. Unlike
git-annex, they don't store the file content on the side, but directly in
the git repository. Great for small files, less good for big files.</p>
<p>Here's how to use the <a href="http://git-annex.branchable.com/assistant/">git-annex assistant</a> to do the same
thing, but even better!</p>
<hr />
<p>Let's suppose you're developing a video game, written in C. You have
source code, and some large game assets. You want to ensure the source
code is stored in git -- that's what git's for! And you want to store
the game assets in the git annex -- to avod bloating your git repos with
possibly enormous files, but still version control them.</p>
<p>All you need to do is configure git-annex to treat your C files
as small files. And treat any file larger than, say, 100kb as a large
file that is stored in the annex.</p>
<pre><code>git config annex.largefiles "largerthan=100kb and not (include=*.c or include=*.h)"
</code></pre>
<p>For more details about this configuration, see <a href="http://git-annex.branchable.com/tips/largefiles/">largefiles</a>.</p>
<hr />
<p>Now if you run <code>git annex add</code>, it will only add the large files to the
annex; small files will be stored in git.</p>
<p>Or, run <code>git annex assistant</code>. It will <em>automatically</em>
add the large files to the annex, and store the small files in git.
It'll notice every time you modify a file, and immediately commit it,
too. And sync it out to other repositories you configure using <code>git annex
webapp</code>.</p>
<hr />
<p>It's also possible to disable the use of the annex entirely, and just
have the assistant <em>always</em> put every file into git, no matter its size:</p>
<pre><code>git config annex.largefiles "exclude=*"
</code></pre>
Building git-annex on Debian OR %¤#"¤%&"# Haskell!http://git-annex.branchable.com/tips/Building_git-annex_on_Debian_OR___37____164____35____34____164____37____38____34____35___Haskell__33__/2013-11-27T22:47:37Z2013-03-13T00:58:31Z
<p>I've been wrestling with git-annex to try to make it build on Debian, or more specifically, wrestling with Haskell dependencies.</p>
<p>After a fair amount of futzing around, and pestering a bunch of people in the process (thanks for the help! <img src="http://git-annex.branchable.com/smileys/smile.png" alt=":)" /> ) I finally managed to make it build.</p>
<p>I figured I would post the steps here, since it's not completely trivial, and I expect that a few others might be interested in building newer versions as well.</p>
<p>There appears to currently be two methods:</p>
<ul>
<li>Debian packages on Wheezy plus Sid
<ul>
<li>Starting out on Wheezy, and then picking the rest from Sid (it seems at least libghc-safesemaphore-dev from Sid is critical for newer git-annex)</li>
<li>WebDAV suport will not be available with this method</li>
</ul>
</li>
<li>Cabal packages</li>
</ul>
<h1>Debian packages on Wheezy plus Sid</h1>
<h2>Start off with a clean wheezy chroot</h2>
<pre><code>sudo debootstrap wheezy debian-wheezy
sudo chroot debian-wheezy
</code></pre>
<h2>Install some build tools</h2>
<pre><code>apt-get update
apt-get install devscripts git
</code></pre>
<h2>Get git-annex (either by cloning or simply moving the source into the chroot)</h2>
<pre><code>mkdir /src
cd /src
git clone git://git-annex.branchable.com/source.git git-annex
cd git-annex
</code></pre>
<h2>Remove WebDAV dependency which can't be satisfied anywhere</h2>
<pre><code>sed '/libghc-dav-dev/d' -i debian/control
</code></pre>
<h2>Create dummy build-depends package and install all available Wheezy dependencies using it</h2>
<pre><code>mk-build-deps
dpkg -i git-annex-build-deps*.deb
apt-get install -f
</code></pre>
<p>(this will remove the build-depends package)</p>
<h2>Add Sid sources and install all available Sid dependencies</h2>
<pre><code>echo "deb http://http.debian.net/debian sid main" >>/etc/apt/sources.list
apt-get update
dpkg -i git-annex-build-deps*.deb
apt-get install -f
</code></pre>
<p>(the build-depends package should now be fully installed)</p>
<h2>Disable the 'make test' that fails due to missing hothasktags</h2>
<pre><code>echo >>debian/rules
echo "override_dh_auto_test:" >>debian/rules
</code></pre>
<h2>Build!</h2>
<pre><code>debuild -us -uc -Igit
</code></pre>
<h1>Cabal packages</h1>
<h2>Start off with a clean Sid(/Wheezy) chroot</h2>
<pre><code>sudo debootstrap sid debian-sid
sudo chroot debian-sid
</code></pre>
<h2>Install a smaller set of tools and build-depends from Debian (cabal needs these to compile the Haskell stuff)</h2>
<pre><code>apt-get update
apt-get install ghc cabal-install devscripts libz-dev pkg-config c2hs libgsasl7-dev libxml2-dev libgnutls-dev c2hs git debhelper ikiwiki perlmagick uuid rsync openssh-client fakeroot
</code></pre>
<h2>Get git-annex (either by cloning or simply moving the source into the chroot)</h2>
<pre><code>mkdir /src
cd /src
git clone git://git-annex.branchable.com/source.git git-annex
cd git-annex
</code></pre>
<h2>Install the Haskell build-dependencies from cabal</h2>
<pre><code>cabal update
cabal install --only-dependencies
</code></pre>
<h2>Optional step which doesn't work (might in the future)</h2>
<p>If we want to run the 'make test' after build we need hothasktags, which is only available via cabal</p>
<pre><code>apt-get install happy
cabal install hothasktags
export PATH=$PATH:~/.cabal/bin
</code></pre>
<p>But this currently fails silently inside make test->fast->tags, and if you dig a bit (manually edit the makefile to be more verbose) you see</p>
<pre><code>hothasktags: ./Command/AddUnused.hs: hGetContents: invalid argument (invalid byte sequence)
</code></pre>
<h2>Disable the 'make test' that fails</h2>
<pre><code>echo >>debian/rules
echo "override_dh_auto_test:" >>debian/rules
</code></pre>
<h2>Remove all Debian package haskell depends (taken care of by cabal instead)</h2>
<pre><code>sed '/\tlibghc/d' -i debian/control
</code></pre>
<h2>Build!</h2>
<pre><code>debuild -us -uc -Igit
</code></pre>
using Google Cloud Storagehttp://git-annex.branchable.com/tips/using_Google_Cloud_Storage/2020-06-17T01:18:32Z2013-01-25T23:10:35Z
<p>For using <a href="https://cloud.google.com/products/cloud-storage">Google Cloud Storage</a>
as a special remote, there are currently three choices:</p>
<ul>
<li><p>Google Cloud Storage supports supports the same API as Amazon S3, so
git-annex's built-in <a href="http://git-annex.branchable.com/special_remotes/S3/">S3 special remote</a> can be used
with it. You may need to configure Google Cloud Storage to allow
"Interoperable Access". Here is how to set up the special remote:</p>
<p> git annex initremote cloud type=S3 encryption=none host=storage.googleapis.com port=80</p></li>
<li><p>Alternatively, rclone supports Google Cloud Storage's native API,
so the <a href="http://git-annex.branchable.com/special_remotes/rclone/">rclone special remote</a> can be used.</p></li>
<li><p>Alternatively, there is a dedicated special remote,
<a href="https://github.com/bgilbert/gcsannex">https://github.com/bgilbert/gcsannex</a><br />
(Last updated 2016)</p></li>
</ul>
<p>At this time it's not clear which is better, so if you find one works
best, please comment below.</p>
How to retroactively annex a file already in a git repohttp://git-annex.branchable.com/tips/How_to_retroactively_annex_a_file_already_in_a_git_repo/2013-11-27T22:47:37Z2012-12-17T16:43:34Z
<p>I worked out how to retroactively annex a large file that had been checked into a git repo some time ago. I thought this might be useful for others, so I am posting it here.</p>
<p>Suppose you have a git repo where somebody had checked in a large file you would like to have annexed, but there are a bunch of commits after it and you don't want to loose history, but you also don't want everybody to have to retrieve the large file when they clone the repo. This will re-write history as if the file had been annexed when it was originally added.</p>
<p>This command works for me, it relies on the current behavior of git which is to use a directory named .git-rewrite/t/ at the top of the git tree for the extracted tree. This will not be fast and it will rewrite history, so be sure that everybody who has a copy of your repo is OK with accepting the new history. If the behavior of git changes, you can specify the directory to use with the -d option. Currently, the t/ directory is created inside the directory you specify, so "-d ./.git-rewrite/" should be roughly equivalent to the default.</p>
<p>Enough with the explanation, on to the command:</p>
<pre>
git filter-branch --tree-filter 'for FILE in file1 file2 file3;do if [ -f "$FILE" ] && [ ! -L "$FILE" ];then git rm --cached "$FILE";git annex add "$FILE";ln -sf `readlink "$FILE"|sed -e "s:^../../::"` "$FILE";fi;done' --tag-name-filter cat -- --all
</pre>
<p>replace file1 file2 file3... with whatever paths you want retroactively annexed. If you wanted bigfile1.bin in the top dir and subdir1/bigfile2.bin to be retroactively annexed try:</p>
<pre>
git filter-branch --tree-filter 'for FILE in bigfile1.bin subdir1/bigfile2.bin;do if [ -f "$FILE" ] && [ ! -L "$FILE" ];then git rm --cached "$FILE";git annex add "$FILE";ln -sf `readlink "$FILE"|sed -e "s:^../../::"` "$FILE";fi;done' --tag-name-filter cat -- --all
</pre>
<p><strong>If your repo has tags</strong> then you should take a look at the git-filter-branch man page about the --tag-name-filter option and decide what you want to do. By default this will re-write the tags "nearly properly".</p>
<p>You'll probably also want to look at the git-filter-branch man page's section titled "CHECKLIST FOR SHRINKING A REPOSITORY" if you want to free up the space in the existing repo that you just changed history on.</p>
Decentralized repository behind a Firewallhttp://git-annex.branchable.com/tips/Decentralized_repository_behind_a_Firewall/2013-11-27T22:47:37Z2012-11-30T14:38:42Z
<p>If you're anything like me¹, you have a copy of your annex on a computer running at home², set up so you can access it from anywhere like this:</p>
<pre><code>ssh myhome.no-ip.org
</code></pre>
<p>This is totally great! Except, there is no way for your home computer to pull your changes, because there is no <em>on-the-go.no-ip.org</em>. You can get clunky and use a <em>bare git repository and git push</em>, but there is a better way.</p>
<p>First, install <em>openssh-server</em> on your <em>on-the-go</em> computer</p>
<pre><code>sudo apt-get install openssh-server # Adjust to your flavor of unix
</code></pre>
<p>Then, log into your <em>home</em> computer, with <em>port forwarding</em>:</p>
<pre><code>ssh me@myhome.no-ip.org -R 2201:localhost:22
</code></pre>
<p>Your <em>home</em> computer can now ssh into your <em>on-the-go</em> computer, as long as you keep the above shell running.</p>
<p>You can now add your <em>on-the-go</em> computer as a remote on your <em>home</em> computer. Use the port forwarding shell you just connected with the command above, if you like.</p>
<pre><code>ssh-keygen -t rsa
ssh-copy-id "me@localhost -p 2201"
cd ~/annex
git remote add on-the-go ssh://me@localhost:2201/home/myuser/annex
</code></pre>
<p>Now you can run normal annex operations, as long as the port forwarding shell is running³.</p>
<pre><code>git annex sync
git annex get on-the-go some/big/file
git annex info
</code></pre>
<p>You can add more computers by repeating with a different port, e.g. 2202 or 2203 (or any other).</p>
<p>If you're security paranoid (like me), read on. If you're not, that's it! Thanks for reading!</p>
<hr />
<p>Paranoid Area</p>
<p>Note you're granting passwordless access to your on-the-go computer to your home computer. I believe that's all right, as long as:</p>
<ul>
<li>Your home computer is really in your home, and not at a friend's house or some datacenter</li>
<li>Your home computer can be accessed only by ssh, and not HTTP or Samba or NTP or (shoot me now!) FTP</li>
<li>Only you (and perhaps trustworthy family) have access to your home computer</li>
<li>You have reasonably strong passwords or key-only logins on both your home and on-the-go computers.</li>
<li>You regularly install security updates on both computers (sudo apt-get update && sudo apt-get upgrade)</li>
</ul>
<p>In any case, the setup is much, much, much more secure than Dropbox. With Dropbox, you have exactly the same setup, but:</p>
<ul>
<li>Your data is stored in some datacenter. It's supposed to be encrypted. It might not be.</li>
<li>Lot's of people have routine access to your files, and plausible reason to. Bored employees might regularly be doing some 'maintenance work' involving your pictures.</li>
<li>The dropbox software can do anything it likes on your computer, and it's closed source so you don't know if it does. A disgruntled employee could put a trojan into it.</li>
<li>Dropbox might have a backdoor for employee access to any file on your computer. This might be done with the best of intentions, but a mal-intentioned or careless employee might still erase things or send sensitive files from your computer by email.</li>
<li>A truly huge amount of eyes connected to incredibly smart brains have looked at openssh and found it secure. Everybody trusts openssh. With dropbox, there is, well, dropbox. Whoever that is.</li>
</ul>
<hr />
<p>¹ Me=Carlo, not Joey. I'm pretty sure doing what I wrote here is a good idea, but in case it turns out to be catastrophically dumb, it's my fault, not his.</p>
<p>² My always-on computer at home is a raspberry pi with a 32GB USB stick. Best self-hosted dropbox you could imagine.</p>
<p>³ You can just forward the port, but not open a shell, by adding the -N command. This could be useful for connecting on startup, e.g. in /etc/rc.local. I prefer to open the shell to forward the ports, maybe use it, and close it to stop it.</p>
using Amazon Glacierhttp://git-annex.branchable.com/tips/using_Amazon_Glacier/2021-03-02T22:09:45Z2012-11-20T20:43:58Z
<p>Amazon Glacier provides low-cost storage, well suited for archiving and
backup. But it takes around 4 hours to get content out of Glacier.</p>
<p>Recent versions of git-annex support Glacier. To use it, you need to have
<a href="http://github.com/basak/glacier-cli">glacier-cli</a> installed.</p>
<p>First, export your Amazon AWS credentials:</p>
<pre><code> # export AWS_ACCESS_KEY_ID="08TJMT99S3511WOZEP91"
# export AWS_SECRET_ACCESS_KEY="s3kr1t"
</code></pre>
<p>Now, create a gpg key, if you don't already have one. This will be used
to encrypt everything stored in Glacier, for your privacy. Once you have
a gpg key, run <code>gpg --list-secret-keys</code> to look up its key id, something
like "2512E3C7"</p>
<p>Next, create the Glacier remote.</p>
<pre><code># git annex initremote glacier type=glacier keyid=2512E3C7
initremote glacier (encryption setup with gpg key C910D9222512E3C7) (gpg) ok
</code></pre>
<p>The configuration for the Glacier remote is stored in git. So to make another
repository use the same Glacier remote is easy:</p>
<pre><code> # cd /media/usb/annex
# git pull laptop
# git annex enableremote glacier
initremote glacier (gpg) ok
</code></pre>
<p>Now the remote can be used like any other remote.</p>
<pre><code> # git annex move my_cool_big_file --to glacier
copy my_cool_big_file (gpg) (checking glacier...) (to glacier...) ok
</code></pre>
<p>But, when you try to get a file out of Glacier, it'll queue a retrieval
job:</p>
<pre><code># git annex get my_cool_big_file
get my_cool_big_file (from glacier...) (gpg)
glacier: queued retrieval job for archive 'GPGHMACSHA1--862afd4e67e3946587a9ef7fa5beb4e8f1aeb6b8'
Recommend you wait up to 4 hours, and then run this command again.
failed
</code></pre>
<p>Like it says, you'll need to run the command again later. Let's remember to
do that:</p>
<pre><code># at now + 4 hours
at> git annex get my_cool_big_file
</code></pre>
<p>Another oddity of Glacier is that git-annex is never entirely sure
if a file is still in Glacier. Glacier inventories take hours to retrieve,
and even when retrieved do not necessarily represent the current state.</p>
<p>So, git-annex plays it safe, and avoids trusting the inventory:</p>
<pre><code># git annex copy important_file --to glacier
copy important_file (gpg) (checking glacier...) (to glacier...) ok
# git annex drop important_file
drop important_file (gpg) (checking glacier...)
Glacier's inventory says it has a copy.
However, the inventory could be out of date, if it was recently removed.
(unsafe)
Could only verify the existence of 0 out of 1 necessary copies
</code></pre>
<p>To avoid this problem, you can either use <code>git annex move</code> to move
content to Glacier, or you can set the remote to be <span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=tips%2Fusing_Amazon_Glacier&page=trusted" rel="nofollow">?</a>trusted</span>.</p>
<p>A final potential gotcha with Glacier is that glacier-cli keeps a local
mapping of file names to Glacier archives. If this cache is lost, or
you want to retrieve files on a different box than the one that put them in
glacier, you'll need to use <code>glacier vault sync</code> to rebuild this cache.</p>
<p>See <a href="http://git-annex.branchable.com/special_remotes/glacier/">Glacier</a> for details.</p>
setup a public repository on a web sitehttp://git-annex.branchable.com/tips/setup_a_public_repository_on_a_web_site/2022-09-13T19:09:12Z2012-09-27T22:39:45Z
<p>Let's say you want to distribute some big files to the whole world.
You can of course, just drop them onto a website. But perhaps you'd like to
use git-annex to manage those files. And as an added bonus, why not let
anyone in the world clone your site and use <code>git-annex get</code>!</p>
<p>My site like this is <a href="https://downloads.kitenet.net">downloads.kitenet.net</a>.
Here's how I set it up. --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p>
<ol>
<li>Set up a web site. I used Apache, and configured it to follow symlinks.
<code>Options FollowSymLinks</code></li>
<li>Put some files on the website. Make sure it works.</li>
<li><code>git init; git annex init</code></li>
<li><code>git config core.sharedrepository world</code> (Makes sure files
are always added with permissions that allow everyone to read them.)</li>
<li><code>git config receive.denyCurrentBranch updateInstead</code> (Makes the
<a href="http://git-annex.branchable.com/tips/making_a_remote_repo_update_when_changes_are_pushed_to_it/">working tree update when changes are pushed to it</a>.)</li>
<li>We want users to be able to clone the git repository over http, because
git-annex can download files from it over http as well. For this to
work, <code>git update-server-info</code> needs to get run on the server after
commits or pushes to it. The git <code>post-update</code> hook will take care of
this, you just need to enable the hook on the server.
<code>mv .git/hooks/post-update.sample .git/hooks/post-update</code></li>
<li><code>git annex add; git commit -m added</code></li>
<li>Make sure users can still download files from the site directly.</li>
<li>Instruct advanced users to clone a http url that ends with the "/.git/"
directory. For example, for downloads.kitenet.net, the clone url
is <code>https://downloads.kitenet.net/.git/</code></li>
</ol>
<p>When users clone over http, and run git-annex, it will
automatically learn all about your repository and be able to download files
right out of it, also using http.</p>
emacs integrationhttp://git-annex.branchable.com/tips/emacs_integration/2016-08-31T18:43:40Z2012-08-31T02:07:31Z
<p>bergey has developed an emacs mode for browsing git-annex repositories,
dired style.</p>
<p><a href="https://gitorious.org/emacs-contrib/annex-mode">https://gitorious.org/emacs-contrib/annex-mode</a></p>
<p>Locally available files are colored differently, and pressing g runs
<code>git annex get</code> on the file at point.</p>
<hr />
<p>John Wiegley has developed a brand new git-annex interaction mode for
Emacs, which aims to integrate with the standard facilities
(C-x C-q, M-x dired, etc) rather than invent its own interface.</p>
<p><a href="https://github.com/jwiegley/git-annex-el">https://github.com/jwiegley/git-annex-el</a></p>
<p>He has also added support to org-attach; if
<code>org-attach-git-annex-cutoff</code> is non-nil and smaller than the size
of the file you're attaching then org-attach will <code>git annex add</code> the
file; otherwise it will <code>git add</code> it.</p>
<hr />
<p><a href="https://github.com/magit/magit-annex">magit-annex</a> adds git annex
operations to Magit.</p>
using box.com as a special remotehttp://git-annex.branchable.com/tips/using_box.com_as_a_special_remote/2019-02-04T14:40:00Z2012-03-04T14:49:28Z
<p><a href="http://box.com/">Box.com</a> is a file storage service.</p>
<p><strong> WebDAV access to box.com will be deprecated at some point in the near future (originally was scheduled to be January 31, 2019 - but it has been pushed back to a yet to be defined date). At that point, the method described on this page will no longer work. See <a href="https://community.box.com/t5/Box-Product-News/Deprecation-WebDAV-Support/ba-p/55684">this announcement</a> for further details. </strong></p>
<p>git-annex can use Box as a <a href="http://git-annex.branchable.com/special_remotes/">special remote</a>.
Recent versions of git-annex make this very easy to set up
and use.</p>
<h2>git-annex setup</h2>
<p>Create the special remote, in your git-annex repository.
<strong> This example is non-encrypted; fill in your gpg key ID for a securely
encrypted special remote! </strong></p>
<pre><code>WEBDAV_USERNAME=you@example.com WEBDAV_PASSWORD=xxxxxxx git annex initremote box.com type=webdav url=https://dav.box.com/dav/git-annex chunk=50mb encryption=none
</code></pre>
<p>Note the use of <a href="http://git-annex.branchable.com/chunking/">chunking</a>. Box has a limit on the maximum size of file
that can be stored there (currently 256 MB). git-annex can break up large
files into chunks to avoid the size limit. This needs git-annex version
3.20120303 or newer, which adds support for chunking.</p>
<p>Now git-annex can copy files to box.com, get files from it, etc, just like
with any other special remote.</p>
<pre><code>% git annex copy bigfile --to box.com
bigfile (to box.com...) ok
% git annex drop bigfile
bigfile (checking box.com...) ok
% git annex get bigfile
bigfile (from box.com...) ok
</code></pre>
<h2>exporting trees</h2>
<p>By default, files stored in Box will show up there named
by their git-annex key, not the original filename. If the filenames
are important, you can run <code>git annex initremote</code> with an additional
parameter "exporttree=yes", and then use <a href="http://git-annex.branchable.com/git-annex-export/">git-annex-export</a> to publish
a tree of files to Box.</p>
<p>Note that chunking can't be used when exporting a tree of files,
so Box's 250 mb limit will prevent exporting larger files.</p>
<h1>old davfs2 method</h1>
<p>This method is deprecated, but still documented here just in case.
Note that the files stored using this method cannot reliably be retreived
using the webdav special remote.</p>
<h2>davfs2 setup</h2>
<ul>
<li>First, install
the <a href="http://savannah.nongnu.org/projects/davfs2">davfs2</a> program,
which can mount Box using WebDAV. On Debian, just <code>sudo apt-get install davfs2</code></li>
<li>Allow users to mount davfs filesystems, by ensuring that
<code>/sbin/mount.davfs</code> is setuid root. On Debian, just <code>sudo dpkg-reconfigure davfs2</code></li>
<li><p>Add yourself to the davfs2 group.</p>
<pre><code> sudo adduser $(whoami) davfs2
</code></pre></li>
<li><p>Edit <code>/etc/fstab</code>, and add a line to mount Box using davfs.</p>
<pre><code> sudo mkdir -p /media/box.com
echo "https://dav.box.com/dav/ /media/box.com davfs noauto,user 0 0" | sudo tee -a /etc/fstab
</code></pre></li>
<li><p>Create <code>~/.davfs2/davfs2.conf</code> with some important settings:</p>
<pre><code> mkdir ~/.davfs2/
echo use_locks 0 > ~/.davfs2/davfs2.conf
echo cache_size 1 >> ~/.davfs2/davfs2.conf
echo delay_upload 0 >> ~/.davfs2/davfs2.conf
</code></pre></li>
<li><p>Create <code>~/.davfs2/secrets</code>. This file contains your Box.com login and password.
Your login is probably the email address you signed up with.</p>
<pre><code> echo "/media/box.com id@joeyh.name mypassword" > ~/.davfs2/secrets
chmod 600 ~/.davfs2/secrets
</code></pre></li>
<li><p>Now you should be able to mount Box, as a non-root user:</p>
<pre><code> mount /media/box.com
</code></pre></li>
</ul>
using assume-unstages to speed up git with large trees of annexed fileshttp://git-annex.branchable.com/tips/assume-unstaged/2013-11-27T22:47:37Z2012-02-03T20:57:07Z
<p>Git update-index's assume-unstaged feature can be used to speed
up <code>git status</code> and stuff by not statting the whole tree looking for changed
files.</p>
<p>This feature works quite well with git-annex. Especially because git
annex's files are immutable, so aren't going to change out from under it,
this is a nice fit. If you have a very large tree and <code>git status</code> is
annoyingly slow, you can turn it on:</p>
<pre><code>git config core.ignoreStat true
</code></pre>
<p>When <code>git mv</code> and <code>git rm</code> are used, those changes <em>do</em> get noticed, even
on assume-unchanged files. When new files are added, eg by <code>git annex add</code>,
they are also noticed.</p>
<p>There are two gotchas. Both occur because <code>git add</code> does not stage
assume-unchanged files.</p>
<ol>
<li>When an annexed file is moved to a different directory, it updates
the symlink, and runs <code>git add</code> on it. So the file will move,
but the changed symlink will not be noticed by git and it will commit a
dangling symlink.</li>
<li>When using <code>git annex migrate</code>, it changes the symlink and <code>git adds</code>
it. Again this won't be committed.</li>
</ol>
<p>These can be worked around by running <code>git update-index --really-refresh</code>
after performing such operations. I hope that <code>git add</code> will be changed
to stage changes to assume-unchanged files, which would remove this
only complication. --<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p>
visualizing repositories with gourcehttp://git-annex.branchable.com/tips/visualizing_repositories_with_gource/2013-11-27T22:47:37Z2012-01-07T22:13:12Z
<p><a href="http://code.google.com/p/gource/">Gource</a> is an amazing animated
visualisation of a git repository.</p>
<p>Normally, gource shows files being added, removed, and changed in
the repository, and the user(s) making the changes. Of course it can be
used in this way in a repository using git-annex too; just run <code>gource</code>.</p>
<p>The other way to use gource with git-annex is to visualise the movement of
annexed file contents between repositories. In this view, the "users" are
repositories, and they move around the file contents that are being added
or removed from them with git-annex.</p>
<p><a href="http://git-annex.branchable.com/tips/visualizing_repositories_with_gource/screenshot.jpg"><img src="http://git-annex.branchable.com/tips/visualizing_repositories_with_gource/screenshot.jpg" width="1024" height="600" class="img" /></a></p>
<p>To use gource this way, first go into the directory you want to visualize,
and use <code>git annex log</code> to make an input file for <code>gource</code>:</p>
<pre><code>git annex log --gource | tee gource.log
sort gource.log | gource --log-format custom -
</code></pre>
<p>The <code>git annex log</code> can take a while, to speed it up you can use something
like <code>--after "4 months ago"</code> to limit how far back it goes.</p>
finding duplicate fileshttp://git-annex.branchable.com/tips/finding_duplicate_files/2013-11-27T22:47:37Z2011-12-23T04:36:25Z
<p>Maybe you had a lot of files scattered around on different drives, and you
added them all into a single git-annex repository. Some of the files are
surely duplicates of others.</p>
<p>While git-annex stores the file contents efficiently, it would still
help in cleaning up this mess if you could find, and perhaps remove
the duplicate files.</p>
<p>Here's a command line that will show duplicate sets of files grouped together:</p>
<pre><code>git annex find --include '*' --format='${file} ${escaped_key}\n' | \
sort -k2 | uniq --all-repeated=separate -f1 | \
sed 's/ [^ ]*$//'
</code></pre>
<p>Here's a command line that will remove one of each duplicate set of files:</p>
<pre><code>git annex find --include '*' --format='${file} ${escaped_key}\n' | \
sort -k2 | uniq --repeated -f1 | sed 's/ [^ ]*$//' | \
xargs -d '\n' git rm
</code></pre>
<p>--<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p>
using git annex with no fixed hostname and optimising sshhttp://git-annex.branchable.com/tips/using_git_annex_with_no_fixed_hostname_and_optimising_ssh/2015-05-21T05:32:27Z2011-12-08T22:42:29Z
<h2>Intro</h2>
<p>This tip is based on my (Matt Ford) experience of using <code>git annex</code> with my out-and-about netbook which hits many different wifi networks and has no fixed home or address.</p>
<p>I'm not using a bare repository that allows pushing (an alternative solution) nor do I fancy allowing <code>git push</code> to run against my desktop checked out repository (perhaps I worry over nothing?)</p>
<p>None of this is really <code>git annex</code> specific but I think it is useful to know...</p>
<h2>Dealing with no fixed hostname</h2>
<p>Essentially set up two repos as per the <a href="http://git-annex.branchable.com/walkthrough/">walkthrough</a>.</p>
<p>Desktop as follows:</p>
<pre><code>cd ~/annex
git init
git annex init "desktop"
</code></pre>
<p>And the laptop like this</p>
<pre><code>git clone ssh://desktop/annex
git init
git annex init "laptop"
</code></pre>
<p>Now we want to add the the repos as remotes of each other.</p>
<p>For the laptop it is easy:</p>
<pre><code>git remote add desktop ssh://desktop/~/annex
</code></pre>
<p>However for the desktop to add an ever changing laptops hostname it's a little tricky. We make use of remote SSH tunnels to do this. Essentially we have the laptop (which always knows its own name and address and knows the address of the desktop) create a tunnel starting on an arbitrary port at the desktop and heads back to the laptop on its own SSH server port (22).</p>
<p>To do this make part of your laptop's SSH config look like this:</p>
<pre><code>Host desktop
User matt
HostName desktop.example.org
RemoteForward 2222 localhost:22
</code></pre>
<p>Now on the desktop to connect over the tunnel to the laptop's SSH port you need this:</p>
<pre><code>Host laptop
User matt
HostName localhost
port 2222
</code></pre>
<p>So to add the desktop's remote:</p>
<p>a) From the laptop ensure the tunnel is up</p>
<pre><code>ssh desktop
</code></pre>
<p>b) From the desktop add the remote</p>
<pre><code>git remote add laptop ssh://laptop/~/annex
</code></pre>
<p>So now you can work on the train, pop on the wifi at work upon arrival, and sync up with a <code>git pull && git annex get</code>.</p>
<p>An alternative solution may be to use direct tunnels over Openvpn.</p>
using the web as a special remotehttp://git-annex.branchable.com/tips/using_the_web_as_a_special_remote/2024-01-30T20:12:40Z2011-11-08T16:16:02Z
<h2>basic use</h2>
<p>The web can be used as a <a href="http://git-annex.branchable.com/special_remotes/">special remote</a> too.</p>
<pre><code># git annex addurl http://example.com/video.mpeg
addurl example.com_video.mpeg (downloading http://example.com/video.mpeg)
########################################################## 100.0%
ok
</code></pre>
<p>Now the file is downloaded, and has been added to the annex like any other
file. So it can be renamed, copied to other repositories, and so on.</p>
<p>To add a lot of urls at once, just list them all as parameters to
<code>git annex addurl</code>.</p>
<h2>trust issues</h2>
<p>Note that git-annex assumes that, if the web site does not 404, and has the
right file size, the file is still present on the web, and this counts as
one <a href="http://git-annex.branchable.com/copies/">copy</a> of the file. If the file still seems to be present
on the web, it will let you remove your last copy, trusting it can be
downloaded again:</p>
<pre><code># git annex drop example.com_video.mpeg
drop example.com_video.mpeg (checking http://example.com/video.mpeg) ok
</code></pre>
<p>If you don't <a href="http://git-annex.branchable.com/trust/">trust</a> the web to this degree, just let git-annex know:</p>
<pre><code># git annex untrust web
untrust web ok
</code></pre>
<p>With the result that it will hang onto files:</p>
<pre><code># git annex drop example.com_video.mpeg
drop example.com_video.mpeg (unsafe)
Could only verify the existence of 0 out of 1 necessary copies
Also these untrusted repositories may contain the file:
00000000-0000-0000-0000-000000000001 -- web
(Use --force to override this check, or adjust numcopies.)
failed
</code></pre>
<h2>attaching urls to existing files</h2>
<p>You can also attach urls to any file already in the annex:</p>
<pre><code># git annex addurl --file my_cool_big_file http://example.com/cool_big_file
addurl my_cool_big_file ok
# git annex whereis my_cool_big_file
whereis my_cool_big_file (2 copies)
00000000-0000-0000-0000-000000000001 -- web
27a9510c-760a-11e1-b9a0-c731d2b77df9 -- here
</code></pre>
<h2>configuring addurl filenames</h2>
<p>By default, <code>addurl</code> will generate a filename for you. You can use
<code>--file=</code> to specify the filename to use.</p>
<p>If you're adding a bunch of related files to a directory, or just don't
like the default filenames generated by <code>addurl</code>, you can use <code>--pathdepth</code>
to specify how many parts of the url are put in the filename.
A positive number drops that many paths from the beginning, while a negative
number takes that many paths from the end.</p>
<pre><code># git annex addurl http://example.com/videos/2012/01/video.mpeg
addurl example.com_videos_2012_01_video.mpeg (downloading http://example.com/videos/2012/01/video.mpeg)
# git annex addurl http://example.com/videos/2012/01/video.mpeg --pathdepth=2
addurl 2012_01_video.mpeg (downloading http://example.com/videos/2012/01/video.mpeg)
# git annex addurl http://example.com/videos/2012/01/video.mpeg --pathdepth=-2
addurl 01_video.mpeg (downloading http://example.com/videos/2012/01/video.mpeg)
</code></pre>
<h2>videos</h2>
<p><a name="videos"></a></p>
<p>There's support for downloading videos from sites like YouTube, Vimeo,
and many more. This relies on yt-dlp to download the videos.</p>
<p>When you have yt-dlp installed, you can just
<code>git annex addurl http://youtube.com/foo</code> and it will detect that
it is a video and download the video content for offline viewing.</p>
<p>(However, this is disabled by default as it can be a security risk.
See the documentation of annex.security.allowed-ip-addresses
in <a href="http://git-annex.branchable.com/git-annex/">git-annex</a> for details.)</p>
<p>Later, in another clone of the repository, you can run <code>git annex get</code> on
the file and it will also be downloaded with yt-dlp. This works
even if the video host has transcoded or otherwise changed the video
in the meantime; the assumption is that these video files are equivalent.</p>
<p>There is an <code>annex.youtube-dl-options</code> configuration setting that can be used
to pass parameters to yt-dlp. For example, you could set <code>git config
annex.youtube-dl-options "--format worst"</code> to configure it to download low
quality videos from YouTube.</p>
<p>To download all the videos in a youtube channel, you can use
<code>git-annex importfeed --scrape</code> with the url to the
channel, or you can find the RSS feed for the channel, and
<code>git-annex importfeed</code> that url (without <code>--scrape</code>).</p>
<h2>bittorrent</h2>
<p>The <a href="http://git-annex.branchable.com/special_remotes/bittorrent/">bittorrent special remote</a> lets git-annex
also download the content of torrent files, and magnet links to torrents.</p>
<p>You can simply pass the url to a torrent to <code>git annex addurl</code>
the same as any other url.</p>
<p>You have to have <a href="http://aria2.sourceforge.net/">aria2</a>
and bittornado (or the original bittorrent) installed for this
to work.</p>
<h2>podcasts</h2>
<p>This is done using <code>git annex importfeed</code>. See <a href="http://git-annex.branchable.com/tips/downloading_podcasts/">downloading podcasts</a>.</p>
<h2>configuring which url is used when there are several</h2>
<p>An annexed file can have content at multiple urls that git-annex knows
about, and git-annex may use any of those urls for downloading a file.</p>
<p>If some urls are especially fast, or especially slow, you might want to
configure which urls git-annex prefers to use first, or should only use as
a last resory. To accomplish that, you can create additional remotes, that
are web special remotes, and are configured to only be used for some urls,
and have a different cost than the web special remote.</p>
<p>For example, suppose that you want to prioritize using urls on "fasthost.com".</p>
<pre><code>git-annex initremote --sameas=web fasthost type=web urlinclude='*//fasthost.com/*' cost=150
</code></pre>
<p>Now, <code>git-annex get</code> of a file that is on both fasthost.com and another url
will prefer to use the fasthost special remote, rather than the web special
remote (which has a higher cost of 200), and so will use the fasthost.com
url. If that url is not available, it will fall back to the web special
remote, and use the other url.</p>
<p>Suppose that you want to avoid using urls on "slowhost.com", except
as a last resort.</p>
<pre><code>git-annex initremote --sameas=web slowhost type=web urlinclude='*//slowhost.com/*' cost=300
</code></pre>
<p>Now, <code>git-annex get</code> of a file that is on both slowhost.com and another url
will first try the fasthost remote. If fasthost does not support the url,
it will next try the regular "web" remote. Which will avoid using
urls that are used by the configuration of either fasthost or slowhost.
Finally, if it's unable to get the file from some other url, it will
use the slowhost remote to get it from the slow url.</p>
centralized git repository tutorialhttp://git-annex.branchable.com/tips/centralized_git_repository_tutorial/2015-07-22T22:07:41Z2011-11-07T17:08:47Z
<p>The <a href="http://git-annex.branchable.com/walkthrough/">walkthrough</a> builds up a decentralized git repository setup, but
git-annex can also be used with a centralized git repository.</p>
<p>We have separate tutorials depending on where the centralized git
repository is hosted.</p>
<ul>
<li><p><a href="http://git-annex.branchable.com/tips/centralized_git_repository_tutorial/on_GitHub/">On GitHub</a> --
However, GitHub does not currently let git-annex
store the contents of large files there. So, things get a little more
complicated when using it.</p></li>
<li><p><a href="http://git-annex.branchable.com/tips/centralized_git_repository_tutorial/on_GitLab/">On GitLab</a> --
This service is similar to GitHub, but supports
git-annex.</p></li>
<li><p><a href="http://git-annex.branchable.com/tips/centralized_git_repository_tutorial/on_your_own_server/">On your own server</a> --
use any unix system with ssh and git and git-annex installed.
A VPS, a home server, etc.</p></li>
</ul>
automatically getting files on checkouthttp://git-annex.branchable.com/tips/automatically_getting_files_on_checkout/2013-11-27T22:47:37Z2011-11-02T19:09:19Z
<p>Normally git-annex does not retrieve file contents when checking out a
tree. In some use cases, it makes sense to always have the contents of
files available after a <code>git checkout</code> or <code>git update</code>. This can be
accomplished by installing the following as <code>.git/hooks/post-checkout</code></p>
<pre><code>#!/bin/sh
# Uses git-annex to get all files in the specified directories
# (relative to the top of the repository) on checkout.
dirs=.
top="$(git rev-parse --show-toplevel)"
for dir in "$dirs"; do git annex get $top/$dir"; done
</code></pre>
<p>By default, all files in the whole repository will be made available. The
<code>dirs</code> setting can be configured if you only want to get files in certian
directories.</p>
what to do when a repository is corruptedhttp://git-annex.branchable.com/tips/what_to_do_when_a_repository_is_corrupted/2014-05-16T19:14:26Z2011-10-29T19:41:51Z
<p>A git-annex repository on a removable USB drive is great, until the cable
falls out at the wrong time and git's repository gets trashed. The way
git checksums everything and the poor quality of USB media makes this
perhaps more likely than you would expect. If this happens to you,
here's a way to recover that makes the most of whatever data is left
on the drive.</p>
<ul>
<li>First, run <code>git fsck</code>. If it does not report any problems, your data
is fine, and you don't need to proceed further.</li>
<li>So <code>git fsck</code> says the git repository is corrupted. But probably the data
git-annex stored is fine. Your first step is to clone another copy
of the git repository from somewhere else. Let's call this clone
"$good", and the corrupted repository "$bad".</li>
<li>Preserve your git configuration changes, and the <code>annex.uuid</code> setting:
<code>mv $bad/.git/config $good/.git/config</code></li>
<li>Move annexed data into the new repository: <code>mkdir $good/.git/annex; mv
$bad/.git/annex/objects $good/.git/annex/objects</code></li>
<li>Reinitalize git-annex: <code>cd $good; git annex init</code></li>
<li>Check for any problems with the annexed data: <code>cd $good; git annex fsck</code></li>
<li>Now you can remove the corrupted repository, the new one is ready to use.</li>
</ul>
<p>Alternatively, recent versions of git-annex have a <code>git annex repair</code>
command that uses <a href="http://git-repair.branchable.com/">http://git-repair.branchable.com/</a> to repair a
repository in-place. The git-annex assistant will detect most corruptions
and offer to run the repair for you automatically.</p>
<p>--<a href="http://git-annex.branchable.com/users/joey/">Joey</a></p>
using gitolite with git-annexhttp://git-annex.branchable.com/tips/using_gitolite_with_git-annex/2015-04-17T16:39:08Z2011-10-17T18:16:05Z
<p><a href="https://github.com/sitaramc/gitolite">Gitolite</a> is a git repository
manager. Here's how to add git-annex support to gitolite, so you can
<code>git annex copy</code> files to a gitolite repository, and <code>git annex get</code>
files from it.</p>
<p>A nice feature of using gitolite with git-annex is that users can be given
read-only access to a repository, and this allows them to <code>git annex get</code>
file contents, but not change anything.</p>
<p>First, you need new enough versions:</p>
<ul>
<li>the current <code>master</code> branch of gitolite works with git-annex (tested 2014-04-19),
but v3.5.3 and earlier v3.x require use of the <code>git-annex</code> branch.</li>
<li>gitolite 2.2 also works -- this version contains a git-annex-shell ADC
and supports "ua" ADCs.</li>
<li>git-annex 3.20111016 or newer needs to be installed on the gitolite
server. Don't install an older version, it wouldn't be secure!</li>
</ul>
<h3>Instructions for gitolite <code>master</code> branch</h3>
<p>To setup gitolite to work with git-annex, you can follow the instructions on the gitolite website,
and just add <code>'git-annex-shell ua',</code> to the ENABLE list in <code>~/.gitolite.rc</code>.</p>
<p>Here are more detailed instructions:</p>
<p>1: Create a <code>git</code> user</p>
<pre>
sudo adduser \
--system \
--shell /bin/bash \
--gecos 'git version control' \
--group \
--disabled-password \
--home /home/git git
</pre>
<p>2: Copy a public SSH key for the user you want to be the gitolite administrator.
In the instructions below, I placed the key in a file named <code>/home/git/me.pub</code>.</p>
<p>3: Clone and install gitolite</p>
<p>First switch to the <code>git</code> user (e.g. <code>sudo su - git</code>) and then run:</p>
<pre>
cd
git clone https://github.com/sitaramc/gitolite.git
mkdir -p bin
./gitolite/install -ln
</pre>
<p>4: Add <code>~/bin</code> to <code>PATH</code></p>
<p>Make sure that <code>~/bin</code> is in the <code>PATH</code>, since that's where gitolite installed its binary. Do something like this:</p>
<pre>
echo 'export PATH=/home/git/bin:$PATH' >> .profile
export PATH=/home/git/bin:$PATH
</pre>
<p>5: Configure gitolite</p>
<p>Edit <code>~/.gitolite.rc</code> to enable the git-annex-shell command.
Find the <code>ENABLE</code> list and add this line in there somewhere:</p>
<pre>
'git-annex-shell ua',
</pre>
<p>Now run gitolite's setup:</p>
<pre>
gitolite setup -pk me.pub
rm me.pub
</pre>
<h3>Instructions for gitolite 2.2</h3>
<p>And here's how to set it up. The examples are for gitolite as installed
on Debian with apt-get, but the changes described can be made to any
gitolite installation, just with different paths.</p>
<p>Set <code>$GL_ADC_PATH</code> in <code>.gitolite.rc</code>, if you have not already done so.</p>
<pre>
echo '$GL_ADC_PATH = "/usr/local/lib/gitolite/adc/";' >>~gitolite/.gitolite.rc
</pre>
<p>Make the ADC directory, and a "ua" subdirectory.</p>
<pre>
mkdir -p /usr/local/lib/gitolite/adc/ua
</pre>
<p>Install the git-annex-shell ADC into the "ua" subdirectory from the gitolite repository.</p>
<pre>
cd /usr/local/lib/gitolite/adc/ua/
cp gitolite/contrib/adc/git-annex-shell .
</pre>
<p>Now all gitolite repositories can be used with git-annex just as any
ssh remote normally would be used. For example:</p>
<pre>
# git clone gitolite@localhost:testing
Cloning into testing...
Receiving objects: 100% (18/18), done.
# cd testing
# git annex init
init ok
# cp /etc/passwd my-cool-big-file
# git annex add my-cool-big-file
add my-cool-big-file ok
(Recording state in git...)
# git commit -m added
[master d36c8b4] added
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 120000 my-cool-big-file
# git push --all
Counting objects: 17, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (12/12), done.
Writing objects: 100% (14/14), 1.39 KiB, done.
Total 14 (delta 0), reused 1 (delta 0)
To gitolite@localhost:testing
c552a38..db4653e git-annex -> git-annex
29cd204..d36c8b4 master -> master
# git annex copy --to origin
copy my-cool-big-file (checking origin...) (to origin...)
WORM-s2502-m1318875140--my-cool-big-file
2502 100% 0.00kB/s 0:00:00 (xfer#1, to-check=0/1)
sent 2606 bytes received 31 bytes 1758.00 bytes/sec
total size is 2502 speedup is 0.95
ok
</pre>
<h3>Troubleshooting</h3>
<p>I got an error like this when setting up gitolite <em>after</em> setting up a local git repo and git annex:</p>
<pre>
git-annex-shell: First run: git-annex init
Command ssh ["git@git.example.com","git-annex-shell 'configlist' '/~/myrepo.git'"] failed; exit code 1
</pre>
<p>because I forgot to "git push --all" after adding the new gitolite remote.</p>
Internet Archive via S3http://git-annex.branchable.com/tips/Internet_Archive_via_S3/2019-01-21T15:42:51Z2011-10-17T17:56:36Z
<p><a href="http://www.archive.org/">The Internet Archive</a> allows members to upload
collections using an Amazon S3
<a href="http://www.archive.org/help/abouts3.txt">compatible API</a>, and this can
be used with git-annex's <a href="http://git-annex.branchable.com/special_remotes/S3/">S3</a> support.</p>
<p>So, you can locally archive things with git-annex, define remotes that
correspond to "items" at the Internet Archive, and use git-annex to upload
your files to there. Of course, your use of the Internet Archive must
comply with their <a href="http://www.archive.org/about/terms.php">terms of service</a>.</p>
<p>A nice added feature is that whenever git-annex sends a file to the
Internet Archive, it records its url, the same as if you'd run <code>git annex
addurl</code>. So any users who can clone your repository can download the files
from archive.org, without needing any login or password info.
The url to the content in the Internet Archive is also displayed by
<code>git annex whereis</code>. This makes the Internet Archive a nice way to
publish the large files associated with a public git repository.</p>
<h2>webapp setup</h2>
<p>Just go to "Add Another Repository", pick "Internet Archive",
and you're on your way.</p>
<h2>basic setup</h2>
<p>Sign up for an account, and get your access keys here:
<a href="http://www.archive.org/account/s3.php">http://www.archive.org/account/s3.php</a></p>
<pre><code># export AWS_ACCESS_KEY_ID=blahblah
# export AWS_SECRET_ACCESS_KEY=xxxxxxx
</code></pre>
<p>Specify <code>host=s3.us.archive.org</code> when doing <code>initremote</code> to set up
a remote at the Archive. This will enable a special Internet Archive mode:
Encryption is not allowed; you are required to specify a bucket name
rather than having git-annex pick a random one; and you can optionally
specify <code>x-archive-meta*</code> headers to add metadata as explained in their
<a href="http://www.archive.org/help/abouts3.txt">documentation</a>.</p>
<pre><code># git annex initremote archive-panama type=S3 \
host=s3.us.archive.org bucket=panama-canal-lock-blueprints \
x-archive-meta-mediatype=texts x-archive-meta-language=eng \
x-archive-meta-collection=test_collection \
x-archive-meta-title="original Panama Canal lock design blueprints"
initremote archive-panama (Internet Archive mode) ok
# git annex describe archive-panama "a man, a plan, a canal: panama"
describe archive-panama ok
</code></pre>
<p>The above uploads to the <a href="https://archive.org/details/test_collection">test collection</a> where items are removed
after thirty days. Uploads can persist by changing to another writable
<a href="https://internetarchive.readthedocs.io/en/latest/metadata.html#collection">collection</a>.</p>
<p>Then you can annex files and copy them to the remote as usual:</p>
<pre><code># git annex add photo1.jpeg --backend=SHA256E
add photo1.jpeg (checksum...) ok
# git annex copy photo1.jpeg --fast --to archive-panama
copy (to archive-panama...) ok
</code></pre>
<h2>update lag</h2>
<p>It may take a while for archive.org to make files publically visible after
they've been uploaded.</p>
<p>While files can be removed from the Internet Archive,
<a href="https://archive.org/help/derivatives.php">derived versions</a>
of some files may continued to be stored there for a while
after the originals were removed.</p>
<h2>exporting trees</h2>
<p>By default, files stored in the Internet Archive will show up there named
by their git-annex key, not the original filename. If the filenames
are important, you can run <code>git annex initremote</code> with an additional
parameter "exporttree=yes", and then use <a href="http://git-annex.branchable.com/git-annex-export/">git-annex-export</a> to publish
a tree of files to the Internet Archive.</p>
<p>Note that the Internet Archive may not support certian characters
in filenames (<a href="http://archive.org/about/faqs.php#1099">see FAQ</a>).
If exporting a filename fails due to such limitations, you would need
to rename it in your git annex repository in order to export it.</p>
migrating data to a new backendhttp://git-annex.branchable.com/tips/migrating_data_to_a_new_backend/2023-12-08T18:39:49Z2011-10-17T17:56:36Z
<p>Maybe you started out using the SHA1 backend, and have now configured
git-annex to use SHA256. But files you added to the annex before still
use the SHA1 backend. There is a simple command that can migrate that
data:</p>
<pre><code># git annex migrate my_cool_big_file
migrate my_cool_big_file (checksum...) ok
</code></pre>
<p>This stages a change to the file, which you can <code>git commit</code> like any other
change.</p>
<p>You can only migrate files whose content is currently available. Other
files will be skipped.</p>
<h2>distributed migration</h2>
<p>When you pull changes into your repository that include migration of files,
your repository then needs to be updated to follow the migration.</p>
<pre><code># git-annex migrate --update
migrate my_cool_big_file (checksum...) ok
</code></pre>
<p>This is done automatically by commands like <code>git-annex pull</code>.</p>
<h2>unused old content</h2>
<p>After migrating a file to a new backend, the old content in the old backend
will still be present. That is necessary because multiple files
can point to the same content. The <code>git annex unused</code> subcommand can be
used to clear up that detritus later. Note that hard links are used,
to avoid wasting disk space.</p>
powerful file matchinghttp://git-annex.branchable.com/tips/powerful_file_matching/2013-11-27T22:47:37Z2011-10-17T17:56:36Z
<p>git-annex has a powerful syntax for making it act on only certain files.</p>
<p>The simplest thing is to exclude some files, using wild cards:</p>
<pre><code>git annex get --exclude '*.mp3' --exclude '*.ogg'
</code></pre>
<p>But you can also exclude files that git-annex's <a href="http://git-annex.branchable.com/location_tracking/">location tracking</a>
information indicates are present in a given repository. For example,
if you want to populate newarchive with files, but not those already
on oldarchive, you could do it like this:</p>
<pre><code>git annex copy --not --in oldarchive --to newarchive
</code></pre>
<p>Without the --not, --in makes it act on files that <em>are</em> in the specified
repository. So, to remove files that are on oldarchive:</p>
<pre><code>git annex drop --in oldarchive
</code></pre>
<p>Or maybe you're curious which files have a lot of copies, and then
also want to know which files have only one copy:</p>
<pre><code>git annex find --copies 7
git annex find --not --copies 2
</code></pre>
<p>The above are the simple examples of specifying what files git-annex
should act on. But you can specify anything you can dream up by combining
the things above, with --and --or -( and -). Those last two strange-looking
options are parentheses, for grouping other options. You will probably
have to escape them from your shell.</p>
<p>Here are the mp3 files that are in either of two repositories, but have
less than 3 copies:</p>
<pre><code>git annex find --not --exclude '*.mp3' --and \
-\( --in usbdrive --or --in archive -\) --and \
--not --copies 3
</code></pre>
recover data from lost+foundhttp://git-annex.branchable.com/tips/recover_data_from_lost+found/2015-11-13T17:47:28Z2011-10-17T17:56:36Z
<p>Suppose something goes wrong, and fsck puts all the files in lost+found.
It's actually very easy to recover from this disaster.</p>
<p>First, check out the git repository again. Then, in the new checkout:</p>
<pre><code>$ mkdir recovered-content
$ sudo mv ../lost+found/* recovered-content
$ sudo chown you:you recovered-content
$ chmod -R u+w recovered-content
$ git annex add recovered-content
$ git reset HEAD recovered-content
$ rm -rf recovered-content
$ git annex fsck
</code></pre>
<p>The way that works is that when git-annex adds the same content that was in
the repository before, all the old links to that content start working
again. So, this works as long as you're using one of the SHA* or other
checksumming backends, which is the default.</p>
untrusted repositorieshttp://git-annex.branchable.com/tips/untrusted_repositories/2013-11-27T22:47:37Z2011-10-17T17:56:36Z
<p>Suppose you have a USB thumb drive and are using it as a git annex
repository. You don't trust the drive, because you could lose it, or
accidentally run it through the laundry. Or, maybe you have a drive that
you know is dying, and you'd like to be warned if there are any files
on it not backed up somewhere else. Maybe the drive has already died
or been lost.</p>
<p>You can let git-annex know that you don't trust a repository, and it will
adjust its behavior to avoid relying on that repositories's continued
availability.</p>
<pre><code># git annex untrust usbdrive
untrust usbdrive ok
</code></pre>
<p>Now when you do a fsck, you'll be warned appropriately:</p>
<pre><code># git annex fsck .
fsck my_big_file
Only these untrusted locations may have copies of this file!
05e296c4-2989-11e0-bf40-bad1535567fe -- portable USB drive
Back it up to trusted locations with git-annex copy.
failed
</code></pre>
<p>Also, git-annex will refuse to drop a file from elsewhere just because
it can see a copy on the untrusted repository.</p>
<p>It's also possible to tell git-annex that you have an unusually high
level of trust for a repository. See <a href="http://git-annex.branchable.com/trust/">trust</a> for details.</p>
using Amazon S3http://git-annex.branchable.com/tips/using_Amazon_S3/2019-05-01T18:45:23Z2011-10-17T17:56:36Z
<p>git-annex extends git's usual remotes with some <a href="http://git-annex.branchable.com/special_remotes/">special remotes</a>, that
are not git repositories. This way you can set up a remote using say,
Amazon S3, and use git-annex to transfer files into the cloud.</p>
<p>First, export your Amazon AWS credentials:</p>
<pre><code># export AWS_ACCESS_KEY_ID="08TJMT99S3511WOZEP91"
# export AWS_SECRET_ACCESS_KEY="s3kr1t"
</code></pre>
<p>Now, create a gpg key, if you don't already have one. This will be used
to encrypt everything stored in S3, for your privacy. Once you have
a gpg key, run <code>gpg --list-secret-keys</code> to look up its key id, something
like "2512E3C7"</p>
<p>Next, create the S3 remote, and describe it.</p>
<pre><code># git annex initremote cloud type=S3 chunk=1MiB keyid=2512E3C7
initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok
# git annex describe cloud "at Amazon's US datacenter"
describe cloud ok
</code></pre>
<p>The configuration for the S3 remote is stored in git. So to make another
repository use the same S3 remote is easy:</p>
<pre><code># export AWS_ACCESS_KEY_ID="08TJMT99S3511WOZEP91"
# export AWS_SECRET_ACCESS_KEY="s3kr1t"
# git pull laptop
# git annex enableremote cloud
enableremote cloud (gpg) (checking bucket) ok
</code></pre>
<p>Notice that to enable an existing S3 remote, you have to provide the Amazon
AWS credentials because they were not stored in the repository. (It is
possible to configure git-annex to do that, but not the default.)</p>
<h2>further reading</h2>
<p>See <a href="http://git-annex.branchable.com/special_remotes/S3/">S3</a> for details about configuring S3 remotes.</p>
<p>See <a href="http://git-annex.branchable.com/tips/public_Amazon_S3_remote/">public Amazon S3 remote</a> for how to set up a Amazon S3 remote that
can be used by the public, without them needing AWS credentials.</p>
<p>If you want to publish files to S3 so they can be accessed without using
git-annex, see <a href="http://git-annex.branchable.com/tips/publishing_your_files_to_the_public/">publishing your files to the public</a>.</p>
what to do when you lose a repositoryhttp://git-annex.branchable.com/tips/what_to_do_when_you_lose_a_repository/2013-11-27T22:47:37Z2011-10-17T17:56:36Z
<p>So you lost a thumb drive containing a git-annex repository. Or a hard
drive died or some other misfortune has befallen your data.</p>
<p>Unless you configured backups, git-annex can't get your data back. But it
can help you deal with the loss.</p>
<p>Go somewhere that knows about the lost repository, and mark it as
dead:</p>
<pre><code>git annex dead usbdrive
</code></pre>
<p>This retains the <a href="http://git-annex.branchable.com/location_tracking/">location tracking</a> information for the repository,
but avoids trying to access it, or list it as a location where files
are present.</p>
<p>If you later found the drive, you could let git-annex know it's found
like so:</p>
<pre><code>git annex semitrust usbdrive
</code></pre>