git-annex-reinjectgit-annexhttp://git-annex.branchable.com/git-annex-reinject/git-annexikiwiki2021-03-02T22:09:45ZLike it's written: annex onlyhttp://git-annex.branchable.com/git-annex-reinject/comment_1_070a87e0cb1bbc49088989293334e1fb/stephane-gourichon-lpad2016-10-28T20:40:54Z2016-10-28T20:40:54Z
<h1>Summary</h1>
<p>Just to make it explicit: <code>--known</code> mode operates on the <em>annex only</em>. If trying to reinject a file that is stored in the regular git part of the repository, and therefore practically known, <code>git-annex-reinject</code> will consider it <em>not known</em>.</p>
<h1>Context</h1>
<p>I'm currently using <code>git-annex reinject --known</code> to tidy a pre-git-annex storage. It gets progressively near-emptied of big files, letting unknown files stand out in the deserted directory hierarchy.</p>
<p>Yet only actually annexed files will get removed.</p>
<p>In my case big files are pictures (NEF, JPG), and regular git files are <code>xmp</code> metadata files used by http://darktable.org/ to store processing parameters. So, all xmp files linger there, whether they were committed in git or not, needing separate handling.</p>
<h1>How to detect if a file is known to regular git repository (not annex).</h1>
<p>There must be a number of ways. I just hacked one:</p>
<pre><code>HASH=$( git hash-object "$FILEPATH" )
if $( git cat-file -e "$HASH" )
then
echo "Known $FILEPATH"
else
echo "Unknown $FILEPATH"
fi
</code></pre>
<p>This can be wrapped into a helper function and used in a <code>find | ...</code> one-liner to remove any file already known to git.</p>
<h2>Caveats</h2>
<p><code>git cat-file</code> will probably consider known any file actually stored within git objects, even if on an deleted branch or whatever situations where it is not reachable. As a result, removing files based on this test may well lose information, not immediately, but on some subsequent <code>git gc</code>.</p>
<p>Such caveat is not surprising, as regular git content and annexed content have differing "scopes"/lifetime.</p>
<h1>Question</h1>
<p>Joey, is there an alternative to <code>git-annex-reinject --known</code> that considers regular git content, too? Perhaps it's a pure git issue and therefore not something inside git-annex job?</p>
<p>A quick test of <code>git-annex-import --clean-duplicates</code> shows similar behavior.</p>
Difference to import/add?http://git-annex.branchable.com/git-annex-reinject/comment_2_d1a04e31fea877ae5fe873fbd01fdcaa/AlbertZeyer2021-03-02T22:09:45Z2021-01-01T21:49:29Z
Considering <code>git annex reinject /tmp/foo.iso foo.iso</code>, what is the difference to <code>git import</code>/tmp/foo.iso<code>or</code>cp /tmp/foo.iso; git annex add foo.iso`?
comment 3http://git-annex.branchable.com/git-annex-reinject/comment_3_2dcdd82efbd6dcac0f3b729d55a09386/Lukey2021-03-02T22:09:45Z2021-01-02T15:16:04Z
The difference of <code>git annex reinject</code> to (<code>git annex import</code> or <code>cp/mv; git annex add</code>) is that only known file contents will be reinjected.