todo/option to `drop path` to not drop "all copies"yohhttp://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/git-annexikiwiki2021-05-25T20:37:46Zcomment 1http://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/comment_1_8f40b2663c9f48ddb07969af1b6632a8/joey2021-04-16T18:42:08Z2021-04-16T17:33:39Z
<p>Problem is git-annex does not keep track of the information it would need
in order to do this. Same problem as in
<span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=todo%2Foption_to___96__drop_path__96___to_not_drop___34__all_copies__34__%2Fcomment_1_8f40b2663c9f48ddb07969af1b6632a8&page=bugs%2Findeterminite_preferred_content_state_for_duplicated_file" rel="nofollow">?</a>indeterminite preferred content state for duplicated file</span>.</p>
<p>Unlike that bug, I think it's actually rather ambiguous whether the user
wants the file to be dropped in this case. Obviously you want it not to,
the way your file tree is arranged, but other could
rely on the current behavior.</p>
<p>Here's one way: Imagine a repo storing music. It has directories for
albums, and also directories containing playlists, which are copies of
files from albums. If I was in a mood for Brazilian music, but have gotten
over it for now, I might want to drop Brazilian_playlist (which got very
long in my travels there) to free up some space. If it refused to drop
files because the same files were also in the corresponding album
directories, I would wonder why git-annex had gotten broken.</p>
<p>But the --not-used-elsewhere switch seems reasonable, if the needed info
was available. I suppose git-annex could scan the index for changes and
update state when this switch was used. Could be slow to update
that state though.</p>
drop --not-used-elsewherehttp://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/comment_2_32383ff5db43f0f4dc9e7470613f04fd/Ilya_Shlyakhter2021-04-17T22:31:52Z2021-04-17T22:31:52Z
<p>+1 for <code>drop --not-used-elsewhere</code>. Would be good if "elsewhere" included <a href="http://git-annex.branchable.com/tips/Using_git-worktree_with_annex/">linked worktrees</a>.
For unlocked files, could just look at the hardlink count to the content file? (But would be odd if only worked for unlocked files.)</p>
comment 3http://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/comment_3_2665f3fc268dff1de24d6ee9648c5d8c/Ilya_Shlyakhter2021-04-18T23:45:26Z2021-04-18T23:45:25Z
As a more general solution, suppose <a href="http://git-annex.branchable.com/git-annex-matching-options/">git-annex-matching-options</a> were extended with the expressions <code>—includeifany=glob</code> (true for a key if <em>any</em> file using that key matches <code>glob</code>), <code>—includeifall=glob</code> (true for a key if <em>all</em> files using that key match <code>glob</code>), and similarly <code>—excludeifany/all</code>. Then use <code>drop —includeifall=path/*</code>.
comment 4http://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/comment_4_5fb82eb18c6d935ac07ec9220d9037c0/joey2021-05-21T20:50:15Z2021-05-21T17:29:09Z
<p>I have a plan over in <span class="createlink"><a href="http://git-annex.branchable.com/ikiwiki.cgi?do=create&from=todo%2Foption_to___96__drop_path__96___to_not_drop___34__all_copies__34__%2Fcomment_4_5fb82eb18c6d935ac07ec9220d9037c0&page=bugs%2Findeterminite_preferred_content_state_for_duplicated_file" rel="nofollow">?</a>indeterminite preferred content state for duplicated file</span>
and will be working on it over there.</p>
<p>Linked worktrees seems out of scope for this though.</p>
comment 5http://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/comment_5_c7b9cbdbc4be36ea4d56a1eec13c6f34/joey2021-05-25T15:45:15Z2021-05-25T15:14:49Z
<p>On the --includeifany=glob idea, that seems to suggest a preferred content
expression like includeifany=, analagous to how --include matches include=</p>
<p>I'm feeling a bit cautious about adding a preferred content expression for
this brand new capability.</p>
<p>And also unfortunately, it turned out not to be possible to prevent the
associated files db from sometimes having stale filenames in it (see
<a href="http://source.git-annex.branchable.com/?p=source.git;a=commitdiff;h=c1b50282118520350d5328153fceedac2b8d8ed5">c1b50282118520350d5328153fceedac2b8d8ed5</a>). Which all current
uses of the associated files db deal with by checking the list of
associated files to see if all of them are in HEAD tree. A preferred
content expression would also have to deal with that, and that risks
slowing down evaluation of preferred content expressions generally.</p>
<p>So I think it's best to not add a preferred content expression,
at least until there's a use case and this has had
some time to soak.</p>
comment 6http://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/comment_6_aa58624e8d0de174a06c6c2c0e33dc14/joey2021-05-25T15:45:15Z2021-05-25T15:32:40Z
<p>What happens if I run <code>git annex drop --not-used-elsewhere foo bar</code> and foo
and bar have the same content?</p>
<p>The content is not used except for in the files I listed, so it could be
argued that the --not-used-elsewhere does not apply, and it should drop it.
But of course, that becomes a problem when dropping large directory trees.</p>
<p>This also makes me thing that not-used-elsewhere is too broad, maybe I want
to only avoid dropping content shared by files in bardir while dropping
foodir, and the option does not allow it.</p>
<p>So I do think @Ilya's on to something with his suggestion.
<code>git annex drop foo --excludeifany=bar</code> does not have the ambiguity.</p>
<p>I guess it's also useful for querying, potentially. Eg, if I have an inbox
and an outbox and think perhaps some things from the inbox are things I've
already dealt with before, I can find such files:</p>
<pre><code>git annex find inbox --includeifany='outbox/*'
</code></pre>
comment 7http://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/comment_7_150a87dea1bd24e3d4e18cf3bcfb937f/joey2021-05-25T20:37:46Z2021-05-25T16:44:56Z
Going with --includesamecontent=glob and --excludesamecontent=glob
comment 8http://git-annex.branchable.com/todo/option_to___96__drop_path__96___to_not_drop___34__all_copies__34__/comment_8_1e87a33fe71ec2d5f4df887b94dae7c6/joey2021-05-25T20:37:46Z2021-05-25T20:34:00Z
<p>Thinking about preferred content expression some more, there should not
be much reason to use this in one. includesamecontent=foo would have the
same effect as include=foo, because when it operates on bar, it already
checks if foo is the same content and is preferred content, and if so
avoids dropping it. And when getting files, the effect would also be the
same, because include=foo makes it get foo, same as includesamecontent=foo
would.</p>