wishlist: `git annex drop --relaxed`

Also suggested during the first Gitify BoF during DebConf13:

git annex drop deletes immediately. In some situations a mechanism to tell git-annex "I would like to hold onto this data if possible, but if you need the space, please delete it" could be nice.

An obvious question would be how to do cleanups. With the assistant, that's easy. On CLI, at the very least git annex fsck should list, and optionally delete, that data.

RSS Atom

How should this interact with the trust model and location tracking?

This could become complicated. AFAIU, right now git-annex keeps track of files as either present or absent. With this feature it's tempting to introduce a third state 'potentially dropped' (or 'dropped in a relaxed fashion') but do you then treat them as if they were dropped depending in wether they are on a trusted or untrusted repo? Or maybe a potentially dropped file in a trusted repo is treated as a file in a semitrusted repo? This becomes convoluted. You also need a command to undrop a file in case you decide that you really want to keep it and in order to do this you need a command to see which files are up for relaxed dropping....

As an alternative approach maybe it makes sense to extend preferred content expressions to take file sizes and disk usage into account.

Comment by Chris Stork — Fri Oct 4 11:13:11 2013

Remove comment

comment 2

I don't think that a third state would be necessary. Actually dropping the file when it happens would need to do the same numcopies verification that git annex drop does now.

I agree it might be simpler to first improve the power of preferred content expressions. Unfortunately one thing that cannot be put in them is anything that probes the current state of the system. This is because repo A on machine X needs to be able to calculate the preferred content of repo B on machine Y. But I could certainly add file size as a preferred content term, since that info is known throughout the network.

Comment by joeyh.name — Fri Oct 4 20:17:07 2013

Remove comment

comment 3

I think things would be simpler if a "drop --relaxed" file were to look to the outside world just like one that was dropped without "--relaxed". In particular, even if a file is dropped with "--relaxed":

the file's work-tree symlink should be broken synchronously by the "drop --relaxed" command (as opposed to only being broken later, if and when the file physically goes away)
other repos should no longer see the file as available from this repo

Basically, the idea is to add a third state, but not a user-visible one. Rather, it should be a well hidden implementation detail, which doesn't affect the conceptual model (very much like git's own distinction between loose and packed objects). Thus, logically dropped would be a better name than potentially dropped.

Corollaries:

A logically (but not physically) dropped file should not count towards satisfying the numcopies limit, i.e. if some other repo has been asked to drop the file too
That in turn means that "git annex drop --relaxed" needs to satisfy a numcopies check at the time the user runs it; it's not enough to only do the check later, at physical-deletion time. (At that point, there should probably be a second numcopies check. I don't know whether the model requires it, but even if not, paranoia is good )
If the user wants to use the file again, they have to "git annex get" it again, just like usual -- but if the file hasn't been physically deleted yet, the "get" will be nearly instantaneous, since the data won't have to be copied

One possible implementation would be to have "drop --relaxed" behave almost identically to a non-relaxed drop -- do all the same safety checks, bookkeeping, etc. The only difference would be to have it rename the file at the end, rather than deleting it outright. (Logically dropped files could stay in their same directory, but with a distinguishing filename, or they could be moved to a parallel tree, e.g. .git/annex/dropped. I don't have an opinion on that choice; I've just picked one arbitrarily to keep talking about.)

"get" would simply search .git/annex/dropped before going off to remote repos, and if the file is found there, would move (not copy) it back into .git/annex/objects.

An alternative might be to set some kind of logically dropped flag, but that would probably be a much more intrusive change; a lot of places in the code would have to check the flag. Doing it as a file rename would make for a much more localized change; most of git-annex would completely ignore .git/annex/dropped, and just go about its business as it has always done.

(It might be tempting to think of (or even implement) .git/annex/dropped as a very low-cost remote, but that's not accurate; the semantics are different.)

I'm just starting to experiment with git-annex, so I can only hope that what I'm saying isn't completely silly...

Comment by erics — Sun May 4 00:48:55 2014

Remove comment

comment 4

erics, that all makes a lot of sense, except I don't know if there's actually a use case for a git-annex that behaves that way. It doesn't seem to solve the original use case.

I'd be inclinded to instead use the new metadata support. A file could have a tag that indicates it's not strongly wanted, and if git-annex get doesn't have enough space it could seek out and drop such files.

Comment by joeyh.name — Mon May 19 16:56:15 2014

Remove comment

comment 5

It doesn't seem to solve the original use case.

It doesn't? The OP requested:

I would like to hold onto this data if possible, but if you need the space, please delete it

It looks to me as though my suggestion does just that -- or am I misunderstanding what they asked for?

Comment by erics — Tue Jun 3 17:49:10 2014

Remove comment

Metadata vs "drop --relaxed"

[This isn't as much about my suggested implementation for "drop --relaxed" as about whether the feature is worth providing in the first place. I'm not arguing strongly for it, actually; just continuing the discussion.]

I'd be inclinded to instead use the new metadata support.

I see metadata as more for static attributes of a given file -- this thing is "a picture", "related to project X", "from Mary". Thus, the combination of metadata plus preferred-content settings seems to me more suitable for static preferences (likely ones that implement some kind of policy, however informal); e.g. "this repo wants pictures but not mp3s", or "Mary's stuff but not Alex's".

"drop --relaxed", on the other hand, would be good for more ad-hoc usage: "disk space is getting tight; hmm, I'm not using foo today, so git-annex, please delete my local copy of ${myrepo}/foo -- but only as much as you have to, because I'm going to want it again tomorrow".

One reason not to want to use metadata and preferred-content settings for such short-term, ad-hoc needs is that you then have to remember to go undo the changes later. That's even worse if you had to add ad-hoc metadata, and now have to go delete it all again. Undoing a "drop --relaxed", on the other hand, consists of a simple "git annex get".

Comment by erics — Tue Jun 3 18:20:50 2014

Remove comment

Add a comment

Last edited Sat Mar 12 16:58:09 2016