Before turning this into a 'todo' item i'd like to discuss the possibilities...
The idea is following:
Having a Laptop with a rather small SSD or some other mobile device i'd like to move files away which are not needed anymore. The first thing is that the --to destination should be semi-automatically choosen, including ensuring enough replicas
git annex move --away <path..>
should pick remotes which are suitable (either by configuration and/or other rules like disk utilization on the remote side).
I am rather new to git-annex and wondering if there is currently already something which gives similar results, esp not need to hand pick the remotes where to move files.
Further on there needs to be some way to find out which files are not needed anymore. On a first thought filtering by 'atime' would be nice, but nowadays mounting with noatime/relatime is common which would make this infeasible. To accomplish this, the assistant could (optionally) manage a lazy-atime by setting inotify or fanotify watches on all annexed files in a repository (close_nowrite) and queue/batch atime updates coarsely together. Then atimes on disk are only lazily updated (after some time expires, when the queue becomes full or at shutdown of the assistant), we can afford to loose some atime updates here in case of unexpected shutdowns (i rather wonder why the kernel has no lazy-atime option).
Then the assistant (or by crontab) one can schedule some regular maintenance. There are certainly plenty of options to consider here, for example a mobile device might prefer only to send files if connected to Wlan, someone wants to move files away until a certain threshold of free disk space is reached etc...
While at this, the assistant could also watch (fanotify) if someone tries to open a not available (dead symlinked) file, block that request, get the file and then proceed with the request.
I think that you're conflating two different features.
First, there's the question of determining which remotes should store a file. The preferred content expressions are a way to let the user define this in a way that makes sense for their setup. There's also the annex.diskreserve setting, to allow a remote to reject a file if it doesn't have space. The git-annex assistant can automatically apply those, and arrange to transfer files to all remotes that want a copy. Or you can do it on the command line using --auto. Perhaps it would be nice to have a
git annex copy --auto --to any
, to avoid needing to run copy multiple times to send to different remotes.Second, there's the concept of expiring files. I don't think relatime would prevent using atime for this (git-annex could just set the atime to 0 when it first gets a file to work around relatime), and I don't like the idea of inotify watching all files just to work around noatime. I suppose that the preferred content expressions could have an atime check operation added to them. Although I guess you could just as well use
find
..(Finally, I don't think there's a good way to block processes that are trying to open files that are not there, without using FuSE. And I don't know that a system that could block any program indefinitely waiting on some large files being pulled down from wherever would be one I'd want to use. This has prevented me from going down that path so far.)
I agree here are even more than 2 features involved.
I'd appreciate a 'git annex copy --auto --to any' like feature. The point is that this should not only copy data until diskreserve is hit but distribute data in some (configureable) way around all remotes. Preferred content is one part of that, available disksize another. The user might also choose destinations depending on location and bandwidth and balance load over multiple servers until enough replicas are distributed. Details have to be worked out.
Expiring content is another thing, i also thinking its most likely improper to add some atime watching thing to git-annex. Instead of that I am thinking to write a dedicated daemon which handles atime updates in userspace, this then could add some more rules to ignore accesses by other system tools (file indexers, users, etc). This makes such a expire facility completely independent from git annex and a user can choose if/what he likes.
Filtering out files which are not accessed recently can then be done by 'find' or something similar and piped into 'git annex move/copy/drop'.
And for your final note: fanotify can block accesses to files, it might be a bit ugly and certainly this is not for everyone, perhaps this could be externalized into some watching daemon too or if integrated to git-annex be treated very carefully and only be used if explicitly configured.