First mentioned here
TL;DR: Feature Proposal: A matching option to operate only on files within a git revision range
Motivation
I use git-annex for several repositories where content is added automatically, e.g.:
- my phone (pictures I take, selected pictures others send me, etc.)
- my research data repo (new files are added as data comes in)
Those repos don't necessarily have the assistant running as I want to control when and what's being added. I also have annex.synccontent=false
set because full availability of all files doesn't make sense. I imagine I am not the only one using git annex like this.
Most of the time, when I want to access content from such repos from another machine, it is often just the most recent content. Example: I take a picture on my phone and would like to have it now on my desktop. The workflow is:
yann@phone> git annex assist # takes ages for some reason, but only when --content functionality is active
yann@desktop> git annex assist # doesn't pull all content from server because I have annex.synccontent=false set (too much space otherwise)
# selectively get files I want (tedious, manual picking)
yann@desktop> git annex get file1 file2 file3 ...
Workaround
One can script the following to only sync the recently touched files:
# Specifying a git rev range by having `git diff` figure out the details
yann@desktop> git diff --name-only HEAD~20 | xargs -d'\n' git annex get
# Specifying a time range
yann@desktop> git log -p --since="1 week ago" | grep -e '^[+-]\{3\}' | cut -c5- | grep -vx '/dev/null' | cut -c3- | sort -u | xargs -d'\n' git annex get
This kinda works but...
- is fragile shell scripting
- doesn't like deleted files that much (they get handed to git-annex, which complains that those don't exist)
- is two very separate solutions for the same thing: only operating on recent files
Proposal
How about a --recent
or --since
or --revs
etc. option which you can hand either a commit(range) or a git log --since
-compatible string like 1 week ago
, which will cause git-annex to only consider those files for get
ing or drop
ing or sync
ing or whatnot?
I observed that git annex sync --content
is often very much slower than git annex sync --no-content
(without the time the actual syncing takes naturally), apparently because it needs to check a whole lot of files for syncing necessity. If that is the case, then a --since
option could result in a speed improvement as only a very small amount of files would need to be checked for.
Also, it would be awesome if one could say git annex assist|sync --since=yesterday
and one would end up with a perfectly synced repo and the files touched yesterday being available. This is a situation I find myself needing on a daily basis.
There's a plan for speeding up sync --content (and
assist
) at https://git-annex.branchable.com/todo/Incremental_git_annex_sync_--content_--all/That method is farly similar, and it wouldn't need the user to specify an additional matching option to get the speedup.
I do think this could be a useful option to have though. (It would not be possible to implement it as a matcher, but that's an implementation detail.)