git annex find
currently makes for a great way to find which files are already local, and don't need to get git annex get
gotten; obviously ls
just shows me all the files in a given directory, disregarding git-annex (and without recursing to subdirectories). I think that adding a '--maxdepth' option to git annex find
would make it much easier to use at directories high up in the directory structure, since currently git annex find
recurses all subdirectories necessarily, when I really just want to see whether or not there are git-annex files present from a given directory.
Obviously, since directories themselves are not git-annex objects, there is no way to say whether or not they are "present", but perhaps the most intuitive would be to say whether or not any git-annex files under a given directory are present.
For example, if I have:
./
+-- subdir0/
| +-- file0 (present in local git-annex repo)
| +-- file1 (present in local git-annex repo)
+-- subdir1/
| +-- file0 (not present in local git-annex repo)
| +-- file1 (not present in local git-annex repo)
+-- file2 (present in local git-annex repo)
and I type git annex find --maxdepth 1 .
, the output might look something like:
subdir0/
file2
rather than:
subdir0/file0
subdir0/file1
file2
find --maxdepth
is a nice optimisation because it can short-circuit when it gets deep in the tree. However,git annex find
is built on top ofgit ls-files --cached
, which has no equivilant way to short-circuit. I am not sure if the format of the index makes it practical for it to get a --maxdepth option (it may need to traverse the whole index, or might be able to short-circuit).I don't see any point in adding a --matdepth to git-annex if it doesn't actually make it any faster, so getting such a thing into
git ls-files
would be the first step. So, suggest filing a feature request on git.I see your point,
git ls-files
may still have to walk the whole tree, precluding a speed advantage. But I guess the point of what I was saying was more that a way summarize from a high level what is here and what is not would be nice. I certainly understand if this is not something you see as worthwhile, but if someone were inclined to write a patch (if ever I find the time) that would add a--maxdepth
option that would merely summarize the results ofgit annex find
would it be something you would be inclined to include in the main repo (providing, of course, that you find the behavior sensible)?I think that --maxdepth has a well-defined meaning and this summary option would need to be named something else.
I don't object to the idea of implementing it. However, I don't know that it would be very easy to implement either.
You make another good point
--maxdepth
is vague in this context... I guess if we were to decide to come up with a summary option, it will have be named something else, like--summary-depth
, where the default would be to represent all files of whatever depth, and specifying the option would take the output that would otherwise get fromgit annex find <opts>
, truncate the paths to a certain depth, and then make a set thereof (to remove the many dups), that way any directory that had any files that would have been output bygit annex find <opts>
, that would also be at or above a certain depth, would be listed.I think if I get a chance I'll try to implement something like this.
Occurs to me that you can do this with existing options, eg to filter out directories that are 3 or more deep:
It won't display subdirectories that contain filtered out files of course.
It would also be easy enough to write a wrapper around
git-annex find
that processed its output and generated output like that.So I'm going to close this.