similar to do not bug me about intermediate files - i feel that massive git annex get
operations should have better progress information than the current individual rsync --progress
bits. i wonder if this couldn't be accomplished with rsync --info=PROGRESS2
, which gives overall rsync progress, combined with copying multiple files at once with rsync (which would have the side-effect of speeding up git annex get
for large number of small files).
once this is done, it could be sent back to the webapp UI to give the user a global sense of the overall sync progress (as opposed to per-file progress). --anarcat
To display global progress, git-annex would have to make 2 passes over all the files to be processed. That is the main reason it does not try to do that.
could you add a simple file counter?
i.e. for a single "git annex copy/get/etc." operation initiating a file counter and incrementing it on every examined file (transferred or not), thus giving a very rough idea of the progress on the whole set (the user should know the amount of files in the annex)
git-annex does not display rsync progress any longer, but you do still get the progress display on a per-file basis. This is at least a lot more compact than the rsync output.
Any kind of global progress display would require a separate pass to identify all the files that git-annex will be operating on. That would make it slower in large repos, and people already complain about seek speed in large repos.
i understand, thanks for taking the time to explain the tradeoffs! :)
i know that rsync is not used anymore, but I figured I should document this here since it's the first thing that it made me think of when i found out about it. as it turns out, rsync does have its own "global status", which has similar tradeoffs than git-annex (namely that it doesn't work in "incremental mode"). from #debian-til on OFTC:
So rsync allows you to switch back to the "costly mode" which does a full scan before starting. It's slower, but it allows you to get global progress info. It's a nice tradeoff and it's especially useful to be able to enable it on deman. I understand this might be complicated to implement in git-annex because there are many places where such an option would be required (and it's unclear how it would be named), but it's something that would certainly be useful for my use cases, where some repos have large files but not so many so are fairly fast to scan (e.g. i could do this on my video repo, but not the music repo). -- anarcat