[haiku-commits] Re: r35129 - haiku/trunk/src/kits/tracker

  • From: Stephan Assmus <superstippi@xxxxxx>
  • To: haiku-commits@xxxxxxxxxxxxx
  • Date: Mon, 18 Jan 2010 10:04:46 +0100

On 2010-01-18 at 00:25:03 [+0100], Ingo Weinhold <ingo_weinhold@xxxxxx> 
wrote:
> 
> On 2010-01-17 at 21:19:15 [+0100], Axel Dörfler <axeld@xxxxxxxxxxxxxxxx> 
> wrote:
> > Stephan Assmus <superstippi@xxxxxx> wrote:
> > > On 2010-01-17 at 20:00:35 [+0100], Axel Dörfler 
> > > <axeld@xxxxxxxxxxxxxxxx>
> > > > wrote:
> > > > superstippi@xxxxxx wrote:
> > > > > Implemented display of current size/total size copied and current 
> > > > > copy
> > > > > speed.
> > > > > It will begin to play after a short time (10 seconds).
> > > > Nice! But isn't 10 seconds much too long? Many operations won't 
> > > > even take
> > > > that long, and I would still like to know the copy speed in those 
> > > > cases -
> > > > it might already be more or less accurate much earlier (like 2 
> > > > seconds I
> > > > would guess).
> > > Unfortunately, it's quite erratic with our current I/O scheduler. 
> > > With 2
> > > seconds intervals, it may change for example between 70 MB/s and 7 
> > > MB/s
> > > (same harddisk). Also, if the whole process lasts less than 10 
> > > seconds anyway, I don't really see the point in knowing how fast it 
> > > is. In any
> > > case, it is possible to display it almost immediately by changing the 
> > > algorithm slightly. I'll play with that. It will certainly be at the 
> > > expense of raising false hopes in the beginning of a copy process... 
> > > :-D
> > 
> > Thanks! I'd say we could keep it like this for a while, and if we then 
> > decide waiting a bit more is the better idea after all, then we could 
> > just change it back (or find something inbetween that works good 
> > enough).
> 
> A few weeks ago I was thinking about how one could estimate reasonable 
> latencies for media nodes that need to do I/O. It would be relatively 
> easy for the I/O scheduler to collect statistics and to provide an API to 
> make those accessible to userland. Unfortunately it's not trivial to 
> infer from a given file path what the underlying responsible I/O 
> scheduler is. Even worse, for a network file system that doesn't help at 
> all. So while the I/O scheduler stats are already helpful info (also for 
> gadgets like ProcessController and ActivityMonitor), a FS interface 
> extension to report/estimate stats would be needed as well.
> 
> Back to the issue at hand: Such data provided by the file system could be 
> used to compute worst/average case estimates for the copy operation even 
> before starting. In fact those could be way more reliable than estimates 
> computed from measuring the first seconds of the process.
> 
> > The I/O scheduler and most importantly, the write back strategy will 
> > definitely change in the future, so that might bring more stable 
> > numbers, too.
> 
> The main problem will remain, though: If you start a copy operation with 
> empty caches and the source and target FS lie on different drives the 
> copy speed of the first seconds should be bound only by the read speed of 
> the source. With a lot of RAM those first seconds might be considerably 
> more than just a few seconds actually. Of course there are other 
> important factors -- like what kinds of files files are copied (small 
> ones with lots of attributes vs. huge ones) or whether other I/O on the 
> same disk is going on in parallel -- but I guess one will mostly get too 
> optimistic estimates when computing them based on the actual progress 
> made in the first few seconds.
> 
> Anyway, an FS performance estimate API doesn't exist yet, and since it 
> will be quite a bit of work, I've decided not to work something like this 
> for the time being. So unless someone else makes it happen, it won't be 
> available anytime soon. To improve the ETAs, the following could be 
> considered:
> 
> * Assume that the transfer rate for the first seconds is too optimistic 
> and 
> use a lower value (e.g. start with 1/2) for computing estimates. The 
> total memory size could be taken into account to guess when the measured 
> figures will not be influenced by short-term caching effects anymore.
> 
> * When the estimate suggests a short total time, I'd start earlier 
> displaying it, e.g. after 10% of the estimated time (with an absolute 
> minimum of 1 or 2 seconds).
> 
> * Tracker could store copy stats and use them for estimates for later 
> operations. Possibly even persistently, though that might be over the top.

A lot of factors go into how long such an operation will take, not only how 
many files/folders need to be created versus how much data is going to be 
moving and on which devices. The access pattern of writing back delayed 
blocks seems to have the greatest impact in making the speed measurements 
instable. But even if you could take all that into account at the beginning 
of the operation, you don't know what the user may do in the future while 
the operation runs. Therefor, the numbers will always have to be adjusted 
during the operation. The current algorithm can certainly be improved, but 
it tries to average the speed from about a past 20 second time-window at 
any given time. The estimated finish time is a total average from when the 
operation started or from the last time the operation was unpaused. This 
isn't ideal, since when other factors begin to slow down the operation 
(possibly outside of Tracker's scope, like when you suddenly "svn stat" 
your tree on the same disk), the estimated finish time will constantly 
shift into the future. But that could even be considered ok, since there is 
a chance that the interfering operation will stop and the time will become 
stable again. Of course one could use a much bigger time window for the 
estimated finish time and disregard measurements from too long in the past. 
It would be easy to adjust the algorithm like that. However, to strive for 
completely accurate numbers is a waste of time, IMHO, since like I said, 
you cannot know what will happen during the operation anyway, so you have 
to work with some sort of time-window and make your estimate be based on 
that.

Best regards,
-Stephan

Other related posts: