On 2010-01-17 at 21:19:15 [+0100], Axel Dörfler <axeld@xxxxxxxxxxxxxxxx> wrote: > Stephan Assmus <superstippi@xxxxxx> wrote: > > On 2010-01-17 at 20:00:35 [+0100], Axel Dörfler <axeld@xxxxxxxxxxxxxxxx> > > > wrote: > > > superstippi@xxxxxx wrote: > > > > Implemented display of current size/total size copied and current > > > > copy > > > > speed. > > > > It will begin to play after a short time (10 seconds). > > > Nice! But isn't 10 seconds much too long? Many operations won't > > > even take > > > that long, and I would still like to know the copy speed in those > > > cases - > > > it might already be more or less accurate much earlier (like 2 > > > seconds I > > > would guess). > > Unfortunately, it's quite erratic with our current I/O scheduler. > > With 2 > > seconds intervals, it may change for example between 70 MB/s and 7 > > MB/s > > (same harddisk). Also, if the whole process lasts less than 10 seconds > > anyway, I don't really see the point in knowing how fast it is. In > > any > > case, it is possible to display it almost immediately by changing the > > algorithm slightly. I'll play with that. It will certainly be at the > > expense of raising false hopes in the beginning of a copy process... > > :-D > > Thanks! I'd say we could keep it like this for a while, and if we then > decide waiting a bit more is the better idea after all, then we could > just change it back (or find something inbetween that works good > enough). A few weeks ago I was thinking about how one could estimate reasonable latencies for media nodes that need to do I/O. It would be relatively easy for the I/O scheduler to collect statistics and to provide an API to make those accessible to userland. Unfortunately it's not trivial to infer from a given file path what the underlying responsible I/O scheduler is. Even worse, for a network file system that doesn't help at all. So while the I/O scheduler stats are already helpful info (also for gadgets like ProcessController and ActivityMonitor), a FS interface extension to report/estimate stats would be needed as well. Back to the issue at hand: Such data provided by the file system could be used to compute worst/average case estimates for the copy operation even before starting. In fact those could be way more reliable than estimates computed from measuring the first seconds of the process. > The I/O scheduler and most importantly, the write back strategy will > definitely change in the future, so that might bring more stable > numbers, too. The main problem will remain, though: If you start a copy operation with empty caches and the source and target FS lie on different drives the copy speed of the first seconds should be bound only by the read speed of the source. With a lot of RAM those first seconds might be considerably more than just a few seconds actually. Of course there are other important factors -- like what kinds of files files are copied (small ones with lots of attributes vs. huge ones) or whether other I/O on the same disk is going on in parallel -- but I guess one will mostly get too optimistic estimates when computing them based on the actual progress made in the first few seconds. Anyway, an FS performance estimate API doesn't exist yet, and since it will be quite a bit of work, I've decided not to work something like this for the time being. So unless someone else makes it happen, it won't be available anytime soon. To improve the ETAs, the following could be considered: * Assume that the transfer rate for the first seconds is too optimistic and use a lower value (e.g. start with 1/2) for computing estimates. The total memory size could be taken into account to guess when the measured figures will not be influenced by short-term caching effects anymore. * When the estimate suggests a short total time, I'd start earlier displaying it, e.g. after 10% of the estimated time (with an absolute minimum of 1 or 2 seconds). * Tracker could store copy stats and use them for estimates for later operations. Possibly even persistently, though that might be over the top. CU, Ingo