[argyllcms] Re: bin/average: averaging and possible outlier elimination for three or more .ti3 sets?

From: "Alastair M. Robinson" <profiling@xxxxxxxxxxxxxxxxxxxxxxx>
To: argyllcms@xxxxxxxxxxxxx
Date: Sat, 29 Aug 2009 12:25:08 +0100

Hi :)

Craig Ringer wrote:

Here's an example error spike:

5: 83.070051 1.346918 2.895409 <=> 82.505987 1.360226 2.779120  de 0.576080
6: 82.110832 1.536077 2.992189 <=> 81.607727 1.437567 2.733480  de 0.574238
7: 40.886825 4.316704 18.048722 <=> 82.063784 1.539669 2.906388  de 43.960711   
 **** Huge error spike ****

Very likely to be a reading glitch - we saw similar problems when Robertfrom the Gutenprint project tried out his i1Pro and GPLin. I've beenmeaning since then to put together something myself to help with outlierelimination, but never got that far.

I know I can average .ti3 sample sets using bin/average. However, it
only seems to accept a pair of .ti3 inputs at a time, and averaging
consecutively isn't going to produce an ideal result.

No, I guess average is intended to account for process variation, ratherthan spikes due to chart misreads.

Also, is there any good built-in way to eliminate outliers in the .ti3
files, or will I need to roll my own ? For that matter, is it wise to do
outlier elimination at all?

While I've not used an i1 myself, I've heard it said that it's not justwise but vital. Having said that, you don't want to eliminate processvariation - just misread patches.

If there's no existing method I'm missing, and if what I want to do
actually seems like a good idea to the folks here, I'm thinking of
seeing if I can extend `average.c' to handle more than two input files.

I think that would be extremely useful. What I was planning, but nevergot around to, was to either extend average or create a new analogousutility with the ability to take three or more files, and find themedian, rather than the mean average.

I'd try to add basic outlier elimination for when it has three or more
inputs, with the outlier elimination threshold shrinking as the number
of input files grows. At this point I'm thinking that any sample more
than three (maybe even two) standard deviations from the mean is
probably a reasonable candidate for outlier elimination.

So where you have an error spike you'd prefer to cull that sample fromall files rather than pick one to use?

Another thing that might work if you don't have enough samples to use amedian, is to fit an RSPL to each .ti3 file (which is really easy,thanks to Argyll's libraries), then for each data point compare itagainst an interpolated value from the RSPL, and see which set of datahas the best fit at that point.


Hope this gives you food for thought!

All the best,
--
Alastair M. Robinson

Follow-Ups:
- [argyllcms] Re: bin/average: averaging and possible outlier elimination for three or more .ti3 sets?
  - From: Klaus Karcher
- [argyllcms] Re: bin/average: averaging and possible outlier elimination for three or more .ti3 sets?
  - From: Craig Ringer
- [argyllcms] Re: bin/average: averaging and possible outlier elimination for three or more .ti3 sets?
  - From: Craig Ringer

References:
- [argyllcms] bin/average: averaging and possible outlier elimination for three or more .ti3 sets?
  - From: Craig Ringer

[argyllcms] Re: bin/average: averaging and possible outlier elimination for three or more .ti3 sets?

Other related posts: