[argyllcms] Re: DeviceLink Profile refining

  • From: Gerhard Fuernkranz <nospam456@xxxxxx>
  • To: argyllcms@xxxxxxxxxxxxx
  • Date: Fri, 16 Feb 2007 00:34:05 +0100

marcel nita wrote:
> On 2/14/07, Graeme Gill <graeme@xxxxxxxxxxxxx> wrote:
>> Looking at the PremiumGlossyPhoto results you posted, these
>> don't look unreasonable to me. Average and peak errors have halved,
>> so overall that's a pretty good result. I can understand that it's
>> not so good that the white has got worse, but I guess this is
>> the influence of trying to correct colors near white. You might
>> try adding several white test patches as a way of increasing
>> the weighting of the white error (the latest version of refine
>> will give the lightest patch an increased weight of 5 automatically.)
>> Another thing you could try is to leave out test patches of colors
>> that get worse, 

My feeling is that it should also be clarified first, whether they
really get worse WITH STATISTICAL SIGNIFICANCE. I think that there is a
misunderstanding, how measurements are to be interpreted. We must always
keep in mind that we're dealing here with RANDOM VARIABLES (see
http://en.wikipedia.org/wiki/Random_variable), and samples of random
variables cannot be compared deterministically. And noisy measurements
are eventually samples of random variables.

Let's illustrate this with an example. Assume for instance a perfect,
error-free profile prof_A, and another profile prof_B with a systematic
error of 1dE. Let's further assume that the printer's repeatability and
the measurements have together a random error of 1dE RMS (or 0.92 dE
avg; for simplicity of the simulation I'm assuming 3-variate
uncorrelated Gaussian i.i.d. noise in CIELAB space).

Given the above assumption, we know that the error of prof_B is exactly
1dE larger than the error of prof_A.

However, if we MEASURE the printed result using prof_A and the result
using prof_B, which errors will we observe?

If we make again and again a print with prof_A and a print with prof_B,
measure them, and compare each pair to the reference, then we will
recognize, that measured_error(A) < measured_error(B) is observed only
in 74% of all cases. But in the other 26% of the cases we will even
observe measured_error(A) > measured_error(B) !!! although prof_A's
error is known to be 1dE LOWER than prof_B's error.

What I want to say is, if we have just a single measurement for each
patch with the old profile, and only a single measurement for each patch
with the refined profile, and if we observe new_error > previous_error
for a particular patch, then this single observation pair is rather not
statistically significant enough in order to conclude that the error of
the refined profile is really larger than the error of previous old
profile for this patch (except if the observed difference is huge,
compared to the printer's repeatability).

We can indeed improve the significance of our measurements, but only at
the cost of making more prints and more measurements.

If we would e.g. make 100 prints with the old profile and average the
measurements, and make 100 prints with the refined profile and average
these measurements too, and then compare the errors of the averaged old
and new measurements w.r.t the reference, then we could make a
comparison with much better significance, since the averaging of 100
samples reduces the observed random error by a factor of 10, but does
not influence the systematic error of the profile.

Applied to the above example, it would now be almost unlikely to observe
error(A) > error(B) for the averaged measurements at all. And in 99% of
all cases we would now observe error(B)-error(A) > 0.74 dE, and only in
1% of all cases we would observe a difference smaller than 0.74.

Compared to the non-averaged measurements this is a big improvement,
since with a single measurement we only can achieve error(B)-error(A) >
-1.1 at the 99% significance level !!!

So in order to get a feeling for the magnitude of the printer's
repeatability error, and how the observed error can be approximately
decomposed into the systematic error of the profile and the random
repeatability error of the printer (where the latter cannot be
eliminated by any profile), I would suggest to print N copies of the
target using the SAME profile, preferably each copy with a different
spatial randomization of the patches in order to account for spatial
variations of the printer, and then to evaluate the resulting N
measurement sets statistically. The larger N is chosen, the better of
course. N=100 or more might be good, but I guess, you won't have so much
patience :-) But even for just N=2 it should be possible to estimate at
least the magnitude of the printer's average error over all patches (of
course N=2 is not sufficient at all for individual noise estimates for
each patch). So to get a first clue, you can start by printing,
measuring and comparing just two copies of the target, using the same
profile for both prints (but preferably using a different spatial
randomization of the patches for the two prints, if possible). Then
compare each of the two measurement sets with the reference, but
primarily also compare them with each other (which should give a zero
error for each patch if the printer's repeatability and the measurements
were perfect - but of course they aren't, so expect to observe a
significant difference between the prints made with the SAME profile too).

Btw, Graeme, it might also be interesting if verify had an option to
print the error summary excluding out-of-gamut patches, since they are
to be considered as outliers, which cannot be corrected anyway. Thus
they obfuscate how well the refinement works for correctable in-gamut
patches. In the -v output, they could possibly be marked as out-of-gamut
too. I just don't know whether this would be easy to implement?


>> or edit the test results for those patches, making
>> them the same as the target, thereby selectively "turning off"
>> further corrections for those points. Clumsy, but it might help.
> I will try this as soon as I will get the chance. It seems like a good
> idea, because this is what I need: to stop samples going further away
>> from reference.
> Adding more white patches sounds better, but testing will tell.
> Thank you.
>> Having said all that, I've have had a bit of a play with refine, and
>> think I've
>> struck upon a slightly better scheme to deal with out of gamut
>> points, that
>> allows efforts to correct them without the correction "running away"
>> and causing things to get worse. I still notice some regressions for
>> dark out of gamut points though. The overall improvement is slight in
>> my tests, but may be worthwhile in improving behaviour for the critical
>> near white colors. If you're running MSWindows, you can try out this
>> version of refine here <http://www.argyllcms.com/refine.zip>.
> Is this in the development tree? Because I am not using the
> executables directly, but a small program which contains icclink +
> refine + interface( measurement statistics and target displayed
> visually ).
> If it is not, then I will have to wait until you put it there.
> Thank you again,
> Marcel.

Other related posts: