[SI-LIST] Re: Comments on "Do you really ship products at BER 10e -xx ?"

From: "Grasso, Charles" <Charles.Grasso@xxxxxxxxxxxx>
To: 'Chris Cheng ' <Chris.Cheng@xxxxxxxxxxxx>, "'si-list-bounce@xxxxxxxxxxxxx '" <si-list-bounce@xxxxxxxxxxxxx>, ''Mike Williams ' ' <mike.williams@xxxxxxxxxxxxxxxxxx>, "''si-list@xxxxxxxxxxxxx ' '" <si-list@xxxxxxxxxxxxx>
Date: Sun, 8 May 2005 09:18:34 -0600
 Hi Chris - Is 2Ps of Rj realistic. I mean - have we reached the
theoretical limits of our harware and have to jump to light
transmission on a pwb??

-----Original Message-----
From: si-list-bounce@xxxxxxxxxxxxx
To: 'Mike Williams '; 'si-list@xxxxxxxxxxxxx '
Sent: 5/8/2005 12:14 AM
Subject: [SI-LIST] Re: Comments on "Do you really ship products at BER 10e
-xx ?"

Mike,
Welcome to the list.
It is good to hear from you and Art from the vendor point of view. I am
particular intrigued by Art's reference paper on how different
instruments
starts to diverge under strong Dj which is what I believe is the true
challenge in characterizing real running system rather than some PLL in
a
test fixture. 
I also agree that it is not the instrument vendor's fault but rather it
is a
spec imposed by certain committee that without care and fully understood
by
engineers can extrapolate to unrealistic pessimism. I learn a lot from
both
on and off line discussion on this subject. 
However, one offline comment from a good friend still haunts me. At
~2-4Gb/s, a Rj of 2ps is still manageable, but what will life be if we
go
into 10Gb/s without PAM @ 2ps Rj ? I don't have the answer but my
suspicion
is half the industry will probably look the other way and come up with
another spec that conveniently spec it to work. This is like tester
guardband in the 90's. Everyone insist they have to triple or quadruple
tester guard band in timing equations for awhile and somehow after the
turn
of the millennium when cycle time getting smaller and smaller that
guardband
just shrunk to a negligible amount. 
But at certain point of time (say 10Gb/s), that 2ps of Rj (if it is
truly
unbounded and gaussian) I conveniently ignore may come back and bite me.
And
I cannot envision myself, nor my management will allow, to ship a
product
that flags a CRC error every 10e-12 BER. I am not talking about a BERT
analyzer extrapolating a bathtub curve, I am talking about real CRC
error
flags every few days that the service engineer will see. But that's
exactly
some of these standards allow. On the other hand, if we have to make
management really happy, we are really designing 10e-18 or 10e-20 like
some
response to the thread mention. That's a really long Rj tail ! Do we
really
have unbounded jitters that will grow to infinity given enough time ?

Good discussions, hopefully it can continue.
Chris


-----Original Message-----
From: Mike Williams
To: si-list@xxxxxxxxxxxxx
Sent: 5/5/2005 9:54 PM
Subject: [SI-LIST] Comments on "Do you really ship products at BER
10e-xx ?"

  
[SI-LIST] Do you really ship products at BER 10e-xx ?

*       From: Chris Cheng <Chris.Cheng@xxxxxxxxxxxx> 
*       To: si-list@xxxxxxxxxxxxx 
*       Date: Tue, 12 Apr 2005 13:49:09 -0700 

I've been shipping Gb/s serial products for a while and have my share of
fail parts. However, I have yet to see a physical channel that is not
either
working like a charm or just fall on its face and barfing errors like
crazy.
Sure, chips or disk can fail and generates errors but no flaky channels
that
spits an error every other hour or days. To me, the channel is either
have a
BER that is near 1 (barfing errors like crazy) or near 0 (never fail, or
at
least approaching the life of the product it is attached to). Are we
just
kidding ourselves with these fancy BER analyzers or jitter
instruments ? Do you really let a machine runs at say BER 10e-12 and say
"ah
ha, it only fails once a day and let's ship it" ? Is BER really meant
for
IEEE spec committees and not for real engineers who actually have to
ship a
product ?


 

Pasted from
<//www.freelists.org/archives/si-list/04-2005/msg00131.html> 

 

 

 

 

Hi Chris,

 

I'm not a regular participant in this list. A colleague forwarded your
post
on to me as part of an ongoing discussion we have been holding about the
legitimacy (or actually lack thereof) of the thought models in popular
use
around understanding jitter in serial data signals, and another brought
the
SI list itself to my attention recently. I wrote the following a couple
of
weeks ago but absent-mindedly forgot to post it. Let's see if this still
works.

 

The question you pose is a great one, and it gets to the core of a
matter I
am very close to.. I am simultaneously a long-time harsh critic of
statistical jitter decomposition (Rj/Dj/.), I also own a company that
makes
a widely used set of statistical jitter decomposition tools and in that
capacity, I invest a large part of our R&D dollars and engineering
bandwidth
in expanding our understanding of this kind of analysis. While those
don't
sound like they go together, I find that to be effective in this
situation,
one has to accept that there are two contradictory yet essential
philosophies in play. 

 

Regarding your questions, I agree intensely with what is implied by
them..
that to a mind trained to reason as an engineer, the current approaches
which engineers are being told to employ raise more questions than they
answer. Statistical jitter decomposition (I'll use SJD from here to save
typing) is one of many ways of abstracting the timing behavior of a
signal.
In the span of just a few years, it has moved from near-total obscurity
among practicing engineers to being pervasive and the virtually
unchallenged
approach to how one must measure jitter in serial data links. SJD has
quickly achieved the status of an entrenched orthodoxy despite the
existence
of a formidable body of reasons to question its validity even as a
general
approach. 

 

That list of concerns is just too expansive to give any sort of detailed
treatment in a casual exchange like this. I have my own little personal
taxonomy I use just to keep all the issues organized:

 

Theoretical Issues/Concerns - which center around Gaussian/Central-Limit
abuse, misuse/misunderstanding of how Nyquist applies in the modulation
domain, the fact that the "standard buckets" in the Rj/Dj/. taxonomy
don't
always hold up even in common situations, and the appropriateness of how
SJD
is employed in abstracting important common pathologies with dynamics
that
just can't be seen or represented using that method. 

 

Practical Issues/Concerns - The "mathematical machinery" employed to
grind
these results out can instill large and unpredictable effects in the
final
numbers, the suitability of the BER "thought model", HUGE correlation
issues
and significant and uncontrolled (at the spec level) implementation
differences that exist between one solution and another, and the fact
that
it's a moving target. this "stuff" is being made up as the industry goes
along. 

 

Philosophical Concerns - A measurement problem being turned into a
combination political-math problem, the unfortunate everyday necessity
to
balance concerns around methodological validity with the reality that
"everything requires SJD" to get out the door, abstraction and
intransparence (the "you don't know what you don't know" effect") among
the
implementers and spec-makers, validation of methods and the range of
pitfalls built into that as well. 

 

 

Again, this is a long list and my intent is not to do a deep dive here
but
to frame up both agreement with your observation as well as explain one
perspective on navigating the morass. Under the most commonly cited
document
that blesses these things, there are numerous platforms and algorithmic
approaches listed resulting in a cross-product of perhaps 30 different
ways
to do it, and that figure is further elevated by the fact that numerous
implementers each have their own method that purports to derive from one
of
those blessed combinations. Under a broad range of applied jitter types,
each of those approaches differ, often dramatically from each other.
Different answers. Different convergence/divergence behaviors..
Differing
abilities to even see common pathologies (e.g. "BER blooming")..
Different
repeatabilities and accuracies. So here comes the first of many clues
that
illustrate the sanity rating of the current SJD mindset in the industry.
THEY ALL ARE "RIGHT". As un-engineering-like as it sounds, they're all
"blessed", so go ahead and pick the one that gets your parts out the
door.
I'm not representing that as the kind of engineering I advocate, but
"the
engineering cops" couldn't write you a ticket for operating in that
mode. 

 

To get a little bit of the flavor of the problems built in to where SJD
is
today and where it seems to be headed, let me dive in deeper on just one
of
the points raised above. the implementation mechanics of SJD. Being in
clocks and their measurement for 25 years, I have been a close observer
of
the SJD trend going back to when it started (I should actually say
"restarted".. you can go back to at least the 50's in the active
engineering
literature with sort of a renaissance having taken place in that
literature
in the late 70's and early 80's). As noted, we have spent a great deal
of
time and energy studying the mechanics of the various suggested
Rj/Dj/Tj/Pj/. for several years now. All of the suggested methods "work"
at
the level of a quickie whiteboard lecture. you can see how at the big
picture level the results look like they should be what is sought. At
this
high level, they make sense. As you dig in deeper, you find that there
is a
fascinating range of "gotchas". a vast range of behaviors and effects
that
can have significant unanticipated impact on the final result. An entire
layer of impact completely unaddressed by the spec-makers as well as the
vast majority of solution providers. It's virtually not on the radar
screen
at all other than in a few very small pockets at some of the larger
customer-side companies. Of all of the issues in the lists above, to my
eye,
this has proven by far to be the broadest. Some of the uncontrolled
effects
of SJD mechanics to which I am referring are:

 

1.      Complex/unanticipated interactions between the signal dynamics
and
algorithm's mechanics
2.      Convergence/divergence effects
3.      Misrepresentation due to "measure-predict cycling"
4.      Encountering jitter behaviors not anticipated by your SJD
implementation
5.      Embedded unanticipated behaviors, and the impact of implicit
assumptions

 

One example from a lecture I gave on this at one of our customers
recently
that seemed to illustrate it well for them was this. Consider that in
many
approaches, you are building up your model of overall system timing from
the
rarest events seen. For example, you might have many millions of events
in a
measurement population but the curve-fit process only applies maybe a
thousand points to the actual result. The consequence is that even a
small
change entering in to the rare-event population (i.e. finally "seeing"
something that fills in the tail a little better) can have a stark
impact on
the BER estimate and dynamics. 

 

 

The "issues and concerns" side is enormous, and while we only talked
about
the tips of the icebergs, you hopefully at least get the flavor. Let's
shift
gears now because there IS another important side to this. That is, some
engineer, some place out there got to work this morning and a piece of
paper
told him he HAD to measure Rj/Dj/.. on his part in order to get it out
the
door. His reality. many engineer's reality.. is that SJD as it is now
imagined is a very real part of life in the lab. For the foreseeable
future,
that's all that really matters in their orbit. 

 

As a resource on timing and timing measurement for our clients and
partners,
we can and do try to refine the "why" behind our criticism of the
appropriateness of SJD as an abstraction of the realistic kinds of
jitters
one can reasonably encounter. However, as a solution provider, we have
to
start with a different philosophy.. that it's accepted by the industry,
and
that we have to provide a product that addresses as many of the gotchas
as
possible. Your question asks whether the products are meant for real
engineers trying to ship a product, so here's one provider's view of
that. I
feel that IF you are going to go down the SJD path that it IS possible
to
provide a method that can deliver numbers that are accurate and
repeatable
UNDER THE Rj/Dj/.. BELIEF SYSTEM.. without pain.. IF it is used
properly. 

 

Our own approach is as follows, and I'll be brief and try to remain at a
general product nonspecific level. We have studied, modeled and analyzed
the
suggested approaches for several years as a primary focus of our
everyday
engineering work. In this work, we have worked with outside experts in
rare-events prediction as well as one of the individuals credited with
having developed the mathematics that underlie decomposition in general
back
in the 70's. I've actually known him since that time but only discovered
a
few years ago that side of his career. Small world. We have identified a
significant range of effects/mechanics built in to the suggested methods
that will to a certainty cause problems. We have used that knowledge to
craft an independent approach that steers clear of the known issues
(e.g.
accuracy, convergence, repeatability and stability effects). 

 

In concert with that, we built up a synthesizer that can create all of
the
various kinds of stationary jitter one can possibly expect to encounter.
the
universe of stationary jitter under the standard SJD thought model. This
synthesis system originally broke that universe down into 15,000 regions
but
the current model, which pushes the edges of that universe out a bit
further, breaks it down into just shy of 10 million regions. We use the
synthesizer to push all 10 million flavors of jitter through an
unmodified
version of decomposition method we fashioned. To look at the results
that
emerge from that, you would definitely see that there are places where
the
results differ quite significantly from what was synthesized, and you
would
also note that they are extremely consistent and repeatable. I attribute
this repeatability primarily to avoiding the algorithmic pitfalls
referred
to above. 

 

The next step is that we then submit both the expected and actual
results to
a neural network-based system of our own design that attempts to
calibrate
the difference between expected and actual to as small a value as we can
make it over the vast majority of the error surface. In reality, we
don't
just rely on the neural approach because after staring at the underlying
mechanics for so many years, we have some engineers that are pretty good
at
recognizing ways to improve the error that exceed what the neural
approach
can do on its own. In the end it's iterative. let the network create a
calibration scheme. study and tweak it (i.e. tweak how the network
operates)
and then grind away some more. The process is constantly revealing new
insights. Most recently, we've found some dependencies on jitter
behaviors
that can be further improved by moving from static to dynamic
calibration.
That is. not just creating a huge cal scheme that is built into the
product
when it ships, but which also does some dynamic calibration as it runs.
I
would say that the improvements attributable to that will be less felt
in
the sort of repetitive short patterns that seem common now among our
customers, and will make an observable difference especially on live
data
and a special impact on data that has moderate to significant ISI. It
all
counts. 

 

The calibration addresses something really obvious, but which is not
even
considered at the spec level. It's unnecessary. any of the blessed
methods
are fine under the spec. What it means for the "real engineer trying to
get
parts out the door" is that if the measured number is too high, a result
from a calibrated and fully validated system is significantly more
likely to
indicate a part really is doing something undesirable rather than an
over-representation by the SJD. 

 

There are other things that can be set in the SJD mechanics/process, but
which there is no rational reason to set them up one way versus another.
For
example, some engineers want to see when BER blooming occurs. In even
very
expensive and stable sources, BER blooming does occur and it's more
common
than you might think in even well-fixtured devices, though many SJD
schemes
can't see that kind of dynamicism. It's useful, I suppose to an engineer
in
debug mode, but we also see engineers who want it "blended out" which is
more rational than it may seem on the surface since the way the
rare-events
math reacts to the blooming (sharply at times, mentioned above) can be
distracting. So since the specs go no where near this level of
consideration
yet it impacts the usability of the tool, you have to give the choice to
the
operator and pick a default state that is guaranteed to tick half of the
people who use it off. Lots of design decisions fall into this category
of
having to make choices that the specs should have made for us. 

 

So. are the tools for the spec-makers or are they really for engineers
that
have to ship a product? My opinion is that the spec-makers have too much
influence over what ultimately appear in the tools, and not nearly
enough
skin in the game in terms of having to use what they specify. There are
more
marketing titles and commercially driven interests on these committees
than
you'd probably want to see, and so you get what you get. But keep in
mind,
in the end, your company and your industry sector decided to accept
whatever
specs you are forced to treat as the gospel. That makes it a tool for
those
engineers I guess. But not the only tool either. When they don't hit
their
numbers, they will need additional tools too. SJD is in the same class
as
dice when it comes to debug utility (note to self. file for patents on
Rj/Dj
dice). Good debug tools (I contend that nothing can match the modulation
domain for doing effective timing debug) will also help you understand
the
reason your parts are not hitting their number.. is it the tool or the
device? And that's actually where using a validated approach can save a
bunch of time as well. 

 

The industry is presently in a very strange place.. a place that is hard
to
defend. We accept at face value things that provide almost boundless
reasons
to pursue other choices, but revert to believing what we're told.. it's
in
the spec so it must be true. The result of this being the significant
yet
unnecessary daily frustrations that follow from the bad choice. Yet.. it
is
incredibly uncommon to see the bad choices actually identified and
challenged as the root of why their frustrations in this space are
mounting.
A mindset that we consider crucial to being able to look at statistical
jitter decomposition objectively is a refusal to accept as
unquestionable
that Rj/Dj separation is correct.. and that the analysis of timing in
serial
data streams is a leadership problem.. a philosophy problem.. an
epistemological problem. and most definitely a measurement problem. But
it's
not a math problem. If someone's solution involves invoking yet one more
thing they learned in stochastic processes, they're part of the problem,
not
part of the solution, in my considered opinion. What we have is broken
and
better ways are needed. The solution seems obvious, but also, the "they
don't know what they don't know" force is strong. I'm sure better
methods
will materialize quite quickly as soon as enough people are willing to
openly articulate what the emperor is wearing. 

 

If you're interested, I've written a few papers about these matters.. "A
Discussion of Rj/Dj/.. Compliance Measurements", and another on the work
we've done over the last year w.r.t. calibrating the error surface for
each
of the various jitter terms of our own Rj/Dj/.. tool set to within 2ps
or 5%
over the universe of stationary jitter. The second one will be revised
in a
few weeks to include the expanded jitter space and dynamic calibration
but
the original still covers the core of what we have done in this area
pretty
well. Just send an email to "info AT amherst-systems DOT com" and ask
for
the "Rj/Dj paper" and/or the "validation paper" and someone will get
them
right out to you. I'm not big on email myself. 

 

Chris.. I didn't have time to write something short.. I hope after all
these
words, I provides some further insight into the situation. Good luck. 

 

Best regards,

Mike 

 

--

Mike Williams
Pres.
ASA Corp.
www.TheJitterSolution.com <http://www.thejittersolution.com/> 





------------------------------------------------------------------
To unsubscribe from si-list:
si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field

or to administer your membership from a web page, go to:
//www.freelists.org/webpage/si-list

For help:
si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field

List FAQ wiki page is located at:
                http://si-list.org/wiki/wiki.pl?Si-List_FAQ

List technical documents are available at:
                http://www.si-list.org

List archives are viewable at:     
                //www.freelists.org/archives/si-list
or at our remote archives:
                http://groups.yahoo.com/group/si-list/messages
Old (prior to June 6, 2001) list archives are viewable at:
                http://www.qsl.net/wb6tpu
  
------------------------------------------------------------------
To unsubscribe from si-list:
si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field

or to administer your membership from a web page, go to:
//www.freelists.org/webpage/si-list

For help:
si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field

List FAQ wiki page is located at:
                http://si-list.org/wiki/wiki.pl?Si-List_FAQ

List technical documents are available at:
                http://www.si-list.org

List archives are viewable at:     
                //www.freelists.org/archives/si-list
or at our remote archives:
                http://groups.yahoo.com/group/si-list/messages
Old (prior to June 6, 2001) list archives are viewable at:
                http://www.qsl.net/wb6tpu
  
------------------------------------------------------------------
To unsubscribe from si-list:
si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field

or to administer your membership from a web page, go to:
//www.freelists.org/webpage/si-list

For help:
si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field

List FAQ wiki page is located at:
                http://si-list.org/wiki/wiki.pl?Si-List_FAQ

List technical documents are available at:
                http://www.si-list.org

List archives are viewable at:     
                //www.freelists.org/archives/si-list
or at our remote archives:
                http://groups.yahoo.com/group/si-list/messages
Old (prior to June 6, 2001) list archives are viewable at:
                http://www.qsl.net/wb6tpu
[SI-LIST] Re: Comments on "Do you really ship products at BER 10e -xx ?"

Other related posts: