
|
[si-list]
||
[Date Prev]
[05-2005 Date Index]
[Date Next]
||
[Thread Prev]
[05-2005 Thread Index]
[Thread Next]
[SI-LIST] Comments on "Do you really ship products at BER 10e-xx ?"
- From: "Mike Williams" <mike.williams@xxxxxxxxxxxxxxxxxx>
- To: <si-list@xxxxxxxxxxxxx>
- Date: Fri, 6 May 2005 00:54:28 -0400
[SI-LIST] Do you really ship products at BER 10e-xx ?
* From: Chris Cheng <Chris.Cheng@xxxxxxxxxxxx>
* To: si-list@xxxxxxxxxxxxx
* Date: Tue, 12 Apr 2005 13:49:09 -0700
I've been shipping Gb/s serial products for a while and have my share of
fail parts. However, I have yet to see a physical channel that is not either
working like a charm or just fall on its face and barfing errors like crazy.
Sure, chips or disk can fail and generates errors but no flaky channels that
spits an error every other hour or days. To me, the channel is either have a
BER that is near 1 (barfing errors like crazy) or near 0 (never fail, or at
least approaching the life of the product it is attached to). Are we just
kidding ourselves with these fancy BER analyzers or jitter
instruments ? Do you really let a machine runs at say BER 10e-12 and say "ah
ha, it only fails once a day and let's ship it" ? Is BER really meant for
IEEE spec committees and not for real engineers who actually have to ship a
product ?
Pasted from
<http://www.freelists.org/archives/si-list/04-2005/msg00131.html>
Hi Chris,
I'm not a regular participant in this list. A colleague forwarded your post
on to me as part of an ongoing discussion we have been holding about the
legitimacy (or actually lack thereof) of the thought models in popular use
around understanding jitter in serial data signals, and another brought the
SI list itself to my attention recently. I wrote the following a couple of
weeks ago but absent-mindedly forgot to post it. Let's see if this still
works.
The question you pose is a great one, and it gets to the core of a matter I
am very close to.. I am simultaneously a long-time harsh critic of
statistical jitter decomposition (Rj/Dj/.), I also own a company that makes
a widely used set of statistical jitter decomposition tools and in that
capacity, I invest a large part of our R&D dollars and engineering bandwidth
in expanding our understanding of this kind of analysis. While those don't
sound like they go together, I find that to be effective in this situation,
one has to accept that there are two contradictory yet essential
philosophies in play.
Regarding your questions, I agree intensely with what is implied by them..
that to a mind trained to reason as an engineer, the current approaches
which engineers are being told to employ raise more questions than they
answer. Statistical jitter decomposition (I'll use SJD from here to save
typing) is one of many ways of abstracting the timing behavior of a signal.
In the span of just a few years, it has moved from near-total obscurity
among practicing engineers to being pervasive and the virtually unchallenged
approach to how one must measure jitter in serial data links. SJD has
quickly achieved the status of an entrenched orthodoxy despite the existence
of a formidable body of reasons to question its validity even as a general
approach.
That list of concerns is just too expansive to give any sort of detailed
treatment in a casual exchange like this. I have my own little personal
taxonomy I use just to keep all the issues organized:
Theoretical Issues/Concerns - which center around Gaussian/Central-Limit
abuse, misuse/misunderstanding of how Nyquist applies in the modulation
domain, the fact that the "standard buckets" in the Rj/Dj/. taxonomy don't
always hold up even in common situations, and the appropriateness of how SJD
is employed in abstracting important common pathologies with dynamics that
just can't be seen or represented using that method.
Practical Issues/Concerns - The "mathematical machinery" employed to grind
these results out can instill large and unpredictable effects in the final
numbers, the suitability of the BER "thought model", HUGE correlation issues
and significant and uncontrolled (at the spec level) implementation
differences that exist between one solution and another, and the fact that
it's a moving target. this "stuff" is being made up as the industry goes
along.
Philosophical Concerns - A measurement problem being turned into a
combination political-math problem, the unfortunate everyday necessity to
balance concerns around methodological validity with the reality that
"everything requires SJD" to get out the door, abstraction and
intransparence (the "you don't know what you don't know" effect") among the
implementers and spec-makers, validation of methods and the range of
pitfalls built into that as well.
Again, this is a long list and my intent is not to do a deep dive here but
to frame up both agreement with your observation as well as explain one
perspective on navigating the morass. Under the most commonly cited document
that blesses these things, there are numerous platforms and algorithmic
approaches listed resulting in a cross-product of perhaps 30 different ways
to do it, and that figure is further elevated by the fact that numerous
implementers each have their own method that purports to derive from one of
those blessed combinations. Under a broad range of applied jitter types,
each of those approaches differ, often dramatically from each other.
Different answers. Different convergence/divergence behaviors.. Differing
abilities to even see common pathologies (e.g. "BER blooming").. Different
repeatabilities and accuracies. So here comes the first of many clues that
illustrate the sanity rating of the current SJD mindset in the industry.
THEY ALL ARE "RIGHT". As un-engineering-like as it sounds, they're all
"blessed", so go ahead and pick the one that gets your parts out the door.
I'm not representing that as the kind of engineering I advocate, but "the
engineering cops" couldn't write you a ticket for operating in that mode.
To get a little bit of the flavor of the problems built in to where SJD is
today and where it seems to be headed, let me dive in deeper on just one of
the points raised above. the implementation mechanics of SJD. Being in
clocks and their measurement for 25 years, I have been a close observer of
the SJD trend going back to when it started (I should actually say
"restarted".. you can go back to at least the 50's in the active engineering
literature with sort of a renaissance having taken place in that literature
in the late 70's and early 80's). As noted, we have spent a great deal of
time and energy studying the mechanics of the various suggested
Rj/Dj/Tj/Pj/. for several years now. All of the suggested methods "work" at
the level of a quickie whiteboard lecture. you can see how at the big
picture level the results look like they should be what is sought. At this
high level, they make sense. As you dig in deeper, you find that there is a
fascinating range of "gotchas". a vast range of behaviors and effects that
can have significant unanticipated impact on the final result. An entire
layer of impact completely unaddressed by the spec-makers as well as the
vast majority of solution providers. It's virtually not on the radar screen
at all other than in a few very small pockets at some of the larger
customer-side companies. Of all of the issues in the lists above, to my eye,
this has proven by far to be the broadest. Some of the uncontrolled effects
of SJD mechanics to which I am referring are:
1. Complex/unanticipated interactions between the signal dynamics and
algorithm's mechanics
2. Convergence/divergence effects
3. Misrepresentation due to "measure-predict cycling"
4. Encountering jitter behaviors not anticipated by your SJD
implementation
5. Embedded unanticipated behaviors, and the impact of implicit
assumptions
One example from a lecture I gave on this at one of our customers recently
that seemed to illustrate it well for them was this. Consider that in many
approaches, you are building up your model of overall system timing from the
rarest events seen. For example, you might have many millions of events in a
measurement population but the curve-fit process only applies maybe a
thousand points to the actual result. The consequence is that even a small
change entering in to the rare-event population (i.e. finally "seeing"
something that fills in the tail a little better) can have a stark impact on
the BER estimate and dynamics.
The "issues and concerns" side is enormous, and while we only talked about
the tips of the icebergs, you hopefully at least get the flavor. Let's shift
gears now because there IS another important side to this. That is, some
engineer, some place out there got to work this morning and a piece of paper
told him he HAD to measure Rj/Dj/.. on his part in order to get it out the
door. His reality. many engineer's reality.. is that SJD as it is now
imagined is a very real part of life in the lab. For the foreseeable future,
that's all that really matters in their orbit.
As a resource on timing and timing measurement for our clients and partners,
we can and do try to refine the "why" behind our criticism of the
appropriateness of SJD as an abstraction of the realistic kinds of jitters
one can reasonably encounter. However, as a solution provider, we have to
start with a different philosophy.. that it's accepted by the industry, and
that we have to provide a product that addresses as many of the gotchas as
possible. Your question asks whether the products are meant for real
engineers trying to ship a product, so here's one provider's view of that. I
feel that IF you are going to go down the SJD path that it IS possible to
provide a method that can deliver numbers that are accurate and repeatable
UNDER THE Rj/Dj/.. BELIEF SYSTEM.. without pain.. IF it is used properly.
Our own approach is as follows, and I'll be brief and try to remain at a
general product nonspecific level. We have studied, modeled and analyzed the
suggested approaches for several years as a primary focus of our everyday
engineering work. In this work, we have worked with outside experts in
rare-events prediction as well as one of the individuals credited with
having developed the mathematics that underlie decomposition in general back
in the 70's. I've actually known him since that time but only discovered a
few years ago that side of his career. Small world. We have identified a
significant range of effects/mechanics built in to the suggested methods
that will to a certainty cause problems. We have used that knowledge to
craft an independent approach that steers clear of the known issues (e.g.
accuracy, convergence, repeatability and stability effects).
In concert with that, we built up a synthesizer that can create all of the
various kinds of stationary jitter one can possibly expect to encounter. the
universe of stationary jitter under the standard SJD thought model. This
synthesis system originally broke that universe down into 15,000 regions but
the current model, which pushes the edges of that universe out a bit
further, breaks it down into just shy of 10 million regions. We use the
synthesizer to push all 10 million flavors of jitter through an unmodified
version of decomposition method we fashioned. To look at the results that
emerge from that, you would definitely see that there are places where the
results differ quite significantly from what was synthesized, and you would
also note that they are extremely consistent and repeatable. I attribute
this repeatability primarily to avoiding the algorithmic pitfalls referred
to above.
The next step is that we then submit both the expected and actual results to
a neural network-based system of our own design that attempts to calibrate
the difference between expected and actual to as small a value as we can
make it over the vast majority of the error surface. In reality, we don't
just rely on the neural approach because after staring at the underlying
mechanics for so many years, we have some engineers that are pretty good at
recognizing ways to improve the error that exceed what the neural approach
can do on its own. In the end it's iterative. let the network create a
calibration scheme. study and tweak it (i.e. tweak how the network operates)
and then grind away some more. The process is constantly revealing new
insights. Most recently, we've found some dependencies on jitter behaviors
that can be further improved by moving from static to dynamic calibration.
That is. not just creating a huge cal scheme that is built into the product
when it ships, but which also does some dynamic calibration as it runs. I
would say that the improvements attributable to that will be less felt in
the sort of repetitive short patterns that seem common now among our
customers, and will make an observable difference especially on live data
and a special impact on data that has moderate to significant ISI. It all
counts.
The calibration addresses something really obvious, but which is not even
considered at the spec level. It's unnecessary. any of the blessed methods
are fine under the spec. What it means for the "real engineer trying to get
parts out the door" is that if the measured number is too high, a result
from a calibrated and fully validated system is significantly more likely to
indicate a part really is doing something undesirable rather than an
over-representation by the SJD.
There are other things that can be set in the SJD mechanics/process, but
which there is no rational reason to set them up one way versus another. For
example, some engineers want to see when BER blooming occurs. In even very
expensive and stable sources, BER blooming does occur and it's more common
than you might think in even well-fixtured devices, though many SJD schemes
can't see that kind of dynamicism. It's useful, I suppose to an engineer in
debug mode, but we also see engineers who want it "blended out" which is
more rational than it may seem on the surface since the way the rare-events
math reacts to the blooming (sharply at times, mentioned above) can be
distracting. So since the specs go no where near this level of consideration
yet it impacts the usability of the tool, you have to give the choice to the
operator and pick a default state that is guaranteed to tick half of the
people who use it off. Lots of design decisions fall into this category of
having to make choices that the specs should have made for us.
So. are the tools for the spec-makers or are they really for engineers that
have to ship a product? My opinion is that the spec-makers have too much
influence over what ultimately appear in the tools, and not nearly enough
skin in the game in terms of having to use what they specify. There are more
marketing titles and commercially driven interests on these committees than
you'd probably want to see, and so you get what you get. But keep in mind,
in the end, your company and your industry sector decided to accept whatever
specs you are forced to treat as the gospel. That makes it a tool for those
engineers I guess. But not the only tool either. When they don't hit their
numbers, they will need additional tools too. SJD is in the same class as
dice when it comes to debug utility (note to self. file for patents on Rj/Dj
dice). Good debug tools (I contend that nothing can match the modulation
domain for doing effective timing debug) will also help you understand the
reason your parts are not hitting their number.. is it the tool or the
device? And that's actually where using a validated approach can save a
bunch of time as well.
The industry is presently in a very strange place.. a place that is hard to
defend. We accept at face value things that provide almost boundless reasons
to pursue other choices, but revert to believing what we're told.. it's in
the spec so it must be true. The result of this being the significant yet
unnecessary daily frustrations that follow from the bad choice. Yet.. it is
incredibly uncommon to see the bad choices actually identified and
challenged as the root of why their frustrations in this space are mounting.
A mindset that we consider crucial to being able to look at statistical
jitter decomposition objectively is a refusal to accept as unquestionable
that Rj/Dj separation is correct.. and that the analysis of timing in serial
data streams is a leadership problem.. a philosophy problem.. an
epistemological problem. and most definitely a measurement problem. But it's
not a math problem. If someone's solution involves invoking yet one more
thing they learned in stochastic processes, they're part of the problem, not
part of the solution, in my considered opinion. What we have is broken and
better ways are needed. The solution seems obvious, but also, the "they
don't know what they don't know" force is strong. I'm sure better methods
will materialize quite quickly as soon as enough people are willing to
openly articulate what the emperor is wearing.
If you're interested, I've written a few papers about these matters.. "A
Discussion of Rj/Dj/.. Compliance Measurements", and another on the work
we've done over the last year w.r.t. calibrating the error surface for each
of the various jitter terms of our own Rj/Dj/.. tool set to within 2ps or 5%
over the universe of stationary jitter. The second one will be revised in a
few weeks to include the expanded jitter space and dynamic calibration but
the original still covers the core of what we have done in this area pretty
well. Just send an email to "info AT amherst-systems DOT com" and ask for
the "Rj/Dj paper" and/or the "validation paper" and someone will get them
right out to you. I'm not big on email myself.
Chris.. I didn't have time to write something short.. I hope after all these
words, I provides some further insight into the situation. Good luck.
Best regards,
Mike
--
Mike Williams
Pres.
ASA Corp.
www.TheJitterSolution.com <http://www.thejittersolution.com/>
------------------------------------------------------------------
To unsubscribe from si-list:
si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field
or to administer your membership from a web page, go to:
http://www.freelists.org/webpage/si-list
For help:
si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field
List FAQ wiki page is located at:
http://si-list.org/wiki/wiki.pl?Si-List_FAQ
List technical documents are available at:
http://www.si-list.org
List archives are viewable at:
http://www.freelists.org/archives/si-list
or at our remote archives:
http://groups.yahoo.com/group/si-list/messages
Old (prior to June 6, 2001) list archives are viewable at:
http://www.qsl.net/wb6tpu
|

|