[SI-LIST] Re: Comments on "Do you really ship products at BER 10e-xx ?"

From: "Alfred P. Neves" <al.neves@xxxxxxxxxxx>
To: <mike.williams@xxxxxxxxxxxxxxxxxx>, <si-list@xxxxxxxxxxxxx>
Date: Sun, 8 May 2005 09:48:14 -0700
Mike,

Interesting post.

Isn't the problem you discussed germane to every engineer endeavor, and
NIST, NBS are organizations that provide traceable measurement methods
so that industry can formulate consistency?  I know that that there are
problems with prevalent RJ/DJ extraction methods and until there are
some very specific RJ/DJ/PJ golden standards are developed the industry
will be in a bit of a morass.

Accordingly, although Im not advocating their solution, I see the work
of Agilent, specifically Ransom Stephens work (others also) as very
important where they are generating, with the assistance of NIST,
traceable jitter quantities.  Some of this work was published in
DesigCon2005.  

Also, my understanding is that your software package utilizes real time
sampling oscilloscopes touted to have 3-6psec RMS of stochastically
fairly nasty time base accuracy- how does your software get around this
inherent instrument limitation?  It would be expected that you would
have a 10psec or so solution -is this correct?  It would be interesting
to see your data in how it corresponds for RJ/DJ extraction and how it
compares to other commercially available solutions and NBS tracable
values of RJ/DJ. 

What algorithms are you using for RJ/DJ extraction if not Dual-Dirac
extraction methods?   Although there is problems with the Dual-Dirac
RJ/DJ model the industry has embraced it, it is in the standards, and if
you had issue with it why were you not involved in the early stages of
the T11.2 committee?

Exactly for the reasons you discuss Teraspeed Consulting suggests using
(we are evaluating other solutions) Tektronix or Agilent DSO with delay
(not a real time scope), spectrum analyzer (phase noise analysis),
Wavecrest TIA and Agilent BERT measurement set.  Using these instruments
together in a concerted methodology where the net measurement result
agrees within 200femtoseconds.  It is the engineering method along with
a variety of software/hardware approaches that lends to the strength of
the approach, not the simple purchase of a software or hardware package.

In no way am I touting a vendors solution, just asking you to concisely
back up your assertions.



Alfred P. Neves
Teraspeed Consulting Group LLC 
121 North River Drive 
Narragansett, RI 02882
 
Hillsboro Office
735 SE 16th Ave.
Hillsboro, OR, 97123
(503) 679 2429 Voice
(503) 210 7727 Fax
 
Main office
(401) 284-1827 Business
(401) 284-1840 Fax 
http://www.teraspeed.com
 
Teraspeed is the registered service mark 
of Teraspeed Consulting Group LLC
 


-----Original Message-----
From: si-list-bounce@xxxxxxxxxxxxx [mailto:si-list-bounce@xxxxxxxxxxxxx]
On Behalf Of Mike Williams
Sent: Thursday, May 05, 2005 9:54 PM
To: si-list@xxxxxxxxxxxxx
Subject: [SI-LIST] Comments on "Do you really ship products at BER
10e-xx ?"


  
[SI-LIST] Do you really ship products at BER 10e-xx ?

*       From: Chris Cheng <Chris.Cheng@xxxxxxxxxxxx> 
*       To: si-list@xxxxxxxxxxxxx 
*       Date: Tue, 12 Apr 2005 13:49:09 -0700 

I've been shipping Gb/s serial products for a while and have my share of
fail parts. However, I have yet to see a physical channel that is not
either working like a charm or just fall on its face and barfing errors
like crazy. Sure, chips or disk can fail and generates errors but no
flaky channels that spits an error every other hour or days. To me, the
channel is either have a BER that is near 1 (barfing errors like crazy)
or near 0 (never fail, or at least approaching the life of the product
it is attached to). Are we just kidding ourselves with these fancy BER
analyzers or jitter instruments ? Do you really let a machine runs at
say BER 10e-12 and say "ah ha, it only fails once a day and let's ship
it" ? Is BER really meant for IEEE spec committees and not for real
engineers who actually have to ship a product ?


 

Pasted from
<//www.freelists.org/archives/si-list/04-2005/msg00131.html> 

 

 

 

 

Hi Chris,

 

I'm not a regular participant in this list. A colleague forwarded your
post on to me as part of an ongoing discussion we have been holding
about the legitimacy (or actually lack thereof) of the thought models in
popular use around understanding jitter in serial data signals, and
another brought the SI list itself to my attention recently. I wrote the
following a couple of weeks ago but absent-mindedly forgot to post it.
Let's see if this still works.

 

The question you pose is a great one, and it gets to the core of a
matter I am very close to.. I am simultaneously a long-time harsh critic
of statistical jitter decomposition (Rj/Dj/.), I also own a company that
makes a widely used set of statistical jitter decomposition tools and in
that capacity, I invest a large part of our R&D dollars and engineering
bandwidth in expanding our understanding of this kind of analysis. While
those don't sound like they go together, I find that to be effective in
this situation, one has to accept that there are two contradictory yet
essential philosophies in play. 

 

Regarding your questions, I agree intensely with what is implied by
them.. that to a mind trained to reason as an engineer, the current
approaches which engineers are being told to employ raise more questions
than they answer. Statistical jitter decomposition (I'll use SJD from
here to save
typing) is one of many ways of abstracting the timing behavior of a
signal. In the span of just a few years, it has moved from near-total
obscurity among practicing engineers to being pervasive and the
virtually unchallenged approach to how one must measure jitter in serial
data links. SJD has quickly achieved the status of an entrenched
orthodoxy despite the existence of a formidable body of reasons to
question its validity even as a general approach. 

 

That list of concerns is just too expansive to give any sort of detailed
treatment in a casual exchange like this. I have my own little personal
taxonomy I use just to keep all the issues organized:

 

Theoretical Issues/Concerns - which center around Gaussian/Central-Limit
abuse, misuse/misunderstanding of how Nyquist applies in the modulation
domain, the fact that the "standard buckets" in the Rj/Dj/. taxonomy
don't always hold up even in common situations, and the appropriateness
of how SJD is employed in abstracting important common pathologies with
dynamics that just can't be seen or represented using that method. 

 

Practical Issues/Concerns - The "mathematical machinery" employed to
grind these results out can instill large and unpredictable effects in
the final numbers, the suitability of the BER "thought model", HUGE
correlation issues and significant and uncontrolled (at the spec level)
implementation differences that exist between one solution and another,
and the fact that it's a moving target. this "stuff" is being made up as
the industry goes along. 

 

Philosophical Concerns - A measurement problem being turned into a
combination political-math problem, the unfortunate everyday necessity
to balance concerns around methodological validity with the reality that
"everything requires SJD" to get out the door, abstraction and
intransparence (the "you don't know what you don't know" effect") among
the implementers and spec-makers, validation of methods and the range of
pitfalls built into that as well. 

 

 

Again, this is a long list and my intent is not to do a deep dive here
but to frame up both agreement with your observation as well as explain
one perspective on navigating the morass. Under the most commonly cited
document that blesses these things, there are numerous platforms and
algorithmic approaches listed resulting in a cross-product of perhaps 30
different ways to do it, and that figure is further elevated by the fact
that numerous implementers each have their own method that purports to
derive from one of those blessed combinations. Under a broad range of
applied jitter types, each of those approaches differ, often
dramatically from each other. Different answers. Different
convergence/divergence behaviors.. Differing abilities to even see
common pathologies (e.g. "BER blooming").. Different repeatabilities and
accuracies. So here comes the first of many clues that illustrate the
sanity rating of the current SJD mindset in the industry. THEY ALL ARE
"RIGHT". As un-engineering-like as it sounds, they're all "blessed", so
go ahead and pick the one that gets your parts out the door. I'm not
representing that as the kind of engineering I advocate, but "the
engineering cops" couldn't write you a ticket for operating in that
mode. 

 

To get a little bit of the flavor of the problems built in to where SJD
is today and where it seems to be headed, let me dive in deeper on just
one of the points raised above. the implementation mechanics of SJD.
Being in clocks and their measurement for 25 years, I have been a close
observer of the SJD trend going back to when it started (I should
actually say "restarted".. you can go back to at least the 50's in the
active engineering literature with sort of a renaissance having taken
place in that literature in the late 70's and early 80's). As noted, we
have spent a great deal of time and energy studying the mechanics of the
various suggested Rj/Dj/Tj/Pj/. for several years now. All of the
suggested methods "work" at the level of a quickie whiteboard lecture.
you can see how at the big picture level the results look like they
should be what is sought. At this high level, they make sense. As you
dig in deeper, you find that there is a fascinating range of "gotchas".
a vast range of behaviors and effects that can have significant
unanticipated impact on the final result. An entire layer of impact
completely unaddressed by the spec-makers as well as the vast majority
of solution providers. It's virtually not on the radar screen at all
other than in a few very small pockets at some of the larger
customer-side companies. Of all of the issues in the lists above, to my
eye, this has proven by far to be the broadest. Some of the uncontrolled
effects of SJD mechanics to which I am referring are:

 

1.      Complex/unanticipated interactions between the signal dynamics
and
algorithm's mechanics
2.      Convergence/divergence effects
3.      Misrepresentation due to "measure-predict cycling"
4.      Encountering jitter behaviors not anticipated by your SJD
implementation
5.      Embedded unanticipated behaviors, and the impact of implicit
assumptions

 

One example from a lecture I gave on this at one of our customers
recently that seemed to illustrate it well for them was this. Consider
that in many approaches, you are building up your model of overall
system timing from the rarest events seen. For example, you might have
many millions of events in a measurement population but the curve-fit
process only applies maybe a thousand points to the actual result. The
consequence is that even a small change entering in to the rare-event
population (i.e. finally "seeing" something that fills in the tail a
little better) can have a stark impact on the BER estimate and dynamics.


 

 

The "issues and concerns" side is enormous, and while we only talked
about the tips of the icebergs, you hopefully at least get the flavor.
Let's shift gears now because there IS another important side to this.
That is, some engineer, some place out there got to work this morning
and a piece of paper told him he HAD to measure Rj/Dj/.. on his part in
order to get it out the door. His reality. many engineer's reality.. is
that SJD as it is now imagined is a very real part of life in the lab.
For the foreseeable future, that's all that really matters in their
orbit. 

 

As a resource on timing and timing measurement for our clients and
partners, we can and do try to refine the "why" behind our criticism of
the appropriateness of SJD as an abstraction of the realistic kinds of
jitters one can reasonably encounter. However, as a solution provider,
we have to start with a different philosophy.. that it's accepted by the
industry, and that we have to provide a product that addresses as many
of the gotchas as possible. Your question asks whether the products are
meant for real engineers trying to ship a product, so here's one
provider's view of that. I feel that IF you are going to go down the SJD
path that it IS possible to provide a method that can deliver numbers
that are accurate and repeatable UNDER THE Rj/Dj/.. BELIEF SYSTEM..
without pain.. IF it is used properly. 

 

Our own approach is as follows, and I'll be brief and try to remain at a
general product nonspecific level. We have studied, modeled and analyzed
the suggested approaches for several years as a primary focus of our
everyday engineering work. In this work, we have worked with outside
experts in rare-events prediction as well as one of the individuals
credited with having developed the mathematics that underlie
decomposition in general back in the 70's. I've actually known him since
that time but only discovered a few years ago that side of his career.
Small world. We have identified a significant range of effects/mechanics
built in to the suggested methods that will to a certainty cause
problems. We have used that knowledge to craft an independent approach
that steers clear of the known issues (e.g. accuracy, convergence,
repeatability and stability effects). 

 

In concert with that, we built up a synthesizer that can create all of
the various kinds of stationary jitter one can possibly expect to
encounter. the universe of stationary jitter under the standard SJD
thought model. This synthesis system originally broke that universe down
into 15,000 regions but the current model, which pushes the edges of
that universe out a bit further, breaks it down into just shy of 10
million regions. We use the synthesizer to push all 10 million flavors
of jitter through an unmodified version of decomposition method we
fashioned. To look at the results that emerge from that, you would
definitely see that there are places where the results differ quite
significantly from what was synthesized, and you would also note that
they are extremely consistent and repeatable. I attribute this
repeatability primarily to avoiding the algorithmic pitfalls referred to
above. 

 

The next step is that we then submit both the expected and actual
results to a neural network-based system of our own design that attempts
to calibrate the difference between expected and actual to as small a
value as we can make it over the vast majority of the error surface. In
reality, we don't just rely on the neural approach because after staring
at the underlying mechanics for so many years, we have some engineers
that are pretty good at recognizing ways to improve the error that
exceed what the neural approach can do on its own. In the end it's
iterative. let the network create a calibration scheme. study and tweak
it (i.e. tweak how the network operates) and then grind away some more.
The process is constantly revealing new insights. Most recently, we've
found some dependencies on jitter behaviors that can be further improved
by moving from static to dynamic calibration. That is. not just creating
a huge cal scheme that is built into the product when it ships, but
which also does some dynamic calibration as it runs. I would say that
the improvements attributable to that will be less felt in the sort of
repetitive short patterns that seem common now among our customers, and
will make an observable difference especially on live data and a special
impact on data that has moderate to significant ISI. It all counts. 

 

The calibration addresses something really obvious, but which is not
even considered at the spec level. It's unnecessary. any of the blessed
methods are fine under the spec. What it means for the "real engineer
trying to get parts out the door" is that if the measured number is too
high, a result from a calibrated and fully validated system is
significantly more likely to indicate a part really is doing something
undesirable rather than an over-representation by the SJD. 

 

There are other things that can be set in the SJD mechanics/process, but
which there is no rational reason to set them up one way versus another.
For example, some engineers want to see when BER blooming occurs. In
even very expensive and stable sources, BER blooming does occur and it's
more common than you might think in even well-fixtured devices, though
many SJD schemes can't see that kind of dynamicism. It's useful, I
suppose to an engineer in debug mode, but we also see engineers who want
it "blended out" which is more rational than it may seem on the surface
since the way the rare-events math reacts to the blooming (sharply at
times, mentioned above) can be distracting. So since the specs go no
where near this level of consideration yet it impacts the usability of
the tool, you have to give the choice to the operator and pick a default
state that is guaranteed to tick half of the people who use it off. Lots
of design decisions fall into this category of having to make choices
that the specs should have made for us. 

 

So. are the tools for the spec-makers or are they really for engineers
that have to ship a product? My opinion is that the spec-makers have too
much influence over what ultimately appear in the tools, and not nearly
enough skin in the game in terms of having to use what they specify.
There are more marketing titles and commercially driven interests on
these committees than you'd probably want to see, and so you get what
you get. But keep in mind, in the end, your company and your industry
sector decided to accept whatever specs you are forced to treat as the
gospel. That makes it a tool for those engineers I guess. But not the
only tool either. When they don't hit their numbers, they will need
additional tools too. SJD is in the same class as dice when it comes to
debug utility (note to self. file for patents on Rj/Dj dice). Good debug
tools (I contend that nothing can match the modulation domain for doing
effective timing debug) will also help you understand the reason your
parts are not hitting their number.. is it the tool or the device? And
that's actually where using a validated approach can save a bunch of
time as well. 

 

The industry is presently in a very strange place.. a place that is hard
to defend. We accept at face value things that provide almost boundless
reasons to pursue other choices, but revert to believing what we're
told.. it's in the spec so it must be true. The result of this being the
significant yet unnecessary daily frustrations that follow from the bad
choice. Yet.. it is incredibly uncommon to see the bad choices actually
identified and challenged as the root of why their frustrations in this
space are mounting. A mindset that we consider crucial to being able to
look at statistical jitter decomposition objectively is a refusal to
accept as unquestionable that Rj/Dj separation is correct.. and that the
analysis of timing in serial data streams is a leadership problem.. a
philosophy problem.. an epistemological problem. and most definitely a
measurement problem. But it's not a math problem. If someone's solution
involves invoking yet one more thing they learned in stochastic
processes, they're part of the problem, not part of the solution, in my
considered opinion. What we have is broken and better ways are needed.
The solution seems obvious, but also, the "they don't know what they
don't know" force is strong. I'm sure better methods will materialize
quite quickly as soon as enough people are willing to openly articulate
what the emperor is wearing. 

 

If you're interested, I've written a few papers about these matters.. "A
Discussion of Rj/Dj/.. Compliance Measurements", and another on the work
we've done over the last year w.r.t. calibrating the error surface for
each of the various jitter terms of our own Rj/Dj/.. tool set to within
2ps or 5% over the universe of stationary jitter. The second one will be
revised in a few weeks to include the expanded jitter space and dynamic
calibration but the original still covers the core of what we have done
in this area pretty well. Just send an email to "info AT amherst-systems
DOT com" and ask for the "Rj/Dj paper" and/or the "validation paper" and
someone will get them right out to you. I'm not big on email myself. 

 

Chris.. I didn't have time to write something short.. I hope after all
these words, I provides some further insight into the situation. Good
luck. 

 

Best regards,

Mike 

 

--

Mike Williams
Pres.
ASA Corp.
www.TheJitterSolution.com <http://www.thejittersolution.com/> 





------------------------------------------------------------------
To unsubscribe from si-list:
si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field

or to administer your membership from a web page, go to:
//www.freelists.org/webpage/si-list

For help:
si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field

List FAQ wiki page is located at:
                http://si-list.org/wiki/wiki.pl?Si-List_FAQ

List technical documents are available at:
                http://www.si-list.org

List archives are viewable at:     
                //www.freelists.org/archives/si-list
or at our remote archives:
                http://groups.yahoo.com/group/si-list/messages
Old (prior to June 6, 2001) list archives are viewable at:
                http://www.qsl.net/wb6tpu
  

------------------------------------------------------------------
To unsubscribe from si-list:
si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field

or to administer your membership from a web page, go to:
//www.freelists.org/webpage/si-list

For help:
si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field

List FAQ wiki page is located at:
                http://si-list.org/wiki/wiki.pl?Si-List_FAQ

List technical documents are available at:
                http://www.si-list.org

List archives are viewable at:     
                //www.freelists.org/archives/si-list
or at our remote archives:
                http://groups.yahoo.com/group/si-list/messages
Old (prior to June 6, 2001) list archives are viewable at:
                http://www.qsl.net/wb6tpu
References:
- [SI-LIST] Comments on "Do you really ship products at BER 10e-xx ?"
  - From: Mike Williams
[SI-LIST] Re: Comments on "Do you really ship products at BER 10e-xx ?"

Other related posts: