
|
[si-list]
||
[Date Prev]
[05-2005 Date Index]
[Date Next]
||
[Thread Prev]
[05-2005 Thread Index]
[Thread Next]
[SI-LIST] Re: Comments on "Do you really ship products at BER 10e -xx ?"
- From: Istvan Novak <istvan.novak@xxxxxxx>
- To: Chris.Cheng@xxxxxxxxxxxx
- Date: Sun, 08 May 2005 10:11:57 -0400
Chris,
As you always said, there is no free lunch, so in my view the solution
will be to accept a little more latency and overhead in return of more
robust error detection and/or error correction.
Errors are facts of life, instead of trying to reduce it to 'zero', it
is more
economical to cope with it so that it wont crash the system. As you refer
to the tester guard band issue in the 90's, it was the same time period
when many system designers strongly opposed the data coding and
spread spectrum clocks citing unacceptable price in overhead. True
that these werent an absolute must in the 90's, but they become so
nowadays. The next bullet to bite is to accept more latency and
overhead, and step up error detection and correction. Once this happens,
the exact nature of jitter distribution wont matter much; we just have
to make sure that jitter values that cause a (correctable) error are
rare enough that the performance penalty is acceptable.
Regards,
Istvan Novak
SUN Microsystems
Chris Cheng wrote:
>Mike,
>Welcome to the list.
>It is good to hear from you and Art from the vendor point of view. I am
>particular intrigued by Art's reference paper on how different instruments
>starts to diverge under strong Dj which is what I believe is the true
>challenge in characterizing real running system rather than some PLL in a
>test fixture.
>I also agree that it is not the instrument vendor's fault but rather it is a
>spec imposed by certain committee that without care and fully understood by
>engineers can extrapolate to unrealistic pessimism. I learn a lot from both
>on and off line discussion on this subject.
>However, one offline comment from a good friend still haunts me. At
>~2-4Gb/s, a Rj of 2ps is still manageable, but what will life be if we go
>into 10Gb/s without PAM @ 2ps Rj ? I don't have the answer but my suspicion
>is half the industry will probably look the other way and come up with
>another spec that conveniently spec it to work. This is like tester
>guardband in the 90's. Everyone insist they have to triple or quadruple
>tester guard band in timing equations for awhile and somehow after the turn
>of the millennium when cycle time getting smaller and smaller that guardband
>just shrunk to a negligible amount.
>But at certain point of time (say 10Gb/s), that 2ps of Rj (if it is truly
>unbounded and gaussian) I conveniently ignore may come back and bite me. And
>I cannot envision myself, nor my management will allow, to ship a product
>that flags a CRC error every 10e-12 BER. I am not talking about a BERT
>analyzer extrapolating a bathtub curve, I am talking about real CRC error
>flags every few days that the service engineer will see. But that's exactly
>some of these standards allow. On the other hand, if we have to make
>management really happy, we are really designing 10e-18 or 10e-20 like some
>response to the thread mention. That's a really long Rj tail ! Do we really
>have unbounded jitters that will grow to infinity given enough time ?
>
>Good discussions, hopefully it can continue.
>Chris
>
>
>-----Original Message-----
>From: Mike Williams
>To: si-list@xxxxxxxxxxxxx
>Sent: 5/5/2005 9:54 PM
>Subject: [SI-LIST] Comments on "Do you really ship products at BER 10e-xx ?"
>
>
>[SI-LIST] Do you really ship products at BER 10e-xx ?
>
>* From: Chris Cheng <Chris.Cheng@xxxxxxxxxxxx>
>* To: si-list@xxxxxxxxxxxxx
>* Date: Tue, 12 Apr 2005 13:49:09 -0700
>
>I've been shipping Gb/s serial products for a while and have my share of
>fail parts. However, I have yet to see a physical channel that is not
>either
>working like a charm or just fall on its face and barfing errors like
>crazy.
>Sure, chips or disk can fail and generates errors but no flaky channels
>that
>spits an error every other hour or days. To me, the channel is either
>have a
>BER that is near 1 (barfing errors like crazy) or near 0 (never fail, or
>at
>least approaching the life of the product it is attached to). Are we
>just
>kidding ourselves with these fancy BER analyzers or jitter
>instruments ? Do you really let a machine runs at say BER 10e-12 and say
>"ah
>ha, it only fails once a day and let's ship it" ? Is BER really meant
>for
>IEEE spec committees and not for real engineers who actually have to
>ship a
>product ?
>
>
>
>
>Pasted from
><http://www.freelists.org/archives/si-list/04-2005/msg00131.html>
>
>
>
>
>
>
>
>
>
>Hi Chris,
>
>
>
>I'm not a regular participant in this list. A colleague forwarded your
>post
>on to me as part of an ongoing discussion we have been holding about the
>legitimacy (or actually lack thereof) of the thought models in popular
>use
>around understanding jitter in serial data signals, and another brought
>the
>SI list itself to my attention recently. I wrote the following a couple
>of
>weeks ago but absent-mindedly forgot to post it. Let's see if this still
>works.
>
>
>
>The question you pose is a great one, and it gets to the core of a
>matter I
>am very close to.. I am simultaneously a long-time harsh critic of
>statistical jitter decomposition (Rj/Dj/.), I also own a company that
>makes
>a widely used set of statistical jitter decomposition tools and in that
>capacity, I invest a large part of our R&D dollars and engineering
>bandwidth
>in expanding our understanding of this kind of analysis. While those
>don't
>sound like they go together, I find that to be effective in this
>situation,
>one has to accept that there are two contradictory yet essential
>philosophies in play.
>
>
>
>Regarding your questions, I agree intensely with what is implied by
>them..
>that to a mind trained to reason as an engineer, the current approaches
>which engineers are being told to employ raise more questions than they
>answer. Statistical jitter decomposition (I'll use SJD from here to save
>typing) is one of many ways of abstracting the timing behavior of a
>signal.
>In the span of just a few years, it has moved from near-total obscurity
>among practicing engineers to being pervasive and the virtually
>unchallenged
>approach to how one must measure jitter in serial data links. SJD has
>quickly achieved the status of an entrenched orthodoxy despite the
>existence
>of a formidable body of reasons to question its validity even as a
>general
>approach.
>
>
>
>That list of concerns is just too expansive to give any sort of detailed
>treatment in a casual exchange like this. I have my own little personal
>taxonomy I use just to keep all the issues organized:
>
>
>
>Theoretical Issues/Concerns - which center around Gaussian/Central-Limit
>abuse, misuse/misunderstanding of how Nyquist applies in the modulation
>domain, the fact that the "standard buckets" in the Rj/Dj/. taxonomy
>don't
>always hold up even in common situations, and the appropriateness of how
>SJD
>is employed in abstracting important common pathologies with dynamics
>that
>just can't be seen or represented using that method.
>
>
>
>Practical Issues/Concerns - The "mathematical machinery" employed to
>grind
>these results out can instill large and unpredictable effects in the
>final
>numbers, the suitability of the BER "thought model", HUGE correlation
>issues
>and significant and uncontrolled (at the spec level) implementation
>differences that exist between one solution and another, and the fact
>that
>it's a moving target. this "stuff" is being made up as the industry goes
>along.
>
>
>
>Philosophical Concerns - A measurement problem being turned into a
>combination political-math problem, the unfortunate everyday necessity
>to
>balance concerns around methodological validity with the reality that
>"everything requires SJD" to get out the door, abstraction and
>intransparence (the "you don't know what you don't know" effect") among
>the
>implementers and spec-makers, validation of methods and the range of
>pitfalls built into that as well.
>
>
>
>
>
>Again, this is a long list and my intent is not to do a deep dive here
>but
>to frame up both agreement with your observation as well as explain one
>perspective on navigating the morass. Under the most commonly cited
>document
>that blesses these things, there are numerous platforms and algorithmic
>approaches listed resulting in a cross-product of perhaps 30 different
>ways
>to do it, and that figure is further elevated by the fact that numerous
>implementers each have their own method that purports to derive from one
>of
>those blessed combinations. Under a broad range of applied jitter types,
>each of those approaches differ, often dramatically from each other.
>Different answers. Different convergence/divergence behaviors..
>Differing
>abilities to even see common pathologies (e.g. "BER blooming")..
>Different
>repeatabilities and accuracies. So here comes the first of many clues
>that
>illustrate the sanity rating of the current SJD mindset in the industry.
>THEY ALL ARE "RIGHT". As un-engineering-like as it sounds, they're all
>"blessed", so go ahead and pick the one that gets your parts out the
>door.
>I'm not representing that as the kind of engineering I advocate, but
>"the
>engineering cops" couldn't write you a ticket for operating in that
>mode.
>
>
>
>To get a little bit of the flavor of the problems built in to where SJD
>is
>today and where it seems to be headed, let me dive in deeper on just one
>of
>the points raised above. the implementation mechanics of SJD. Being in
>clocks and their measurement for 25 years, I have been a close observer
>of
>the SJD trend going back to when it started (I should actually say
>"restarted".. you can go back to at least the 50's in the active
>engineering
>literature with sort of a renaissance having taken place in that
>literature
>in the late 70's and early 80's). As noted, we have spent a great deal
>of
>time and energy studying the mechanics of the various suggested
>Rj/Dj/Tj/Pj/. for several years now. All of the suggested methods "work"
>at
>the level of a quickie whiteboard lecture. you can see how at the big
>picture level the results look like they should be what is sought. At
>this
>high level, they make sense. As you dig in deeper, you find that there
>is a
>fascinating range of "gotchas". a vast range of behaviors and effects
>that
>can have significant unanticipated impact on the final result. An entire
>layer of impact completely unaddressed by the spec-makers as well as the
>vast majority of solution providers. It's virtually not on the radar
>screen
>at all other than in a few very small pockets at some of the larger
>customer-side companies. Of all of the issues in the lists above, to my
>eye,
>this has proven by far to be the broadest. Some of the uncontrolled
>effects
>of SJD mechanics to which I am referring are:
>
>
>
>1. Complex/unanticipated interactions between the signal dynamics
>and
>algorithm's mechanics
>2. Convergence/divergence effects
>3. Misrepresentation due to "measure-predict cycling"
>4. Encountering jitter behaviors not anticipated by your SJD
>implementation
>5. Embedded unanticipated behaviors, and the impact of implicit
>assumptions
>
>
>
>One example from a lecture I gave on this at one of our customers
>recently
>that seemed to illustrate it well for them was this. Consider that in
>many
>approaches, you are building up your model of overall system timing from
>the
>rarest events seen. For example, you might have many millions of events
>in a
>measurement population but the curve-fit process only applies maybe a
>thousand points to the actual result. The consequence is that even a
>small
>change entering in to the rare-event population (i.e. finally "seeing"
>something that fills in the tail a little better) can have a stark
>impact on
>the BER estimate and dynamics.
>
>
>
>
>
>The "issues and concerns" side is enormous, and while we only talked
>about
>the tips of the icebergs, you hopefully at least get the flavor. Let's
>shift
>gears now because there IS another important side to this. That is, some
>engineer, some place out there got to work this morning and a piece of
>paper
>told him he HAD to measure Rj/Dj/.. on his part in order to get it out
>the
>door. His reality. many engineer's reality.. is that SJD as it is now
>imagined is a very real part of life in the lab. For the foreseeable
>future,
>that's all that really matters in their orbit.
>
>
>
>As a resource on timing and timing measurement for our clients and
>partners,
>we can and do try to refine the "why" behind our criticism of the
>appropriateness of SJD as an abstraction of the realistic kinds of
>jitters
>one can reasonably encounter. However, as a solution provider, we have
>to
>start with a different philosophy.. that it's accepted by the industry,
>and
>that we have to provide a product that addresses as many of the gotchas
>as
>possible. Your question asks whether the products are meant for real
>engineers trying to ship a product, so here's one provider's view of
>that. I
>feel that IF you are going to go down the SJD path that it IS possible
>to
>provide a method that can deliver numbers that are accurate and
>repeatable
>UNDER THE Rj/Dj/.. BELIEF SYSTEM.. without pain.. IF it is used
>properly.
>
>
>
>Our own approach is as follows, and I'll be brief and try to remain at a
>general product nonspecific level. We have studied, modeled and analyzed
>the
>suggested approaches for several years as a primary focus of our
>everyday
>engineering work. In this work, we have worked with outside experts in
>rare-events prediction as well as one of the individuals credited with
>having developed the mathematics that underlie decomposition in general
>back
>in the 70's. I've actually known him since that time but only discovered
>a
>few years ago that side of his career. Small world. We have identified a
>significant range of effects/mechanics built in to the suggested methods
>that will to a certainty cause problems. We have used that knowledge to
>craft an independent approach that steers clear of the known issues
>(e.g.
>accuracy, convergence, repeatability and stability effects).
>
>
>
>In concert with that, we built up a synthesizer that can create all of
>the
>various kinds of stationary jitter one can possibly expect to encounter.
>the
>universe of stationary jitter under the standard SJD thought model. This
>synthesis system originally broke that universe down into 15,000 regions
>but
>the current model, which pushes the edges of that universe out a bit
>further, breaks it down into just shy of 10 million regions. We use the
>synthesizer to push all 10 million flavors of jitter through an
>unmodified
>version of decomposition method we fashioned. To look at the results
>that
>emerge from that, you would definitely see that there are places where
>the
>results differ quite significantly from what was synthesized, and you
>would
>also note that they are extremely consistent and repeatable. I attribute
>this repeatability primarily to avoiding the algorithmic pitfalls
>referred
>to above.
>
>
>
>The next step is that we then submit both the expected and actual
>results to
>a neural network-based system of our own design that attempts to
>calibrate
>the difference between expected and actual to as small a value as we can
>make it over the vast majority of the error surface. In reality, we
>don't
>just rely on the neural approach because after staring at the underlying
>mechanics for so many years, we have some engineers that are pretty good
>at
>recognizing ways to improve the error that exceed what the neural
>approach
>can do on its own. In the end it's iterative. let the network create a
>calibration scheme. study and tweak it (i.e. tweak how the network
>operates)
>and then grind away some more. The process is constantly revealing new
>insights. Most recently, we've found some dependencies on jitter
>behaviors
>that can be further improved by moving from static to dynamic
>calibration.
>That is. not just creating a huge cal scheme that is built into the
>product
>when it ships, but which also does some dynamic calibration as it runs.
>I
>would say that the improvements attributable to that will be less felt
>in
>the sort of repetitive short patterns that seem common now among our
>customers, and will make an observable difference especially on live
>data
>and a special impact on data that has moderate to significant ISI. It
>all
>counts.
>
>
>
>The calibration addresses something really obvious, but which is not
>even
>considered at the spec level. It's unnecessary. any of the blessed
>methods
>are fine under the spec. What it means for the "real engineer trying to
>get
>parts out the door" is that if the measured number is too high, a result
>from a calibrated and fully validated system is significantly more
>likely to
>indicate a part really is doing something undesirable rather than an
>over-representation by the SJD.
>
>
>
>There are other things that can be set in the SJD mechanics/process, but
>which there is no rational reason to set them up one way versus another.
>For
>example, some engineers want to see when BER blooming occurs. In even
>very
>expensive and stable sources, BER blooming does occur and it's more
>common
>than you might think in even well-fixtured devices, though many SJD
>schemes
>can't see that kind of dynamicism. It's useful, I suppose to an engineer
>in
>debug mode, but we also see engineers who want it "blended out" which is
>more rational than it may seem on the surface since the way the
>rare-events
>math reacts to the blooming (sharply at times, mentioned above) can be
>distracting. So since the specs go no where near this level of
>consideration
>yet it impacts the usability of the tool, you have to give the choice to
>the
>operator and pick a default state that is guaranteed to tick half of the
>people who use it off. Lots of design decisions fall into this category
>of
>having to make choices that the specs should have made for us.
>
>
>
>So. are the tools for the spec-makers or are they really for engineers
>that
>have to ship a product? My opinion is that the spec-makers have too much
>influence over what ultimately appear in the tools, and not nearly
>enough
>skin in the game in terms of having to use what they specify. There are
>more
>marketing titles and commercially driven interests on these committees
>than
>you'd probably want to see, and so you get what you get. But keep in
>mind,
>in the end, your company and your industry sector decided to accept
>whatever
>specs you are forced to treat as the gospel. That makes it a tool for
>those
>engineers I guess. But not the only tool either. When they don't hit
>their
>numbers, they will need additional tools too. SJD is in the same class
>as
>dice when it comes to debug utility (note to self. file for patents on
>Rj/Dj
>dice). Good debug tools (I contend that nothing can match the modulation
>domain for doing effective timing debug) will also help you understand
>the
>reason your parts are not hitting their number.. is it the tool or the
>device? And that's actually where using a validated approach can save a
>bunch of time as well.
>
>
>
>The industry is presently in a very strange place.. a place that is hard
>to
>defend. We accept at face value things that provide almost boundless
>reasons
>to pursue other choices, but revert to believing what we're told.. it's
>in
>the spec so it must be true. The result of this being the significant
>yet
>unnecessary daily frustrations that follow from the bad choice. Yet.. it
>is
>incredibly uncommon to see the bad choices actually identified and
>challenged as the root of why their frustrations in this space are
>mounting.
>A mindset that we consider crucial to being able to look at statistical
>jitter decomposition objectively is a refusal to accept as
>unquestionable
>that Rj/Dj separation is correct.. and that the analysis of timing in
>serial
>data streams is a leadership problem.. a philosophy problem.. an
>epistemological problem. and most definitely a measurement problem. But
>it's
>not a math problem. If someone's solution involves invoking yet one more
>thing they learned in stochastic processes, they're part of the problem,
>not
>part of the solution, in my considered opinion. What we have is broken
>and
>better ways are needed. The solution seems obvious, but also, the "they
>don't know what they don't know" force is strong. I'm sure better
>methods
>will materialize quite quickly as soon as enough people are willing to
>openly articulate what the emperor is wearing.
>
>
>
>If you're interested, I've written a few papers about these matters.. "A
>Discussion of Rj/Dj/.. Compliance Measurements", and another on the work
>we've done over the last year w.r.t. calibrating the error surface for
>each
>of the various jitter terms of our own Rj/Dj/.. tool set to within 2ps
>or 5%
>over the universe of stationary jitter. The second one will be revised
>in a
>few weeks to include the expanded jitter space and dynamic calibration
>but
>the original still covers the core of what we have done in this area
>pretty
>well. Just send an email to "info AT amherst-systems DOT com" and ask
>for
>the "Rj/Dj paper" and/or the "validation paper" and someone will get
>them
>right out to you. I'm not big on email myself.
>
>
>
>Chris.. I didn't have time to write something short.. I hope after all
>these
>words, I provides some further insight into the situation. Good luck.
>
>
>
>Best regards,
>
>Mike
>
>
>
>--
>
>Mike Williams
>Pres.
>ASA Corp.
>www.TheJitterSolution.com <http://www.thejittersolution.com/>
>
>
>
------------------------------------------------------------------
To unsubscribe from si-list:
si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field
or to administer your membership from a web page, go to:
http://www.freelists.org/webpage/si-list
For help:
si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field
List FAQ wiki page is located at:
http://si-list.org/wiki/wiki.pl?Si-List_FAQ
List technical documents are available at:
http://www.si-list.org
List archives are viewable at:
http://www.freelists.org/archives/si-list
or at our remote archives:
http://groups.yahoo.com/group/si-list/messages
Old (prior to June 6, 2001) list archives are viewable at:
http://www.qsl.net/wb6tpu
|

|