Hi Chris - Is 2Ps of Rj realistic. I mean - have we reached the theoretical limits of our harware and have to jump to light transmission on a pwb?? -----Original Message----- From: si-list-bounce@xxxxxxxxxxxxx To: 'Mike Williams '; 'si-list@xxxxxxxxxxxxx ' Sent: 5/8/2005 12:14 AM Subject: [SI-LIST] Re: Comments on "Do you really ship products at BER 10e -xx ?" Mike, Welcome to the list. It is good to hear from you and Art from the vendor point of view. I am particular intrigued by Art's reference paper on how different instruments starts to diverge under strong Dj which is what I believe is the true challenge in characterizing real running system rather than some PLL in a test fixture. I also agree that it is not the instrument vendor's fault but rather it is a spec imposed by certain committee that without care and fully understood by engineers can extrapolate to unrealistic pessimism. I learn a lot from both on and off line discussion on this subject. However, one offline comment from a good friend still haunts me. At ~2-4Gb/s, a Rj of 2ps is still manageable, but what will life be if we go into 10Gb/s without PAM @ 2ps Rj ? I don't have the answer but my suspicion is half the industry will probably look the other way and come up with another spec that conveniently spec it to work. This is like tester guardband in the 90's. Everyone insist they have to triple or quadruple tester guard band in timing equations for awhile and somehow after the turn of the millennium when cycle time getting smaller and smaller that guardband just shrunk to a negligible amount. But at certain point of time (say 10Gb/s), that 2ps of Rj (if it is truly unbounded and gaussian) I conveniently ignore may come back and bite me. And I cannot envision myself, nor my management will allow, to ship a product that flags a CRC error every 10e-12 BER. I am not talking about a BERT analyzer extrapolating a bathtub curve, I am talking about real CRC error flags every few days that the service engineer will see. But that's exactly some of these standards allow. On the other hand, if we have to make management really happy, we are really designing 10e-18 or 10e-20 like some response to the thread mention. That's a really long Rj tail ! Do we really have unbounded jitters that will grow to infinity given enough time ? Good discussions, hopefully it can continue. Chris -----Original Message----- From: Mike Williams To: si-list@xxxxxxxxxxxxx Sent: 5/5/2005 9:54 PM Subject: [SI-LIST] Comments on "Do you really ship products at BER 10e-xx ?" [SI-LIST] Do you really ship products at BER 10e-xx ? * From: Chris Cheng <Chris.Cheng@xxxxxxxxxxxx> * To: si-list@xxxxxxxxxxxxx * Date: Tue, 12 Apr 2005 13:49:09 -0700 I've been shipping Gb/s serial products for a while and have my share of fail parts. However, I have yet to see a physical channel that is not either working like a charm or just fall on its face and barfing errors like crazy. Sure, chips or disk can fail and generates errors but no flaky channels that spits an error every other hour or days. To me, the channel is either have a BER that is near 1 (barfing errors like crazy) or near 0 (never fail, or at least approaching the life of the product it is attached to). Are we just kidding ourselves with these fancy BER analyzers or jitter instruments ? Do you really let a machine runs at say BER 10e-12 and say "ah ha, it only fails once a day and let's ship it" ? Is BER really meant for IEEE spec committees and not for real engineers who actually have to ship a product ? Pasted from <//www.freelists.org/archives/si-list/04-2005/msg00131.html> Hi Chris, I'm not a regular participant in this list. A colleague forwarded your post on to me as part of an ongoing discussion we have been holding about the legitimacy (or actually lack thereof) of the thought models in popular use around understanding jitter in serial data signals, and another brought the SI list itself to my attention recently. I wrote the following a couple of weeks ago but absent-mindedly forgot to post it. Let's see if this still works. The question you pose is a great one, and it gets to the core of a matter I am very close to.. I am simultaneously a long-time harsh critic of statistical jitter decomposition (Rj/Dj/.), I also own a company that makes a widely used set of statistical jitter decomposition tools and in that capacity, I invest a large part of our R&D dollars and engineering bandwidth in expanding our understanding of this kind of analysis. While those don't sound like they go together, I find that to be effective in this situation, one has to accept that there are two contradictory yet essential philosophies in play. Regarding your questions, I agree intensely with what is implied by them.. that to a mind trained to reason as an engineer, the current approaches which engineers are being told to employ raise more questions than they answer. Statistical jitter decomposition (I'll use SJD from here to save typing) is one of many ways of abstracting the timing behavior of a signal. In the span of just a few years, it has moved from near-total obscurity among practicing engineers to being pervasive and the virtually unchallenged approach to how one must measure jitter in serial data links. SJD has quickly achieved the status of an entrenched orthodoxy despite the existence of a formidable body of reasons to question its validity even as a general approach. That list of concerns is just too expansive to give any sort of detailed treatment in a casual exchange like this. I have my own little personal taxonomy I use just to keep all the issues organized: Theoretical Issues/Concerns - which center around Gaussian/Central-Limit abuse, misuse/misunderstanding of how Nyquist applies in the modulation domain, the fact that the "standard buckets" in the Rj/Dj/. taxonomy don't always hold up even in common situations, and the appropriateness of how SJD is employed in abstracting important common pathologies with dynamics that just can't be seen or represented using that method. Practical Issues/Concerns - The "mathematical machinery" employed to grind these results out can instill large and unpredictable effects in the final numbers, the suitability of the BER "thought model", HUGE correlation issues and significant and uncontrolled (at the spec level) implementation differences that exist between one solution and another, and the fact that it's a moving target. this "stuff" is being made up as the industry goes along. Philosophical Concerns - A measurement problem being turned into a combination political-math problem, the unfortunate everyday necessity to balance concerns around methodological validity with the reality that "everything requires SJD" to get out the door, abstraction and intransparence (the "you don't know what you don't know" effect") among the implementers and spec-makers, validation of methods and the range of pitfalls built into that as well. Again, this is a long list and my intent is not to do a deep dive here but to frame up both agreement with your observation as well as explain one perspective on navigating the morass. Under the most commonly cited document that blesses these things, there are numerous platforms and algorithmic approaches listed resulting in a cross-product of perhaps 30 different ways to do it, and that figure is further elevated by the fact that numerous implementers each have their own method that purports to derive from one of those blessed combinations. Under a broad range of applied jitter types, each of those approaches differ, often dramatically from each other. Different answers. Different convergence/divergence behaviors.. Differing abilities to even see common pathologies (e.g. "BER blooming").. Different repeatabilities and accuracies. So here comes the first of many clues that illustrate the sanity rating of the current SJD mindset in the industry. THEY ALL ARE "RIGHT". As un-engineering-like as it sounds, they're all "blessed", so go ahead and pick the one that gets your parts out the door. I'm not representing that as the kind of engineering I advocate, but "the engineering cops" couldn't write you a ticket for operating in that mode. To get a little bit of the flavor of the problems built in to where SJD is today and where it seems to be headed, let me dive in deeper on just one of the points raised above. the implementation mechanics of SJD. Being in clocks and their measurement for 25 years, I have been a close observer of the SJD trend going back to when it started (I should actually say "restarted".. you can go back to at least the 50's in the active engineering literature with sort of a renaissance having taken place in that literature in the late 70's and early 80's). As noted, we have spent a great deal of time and energy studying the mechanics of the various suggested Rj/Dj/Tj/Pj/. for several years now. All of the suggested methods "work" at the level of a quickie whiteboard lecture. you can see how at the big picture level the results look like they should be what is sought. At this high level, they make sense. As you dig in deeper, you find that there is a fascinating range of "gotchas". a vast range of behaviors and effects that can have significant unanticipated impact on the final result. An entire layer of impact completely unaddressed by the spec-makers as well as the vast majority of solution providers. It's virtually not on the radar screen at all other than in a few very small pockets at some of the larger customer-side companies. Of all of the issues in the lists above, to my eye, this has proven by far to be the broadest. Some of the uncontrolled effects of SJD mechanics to which I am referring are: 1. Complex/unanticipated interactions between the signal dynamics and algorithm's mechanics 2. Convergence/divergence effects 3. Misrepresentation due to "measure-predict cycling" 4. Encountering jitter behaviors not anticipated by your SJD implementation 5. Embedded unanticipated behaviors, and the impact of implicit assumptions One example from a lecture I gave on this at one of our customers recently that seemed to illustrate it well for them was this. Consider that in many approaches, you are building up your model of overall system timing from the rarest events seen. For example, you might have many millions of events in a measurement population but the curve-fit process only applies maybe a thousand points to the actual result. The consequence is that even a small change entering in to the rare-event population (i.e. finally "seeing" something that fills in the tail a little better) can have a stark impact on the BER estimate and dynamics. The "issues and concerns" side is enormous, and while we only talked about the tips of the icebergs, you hopefully at least get the flavor. Let's shift gears now because there IS another important side to this. That is, some engineer, some place out there got to work this morning and a piece of paper told him he HAD to measure Rj/Dj/.. on his part in order to get it out the door. His reality. many engineer's reality.. is that SJD as it is now imagined is a very real part of life in the lab. For the foreseeable future, that's all that really matters in their orbit. As a resource on timing and timing measurement for our clients and partners, we can and do try to refine the "why" behind our criticism of the appropriateness of SJD as an abstraction of the realistic kinds of jitters one can reasonably encounter. However, as a solution provider, we have to start with a different philosophy.. that it's accepted by the industry, and that we have to provide a product that addresses as many of the gotchas as possible. Your question asks whether the products are meant for real engineers trying to ship a product, so here's one provider's view of that. I feel that IF you are going to go down the SJD path that it IS possible to provide a method that can deliver numbers that are accurate and repeatable UNDER THE Rj/Dj/.. BELIEF SYSTEM.. without pain.. IF it is used properly. Our own approach is as follows, and I'll be brief and try to remain at a general product nonspecific level. We have studied, modeled and analyzed the suggested approaches for several years as a primary focus of our everyday engineering work. In this work, we have worked with outside experts in rare-events prediction as well as one of the individuals credited with having developed the mathematics that underlie decomposition in general back in the 70's. I've actually known him since that time but only discovered a few years ago that side of his career. Small world. We have identified a significant range of effects/mechanics built in to the suggested methods that will to a certainty cause problems. We have used that knowledge to craft an independent approach that steers clear of the known issues (e.g. accuracy, convergence, repeatability and stability effects). In concert with that, we built up a synthesizer that can create all of the various kinds of stationary jitter one can possibly expect to encounter. the universe of stationary jitter under the standard SJD thought model. This synthesis system originally broke that universe down into 15,000 regions but the current model, which pushes the edges of that universe out a bit further, breaks it down into just shy of 10 million regions. We use the synthesizer to push all 10 million flavors of jitter through an unmodified version of decomposition method we fashioned. To look at the results that emerge from that, you would definitely see that there are places where the results differ quite significantly from what was synthesized, and you would also note that they are extremely consistent and repeatable. I attribute this repeatability primarily to avoiding the algorithmic pitfalls referred to above. The next step is that we then submit both the expected and actual results to a neural network-based system of our own design that attempts to calibrate the difference between expected and actual to as small a value as we can make it over the vast majority of the error surface. In reality, we don't just rely on the neural approach because after staring at the underlying mechanics for so many years, we have some engineers that are pretty good at recognizing ways to improve the error that exceed what the neural approach can do on its own. In the end it's iterative. let the network create a calibration scheme. study and tweak it (i.e. tweak how the network operates) and then grind away some more. The process is constantly revealing new insights. Most recently, we've found some dependencies on jitter behaviors that can be further improved by moving from static to dynamic calibration. That is. not just creating a huge cal scheme that is built into the product when it ships, but which also does some dynamic calibration as it runs. I would say that the improvements attributable to that will be less felt in the sort of repetitive short patterns that seem common now among our customers, and will make an observable difference especially on live data and a special impact on data that has moderate to significant ISI. It all counts. The calibration addresses something really obvious, but which is not even considered at the spec level. It's unnecessary. any of the blessed methods are fine under the spec. What it means for the "real engineer trying to get parts out the door" is that if the measured number is too high, a result from a calibrated and fully validated system is significantly more likely to indicate a part really is doing something undesirable rather than an over-representation by the SJD. There are other things that can be set in the SJD mechanics/process, but which there is no rational reason to set them up one way versus another. For example, some engineers want to see when BER blooming occurs. In even very expensive and stable sources, BER blooming does occur and it's more common than you might think in even well-fixtured devices, though many SJD schemes can't see that kind of dynamicism. It's useful, I suppose to an engineer in debug mode, but we also see engineers who want it "blended out" which is more rational than it may seem on the surface since the way the rare-events math reacts to the blooming (sharply at times, mentioned above) can be distracting. So since the specs go no where near this level of consideration yet it impacts the usability of the tool, you have to give the choice to the operator and pick a default state that is guaranteed to tick half of the people who use it off. Lots of design decisions fall into this category of having to make choices that the specs should have made for us. So. are the tools for the spec-makers or are they really for engineers that have to ship a product? My opinion is that the spec-makers have too much influence over what ultimately appear in the tools, and not nearly enough skin in the game in terms of having to use what they specify. There are more marketing titles and commercially driven interests on these committees than you'd probably want to see, and so you get what you get. But keep in mind, in the end, your company and your industry sector decided to accept whatever specs you are forced to treat as the gospel. That makes it a tool for those engineers I guess. But not the only tool either. When they don't hit their numbers, they will need additional tools too. SJD is in the same class as dice when it comes to debug utility (note to self. file for patents on Rj/Dj dice). Good debug tools (I contend that nothing can match the modulation domain for doing effective timing debug) will also help you understand the reason your parts are not hitting their number.. is it the tool or the device? And that's actually where using a validated approach can save a bunch of time as well. The industry is presently in a very strange place.. a place that is hard to defend. We accept at face value things that provide almost boundless reasons to pursue other choices, but revert to believing what we're told.. it's in the spec so it must be true. The result of this being the significant yet unnecessary daily frustrations that follow from the bad choice. Yet.. it is incredibly uncommon to see the bad choices actually identified and challenged as the root of why their frustrations in this space are mounting. A mindset that we consider crucial to being able to look at statistical jitter decomposition objectively is a refusal to accept as unquestionable that Rj/Dj separation is correct.. and that the analysis of timing in serial data streams is a leadership problem.. a philosophy problem.. an epistemological problem. and most definitely a measurement problem. But it's not a math problem. If someone's solution involves invoking yet one more thing they learned in stochastic processes, they're part of the problem, not part of the solution, in my considered opinion. What we have is broken and better ways are needed. The solution seems obvious, but also, the "they don't know what they don't know" force is strong. I'm sure better methods will materialize quite quickly as soon as enough people are willing to openly articulate what the emperor is wearing. If you're interested, I've written a few papers about these matters.. "A Discussion of Rj/Dj/.. Compliance Measurements", and another on the work we've done over the last year w.r.t. calibrating the error surface for each of the various jitter terms of our own Rj/Dj/.. tool set to within 2ps or 5% over the universe of stationary jitter. The second one will be revised in a few weeks to include the expanded jitter space and dynamic calibration but the original still covers the core of what we have done in this area pretty well. Just send an email to "info AT amherst-systems DOT com" and ask for the "Rj/Dj paper" and/or the "validation paper" and someone will get them right out to you. I'm not big on email myself. Chris.. I didn't have time to write something short.. I hope after all these words, I provides some further insight into the situation. Good luck. Best regards, Mike -- Mike Williams Pres. ASA Corp. www.TheJitterSolution.com <http://www.thejittersolution.com/> ------------------------------------------------------------------ To unsubscribe from si-list: si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field or to administer your membership from a web page, go to: //www.freelists.org/webpage/si-list For help: si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field List FAQ wiki page is located at: http://si-list.org/wiki/wiki.pl?Si-List_FAQ List technical documents are available at: http://www.si-list.org List archives are viewable at: //www.freelists.org/archives/si-list or at our remote archives: http://groups.yahoo.com/group/si-list/messages Old (prior to June 6, 2001) list archives are viewable at: http://www.qsl.net/wb6tpu ------------------------------------------------------------------ To unsubscribe from si-list: si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field or to administer your membership from a web page, go to: //www.freelists.org/webpage/si-list For help: si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field List FAQ wiki page is located at: http://si-list.org/wiki/wiki.pl?Si-List_FAQ List technical documents are available at: http://www.si-list.org List archives are viewable at: //www.freelists.org/archives/si-list or at our remote archives: http://groups.yahoo.com/group/si-list/messages Old (prior to June 6, 2001) list archives are viewable at: http://www.qsl.net/wb6tpu ------------------------------------------------------------------ To unsubscribe from si-list: si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field or to administer your membership from a web page, go to: //www.freelists.org/webpage/si-list For help: si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field List FAQ wiki page is located at: http://si-list.org/wiki/wiki.pl?Si-List_FAQ List technical documents are available at: http://www.si-list.org List archives are viewable at: //www.freelists.org/archives/si-list or at our remote archives: http://groups.yahoo.com/group/si-list/messages Old (prior to June 6, 2001) list archives are viewable at: http://www.qsl.net/wb6tpu