[SI-LIST] Re: Do you really ship products at BER 10e-xx ?

  • From: Paul Levin <levinpa@xxxxxxxxxxxxx>
  • To: gtang@xxxxxxxx
  • Date: Wed, 13 Apr 2005 18:07:15 -0700

Dear George,

I don't know what "most" you work on, but the gigabit *storage* interfaces,
e.g., Fibre Channel, SATA and SAS, do not have built-in error correction, at
least not at the lower levels. Yet storage arrays are routinely built to
error rates approaching "none."

Consider a  rack of eight shelves of 16 enterprise (i.e., dual I/O) drives.
With each link humming along at 2.125 GBaud (a Fibre Channel rate that has
been in mass production for years now,) the aggregate data rate is just
over 1.1e12 bits per second. Obviously we couldn't tolerate the Fibre 
Channel
design BER of 1e-12. The reason we haven't changed the number is that 
testing
to anything lower would kill in terms of time and equipment cost.

We routinely experience error-free weeks, each of which has seen nearly 7e17
bits transmitted successfully.

Despite this, I am a firm believer in the existence of random noise and its
low-level effects on error rate. While there may be a lot of deterministic
error mechanisms, and while they are undoubtedly the ones that account for
most massive failures, there are really some truly random events that will
affect performance a very low levels.

If your link is 100 ohms differential, then at 300 degK, you have a noise
source of 1.26nV/root-Hz. Over 2 GHz, that scales up to 57 micro-volts rms.
The above figures are for a 0 dB noise figure. I would bet that most SERDES
receivers are closer to 10 dB, bringing the rms noise over 2 GHz to 179uV.
Fourteen sigma (roughly 1e-12) brings this to 2.5 mV at that error level.

Even if you power your RefClk oscillator from a battery and added perfect
shielding, the fact that the circuit isn't operating at absolute zero
temperature means that there will be some random jitter. This random jitter
will be compounded by the frequency multiplying PLL in the SERDES chip,
even if it, too, were powered by a battery and shielded perfectly. Why do
I bring up jitter? For two reasons:
1) Because you could get a bit error by failing setup and hold; and
2) Because it is yet another source of vertical eye closure. Unless 
dV/dt=0,
jitter contributes vertical noise = jitter/(dV/dt).

I hope that these observations help.

Regards,

Paul

P.S., Years ago, an engineer at a supercomputer company told me that he was
looking for a 1 Gb link with BER 1e-19, i.e, it would fail after thirty 
years!
________________

George Tang wrote:

>Most gigabit systems out there today have error correction algorithms
>build-in that can correct either single-bit error or low bit errors.  To see
>the actual errors caused by the Gaussian distribution of RJ, you need to
>disable the error correction circuits.  Fortunately, I work at a SerDes
>company/department, and I have access to the low-level circuits of the
>transmitters and receivers.  BER in the range of 1e-10 to 1e-16 is a real
>and repeatable (thus measurable) phenomenon.  Some people believe that this
>type of error is totally drown by the channel reflection, cross-talk, DJ,
>power supply switching noise, power/ground bounce...etc.  I very much agree.
>These factors typically contribute errors in the 1e-3 to 1e-8 range.  If you
>drive your transmitter/ receiver with a fixed data pattern, say PRBS7, which
>has 127 bits, and you have errors due to cross-talk or channel reflection of
>a certain bit pattern, you will see this error repeats every 127 bits.  And
>power supplies that switch at 400kHz will cause periodic jitter of the PLL
>and result in BER of some number *much* higher than 1e-12.  The same is true
>for ISI due to very poor channel design.  Anything that is caused by the
>deterministic factors (data pattern, channel cross talk, power noise)
>repeats at a rate much much more frequent than once every 1e12 bits.  But if
>you solve all these problems to the extent that there is sufficient
>eye-opening inside the receiver, then you are dealing with errors caused by
>the second-order effects, mainly the RJ from TX PLL, RX PLL / CDR circuits.
>Different receiver CDR circuits uses different algorithms to find the center
>of the eye at the same time keeping track of the low frequency drifts or
>frequency offset (ie. 100PPM) between the TX PLL and RX PLL.  These
>algorithms works almost all the time, ALMOST.  It has an error distribution
>of Gaussian shape.  The TX PLL and RX PLL RJ distributions are also
>Gaussian.  You can display the eye diagram of the signal at the point right
>before it goes into the receiver and see that there is sufficient vertical
>eye opening (receiver sensitivity) and horizontal eye opening (receiver
>jitter tolerance).  Then you feed that signal into the receiver.  What you
>may find is that you get BER of 1e-11 or 1e-12.  It's working, almost but
>not quite.  You can tweak the PLL and CDR parameters a little bit, and the
>errors just go away.  You may think that was just coincidence, so you set
>the parameters back to what they were.  Sure enough, the errors appear
>again.  They are as repeatable as hitting the table with your fist and
>seeing your fist not penetrating the table.  Another way to induce or remove
>BER is to adjust the pre-emphasis settings to allow more ISI into the
>receiver to further close down the eye.  You can observe that certain
>pre-emphasis setting results in a certain BER rate.  That is also very
>repeatable.  There are times that you would like to open the eye just a
>little more to improve the BER from 1e-12 to 1e-13, but you are bounded by
>the laws of physics and every test returns a BER of 1e-12.  This probability
>function is as real as life itself.  I would say BER down to 1e-15 is
>meaningful.  Beyond that, you should take it with a grain of salt, because
>something else will cause an error before this link.  BER in the 1e-2x is
>meaningless since I probably won't live long enough to see the test
>complete.  Simulations can project BER to some degree and they are good for
>merit comparisons, but companies that measure and demonstrate BER are the
>ones that provide the true performance.
>
>
>George
>
>LSI Logic
>
>
>-----Original Message-----
>From: si-list-bounce@xxxxxxxxxxxxx
>[mailto:si-list-bounce@xxxxxxxxxxxxx]On Behalf Of Chris Cheng
>Sent: Wednesday, April 13, 2005 2:12 PM
>Cc: si-list@xxxxxxxxxxxxx
>Subject: [SI-LIST] Re: Do you really ship products at BER 10e-xx ?
>
>
>Al and Tom,
>
>Since both of your response to me are similar so I will just answer one but
>the point should be the same.
>
>Before we go further, let's also follow Ed's suggestion and not rat hole
>this to a DJ/RJ debate.
>
>I've play Cal Lottery long enough to know my early retirement plan is not a
>probability function based on lottery. Neither can my system design.
>
>At the end of the day, I believe most of the phenomenons you mention below
>are either predictable/bounded or the probability distribution is so small
>that it will be dwarfed by the predictable noises.
>
>
>  
>
>>Any real system, with a real transmitter generating some very small
>>finite amount of Random Jitter, RJ, cannot operate "error free".  It is
>>an issue of probabilities.  By definition, RJ is unbounded, therefore
>>there is always some probability of a failure.
>>    
>>
>
>Definition is arbitrary, you can claim RJ is unbounded but we need a real
>example to show why a real system phenomenon is unbounded. Just because the
>spec says so is meaningless.
>
>  
>
>>The jitter issue, specifically regarding serial links such as
>>backplanes, can be broken down into 4 primary categories:
>>    
>>
>
>  
>
>>1.  RJ generated from the transmitter - due to Transmitters VCO and
>>Reference clock jitter transfer
>>    
>>
>
>I have done enough PLL analysis and testing in a digital environment to
>convince myself the jitter component induced by the supply noise dwarf any
>reference clock jitter transfer. And the supply noise induced jitter is a
>very predictable phenomenon that is clearly not unbounded. You can both
>simulate and characterize the behavior (BUTT). One can argue about whether
>the supply noise can be predicted but with proper filtering and power
>distribution, they can be limited to guarantee the jitter will not exceed
>certain limit (once again, a bounded limit).
>
>  
>
>>2.  Deterministic jitter (DJ) due to Transmitter - Duty cycle distortion
>>and Intersymbol interference, periodic jitter due to power supply and
>>plane resonance
>>    
>>
>
>Once again, can be simulated, measured and bounded.
>
>  
>
>>3.  DJ due to the physical link - losses in the system (resonance, skin,
>>dielectric), impedance mismatches, crosstalk, resonances
>>    
>>
>
>Ditto.
>
>  
>
>>4.  Tolerance of the Receivers - BER measured with combinations of RJ,
>>DJ and swept PJ (T11.2 Annex A)
>>    
>>
>
>If the above phenomenon's are bounded, it will becomes a simple whether you
>make your setup/hold time or not. Nothing undeterministic about it.
>
>  
>
>>Of course, if a good transmitter and a well designed link and a receiver
>>with significant tolerance is incorporated into the design, the actual
>>BER will appear to be perfect, and it may be directly impractical to
>>measure.  In this case, it may be necessary to add jitter to see how the
>>system tolerates it with respect to the receivers tolerance.  A system
>>with low RJ and significant DJ, with steep bathtub curves will not start
>>to have a moderate 1E-8 BER type of problem, it will probably have
>>catastrophic loss of lock and BER problems.  Chris, Andy I think this is
>>the behavior you were describing, no?
>>    
>>
>
>May be, but it sure sounds like some instrument company or spec committee
>try to push some 5 sigma spec down my throat and say "ah ha, even though
>your measured jitter is blah blah blah and your system is working, but 5x
>sigma later you are doomed so you need our help..."
>
>  
>
>>I would pose an interesting question for Chris - if his particular
>>system has 1ps RMS more jitter on the REFCLK for a 3.125Gbpsec
>>transmitter (if it had 1psec RMS initially, it now has 1.414psec RMS
>>now), would it still meet BER performance for the full link?  What is
>>your confidence it still works?  How much BER testing would be required?
>>How well is his oscillator vendors testing their product for jitter and
>>phase noise?
>>    
>>
>
>I would say I don't care because I believe the jitter induced by supply
>noise will dwarf that input reference transfer. 1 or 2ps jitter is NOTHING
>compared with the jitter induced by supply noise at the right frequency.
>
>  
>
>>How about 30mV more peak-peak switching noise at 400kHz - how tolerant
>>are the PLL's from losing lock, multiply the higher freq components and
>>creating a serios PJ problem, how would this impact the Receiver
>>tolerance - would the system still work, would you now have occasional
>>failure?
>>    
>>
>
>Now that's an interesting thought and I believe where most parts can fail.
>But is that an unbounded phenomenon ? I don't think so. Afterall, the same
>30mV that will hit the PLL supply at say 100MHz will probably never fail the
>system no matter how long you wait. The behavior and response of the PLL can
>be simulated and predicted. And like I said above, that's why companies pay
>me peanuts to design power distribution system that doesn't have 30mV of
>noise at 400KHz in the first place (or at least protect the PLL with enough
>filters that the VCO won't see that 30mV).
>
>  
>
>>This is not meant to be critical in any way, but unfortunately most BSEE
>>programs do not require a single class in Stochastic Processes (after
>>all who in their right mind would elect that class), and that is why a
>>lot of the engineering community graples with abstract jitter issues.
>>We have not been trained to think "stochastically".
>>    
>>
>
>Sorry Al, I don't know jack about stochastic process but none of the above
>is undeterministic or at least big enough when compared with the predictable
>part.
>
>I would propose another explanation for these BER 10e-xx spec or bath tub
>curves for electrical physical channels. It is based on the laziness of the
>engineer who really doesn't want to dig down to analyze and predict these
>effects such as ISI, PLL jitter or crosstalk so he/she just stick the probe
>at the receiver and measure the jitter and say "hmmm, I don't know where
>they come from so let's just call them noise/jitter and extrapolate 5x sigma
>to sand bag myself with enough margin and ship it." And that I suspect, is
>why you will ultimately have those intermittent failures.
>
>And if you bring in Mr. Heisenberg, I am out of here.
>------------------------------------------------------------------
>To unsubscribe from si-list:
>si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field
>
>or to administer your membership from a web page, go to:
>//www.freelists.org/webpage/si-list
>
>For help:
>si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field
>
>List FAQ wiki page is located at:
>                http://si-list.org/wiki/wiki.pl?Si-List_FAQ
>
>List technical documents are available at:
>                http://www.si-list.org
>
>List archives are viewable at:
>               //www.freelists.org/archives/si-list
>or at our remote archives:
>               http://groups.yahoo.com/group/si-list/messages
>Old (prior to June 6, 2001) list archives are viewable at:
>               http://www.qsl.net/wb6tpu
>
>
>
>
>------------------------------------------------------------------
>To unsubscribe from si-list:
>si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field
>
>or to administer your membership from a web page, go to:
>//www.freelists.org/webpage/si-list
>
>For help:
>si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field
>
>List FAQ wiki page is located at:
>                http://si-list.org/wiki/wiki.pl?Si-List_FAQ
>
>List technical documents are available at:
>                http://www.si-list.org
>
>List archives are viewable at:     
>               //www.freelists.org/archives/si-list
>or at our remote archives:
>               http://groups.yahoo.com/group/si-list/messages
>Old (prior to June 6, 2001) list archives are viewable at:
>               http://www.qsl.net/wb6tpu
>  
>
>
>
>  
>

-- 
Paul A. Levin
Senior Principal Engineer
Xyratex, Manhattan Beach
(310) 372-7352 - home & office
(310) 291-8199 - cell



------------------------------------------------------------------
To unsubscribe from si-list:
si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field

or to administer your membership from a web page, go to:
//www.freelists.org/webpage/si-list

For help:
si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field

List FAQ wiki page is located at:
                http://si-list.org/wiki/wiki.pl?Si-List_FAQ

List technical documents are available at:
                http://www.si-list.org

List archives are viewable at:     
                //www.freelists.org/archives/si-list
or at our remote archives:
                http://groups.yahoo.com/group/si-list/messages
Old (prior to June 6, 2001) list archives are viewable at:
                http://www.qsl.net/wb6tpu
  

Other related posts: