Hi Mort, I spent most of my career in clock and timing engineering... supercomputers and mainframes in the early days, and "slumming it" on PC's and workstations after that:) The consults ranged from front-end pre-emptive tolerance management (i.e. designing the clock distribution/reception architecture and the state communication geometry to meet the need), mid-project distribution design-validation (did they have a valid belief system? did they implement to their beliefs if it was?) and the fun part, late-design-stage rescue consults (why do we have clear timing faults but the tolerancing simulates/measures out ok?). That has lead to about 30 years of exposure around how various design teams/cultures think about this sort of thing. I consider that what you have is more a philosophical problem than a technical problem. It's about your "belief system". The definition of "what is right" is fluid (though "what is wrong" is carved in stone:). I don't think I can give you an answer to where you should be on the continuum from "conservative" to "aggressive", but I can try to give you a sense for how I've seen other teams and individuals think about the problem. The mainframe and supercomputer "clock guys" I worked with through the 80's would definitely be at the very conservative end of that spectrum. Part of what explains that was the nature of the product... supercomputers and mainframes were expected to not fail randomly. Every copy of what you built had to reliably synch on every cycle of its operation during its service life. Contributing to the conservatism, the guys doing the clocks on those systems tended to be tremendously well-rounded and expert not just at the electrical stuff (SI) but at actively designing to manage tolerances. A 100% lost art today, IMO. So... they were conservative. But they also had a lot of experience in working sticky clock situations (e.g. what special design and validation methods do you need at a stall/no-stall boundary in a multiphase pipe?). I can recall at least two situations on big systems where relaxing specific assumptions would result in a more tractable simulation pass and the way both situations were resolved was something I can't really imagine having occurred after ~1994 or so. Much consideration of the local governing dynamics of the tolerancing took place.. lots of interesting pictures were drawn in notebooks and even more interesting thought models constructed... resolving in one case days later, and in the other, I think it was a few weeks, as the assumptions were relaxed but they were very thoroughly and expertly regarded. Coming at the problem from the other end of system space in the 90's were the PC guys. There was no culture of precision in their past (go ahead and look for any evidence of clocking sophistication in 486's, where they all came from). The terms I used to apply to PC and workstation guys back then (when they weren't in the room) were "swashbucklers" and "butchers". Utterly NO sense of how to do clocking, and they tended to be far too willing to make big changes in their belief system about tolerancing on the basis of a 2-minute conversation (which means they had no belief system to start with). One workstation team I met with (I believe a gentleman from that team visits here quite regularly) had NO... as in NONE... accommodation of jitter in their tolerance budget. The discussion there went from oops... to we should factor that in then... to OMG, nothing is shippable if we factor that in!... to my favorite recurring "idea" for how to weasel a tolerance budget into an apparently shippable condition: "Let's RMS it". (My advice on RMS is that it has no role in the avoidance of real pathologies on clocks... my 2 cents). Personally, I've ended up suggesting to clients in good faith to move both toward greater conservatism (grow reliability) as well as toward more aggressive relaxation of a worst case assumption (grow performance). When these long philosophical discussions take place, I attempt in my reasoning to take into consideration all the knowledge we have... the mission of the system... how many do they intend to make (only ex-supercomputer guys think "1" is a possible answer to this:)? How deep and wide is the dist. network and are the things we're worried about static, dynamic or hybrid tolerances (you stack them differently depending on the answer)? Is the effect of the assumption we're thinking of relaxing systemic or local? Are there any "double taxing" effects in the things that are "too big"? Can this team actually do this? etc. When there was a rich set of choices from which you could build your distribution network, I worked to develop good relationships with the semiconductor manufacturers that made the high-precision distribution solutions. Sometimes it was possible to get a real feel for a device/family's personality far beyond the story on the data sheet... what direction did certain tolerances move under certain conditions... what was the "lawyer factor"... what are the "I've never ever seen XYZ" factors from the characterization guys... etc. And actually, derating factors were provided by some like the Moto guys who I got to know very well. Back then, clock distribution and reception schemes were universally unique clean-sheet-of-paper designs and you had lots of choices about how to color them in. I think this created an environment for a skillful practitioner to work the statistics in an informed way away from pure worst case. Something else we had back on the huge systems was absolute control not only of every path through the CDN, but also every state-to-state path, and given design tools then, there were not nearly as many critical paths as there are today. Even though the systems were huge, the problems were tractable. On a modern system, there's been a LOT of simplification for the clock guy... You have 3 orders of magnitude fewer board-level loads, usually only a single level of distribution instead of 6-10 with 2 or more parallel phases... BUT... today you have no idea of what's happening in the critical paths. They live inside devices you can't see into. What you do know is that modern design tools have made nearly every path a critical path. You can't work with the same kind of knowledge or total system visibility now that you could then. Reasoning what direction a Moto "111" will swing under likely operating conditions was just a lot easier than how some been-on-the-market-for-a-month double-PLL with a Jack the Ripper personality that only comes out when an uncommon and hitherto-unseen memory-write mode launches an image current underneath it will. It's different now. You have to trust that the behaviors you see in validation are all that are in there. "Skill" can't be applied in the same ways and IMO isn't as useful to the modern challenge. At the end of my first decade with clocks, I would have confidently answered your question that a maximally conservative POV is always required, though careful and informed relaxation from worst case can still be tolerated (ROFL... clock joke!). At this point, approaching three decades, I've encountered this question/scenario more than a few times. I've seen really careful teams/cultures still get burned being careful because of an undocumented Jack the Ripper effect in a complex device, and I've seen guys who shouldn't be allowed to design flashlights hit it out of the park being more optimistic than Mary Poppins. What I think explains some of that is that challenging design situations draw in more proficient designers, and the designs that don't probably have a LOT of undocumented margin. For example.. take the first two generations of a certain well-known microprocessor class (not saying which since they have more lawyers inside) that in my opinion, could NOT be implemented with legally timed designs that used the devices available at the time, using even a very experienced supercomputer-grade attention to detail. Yet... they worked, and I think that is because ALL of the risk margin on timing got pulled inside the processor to be available for those designers. You seemingly didn't have to actually legally time them. If you want to do something to crystallize your belief system, I would suggest getting out of the sim environment asap (simulation believers comprised a solid 15 years of rescue consults for me:) and building up a test program that gives you real data on what happens when you stress some assumption in a controlled way, and ultimately provides you with a sense of the pertinent governing dynamics for the kind of systems your team makes. There's probably no time or budget to do that today, but that is still my suggestion. That is when you can start relaxing certain worst case contributors to your overall tolerance budget in an informed way. Oh yea... and when you first have that confidence and start to apply it, that is when you're most likely to screw up REALLY big:) Good luck with things. Mike Williams _____________________///ASA\\\_____________________ Mike Williams President Principal Product Designer - M1 OT Family ASA +1.413.596.5354 mike.williams@xxxxxxxxxxxxxxxxxx <mailto:mike.williams@xxxxxxxxxxxxxxxxxxx> M1 Oscilloscope Tools: www.m1ot.com <http://www.m1ot.com/> ASA: www.amherst-systems.com <http://www.amherst-systems.com/> _____________________________________________________________ The information contained in this message may be privileged, confidential, and protected from disclosure. If the reader of this message is not the intended recipient, or any employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. >From: "Harwood, Morton (GE Infra, Aviation, US)" Morton.Harwood@xxxxxx >To: <si-list@xxxxxxxxxxxxx> >Date: Fri, 20 Mar 2009 15:08:10 -0400 > >What I don't really see, however, is how I can legitimately change the >numbers "on paper" to show other than 85 MHz as the worst-case limit. >This is what a certain group of people are asking me for, claiming that >we need to be more "aggressive" -- perhaps by taking an RMS >(root-mean-square) sum of delays or something of that nature -- instead >of subtracting from the 10 ns path a straight sum of delays like I am >doing. Another idea is to reduce the delays by some percentage to be >less than worst-case. (I counter with the argument that even though I >describe my timing analysis methodology as "worst-case", it actually >doesn't even consider the effects of noise -- such as from the power >distribution network or crosstalk. So I could make my timing analysis >even more pessimistic by adding some noise margin to the data sheet >values for vinl and vinh. And regarding an RMS sum of timing delays, I >think that's a fudge, not a valid method.) > >Well anyway I'm wondering what alternative timing analysis methodologies >other people use, or how other people deal with this situation at their >companies. > >Thanks very much if you have comments. > >Mort ------------------------------------------------------------------ To unsubscribe from si-list: si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field or to administer your membership from a web page, go to: //www.freelists.org/webpage/si-list For help: si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field List technical documents are available at: http://www.si-list.net List archives are viewable at: //www.freelists.org/archives/si-list or at our remote archives: http://groups.yahoo.com/group/si-list/messages Old (prior to June 6, 2001) list archives are viewable at: http://www.qsl.net/wb6tpu