Is there any way for you to slow down the system clock so that the external bus runs slower to see if that affects it? It could be that a driver or receiver was damaged to the point of being slightly slower, thus causing your problem. Chris On 2/17/2011 1:41 PM, Dimiter Popoff wrote: > Just tried it out - no, it does not boot but I have yet to > investigate where it fails (not the same failure). > I had not thought of trying it out because the code is huge > and complex, chances are it will not boot with the cache > off on a working machine either (will have to try that as well). > > But it may tell me something - which is what I am after at the > moment. > > Thanks, > > Dimiter > > >> Date: Thu, 17 Feb 2011 13:25:46 -0500 >> From: Chris Johnson<cjohnson@xxxxxxx> >> To: Dimiter Popoff<dp@xxxxxxxxxxx> >> Subject: Re: [SI-LIST] Re: OT: Overvoltage breakdown on 120 nm silicon? >> >> Does it work if you turn off cache? >> >> On 2/17/2011 1:15 PM, Dimiter Popoff wrote: >>>> I assume that changing the power up sequence back to the way it was >>>> initially makes no difference? >>> Yep, forgot to mention that. Would have been nice :-). >>> >>>> Are there any Tantalum or electrolytic caps that are near their voltage >>>> spec that could be an issue? >>> No, just plenty of ceramic ones and 470uF/6.3V electrolytic. The power >>> looks OK. >>> >>>> Could you be having a problem with what is being fetched from memory, >>>> versus the CPU itself? Are you running out of the cache or internal >>>> memory at the point that it crashes? If not, you could try to force the >>>> code to be in internal memory and see if that changes the behavior. >>> Well it is something like that but the problem appears to occur >>> when just the cache is involved, the external DDR is fine. I even >>> went to saving to a file the copy of the flash image from the DDR where >>> it is moved upon reset and compared it to the original - no difference >>> (I am doing the tests mostly booting off the flash as I know it to work). >>> >>> But the fact that a breakpoint - which causes lots of memory activity - does >>> prevent the failure from occuring seems to suggest that something is wrong >>> with the cache write (shortly after the return address is written >>> to the stack another register is stacked - and sometimes the return >>> address is wrong). But things like that occur zillions of times >>> before this happens so I am just staring at it with pancake like >>> eyes... :D >>> >>> >>> Dimiter >>> >>> >>>> Date: Thu, 17 Feb 2011 12:45:19 -0500 >>>> From: Chris Johnson<cjohnson@xxxxxxx> >>>> To: Dimiter Popoff<dp@xxxxxxxxxxx>, >>>> "si-list@xxxxxxxxxxxxx"<si-list@xxxxxxxxxxxxx> >>>> Subject: [SI-LIST] Re: OT: Overvoltage breakdown on 120 nm silicon? >>>> >>>> Could you be having a problem with what is being fetched from memory, >>>> versus the CPU itself? Are you running out of the cache or internal >>>> memory at the point that it crashes? If not, you could try to force the >>>> code to be in internal memory and see if that changes the behavior. >>>> >>>> Are there any Tantalum or electrolytic caps that are near their voltage >>>> spec that could be an issue? >>>> >>>> I assume that changing the power up sequence back to the way it was >>>> initially makes no difference? >>>> >>>> Chris >>>> >>>> On 2/17/2011 12:09 PM, Dimiter Popoff wrote: >>>>> Well not that I could rule this out but it does not look much >>>>> like it. It appears that the core is failing - which is powered >>>>> off the 1.5V, this is internal only (no inputs/outputs). >>>>> Then the consumption did not change. Then I have overvoltage >>>>> protection on each of the power lines - 1.5, 2.5 and 3.3 >>>>> (SCR with zener for the 2.3 and 3.3, the 1.5 somewhat different >>>>> but in effect that again - so the spikes were really well limited >>>>> in both height and in time). >>>>> And then the DDR works - -2.5V powered. So does the flash and the >>>>> ATA interface - 3.3V powered... >>>>> >>>>> But I am really inexperieced with failed parts of that size/complexity >>>>> so I don't know, I feel really clueless. I will replace the CPU at >>>>> some point (when I get some, I am out of parts now) but it is >>>>> just interesting to me what this can be, I have seen a CPU >>>>> which failed at some opcode 25 years ago, once (a clone of >>>>> the 6800). And while it cost me some time to catch that I could >>>>> catch where it failed. On that PPC part now things are incomparably >>>>> more complex, nothing is guaranteed to be in order, caches, MMU, >>>>> you name it. But I have done all the low and high level stuff so >>>>> I can say I have narrowed things down - yet I could >>>>> not catch the access which fails. Putting a breakpoint within a section of >>>>> say 20 opcodes prior to a certain location makes the return >>>>> address on the stack correct (a breakpoint does an illegal opcode >>>>> exception, tons of processing/memory i/o, possibly cache >>>>> flushes etc.). Put it below a certain opcode - no opcode doing >>>>> anything of interest - and the stacked return address is bad... >>>>> It _does_ sound so much like a software issue yet it is limited to >>>>> that board only. >>>>> I spent over a day only recalling things so I could ensure there was >>>>> no exception taking place to cause the failure. I think I have run >>>>> out of ideas now though... >>>>> >>>>> Dimiter >>>>> >>>>> ------------------------------------------------------ >>>>> Dimiter Popoff Transgalactic Instruments >>>>> >>>>> http://www.tgi-sci.com >>>>> ------------------------------------------------------ >>>>> http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/ >>>>> >>>>> >>>>>> Subject: [SI-LIST] Re: OT: Overvoltage breakdown on 120 nm silicon? >>>>>> From: Russel Hughes<russel.hughes@xxxxxxxxx> >>>>>> To: Dimiter Popoff<dp@xxxxxxxxxxx> >>>>>> Cc: si-list@xxxxxxxxxxxxx >>>>>> Date: Thu, 17 Feb 2011 17:19:51 +0100 >>>>>> >>>>>> ESD diodes on an input broken down? If you have put too much through them >>>>>> and they have shorted out it may explain your problem. >>>>>> Cheers >>>>>> >>>>>> Russel >>>>>> >>>>>> On 17 February 2011 16:44, Dimiter Popoff<dp@xxxxxxxxxxx> wrote: >>>>>> >>>>>>> I am facing an unbelievable reality at the moment. >>>>>>> A processor which will not boot - although all tests I have >>>>>>> done to it pass. >>>>>>> >>>>>>> I still refuse to believe I can have killed the CPU - but after >>>>>>> 3 days of tracing of the boot process I seem to run out of >>>>>>> other explanations (heck, I had to dig through code some of >>>>>>> which I have written 15+ years ago...). >>>>>>> >>>>>>> The CPU (an MPC5200B) appears to work - monitor via UART, even disk >>>>>>> I/O worked etc. - but it fails some way into the boot process. >>>>>>> This happened after I fixed the power up sequencing closer to >>>>>>> the specs :-). >>>>>>> >>>>>>> That board had been working for nearly a year before that, had survived >>>>>>> the development process (lots of programming/debugging and power >>>>>>> on/off). >>>>>>> It had lived through all that with a nice spike on the 1.5V, 2.5V and >>>>>>> 3.3V >>>>>>> upon poweron, perhaps 1 to 5mS over the absolute maximum by perhaps >>>>>>> 50%. I changed that now - and it won't boot, fails at more or less >>>>>>> the same place (pulls the wrong return address from the stack if I am >>>>>>> not tracing ....). This is after a few system calls have returned OK >>>>>>> already. It looks unbelievable to me to have killed the CPU in such >>>>>>> a subtle way - but I have not seen many killed ones. >>>>>>> >>>>>>> How likely is it that I have killed it? The only news about the >>>>>>> spikes which I believe to may have killed it is that I now know they >>>>>>> used to exist... >>>>>>> Not to speak of the other boards which keep on workingfine :). >>>>>>> >>>>>>> I also made the CPU check almost all of the 64M DDRAM, write address >>>>>>> to location/verify - works, did that with the written address rotated >>>>>>> 0 to 31 times, also works.... And all that also misaligned, >>>>>>> also works fine - it is pretty maddening really. >>>>>>> >>>>>>> I am simply clueless as to how likely it is to break a gate >>>>>>> with say 2.5V instead of 1.5? I guess drain/source breakdown won't >>>>>>> be an issue even if they break for a few mS (not enough energy >>>>>>> to fry anything)? >>>>>>> >>>>>>> Hopefully people with more silicon inside knowledge can >>>>>>> comment... >>>>>>> >>>>>>> Thanks, >>>>>>> Dimiter >>>>>>> >>>>>>> ------------------------------------------------------ >>>>>>> Dimiter Popoff Transgalactic Instruments >>>>>>> >>>>>>> http://www.tgi-sci.com >>>>>>> ------------------------------------------------------ >>>>>>> http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/ >>>>> ------------------------------------------------------------------ > ------------------------------------------------------------------ To unsubscribe from si-list: si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field or to administer your membership from a web page, go to: //www.freelists.org/webpage/si-list For help: si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field List technical documents are available at: http://www.si-list.net List archives are viewable at: //www.freelists.org/archives/si-list Old (prior to June 6, 2001) list archives are viewable at: http://www.qsl.net/wb6tpu