[SI-LIST] Re: OT: Overvoltage breakdown on 120 nm silicon?

  • From: Chris Johnson <cjohnson@xxxxxxx>
  • To: Dimiter Popoff <dp@xxxxxxxxxxx>, "si-list@xxxxxxxxxxxxx" <si-list@xxxxxxxxxxxxx>
  • Date: Thu, 17 Feb 2011 14:26:44 -0500

Is there any way for you to slow down the system clock so that the 
external bus runs slower to see if that affects it?  It could be that a 
driver or receiver was damaged to the point of being slightly slower, 
thus causing your problem.

Chris

On 2/17/2011 1:41 PM, Dimiter Popoff wrote:
> Just tried it out - no, it does not boot but I have yet to
> investigate where it fails (not the same failure).
> I had not thought of trying it out because the code is huge
> and complex, chances are it will not boot with the cache
> off on a working machine either (will have to try that as well).
>
> But it may tell me something - which is what I am after at the
> moment.
>
> Thanks,
>
> Dimiter
>
>
>> Date: Thu, 17 Feb 2011 13:25:46 -0500
>> From: Chris Johnson<cjohnson@xxxxxxx>
>> To: Dimiter Popoff<dp@xxxxxxxxxxx>
>> Subject: Re: [SI-LIST] Re: OT: Overvoltage breakdown on 120 nm silicon?
>>
>> Does it work if you turn off cache?
>>
>> On 2/17/2011 1:15 PM, Dimiter Popoff wrote:
>>>> I assume that changing the power up sequence back to the way it was
>>>> initially makes no difference?
>>> Yep, forgot to mention that. Would have been nice :-).
>>>
>>>> Are there any Tantalum or electrolytic caps that are near their voltage
>>>> spec that could be an issue?
>>> No, just plenty of ceramic ones and 470uF/6.3V electrolytic. The power
>>> looks OK.
>>>
>>>> Could you be having a problem with what is being fetched from memory,
>>>> versus the CPU itself?  Are you running out of the cache or internal
>>>> memory at the point that it crashes?  If not, you could try to force the
>>>> code to be in internal memory and see if that changes the behavior.
>>> Well it is something like that but the problem appears to occur
>>> when just the cache is involved, the external DDR is fine. I even
>>> went to saving to a file the copy of the flash image from the DDR where
>>> it is moved upon reset and compared it to the original - no difference
>>> (I am doing the tests mostly booting off the flash as I know it to work).
>>>
>>> But the fact that a breakpoint - which causes lots of memory activity - does
>>> prevent the failure from occuring seems to suggest that something is wrong
>>> with the cache write (shortly after the return address is written
>>> to the stack another register is stacked - and sometimes the return
>>> address is wrong). But things like that occur zillions of times
>>> before this happens so I am just staring at it with pancake like
>>> eyes... :D
>>>
>>>
>>> Dimiter
>>>
>>>
>>>> Date: Thu, 17 Feb 2011 12:45:19 -0500
>>>> From: Chris Johnson<cjohnson@xxxxxxx>
>>>> To: Dimiter Popoff<dp@xxxxxxxxxxx>,
>>>> "si-list@xxxxxxxxxxxxx"<si-list@xxxxxxxxxxxxx>
>>>> Subject: [SI-LIST] Re: OT: Overvoltage breakdown on 120 nm silicon?
>>>>
>>>> Could you be having a problem with what is being fetched from memory,
>>>> versus the CPU itself?  Are you running out of the cache or internal
>>>> memory at the point that it crashes?  If not, you could try to force the
>>>> code to be in internal memory and see if that changes the behavior.
>>>>
>>>> Are there any Tantalum or electrolytic caps that are near their voltage
>>>> spec that could be an issue?
>>>>
>>>> I assume that changing the power up sequence back to the way it was
>>>> initially makes no difference?
>>>>
>>>> Chris
>>>>
>>>> On 2/17/2011 12:09 PM, Dimiter Popoff wrote:
>>>>> Well not that I could rule this out but it does not look much
>>>>> like it. It appears that the core is failing - which is powered
>>>>> off the 1.5V, this is internal only (no inputs/outputs).
>>>>> Then the consumption did not change. Then I have overvoltage
>>>>> protection on each of the power lines - 1.5, 2.5 and 3.3
>>>>> (SCR with zener for the 2.3 and 3.3, the 1.5 somewhat different
>>>>> but in effect that again - so the spikes were really well limited
>>>>> in both height and in time).
>>>>>     And then the DDR works - -2.5V powered. So does the flash and the
>>>>> ATA interface - 3.3V powered...
>>>>>
>>>>> But I am really inexperieced with failed parts of that size/complexity
>>>>> so I don't know, I feel really clueless. I will replace the CPU at
>>>>> some point (when I get some, I am out of parts now) but it is
>>>>> just interesting to me what this can be, I have seen a CPU
>>>>> which failed at some opcode 25 years ago, once (a clone of
>>>>> the 6800). And while it cost me some time to catch that I could
>>>>> catch where it failed. On that PPC part now things are incomparably
>>>>> more complex, nothing is guaranteed to be in order, caches, MMU,
>>>>> you name it. But I have done all the low and high level stuff so
>>>>> I can say I have narrowed things down  - yet I could
>>>>> not catch the access which fails. Putting a breakpoint within a section of
>>>>> say 20 opcodes prior to a certain location makes the return
>>>>> address on the stack correct (a breakpoint does an illegal opcode
>>>>> exception, tons of processing/memory i/o, possibly cache
>>>>> flushes etc.). Put it below a certain opcode - no opcode doing
>>>>> anything of interest - and the stacked return address is bad...
>>>>> It _does_ sound so much like a software issue yet it is limited to
>>>>> that board only.
>>>>> I spent over a day only recalling things so I could ensure there was
>>>>> no exception taking place to cause the failure. I think I have run
>>>>> out of ideas now though...
>>>>>
>>>>> Dimiter
>>>>>
>>>>> ------------------------------------------------------
>>>>> Dimiter Popoff               Transgalactic Instruments
>>>>>
>>>>> http://www.tgi-sci.com
>>>>> ------------------------------------------------------
>>>>> http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
>>>>>
>>>>>
>>>>>> Subject: [SI-LIST] Re: OT: Overvoltage breakdown on 120 nm silicon?
>>>>>> From: Russel Hughes<russel.hughes@xxxxxxxxx>
>>>>>> To: Dimiter Popoff<dp@xxxxxxxxxxx>
>>>>>> Cc: si-list@xxxxxxxxxxxxx
>>>>>> Date: Thu, 17 Feb 2011 17:19:51 +0100
>>>>>>
>>>>>> ESD diodes on an input broken down? If you have put too much through them
>>>>>> and they have shorted out it may explain your problem.
>>>>>> Cheers
>>>>>>
>>>>>> Russel
>>>>>>
>>>>>> On 17 February 2011 16:44, Dimiter Popoff<dp@xxxxxxxxxxx>    wrote:
>>>>>>
>>>>>>> I am facing an unbelievable reality at the moment.
>>>>>>> A processor which will not boot - although all tests I have
>>>>>>> done to it pass.
>>>>>>>
>>>>>>> I still refuse to believe I can have killed the CPU - but after
>>>>>>> 3 days of tracing of the boot process I seem to run out of
>>>>>>> other explanations (heck, I had to dig through code some of
>>>>>>> which I have written 15+ years ago...).
>>>>>>>
>>>>>>> The CPU (an MPC5200B) appears to work - monitor via UART, even disk
>>>>>>> I/O worked etc. - but it fails some way into the boot process.
>>>>>>> This happened after I fixed the power up sequencing closer to
>>>>>>> the specs :-).
>>>>>>>
>>>>>>> That board had been working for nearly a year before that, had survived
>>>>>>> the development process (lots of programming/debugging and power 
>>>>>>> on/off).
>>>>>>> It had lived through all that with a nice spike on the 1.5V, 2.5V and 
>>>>>>> 3.3V
>>>>>>> upon poweron, perhaps 1 to 5mS over the absolute maximum by perhaps
>>>>>>> 50%. I changed that now - and it won't boot, fails at more or less
>>>>>>> the same place (pulls the wrong return address from the stack if I am
>>>>>>> not tracing ....). This is after a few system calls have returned OK
>>>>>>> already. It looks unbelievable to me to have killed the CPU in such
>>>>>>> a subtle way - but I have not seen many killed ones.
>>>>>>>
>>>>>>> How likely is it that I have killed it? The only news about the
>>>>>>> spikes which I believe to may have killed it is that I now know they
>>>>>>> used to exist...
>>>>>>> Not to speak of the other boards which keep on workingfine :).
>>>>>>>
>>>>>>> I also made the CPU check almost all of the 64M DDRAM, write address
>>>>>>> to location/verify - works, did that with the written address rotated
>>>>>>> 0 to 31 times, also works.... And all that also misaligned,
>>>>>>> also works fine - it is pretty maddening really.
>>>>>>>
>>>>>>> I am simply clueless as to how likely it is to break a gate
>>>>>>> with say 2.5V instead of 1.5? I guess drain/source breakdown won't
>>>>>>> be an issue even if they break for a few mS (not enough energy
>>>>>>> to fry anything)?
>>>>>>>
>>>>>>> Hopefully people with more silicon inside knowledge can
>>>>>>> comment...
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dimiter
>>>>>>>
>>>>>>> ------------------------------------------------------
>>>>>>> Dimiter Popoff               Transgalactic Instruments
>>>>>>>
>>>>>>> http://www.tgi-sci.com
>>>>>>> ------------------------------------------------------
>>>>>>> http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
>>>>> ------------------------------------------------------------------
>

------------------------------------------------------------------
To unsubscribe from si-list:
si-list-request@xxxxxxxxxxxxx with 'unsubscribe' in the Subject field

or to administer your membership from a web page, go to:
//www.freelists.org/webpage/si-list

For help:
si-list-request@xxxxxxxxxxxxx with 'help' in the Subject field


List technical documents are available at:
                http://www.si-list.net

List archives are viewable at:     
                //www.freelists.org/archives/si-list
 
Old (prior to June 6, 2001) list archives are viewable at:
                http://www.qsl.net/wb6tpu
  

Other related posts: