[Linux-Discussion] crash, moral of the story...

  • From: John Madden <weez@xxxxxxxxxxxxx>
  • To: linux-discussion@xxxxxxxxxxxxx
  • Date: Sat, 7 Apr 2001 13:33:39 -0500

Last night, FreeLists (the whole thing) crashed badly.  I couldn't figure 
out why at first, but given that it's running kernel 2.4.2, I feared that 
it may actually be a kernel issue.  

I rebooted (yuck, all those things I said about the lack of stability of 
windows came rushing back to me...) and let it run overnight, 
successfully.  This morning, I went into the usual testing for bad cpu/ram 
by doing a rigorous kernel compile:

make clean; make -j2 bzImage

It actually succeeded the first couple of times, but on run #3, it died.  
To all who didn't know about this trick before: kernel compiles are very 
memory-intensive.  If you want to test your system, compile the kernel 
multiple times with that -j option (-jN threads the compile into N 
threads, putting even more stress on the system).  

So I shut down, took out one of the dimms, booted, and ran the tests 
again.  And again, the first compile was fine, subsequent compiles failed. 
 Another reboot, swapped dimms again, boot, compile, success, compile, 
failure.  What's going on here?  *Both* dimms bad?!  Maybe the cpu?!

I headed over to advanced-cs.com to pick out a barebones system (btw, I 
have two of their systems, and I'd buy from them over and over again -- 
good prices, great systems), and while I was sitting there thinking about 
whether I should get an athlon 1ghz in thunderbird or thunderbird2, some 
voice said "dude, the cpu might be overheating."  

The heat sink was so hot I couldn't even touch it, and as I'm sitting 
there looking at it wondering why it's so hot, I look at the cpu and can 
barely belive my eyes -- the damn cpu fan wasn't even spinning!!  In fact, 
it wasn't even plugged in, which is something I'm sure I'll kick myself 
for for a long time. :)  I must've unplugged it when moving disks around.  

There are a few things to learn from this: 
1) Linux is awesome: the box had run for... 23? days without a cpu fan, 
meaning that when there's light load on the system (90% of the day), the 
OS keeps the cpu running cool.  
2) Never attribute to Linux that which can be explained by a) stupidity, 
and b) bad hardware.  I keep repeating this one over and over to myself -- 
time and time again, the kernel's fine, but either the hardware's bad, or 
there's a software configuration issue.
3) Windows still sucks a lot. 

Anyway, I've run the compile a few times now, and all seems well.  Silly 
rabbit, fans are for wussies. 

John


-- 
# John Madden  weez@xxxxxxxxxxxxx ICQ: 2EB9EA
# FreeLists, Free mailing lists for all: //www.freelists.org
# UNIX Systems Engineer, Ivy Tech State College: http://www.ivy.tec.in.us
# Linux, Apache, Perl and C: All the best things in life are free!
=============================================================
Avenir Web's Linux Discussion List

List info: //www.freelists.org/cgi-bin/webpage?webpage_id=13
To unsubscribe: email linux-discussion-request@xxxxxxxxxxxxx
with 'unsubscribe' in the Subject line.

Administrative contact: weez@xxxxxxxxxxxxx
=============================================================

Other related posts: