Thanks for all the input...most helpful!! Matthew Shrewsbury, MCSE+Internet MCSE 2000 CCA Server+ Network Manager -----Original Message----- From: thin-bounce@xxxxxxxxxxxxx [mailto:thin-bounce@xxxxxxxxxxxxx] On Behalf Of Rick Mack Sent: Friday, March 24, 2006 7:03 PM To: thin@xxxxxxxxxxxxx Subject: RE: [THIN] PS4 Locks up Hi Matthew, Server lockups can be incredibly frustrating to sort out. Basically it's possible to see several different types of hangs or lockups. The first may be due to software on the server stressing things to the limit. Examples that I can think of are: apps with severe memory leaks which cause the server to page itself to death cpu hogs that take out all your cpu resources applications that exhaust some system resource like file handles heavy registry updates Then there are server hardware problems like flakey memory. Corrupt user profiles has been a major cause of hangs on Server 2003 at times, but you're running 200 server so that's kind of unlikely. Using the latest version of UPHClean isn't a bad idea though. You didn't state whether you're running 2000 SP4, but if you are, I'd suggest you look at hotfixes 324446, 816134, 817446, 821255, 823747, 823272 and 829485. However software issues apart, the most common cause of hangs are back-end servers. By this I mean that TS systems are incredibly dependent on timely response from back-end servers (file/print, domain controllers). If the network I/O request queues fill up, the TS systems will hang, either momentarily or just stop depending on the amount of pending I/O. MaxMPXCt and MaxWorkitems tuning helps a lot and can make the difference between a server that hangs and one that just goes slow when the back-end gets sluggish. The best I can probably do for you is to give you an example. Had a situation recently where a TS server was just going super slow at times and would hang for 2-3 minutes at a time. It was properly tuned and looked okay from a performance monitoring viewpoint. Current commands were a bit high but not excessive. In terms of when the hangs were happening, they didn't happen all the time but started mid-morning and kept happening til late afternoon. I was fairly certain early on that the file/print server was the problem, but that's where the fun started. The file server had so many things wrong with it that we barely knew where to start. Cleaned up a lot of crap (MP3s mostly) to free up some disk space, defragged and chkdsked the volumes, moved files around to spread the I/O, fixed the antivirus settings. Things got a lot better but we were still seeing the hangs. Set up perfmon to look at just about everything and saw an interesting relationship between network i/o and server work queues. The network I/O baseline was fairly high all the time, but would drop down to zero for 2-3 minutes. At the same time, the server work queue count was climbing linearly up to 20-30 indicating that the CPUs were super busy (but cpu time didn't peak at the same time). After 2-3 minutes the work queues would drop to zero and the network I/O would resume. Memory utilization was okay, cache hits were generally better than 95%, very little disk I/O, cpu utilization was ok etc. So something was making the server so busy that it wasn't responding to anything. Poked around until I realised that there was something very peculiar about the network i/o throughput I was seeing with task manager. We're used to peaks and troughs in activity, but there was a constant baseline activity and it never fell to zero. So what was going on? Installed ethereal and started looking at what was happening. I found that the baseline activity was due to 2 workstations on the network that were hammering the server. When we looked at the packet capture it was really interesting. The packets were MTU sized SMB packets mostly filled with nulls, so we were looking at some sort of malformed SMB request. To cut a long story short, the 2 workstations were infected with a virus which wasn't being activated until the user logged on. Once the user logged off or the workstation was turned off everything started working as designed. If the relevant users didn't come in that day or arrived late or left early, the hang times would change. Cleaned up the virus and the hangs disappeared. Had a similar scenario where the culprits were 2 workstations where the users had set the antivirus package to check network drives. Other causes can be backups left running during production time or basically anything that slows down server responsiveness. If you've got a lot of group policies, then group policy processing can also cause server hangs on logon if your domain controllers aren't performing well. If the server has just crashed and a lot of users are logging back on, it can just lock up. The last scenario I can think of at the moment is where there's a NIC/switch port speed mismatch or autonegotiation problem. However this is generally easy to diagnose because if you copy a file to and from the server, you can see a huge difference in the copy speeds. Hopefully there's something relevant in this rambling ;-) regards, Rick Ulrich Mack Volante Systems ________________________________ From: thin-bounce@xxxxxxxxxxxxx on behalf of Matthew Shrewsbury Sent: Sat 25/03/2006 8:29 To: thin@xxxxxxxxxxxxx Subject: [THIN] PS4 Locks up As you may have noticed I have been posting a lot as of late. For some strange reason out of the blue I've been having a lot of issues. One PS4 server keeps locking up out of my two although both have locked up but not at the same time. 1) I've tried updating all firmware and drivers. 2) I've installed PSE400W2KR01.msp on the one server it would install on (but it locked but about 7 hours later). 3) I've searched event logs but don't see anything obvious. Is there any good method of repairing a Citrix server? I'm just not finding anything that points me to a problem. Sunday I'm going to come in and try my best to find the problem. I plan to start with low level hardware diagnostics, then proceed to virus scans/ boot from PE and look for root kits. Run some network sniffing tools. If everything comes up clean how should I try and repair my server? Should I try and repair PS4 through add remove programs? I'm running a older version of User Profile hive cleanup...could this cause lock ups? Matthew Shrewsbury, MCSE+Internet MCSE 2000 CCA Server+ Network Manager ######################################################################## ############# This e-mail, including all attachments, may be confidential or privileged. Confidentiality or privilege is not waived or lost because this e-mail has been sent to you in error. If you are not the intended recipient any use, disclosure or copying of this e-mail is prohibited. If you have received it in error please notify the sender immediately by reply e-mail and destroy all copies of this e-mail and any attachments. All liability for direct and indirect loss arising from this e-mail and any attachments is hereby disclaimed to the extent permitted by law. ######################################################################## #############