Nah...we just found out that the users quit letting us know about them. We had to dig through logs and found about 55 disconects in two days on a 300 user farm. I'm somewhat relieved, considering I can't see how a fileserver could fix that problem anway. So, problem still there...no clue how to proceed. ________________________________ From: thin-bounce@xxxxxxxxxxxxx on behalf of Steve Raffensberger Sent: Thu 5/20/2004 8:12 PM To: thin@xxxxxxxxxxxxx Subject: [THIN] Re: Random ICA disconnects! Gabe Knuth seems to have fixed his disconnection problem by replacing a file server. The following is a copy of a message from Rick Mack to this forum a few weeks ago. Rick mentions all the standard disconnect troubleshooting steps one should take. Hope this helps, Raff -------------------- From Rick ------ Hi People, Had fun on a site with lots of disconnections and the fix turned out to be something we didn't suspect at all. Thought it might be interesting if I gave you a quick tour of what we did. We had just consolidated a bunch of Citrix Metaframe (win2k SP2/ MF Xpa FR2) servers in to one location (previously each regional office had their own server). Gigabit backbone 1-2 MB ADSL connections to each office (8-30) users per office. WAN performance was a bit ordinary at times 'til we put in a Thinprint gateway server, and protocol queuing, getting away from ICA client-based printing. Everything looked reasonably good, but there were a fair few disconnections happening on the WAN. Some users were getting disconnected up to 5-6 times a day and getting really annoyed. We did the usual things, monitored WAN stability, turned on ICA keepalives and upped tcpmaxretransmissions so that the sessions might last out any transient comms problems and disconnections were detected promptly enough the auto reconnection worked most of the time. But the disconnections remained, even though the autoreconnection made things a lot less aggravating for the users. The disconnections were happening almost at random, only the busier users got disconnected more. But even idle sessions could get disconnected. Went through the servers with a fine tooth comb, fixing up everything that was even slightly out. Word from the network guys was that except for a very occasional dropout, which disconnected a lot of sessions at once, the WAN links were fine. So what was it? We set up a network trace with Ethereal between a couple of the most badly effected ICA clients and a dedicated server. I used dumpel to trawl the server event logs for events 683 and 682 (disconnection/reconnections) so that we could accurately determine when disconnections were happening. Since these 2 client machines were getting 8-12 disconnections a day between them, we didn't have long to wait. The results were a surprise. There were a lot of re-transmissions, mostly on the ICA client side, and most of the TCP session disconnections were actually from the client end. It looked like the server was going offline (from a comms perspective) for up to 30-45 seconds, prompting a client disconnect. When we increased the tcpretranmission count at the client end (win98) disconnections still happened, despite the TCP session timeout being extended to over 2 minutes. Packets and retransmissions were just getting lost. It really looked like a LAN problem (problem in computer room). The servers had gigabit cards, so we tried dropping everything to 100 Mb, even replaced the gigabit card with 10/100 cards and bypassed the gigabit switch with the server plugged into a 10/100 switch. Since the computer room had mistakenly been cabled to Cat 5 we even bypassed the existing cabling with cat 6 cables direct to the switch. No improvement. So we couldn't blame the NICs, the cabling or the gigabit network. But it sure looked like ICA packets were dropping down a black hole at times. I happened to spot a Microsoft technote on a PMTUdetection fault in win2K SP2 that looked just about perfect. If you look at the IP flags on an ICA protocol packet, you'll find the "Don't fragment" bit is set. Considering that an ADSL link often uses a smaller MTU than ethernet, it looked like we might have found our problem. When we examined the network trace for large (> MTU (1440 bytes)) every single large packet was being retransmitted. When you did a "ping -f -l 1441" from the server to an ICA client all packets were dropped, and "ping -f -l 1440" had about a 25-50% drop rate. Smaller packets were okay. Whoopee! And all you have to do is put in a registry entry to force a small packet size and things will be fixed. Nope! So where was our black hole? To absolutely exclude the LAN components, we set up a system with 3 NICs (one for remote access, 2 for monitoring). We set up 2 lots of simultaneous packet monitoring, between the WAN router and core switch (input side), and the switch and the server. That way we had 2 packet traces on both sides of the switch that were accurately synchronised by time offset (both ethereal sessions on same system, one on each monitoring NIC). The results were pretty discouraging because both traces looked identical. Kind of suggested that our problem wasn't on the LAN. But one of our network guys was finally convinced that it was a network issue, so he persisted in going through the disconnection traces packet by packet. About 50 packets upstream from the disconnection, he found something that shouldn't have been there. We were looking at a packet trace where we were using a TCP/IP address filter, looking at packets between a single client and server. What he found was that that the destination MAC address of packets going to the server was occasionally changing, just before a whole bunch of retransmissions and disconnection from the client end. The router was actually sending packets with the IP address of the router to their PIX firewall (default gateway), not the Metaframe server, and more importantly as well as a packet getting redirected to the wrong place, all subsequent retransmissions of the lost packet were also getting sent to the PIX. This was happening in the midst of normal traffic and ACKs, all with the right MAC address and IP address. Since all the client retransmissions weren't being acknowledged, the client eventually just gave up. I guess I didn't mention that the router in question was a new model Cisco router. One of the performance enhancements that Cisco have is CEF (cisco express forwarding) which optimises packet retransmissions etc by resending identical packets out of a buffer rather than handling slower retransmission from the WAN. If the same packet was being regenerated, it could explain why the re-transmissions were also going to the same, wrong MAC address. Didn't explain why the router was getting confused, but at least explained why it was being consistent. When we disabled CEF, the disconnections went away. Cisco will be getting a full bug report and we've got a happy customer. Just don't ask how many man-hours it took to find this feature :-( Regards, Rick Ulrich Mack Volante Systems 18 Heussler Terrace, Milton 4064 Queensland, Australia tel +61 7 32467704 rmack@xxxxxxxxxxxxxx ---------------------------------------- ******************************************************** This Week's Sponsor - Tarantella Secure Global Desktop Tarantella Secure Global Desktop Terminal Server Edition Free Terminal Service Edition software with 2 years maintenance. http://www.tarantella.com/ttba ********************************************************** Useful Thin Client Computing Links are available at: http://thin.net/links.cfm *********************************************************** For Archives, to Unsubscribe, Subscribe or set Digest or Vacation mode use the below link: http://thin.net/citrixlist.cfm -- No attachments (even text) are allowed -- -- Type: application/ms-tnef -- File: winmail.dat ******************************************************** This Week's Sponsor - Tarantella Secure Global Desktop Tarantella Secure Global Desktop Terminal Server Edition Free Terminal Service Edition software with 2 years maintenance. http://www.tarantella.com/ttba ********************************************************** Useful Thin Client Computing Links are available at: http://thin.net/links.cfm *********************************************************** For Archives, to Unsubscribe, Subscribe or set Digest or Vacation mode use the below link: http://thin.net/citrixlist.cfm