Reading the section on file serving it¹s like you worked here for the last couple of years. We redirect Favorites, Desktop, My Documents, and App Data and we have gone through everything you mention below and made all of the same tuning tweaks, etc. Even the chkdsk where we lost all of the ACLs, had to make it everyone change (maybe full) to keep production running, and then working hours and hours to fix it. Compression was also a huge performance killer as admins would use it to address free space issues. The problem escalated as the disks got too full and the defragmenting process became useless. Adding/expanding LUNs, uncompressing disks, and defragging had a huge affect on performance. Backups that took 26 hours to complete for a single LUN now take 15 hours. Since we use Windows clustering I believe there is even a bit more tuning that had to be done and it took us many, many calls with Microsoft to get it all ironed out. The hyperthreading comment is something that I will take back for investigation as well as applying the security via policy instead of the file level to address exactly what we saw with chkdsk. We are currently redesigning our back end file serving to use DFS and Windows2003 x64. I¹m anxious to see how it works out. All great suggestions Rick. I have openings in Pittsburgh and Phoenix, just tell me when to have your office ready. ;p From: Rick Mack <ulrich.mack@xxxxxxxxx> Reply-To: <thin@xxxxxxxxxxxxx> Date: Thu, 11 Jan 2007 21:20:40 +1000 To: <thin@xxxxxxxxxxxxx> Subject: [THIN] Re: TSCALE or Appsense Hi Angela, > Rick, you misunderstood me on the pagefile point. I was thinking of > rebooting the servers nightly to refresh the server resources just in case > the memory is not being freed up fully once the applications close. We > don't have the clearpagefileonexit option enabled (I made that mistake > once). We currently reboot our servers weekly. Was interested in seeing if > its worth doing it nightly. On this point, do people also reboot their Web > Interface servers or dedicated Zone data Collectors or simply the servers > that farm the apps? Nightly reboots don't hurt if you can fit them in. At least things will generally be as good as they can be the next day. Unless of course the servers don't reboot ;-) (1) Is your page file contiguous (start and maximum same size, check with pagedfrag (sysinternals) Pagefile is contigous. 4096 min and max. 1-2 servers run 2003 Enterprise and have more than 4Gb RAM. These servers have 6Gb pagefiles. I know Citrix won't really use more than 4Gb RAM. Is it best I reduce the pagefile to 4Gb? You're presumably using the /PAE switch in boot.ini to use memory above 4 GB (or the servers have hot swap memory or hardware enforced DEP) but the big problem on 32 bit systems is that as you add more memory you also add to the kernel memory overhead and you will eventually hit the wall. IBM published a study about 18 months ago that showed quite nicely that memory over 6-8 GB actually resulted in less scalability on a server running a lot of processes. So you end up tweaking the MAXMEM switch to reduce the amount of RAM seen by the system until things are optimal, with the rest of the memory wasted :-( I guess that's where X64 comes in. But back to your question, I really believe that you gain nothing in having a huge page file (or files aggregate). On 4 GB systems the working page file shouldn't be over 3.6 GB max with a /MAXMEM switch on an alternate boot.ini entry to reduce memory to 3 GB so we can do a full memory dump if necessary. I've avoided having more than one pagefile on a physical drive because the use of 2 page files could have caused unnecessary disk thrashing. So what I used to do was put the pagefile on a partition other than the system partition to at least reduce directory overhead on the system disk. Of course that meant you couldn't ever do a full crash dump. However it looks like you can have your cake and eat it too according to Microsoft technote 197379. If it's corrrect, you can have 2 page files and actually use only the one on the least used partition. So you could presumably do your paging on the least used partition, and have a maximum sized page file on the system disk to handle a memory dump. (2) What have you done by way of minimal system tuning, optimise memory for applications etc? I've got a tuning policy template that will help you do this without any hacking The farm was initially setup by Citrix so they used some of their own ADMs. Theres alot of customisations but not sure if they are performance based. Were you interested in any particular settings? There are a bunch of tweaks, but if Citrix did the job then I'd suspect everything that matters will be there. Maybe ;-) (3) Do you defrag your system disks on a regular basis? - running a scheduled batch job to "defrag c:" at 3 AM every day is dead simple No. We don't defrag our disks at all. Is this something that will make a noticable performance difference? I had a look at a few servers (ie ran defrag manually) and it said it doesn't need defragging so I'll assume this is OK for now but I may schedule this if its best practice. Do you defrag your servers daily/weekly/monthly? Defragging does help and it's a good test of your file system structure. All you have to do is run a scheduled "defrag c:" at 3 AM every day and it'll ensure the disk will generally perform as well as it can. (4) How big are your user's profiles? - if they get too big (over 6-8 MB) the extra system overhead from logins and logouts will really hurt. Profiles are between 1-2Mb. We redirect Application Data path, Desktop path, My Pictures path and My Documents path to users TS Home Drive to keep profiles small. That's just fine. However be aware that depending on the application being used, redirecting application data can sometimes creat a huge preformance hit. (5) have you tuned your back-end servers (file/print, domain controllers) to increase the network i/o queue size [ie maxmpxct/maxworkitems] - I've got a tuning policy template for this that you can have. I haven't tuned the File/Print server or the DC. Would be interested in seeing your template.. I want to really stress this. Your terminal server farm peak performance is totally dependent on the performance of your file server. If it's too busy servicing requests, your whole farm will suffer. The default network i/o request queue size on terminal services is way too small and it has to be increased by increasing the lanmanserver maxmpxct and maxworkitems value on the file server. If you don't, once the number of pending i/o requests fill the queue, everything will stop and you will see momentary or even quite lengthy hangs. If the file server is busy enough it can hang your whole farm. Really. Increasing lanmanworkstation maxcmds on the file server client doesn't increase the request queue size though I do set it to match the maxmpxct/maxworkitems values. Domain controllers are file servers, they host sysvol, group policies etc and group policy processing generates a huge number of small i/o requests. In a large TS environment, you can see TS server hangs on user login if the domain controllers aren't tweaked as well. But I'd like to make a few more comments about file servers. As I've stressed, they are the heart of your TS environment, particularly if you have a significant amount of folder redirection. Every folder that's redirected increases the amount of network I/O operations. This isn't about the data throughput capability of your NICs etc, it's about the ability of the file server to service i/o requests, get data off disks and send the data where it's needed. Tuning your file server is the most profound thing you can do to improve farm performance. (a) Tune the network i/o parameters (maxmpxct/maxworkitems etc) I'll email you the back-end server tuning template. (b) defrag your file server data volumes. Get a good defragging product (eg winternals defrag commander) and use it regularly. Either that or use a unix system as your file server ;-) (c) run chkdsk across the data volumes at regular intervals. I've seen farms grind to a halt because of corrupted security descriptors on a data volume. (d) don't let your data volumes get more than 80% full (e) don't run backups during prime time and avoid a too agressive virus checker (if I checked the file when I wrote it to disk, why check it again when I read it, or vice versa). (f) Have a good hard think about not using hyperthreading on the cpus on your fileserver. What would you think if I offered you an add-on for your car that could make it go 10-25% faster most of the time at no extra cost? And the only catch was that if you went up a steep enough hill your wheels would fall off . When hyperthreaded CPUs get too busy servicing multiple serial i/o streams then they start thrashing the shared cache and things get very slow very quickly. I use hyperthreading on my TS systems, but not on a file server that's going to be super busy. Because if it hits the wall so do your TS systems. And it's no fun having a whole farm hang. (g) Don't migrate your file server to VMWare because it's a great way to make sure your farm goes slower. Special Note: If (c) goes wrong when you run a "ckhdsk /f" to fix security descriptors, you can lose all the security ACLs on your data volume. While you're desperately looking for your backup and give everyone access to everything to buy you time to fix things, you might consider adding the ACLs to folders using group policy. You get self repairing ACLs and the big plus is that the ACLs are set in concrete and documented. (6) Is your network/switch port condifigauration set up properly. If you take a large file, does it take the same time to copy to and to copy from another system. Use %systemroot%\drive cache\i386\drivers.cab. - that's an easy one for your network people if the speed isn't the same in both directions. According to networks team they are setup OK. Copy speed is OK also Good. You sometimes see some really bizarre performance problems that boil down to a misconfigure switch port/NIC configuration/ (7) My memory is a bit lazy at the moment. Did you mention that the main apps are browser based? What are you running? The majority of our published applications are browser based. Some do use Java. That means bloat but what the heck, so does everything else. :-( I have installed a Smart Array Write cache on one server as a test to see if it makes a difference before I upgrade all my servers. It will definitely help. regards, Rick