[THIN] Re: TSCALE or Appsense

Hi Angela,


Rick, you misunderstood me on the pagefile point.  I was thinking of
rebooting the servers nightly to refresh the server resources just in case
the memory is not being freed up fully once the applications close.  We
don't have the clearpagefileonexit option enabled (I made that mistake
once).  We currently reboot our servers weekly.  Was interested in seeing
if
its worth doing it nightly. On this point, do people also reboot their Web
Interface servers or dedicated Zone data Collectors or simply the servers
that farm the apps?


Nightly reboots don't hurt if you can fit them in. At least things will
generally be as good as they can be the next day. Unless of course the
servers don't reboot ;-)

(1) Is your page file contiguous (start and maximum same size, check with
pagedfrag (sysinternals)

Pagefile is contigous.  4096 min and max.  1-2 servers run 2003 Enterprise
and have more than 4Gb RAM.  These servers have 6Gb pagefiles. I know Citrix
won't really use more than 4Gb RAM.  Is it best I reduce the pagefile to
4Gb?

You're presumably using the /PAE switch in boot.ini to use memory above 4 GB
(or the servers have hot swap memory or hardware enforced DEP) but the big
problem on 32 bit systems is that as you add more memory you also add to the
kernel memory overhead and you will eventually hit the wall. IBM published a
study about 18 months ago that showed quite nicely that memory over 6-8 GB
actually resulted in less scalability on a server running a lot of
processes. So you end up tweaking the MAXMEM switch to reduce the amount of
RAM seen by the system until things are optimal, with the rest of the memory
wasted :-(

I guess that's where X64 comes in.

But back to your question, I really believe that you gain nothing in having
a huge page file (or files aggregate). On 4 GB systems the working page file
shouldn't be over 3.6 GB max with a /MAXMEM switch on an alternate
boot.inientry to reduce memory to 3 GB so we can do a full memory dump
if necessary.


I've avoided having more than one pagefile on a physical drive because the
use of 2 page files could have caused unnecessary disk thrashing. So what I
used to do was put the pagefile on a partition other than the system
partition to at least reduce directory overhead on the system disk. Of
course that meant you couldn't ever do a full crash dump.

However it looks like you can have your cake and eat it too according to
Microsoft technote 197379. If it's corrrect, you can have 2 page files and
actually use only the one on the least used partition. So you could
presumably do your paging on the least used partition, and have a maximum
sized page file on the system disk to handle a memory dump.

(2) What have you done by way of minimal system tuning, optimise memory for
applications etc?
I've got a tuning policy template that will help you do this without any
hacking

The farm was initially setup by Citrix so they used some of their own ADMs.
Theres alot of customisations but not sure if they are performance based.
Were you interested in any particular settings?

There are a bunch of tweaks, but if Citrix did the job then I'd suspect
everything that matters will be there. Maybe ;-)

(3) Do you defrag your system disks on a regular basis?
- running a scheduled batch job to "defrag c:" at 3 AM every day is dead
simple

No.  We don't defrag our disks at all.  Is this something that will make a
noticable performance difference?  I had a look at a few servers (ie ran
defrag manually) and it said it doesn't need defragging so I'll assume this
is OK for now but I may schedule this if its best practice.  Do you defrag
your servers daily/weekly/monthly?

Defragging does help and it's a good test of your file system structure. All
you have to do is run a scheduled "defrag c:" at 3 AM every day and it'll
ensure the disk will generally perform as well as it can.

(4) How big are your user's profiles?
- if they  get too big (over 6-8 MB) the extra system overhead from logins
and logouts will really hurt.

Profiles are between 1-2Mb.  We redirect Application Data path, Desktop
path, My Pictures path and My Documents path to users TS Home Drive to keep
profiles small.

That's just fine. However be aware that depending on the application being
used, redirecting application data can sometimes creat a huge preformance
hit.

(5) have you tuned your back-end servers (file/print, domain controllers) to
increase the network i/o queue size [ie maxmpxct/maxworkitems]
- I've got a tuning policy template for this that you can have.

I haven't tuned the File/Print server or the DC.  Would be interested in
seeing your template..

*I want to really stress this*.

*Your terminal server farm peak performance is totally dependent on the
performance of your file server.*

If it's too busy servicing requests, your whole farm will suffer. The
default network i/o request queue size on terminal services is way too small
and it has to be  increased by increasing the lanmanserver maxmpxct and
maxworkitems value on the file server. If you don't, once the number of
pending i/o requests fill the queue, everything will stop and you will see
momentary or even quite lengthy hangs. If the file server is busy enough it
can hang your whole farm. Really.

Increasing lanmanworkstation maxcmds on the file server client doesn't
increase the request queue size though I do set it to match the
maxmpxct/maxworkitems values.

*Domain controllers are file servers*, they host sysvol, group policies etc
and group policy processing generates a huge number of small i/o requests.
In a large TS environment, you can see TS server hangs on user login if the
domain controllers aren't tweaked as well.

But I'd like to make a few more comments about file servers. As I've
stressed, they are the heart of your TS environment, particularly if you
have a significant amount of folder redirection. Every folder that's
redirected increases the amount of network I/O operations. This isn't about
the data throughput capability of your NICs etc, it's about the ability of
the file server to service i/o requests, get data off disks and send the
data where it's needed.

*Tuning your file server is the most profound thing you can do to improve
farm performance*.

*(a)* Tune the network i/o parameters (maxmpxct/maxworkitems etc) I'll email
you the back-end server tuning template.
*(b)* defrag your file server data volumes. Get a good defragging product
(eg winternals defrag commander) and use it regularly. Either that or use a
unix system as your file server ;-)
*(c)* run chkdsk across the data volumes at regular intervals. I've seen
farms grind to a halt because of corrupted security descriptors on a data
volume.
*(d)* don't let your data volumes get more than 80% full
*(e)* don't run backups during prime time and avoid a too agressive virus
checker (if I checked the file when I wrote it to disk, why check it again
when I read it, or vice versa).
*(f)* Have a good hard think about not using hyperthreading on the cpus on
your fileserver.

*What would you think if I offered you an add-on for your car that could
make it go 10-25% faster most of the time at no extra cost? And the only
catch was that if you went up a steep enough hill your wheels would fall off.
*

When hyperthreaded CPUs get too busy servicing multiple serial i/o streams
then they start thrashing the shared cache and things get very slow very
quickly.

I use hyperthreading on my TS systems, but not on a file server that's
going to be super busy. Because if it hits the wall so do your TS systems.
And it's no fun having a whole farm hang.

*(g)* Don't migrate your file server to VMWare because it's a great way to
make sure your farm goes slower.

Special Note: If (c) goes wrong when you run a "ckhdsk /f" to fix security
descriptors, you can lose all the security ACLs on your data volume. While
you're desperately looking for your backup and give everyone access to
everything to buy you time to fix things, you might consider adding the ACLs
to folders using group policy. You get self repairing ACLs and the big plus
is that the ACLs are set in concrete and documented.

(6) Is your network/switch port condifigauration set up properly. If you
take a large file, does it take the same time to copy to and to copy from
another system. Use %systemroot%\drive cache\i386\drivers.cab.
- that's an easy one for your network people if the speed isn't the same in
both directions.

According to networks team they are setup OK.  Copy speed is OK also

Good. You sometimes see some really bizarre performance problems that boil
down to a misconfigure switch port/NIC configuration/

(7) My memory is a bit lazy at the moment. Did you mention that the main
apps are browser based? What are you running?

The majority of our published applications are browser based.  Some do use
Java.

That means bloat but what the heck, so does everything else. :-(

I have installed a Smart Array Write cache on one server as a test to see if
it makes a difference before I upgrade all my servers.

It will definitely help.

regards,

Rick

Other related posts: