Henry, have you thought of testing IO on both boxes? Something like
bonnie++ or SLOB could tell you the differences in the IO
characteristics of your system. Also, if the underlying OS is Linux
newer than RH 5.x, you can use atop to see how much IO are you actually
doing on the systems.
There is also a distinct possibility of the systems having different
memory types. DDR2, DDR3 and DDR4 are very different animals. You can
check the memory types using dmidecode --type 17. Here is the result
from my machine:
root@umajor:~# dmidecode --type 17
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
Handle 0x0043, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: ChannelA-DIMM0
Bank Locator: BANK 0
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: 1315
Serial Number: 00000000
Asset Tag: 9876543210
Part Number: BLS8G3D1609DS1S00.
Rank: 2
Configured Clock Speed: 1600 MHz
Handle 0x0044, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: ChannelA-DIMM1
Bank Locator: BANK 1
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: 1315
Serial Number: 00000000
Asset Tag: 9876543210
Part Number: BLS8G3D1609DS1S00.
Rank: 2
Configured Clock Speed: 1600 MHz
Handle 0x0045, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: ChannelB-DIMM0
Bank Locator: BANK 2
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: 1315
Serial Number: 00000000
Asset Tag: 9876543210
Part Number: BLS8G3D1609DS1S00.
Rank: 2
Configured Clock Speed: 1600 MHz
Handle 0x0046, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: ChannelB-DIMM1
Bank Locator: BANK 3
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: 1315
Serial Number: 00000000
Asset Tag: 9876543210
Part Number: BLS8G3D1609DS1S00.
Rank: 2
Configured Clock Speed: 1600 MHz
On my system, I have 4 8GB banks of DDR3 memory. There is also
information about the clock speed, which can significantly influence the
memory access speed. You should also check the cache sizes on your machine:
root@umajor:~# lshw -C memory
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: F6
date: 06/17/2014
size: 64KiB
capacity: 15MiB
capabilities: pci upgrade shadowing cdboot bootselect
socketedrom edd int13floppy1200 int13floppy720 int13floppy2880
int5printscreen int9keyboard int14serial int17printer acpi usb
biosbootspecification uefi
*-cache:0
description: L1 cache
physical id: 3e
slot: CPU Internal L1
size: 256KiB
capacity: 256KiB
capabilities: synchronous internal write-back
Level 1 cache is the most significant. If the memory address is cached
in L1 cache, the CPU doesn't have to go to MMU to fetch it. One system
having significantly larger L1 cache than the other would also mean a
lot faster memory access on average. Basically, the logic is very
simple: your system has 3 main components: CPU, memory and disks. If CPU
is the same, you should compare IO performance using bonnie++ and memory
speed. My assumption is that there is difference in both of those
factors. However, before venturing into that, check paging and swapping
on both systems. Paging and swapping are performance killers and you may
have them on one of your systems. Different file systems can also
account for the speed degradation. Finally, I wish you good luck. You'll
need it.
On 08/09/2017 05:46 PM, Henry Poras wrote:
I have two identical servers (or so I am told), but application work is running 2-3 times slower on one than the other. Using Tanel's snapper, I see that all active sessions are all on CPU. Viewing top shows me the same thing, each session pegs a cpu. We also found that it wasn't particular SQL that slowed down across severs, but it looked like everything was slow. A select count(*) from dba_objects showed this behavior as did Jonathan Lewis's kill_cpu script. This gave me something to test with. Running a 10046, I saw the same amount of resource utilization (parse count, fetch count, cr count, ...), no contention (wait events), but one server finished 2.5 times faster than the other. Looking at session stats through snapper, I see that the number of session logical reads per sec (~all of which are consistent reads) is ~ 2.5 times higher on one server than the other. That explains why it takes one longer to finish.
So, now what?? Why is one server giving me 350k consistent gets/per second and the other is ~800k? Is it hardware? /proc/cpuinfo shows the same cpu for each box. Is it hidden in the Oracle code path? I realize that not all LIO are created equal, but how do I check this? I am running on SE12.1.0.1
Any and all thoughts welcome.
Henry