Re: LIO/sec per CPU limit? Is it Hardware or Oracle code?

  • From: Mladen Gogala <gogala.mladen@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Thu, 10 Aug 2017 19:06:59 -0400

Henry, have you thought of testing IO on both boxes? Something like bonnie++ or SLOB could tell you the differences in the IO characteristics of your system. Also, if the underlying OS is Linux newer than RH 5.x, you can use atop to see how much IO are you actually doing on the systems.

There is also a distinct possibility of the systems having different memory types. DDR2, DDR3 and DDR4 are very different animals. You can check the memory types using dmidecode --type 17. Here is the result from my machine:

root@umajor:~# dmidecode --type 17
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.

Handle 0x0043, DMI type 17, 34 bytes
Memory Device
    Array Handle: 0x0042
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 8192 MB
    Form Factor: DIMM
    Set: None
    Locator: ChannelA-DIMM0
    Bank Locator: BANK 0
    Type: DDR3
    Type Detail: Synchronous
    Speed: 1600 MHz
    Manufacturer: 1315
    Serial Number: 00000000
    Asset Tag: 9876543210
    Part Number: BLS8G3D1609DS1S00.
    Rank: 2
    Configured Clock Speed: 1600 MHz

Handle 0x0044, DMI type 17, 34 bytes
Memory Device
    Array Handle: 0x0042
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 8192 MB
    Form Factor: DIMM
    Set: None
    Locator: ChannelA-DIMM1
    Bank Locator: BANK 1
    Type: DDR3
    Type Detail: Synchronous
    Speed: 1600 MHz
    Manufacturer: 1315
    Serial Number: 00000000
    Asset Tag: 9876543210
    Part Number: BLS8G3D1609DS1S00.
    Rank: 2
    Configured Clock Speed: 1600 MHz

Handle 0x0045, DMI type 17, 34 bytes
Memory Device
    Array Handle: 0x0042
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 8192 MB
    Form Factor: DIMM
    Set: None
    Locator: ChannelB-DIMM0
    Bank Locator: BANK 2
    Type: DDR3
    Type Detail: Synchronous
    Speed: 1600 MHz
    Manufacturer: 1315
    Serial Number: 00000000
    Asset Tag: 9876543210
    Part Number: BLS8G3D1609DS1S00.
    Rank: 2
    Configured Clock Speed: 1600 MHz

Handle 0x0046, DMI type 17, 34 bytes
Memory Device
    Array Handle: 0x0042
    Error Information Handle: Not Provided
    Total Width: 64 bits
    Data Width: 64 bits
    Size: 8192 MB
    Form Factor: DIMM
    Set: None
    Locator: ChannelB-DIMM1
    Bank Locator: BANK 3
    Type: DDR3
    Type Detail: Synchronous
    Speed: 1600 MHz
    Manufacturer: 1315
    Serial Number: 00000000
    Asset Tag: 9876543210
    Part Number: BLS8G3D1609DS1S00.
    Rank: 2
    Configured Clock Speed: 1600 MHz

On my system, I have 4 8GB banks of DDR3 memory. There is also information about the clock speed, which can significantly influence the memory access speed. You should also check the cache sizes on your machine:

root@umajor:~# lshw -C memory
  *-firmware
       description: BIOS
       vendor: American Megatrends Inc.
       physical id: 0
       version: F6
       date: 06/17/2014
       size: 64KiB
       capacity: 15MiB
capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
  *-cache:0
       description: L1 cache
       physical id: 3e
       slot: CPU Internal L1
       size: 256KiB
       capacity: 256KiB
       capabilities: synchronous internal write-back

Level 1 cache is the most significant. If the memory address is cached in L1 cache, the CPU doesn't have to go to MMU to fetch it. One system having significantly larger L1 cache than the other would also mean a lot faster memory access on average. Basically, the logic is very simple: your system has 3 main components: CPU, memory and disks. If CPU is the same, you should compare IO performance using bonnie++ and memory speed. My assumption is that there is difference in both of those factors. However, before venturing into that, check paging and swapping on both systems. Paging and swapping are performance killers and you may have them on one of your systems. Different file systems can also account for the speed degradation. Finally, I wish you good luck. You'll need it.


On 08/09/2017 05:46 PM, Henry Poras wrote:

I have two identical servers (or so I am told), but application work is running 2-3 times slower on one than the other. Using Tanel's snapper, I see that all active sessions are all on CPU. Viewing top shows me the same thing, each session pegs a cpu. We also found that it wasn't particular SQL that slowed down across severs, but it looked like everything was slow. A select count(*) from dba_objects showed this behavior as did Jonathan Lewis's kill_cpu script. This gave me something to test with. Running a 10046, I saw the same amount of resource utilization (parse count, fetch count, cr count, ...), no contention (wait events), but one server finished 2.5 times faster than the other. Looking at session stats through snapper, I see that the number of session logical reads per sec (~all of which are consistent reads) is ~ 2.5 times higher on one server than the other. That explains why it takes one longer to finish.

So, now what?? Why is one server giving me 350k consistent gets/per second and the other is ~800k? Is it hardware? /proc/cpuinfo shows the same cpu for each box. Is it hidden in the Oracle code path? I realize that not all LIO are created equal, but how do I check this? I am running on SE12.1.0.1

Any and all thoughts welcome.

Henry

--
Mladen Gogala
Oracle DBA
Tel: (347) 321-1217

Other related posts: