Why never finished deploying oracle data guard physical standy by rman duplicate?

From: Quanwen Zhao <quanwenzhao@xxxxxxxxx>
To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
Date: Sun, 21 May 2023 21:40:27 +0800

Hello my oracle friends :-),

One of my customers wanna delopy oracle data guard physical standby by rman
duplicate using my current oracle 11.2.0.4.0 with single instance, after a
series of the tedious DG parameters configuration both on primary and
physical standby database I started using rman duplicate to push memory
script from primary to physical standby.

Yes, my current oracle 11.2.0.4.0 has the volume size of *2.0 t* for data
files, the hardware resource is as below:
Logic CPUs: 64c, Physical Memory: 512 GB, Disk IO for the read and write
speed is probably a bit of bottleneck.

Firstly, I use the following shell script to deploy it.

cat duplicate_dg.sh

rman target sys/xxxxxx@aa auxiliary sys/xxxxxx@bb << EOF >
/home/oracle/duplicate_dg_`date +%Y%m%d_%H%M%S`.log
run {
allocate channel p1 type disk;
allocate channel p2 type disk;
allocate channel p3 type disk;
allocate channel p4 type disk;
allocate channel p5 type disk;
allocate channel p6 type disk;
allocate auxiliary channel d1 type disk;
allocate auxiliary channel d2 type disk;
allocate auxiliary channel d3 type disk;
allocate auxiliary channel d4 type disk;
allocate auxiliary channel d5 type disk;
allocate auxiliary channel d6 type disk;
duplicate target database for standby nofilenamecheck from active database;
release channel p6;
release channel p5;
release channel p4;
release channel p3;
release channel p2;
release channel p1;
release channel d6;
release channel d5;
release channel d4;
release channel d3;
release channel d2;
release channel d1;
}
exit;
EOF

But after *3.5 hours*, the log file of rman duplicate has the following
error:

......

executing command: SET NEWNAME

Starting backup at 2023-05-19 23:14:31

channel p1: starting datafile copy
input datafile file number=00041
name=/oracle/oradata/xxxxxx/datafile/data_1.dbf
channel p2: starting datafile copy
input datafile file number=00042
name=/oracle/oradata/xxxxxx/datafile/data_2.dbf
channel p3: starting datafile copy
input datafile file number=00043
name=/oracle/oradata/xxxxxx/datafile/data_3.dbf
channel p4: starting datafile copy
input datafile file number=00044
name=/oracle/oradata/xxxxxx/datafile/data_4.dbf
channel p5: starting datafile copy
input datafile file number=00045
name=/oracle/oradata/xxxxxx/datafile/data_5.dbf
channel p6: starting datafile copy
input datafile file number=00046
name=/oracle/oradata/xxxxxx/datafile/data_6.dbf
output file name=/oracle/oradata/xxxxxxx/datafile/data_7.dbf
tag=TAG20230519T231432
channel p1: datafile copy complete, elapsed time: 00:42:08
channel p1: starting datafile copy
input datafile file number=00007
name=/oracle/oradata/xxxxxx/datafile/data_7.1005681629
output file name=/oracle/oradata/xxxxxx/datafile/data_9.dbf
tag=TAG20230519T231432
channel p5: datafile copy complete, elapsed time: 00:42:07
channel p5: starting datafile copy
input datafile file number=00008
name=/oracle/oradata/xxxxxx/datafile/data_8.1005682331
output file name=/oracle/oradata/xxxxxx/datafile/data_10.dbf
tag=TAG20230519T231432
channel p6: datafile copy complete, elapsed time: 00:42:07
......

*RMAN-03009: failure of backup command on p1 channel at 05/20/2023
02:45:38ORA-00603: ORACLE server session terminated by fatal
errorORA-00239: timeout waiting for control file enqueue: held by 'inst 1,
osid 8262' for more than 900 seconds*
......

My initial thought is reducing rman channel number and limit speed, so I
changed the previous shell script to be as follows:

cat duplicate_dg.sh

rman target sys/xxxxxx@aa auxiliary sys/xxxxxx@bb << EOF >
/home/oracle/duplicate_dg_`date +%Y%m%d-%H%M%S`.log
run {

*allocate channel p1 type disk maxpiecesize 16g maxopenfiles 4 rate
40M;allocate auxiliary channel d1 type disk maxpiecesize 16g maxopenfiles 4
rate 40M;*
duplicate target database for standby nofilenamecheck from active database;
release channel d1;
release channel p1;
}
exit;
EOF

Re-running that shell script, never showing the error ORA-00239 again, but
after 8 hours (finished copying *1.3 t* for data files), my primary oracle
database has been crashed directly, I've seen the following error from
*/var/log/messages*.

......

May 21 16:03:19 aaaaaa kernel: [112149.846996] *oracle invoked
oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0*
May 21 16:03:19 aaaaaa kernel: [112149.847013] oracle cpuset=/
mems_allowed=0
May 21 16:03:19 aaaaaa kernel: [112149.847036] CPU: 37 PID: 48996 Comm:
oracle Not tainted 4.1.12-61.1.28.el6uek.x86_64 #2
May 21 16:03:19 aaaaaa kernel: [112149.847038] Hardware name: Xen HVM
domU, BIOS 4.7.4-1.16 07/13/2018
May 21 16:03:19 aaaaaa kernel: [112149.847040]  0000000000000000
ffff88011c157528 ffffffff816c6e40 ffff887dd8bb3800
May 21 16:03:19 aaaaaa kernel: [112149.847043]  0000000000000000
ffff88011c157578 ffffffff8118c15e ffff880100000000
May 21 16:03:19 aaaaaa kernel: [112149.847046]  ffffffff000200da
ffff880148932a00 ffff880d0e7c3800 ffff880d0e7c42b0
May 21 16:03:19 aaaaaa kernel: [112149.847048] Call Trace:
May 21 16:03:19 aaaaaa kernel: [112149.847057]  [<ffffffff816c6e40>]
dump_stack+0x63/0x83
May 21 16:03:19 aaaaaa kernel: [112149.847062]  [<ffffffff8118c15e>]
dump_header+0x8e/0xe0
May 21 16:03:19 aaaaaa kernel: [112149.847065]  [<ffffffff8118c767>]
oom_kill_process+0x1d7/0x3c0
May 21 16:03:19 aaaaaa kernel: [112149.847071]  [<ffffffff812a8485>] ?
security_capable_noaudit+0x15/0x20
May 21 16:03:19 aaaaaa kernel: [112149.847083]  [<ffffffff8108d707>] ?
has_capability_noaudit+0x17/0x20
May 21 16:03:19 aaaaaa kernel: [112149.847089]  [<ffffffff8118cc18>]
__out_of_memory+0x2c8/0x370
May 21 16:03:19 aaaaaa kernel: [112149.847094]  [<ffffffff8118ce09>]
out_of_memory+0x69/0x90
May 21 16:03:19 aaaaaa kernel: [112149.847099]  [<ffffffff8119209f>]
__alloc_pages_slowpath+0x6af/0x760
May 21 16:03:19 aaaaaa kernel: [112149.847102]  [<ffffffff81192401>]
__alloc_pages_nodemask+0x2b1/0x2d0
May 21 16:03:19 aaaaaa kernel: [112149.847105]  [<ffffffff810b61c8>] ?
sched_clock_cpu+0xa8/0xc0
May 21 16:03:19 aaaaaa kernel: [112149.847109]  [<ffffffff811de104>]
alloc_pages_vma+0xd4/0x230
May 21 16:03:19 aaaaaa kernel: [112149.847113]  [<ffffffff811cceed>]
read_swap_cache_async+0xfd/0x160
May 21 16:03:19 aaaaaa kernel: [112149.847115]  [<ffffffff811cd05e>]
swapin_readahead+0x10e/0x1c0
May 21 16:03:19 aaaaaa kernel: [112149.847118]  [<ffffffff811a148e>]
shmem_swapin+0x5e/0x90
May 21 16:03:19 aaaaaa kernel: [112149.847121]  [<ffffffff816c6f3d>] ?
io_schedule_timeout+0xdd/0x110
May 21 16:03:19 aaaaaa kernel: [112149.847124]  [<ffffffff811fad5e>] ?
swap_cgroup_record+0x4e/0x60
May 21 16:03:19 aaaaaa kernel: [112149.847127]  [<ffffffff8131c703>] ?
radix_tree_lookup_slot+0x13/0x30
May 21 16:03:19 aaaaaa kernel: [112149.847129]  [<ffffffff81187d6e>] ?
find_get_entry+0x1e/0xa0
May 21 16:03:19 aaaaaa kernel: [112149.847132]  [<ffffffff81189788>] ?
pagecache_get_page+0x38/0x1c0
May 21 16:03:19 aaaaaa kernel: [112149.847135]  [<ffffffff811a47a0>]
shmem_getpage_gfp+0x540/0x820
May 21 16:03:19 aaaaaa kernel: [112149.847137]  [<ffffffff811a54ba>]
shmem_fault+0x6a/0x1c0
May 21 16:03:19 aaaaaa kernel: [112149.847141]  [<ffffffff8129a0ae>]
shm_fault+0x1e/0x20
May 21 16:03:19 aaaaaa kernel: [112149.847144]  [<ffffffff811b877d>]
__do_fault+0x3d/0xa0
May 21 16:03:19 aaaaaa kernel: [112149.847149]  [<ffffffff810ebfd7>] ?
current_fs_time+0x27/0x30
May 21 16:03:19 aaaaaa kernel: [112149.847153]  [<ffffffff810c8783>] ?
__wake_up+0x53/0x70
May 21 16:03:19 aaaaaa kernel: [112149.847155]  [<ffffffff811b89c5>]
do_read_fault+0x1e5/0x300
May 21 16:03:19 aaaaaa kernel: [112149.847157]  [<ffffffff811b8cfc>] ?
do_shared_fault+0x19c/0x1d0
May 21 16:03:19 aaaaaa kernel: [112149.847159]  [<ffffffff811bc703>]
handle_pte_fault+0x1e3/0x230
May 21 16:03:19 aaaaaa kernel: [112149.847163]  [<ffffffff810738a0>] ?
pte_alloc_one+0x30/0x50
May 21 16:03:19 aaaaaa kernel: [112149.847165]  [<ffffffff811b7e27>] ?
__pte_alloc+0xd7/0x190
May 21 16:03:19 aaaaaa kernel: [112149.847167]  [<ffffffff811bc90c>]
__handle_mm_fault+0x1bc/0x330
May 21 16:03:19 aaaaaa kernel: [112149.847169]  [<ffffffff811bcb32>]
handle_mm_fault+0xb2/0x1a0
May 21 16:03:19 aaaaaa kernel: [112149.847171]  [<ffffffff8106ddf3>] ?
__do_page_fault+0xe3/0x480
May 21 16:03:19 aaaaaa kernel: [112149.847173]  [<ffffffff8106de7c>]
__do_page_fault+0x16c/0x480
May 21 16:03:19 aaaaaa kernel: [112149.847175]  [<ffffffff8106e337>]
do_page_fault+0x37/0x90
May 21 16:03:19 aaaaaa kernel: [112149.847177]  [<ffffffff816c7b9a>] ?
schedule_user+0x1a/0x60
May 21 16:03:19 aaaaaa kernel: [112149.847181]  [<ffffffff816cdb18>]
page_fault+0x28/0x30
May 21 16:03:19 aaaaaa kernel: [112149.847183] Mem-Info:
May 21 16:03:19 aaaaaa kernel: [112149.847193] active_anon:18878013
inactive_anon:30402260 isolated_anon:576
May 21 16:03:19 aaaaaa kernel: [112149.847193]  active_file:4697
inactive_file:4559 isolated_file:613
May 21 16:03:19 aaaaaa kernel: [112149.847193]  unevictable:0 dirty:0
writeback:0 unstable:0
May 21 16:03:19 aaaaaa kernel: [112149.847193]  slab_reclaimable:242933
slab_unreclaimable:49152
May 21 16:03:19 aaaaaa kernel: [112149.847193]  mapped:16529750
shmem:47921098 pagetables:2963894 bounce:0
May 21 16:03:19 aaaaaa kernel: [112149.847193]  free:532442 free_pcp:223
free_cma:1
May 21 16:03:19 aaaaaa kernel: [112149.847197] Node 0 DMA free:15828kB
min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:15988kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB
mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:68kB
kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? yes
May 21 16:03:19 aaaaaa kernel: [112149.847202] lowmem_reserve[]: 0 3455
515685 515685
May 21 16:03:19 aaaaaa kernel: [112149.847206] Node 0 DMA32 free:2049148kB
min:436kB low:544kB high:652kB active_anon:0kB inactive_anon:28kB
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:3915776kB managed:3540104kB mlocked:0kB
dirty:0kB writeback:0kB mapped:12kB shmem:8kB slab_reclaimable:8240kB
slab_unreclaimable:1700kB kernel_stack:32kB pagetables:248kB unstable:0kB
bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:4 all_unreclaimable? yes
May 21 16:03:19 aaaaaa kernel: [112149.847210] lowmem_reserve[]: 0 0
512229 512229
May 21 16:03:19 aaaaaa kernel: [112149.847214] Node 0 Normal free:64792kB
min:65092kB low:81364kB high:97636kB active_anon:75512308kB
inactive_anon:121608500kB active_file:18788kB inactive_file:18236kB
unevictable:0kB isolated(anon):2304kB isolated(file):2452kB
present:532930560kB managed:524523364kB mlocked:0kB dirty:0kB writeback:0kB
mapped:66118988kB shmem:191684384kB slab_reclaimable:963492kB
slab_unreclaimable:194840kB kernel_stack:19120kB pagetables:11855328kB
unstable:0kB bounce:0kB free_pcp:892kB local_pcp:0kB free_cma:4kB
writeback_tmp:0kB pages_scanned:312092 all_unreclaimable? yes
May 21 16:03:19 aaaaaa kernel: [112149.847218] lowmem_reserve[]: 0 0 0 0
May 21 16:03:19 aaaaaa kernel: [112149.847221] Node 0 DMA: 1*4kB (U) 2*8kB
(U) 2*16kB (U) 1*32kB (U) 0*64kB 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB
(U) 1*2048kB (M) 3*4096kB (M) = 15828kB
May 21 16:03:19 aaaaaa kernel: [112149.847233] Node 0 DMA32: 26*4kB (UM)
25*8kB (UM) 33*16kB (UEM) 40*32kB (UEM) 17*64kB (UEM) 4*128kB (EM) 6*256kB
(UEM) 2*512kB (M) 5*1024kB (UEM) 3*2048kB (UMR) 496*4096kB (M) = 2049152kB
May 21 16:03:19 aaaaaa kernel: [112149.847246] Node 0 Normal: 16336*4kB
(UMC) 81*8kB (UM) 11*16kB (UR) 9*32kB (UR) 1*64kB (R) 0*128kB 2*256kB (R)
1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 68568kB
May 21 16:03:19 aaaaaa kernel: [112149.847258] Node 0 hugepages_total=0
hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
May 21 16:03:19 aaaaaa kernel: [112149.847259] Node 0
hugepages_total=153641 hugepages_free=153641 hugepages_surp=0
hugepages_size=2048kB
May 21 16:03:19 aaaaaa kernel: [112149.847261] 48948485 total pagecache
pages
May 21 16:03:19 aaaaaa kernel: [112149.847262] 1018682 pages in swap cache
May 21 16:03:19 aaaaaa kernel: [112149.847264] Swap cache stats: add
32771424, delete 31752742, find 10338170/12219018
May 21 16:03:19 aaaaaa kernel: [112149.847265] Free swap  = 0kB
May 21 16:03:19 aaaaaa kernel: [112149.847266] Total swap = 16498684kB
May 21 16:03:19 aaaaaa kernel: [112149.847267] 134215581 pages RAM
May 21 16:03:19 aaaaaa kernel: [112149.847268] 0 pages HighMem/MovableOnly
May 21 16:03:19 aaaaaa kernel: [112149.847269] 2191643 pages reserved
May 21 16:03:19 aaaaaa kernel: [112149.847270] 4096 pages cma reserved
May 21 16:03:19 aaaaaa kernel: [112149.847271] 0 pages hwpoisoned
......

Oh, my god!!! You know, I've set *Hugepage* on OS and disabled the *THP
(Transparent Huge Pages)*.

[root@aaaaaa ~]# *cat /proc/meminfo | grep Huge*

AnonHugePages:         0 kB

*HugePages_Total:   153641HugePages_Free:    73222HugePages_Rsvd:    73182*
HugePages_Surp:        0
Hugepagesize:       2048 kB
[root@aaaaaa ~]#
[root@aaaaaa ~]# *cat /sys/kernel/mm/transparent_hugepage/enabled*
always madvise *[never]*
[root@aaaaaa ~]#

About how to avoid out-of-memory? The most articles suggest to set
*"vm.lower_zone_protection
= 250"* on */etc/sysctl.conf *on x86_64 system or install *hugemem kernel
rpm package* on x86 system.

Could you help me troubleshooting the incomprehensible issue? Thanks
beforehand!

Best Regards
Quanwen Zhao

Follow-Ups:
- Re: Why never finished deploying oracle data guard physical standy by rman duplicate?
  - From: Mladen Gogala

Why never finished deploying oracle data guard physical standy by rman duplicate?

Other related posts: