Running RAC in a Virtual Environment

I was in a discussion at an Oracle Meetup this week, and the subject of RAC in a virtualized environment – specifically Oracle Virtual Machine (OVM) – came up.
Here’s a couple of points which were discussed.

pingtarget

There was a lack of awareness of a common problem, which has a solution built-in to Oracle 12.1.0.2 Grid Infrastructure and later. In a virtualized environment, the network components are also virtualized. Sometimes, network failures on the host may not be raised up to the guests. As a result, the guest O/S commands can fail to detect the network failures and the Virtual NIC remains up. Grid Infrastructure (GI) will not perform a VIP fail-over as it can’t see the failure despite the network being unavailable.

To resolve this, Oracle has added an option of a “pingtarget” for each public network defined in GI. This will perform a keep-alive to a external device, usually something like the default gateway. This is just like the heartbeat on the cluster interconnect.

Before

srvctl config network

Network 1 exists

Subnet IPv4: 192.168.0.160/255.255.255.224/eth1, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:

The default gateway makes a good ping target. For this IP and subnet, it’s 192.168.0.161

srvctl modify network -k 1 -pingtarget 192.168.0.161

After

srvctl config network

Network 1 exists

Subnet IPv4: 192.168.0.160/255.255.255.224/eth1, static
Subnet IPv6:
Ping Targets: 192.168.0.161
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:

All safe!

Server Pools

A second item we discussed was the Server Pools in OVM. Each RAC guest should be on a different host, otherwise you have not eliminated that as a Single Point Of Failure. A second less obvious SPOF is the Server Pool disk.

A Server Pool is a filesystem LUN (and IP address prior to release 3.4) used to group a logical collection of servers with similar CPU models, within which we can create and migrate VM guests. For a RAC installation, each RAC node should be within a different server pool, as well as on different physical hardware.

ovs

In this image, RAC nodes within the same cluster should be created within each server pool. This configuration can safely support a 2 node cluster despite having 4 servers, with one node created in “OVS-Pool-2” on server “ovs02“. The second node should be in “OVS-Pool-1″ and can be on “ovs01“, “ovs11” or “ovs12“.

It is possible to live migrate guests between these 3 servers.

 

 

 

 

 

Advertisement

Oracle Audit Control part 3 – OS files

Just realized that I wrote a follow-up to Part 2 two years ago and never posted it! So here’s the stripped-out key facts to round off auditing control a bit more:

Deleting O/S audit files is, like, totally necessary. In Oracle 12C (12.1), they are created by default as AUDIT_SYS_OPERATIONS defaults to TRUE now, so you are going to be filling up your AUDIT_FILE_DEST whether you realized it or not.

[NOTE: In theory, this could end up being a performance problem as well as a space problem if you have many millions files in the AUDIT_FILE_DEST. In Oracle 11G, when allocating an AUDIT file, it would check to see if a file for that SPID already existed and add an incremental number, The act of calling vx_dirbread and vx_dirscan can get terribly slow if there are lots of files to wade through. This mechanism is different in 12C but I have not traced the kernel calls to see if the vx_ calls have gone away.]

To cleanup the O/S, we use the same commands as we did in Part 2, but changed to use an AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS

You also need to be aware of the RAC_INSTANCE_NUMBER. For database-level audit, it’s irrelevant. For single instance, it is also irrelevant, but with RAC, this parameter tells you which node to clean up as they are all individual installs with their own audit files (assuming a Grid Infrastructure install per node.)

Commands:

-- Show the BEFORE sizes
host ssh server01 du -m -s /u01/app/oracle/admin/ORCL/a*
host ssh server02 du -m -s /u01/app/oracle/admin/ORCL/a*
begin

-- Loop around every instance in the RAC cluster and run the command
for i in (select inst_id from gv$instance)
loop

-- cleanup AUDIT_FILE_DEST (operating system audit files)
-- set cleanup for 60 days
DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(
AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,
LAST_ARCHIVE_TIME => systimestamp-60,
RAC_INSTANCE_NUMBER => i.inst_id );

-- And cleanup based upon that date
DBMS_AUDIT_MGMT.clean_audit_trail(
audit_trail_type        => DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,
use_last_arch_timestamp => TRUE);

end loop;

end;
/

-- And check the sizes AFTER
host ssh server01 du -m -s /u01/app/oracle/admin/ORCL/a*
host ssh server02 du -m -s /u01/app/oracle/admin/ORCL/a*

 


 

Output:

15:58:56 SYS @ ORCL1 > host ssh server01 du -m -s /u01/app/oracle/admin/ORCL/a*
776 /u01/app/oracle/admin/ORCL/adump
15:58:56 SYS @ ORCL1 > host ssh server02 du -m -s /u01/app/oracle/admin/ORCL/a*
694 /u01/app/oracle/admin/ORCL/adump
15:58:56 SYS @ ORCL1 >
15:58:56 SYS @ ORCL1 >
15:58:56 SYS @ ORCL1 > begin
15:58:56   2
15:58:56   3   for i in (select inst_id from gv$instance)
15:58:56   4   loop
15:58:56   5
15:58:56   6      -- cleanup AUDIT_FILE_DEST (operating system audit files)
15:58:56   7      -- set cleanup for 60 days
15:58:56   8      DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(
15:58:56   9      AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,
15:58:56  10      LAST_ARCHIVE_TIME => systimestamp-50,
15:58:56  11      RAC_INSTANCE_NUMBER => i.inst_id );
15:58:56  12
15:58:56  13      -- And cleanup based upon that date
15:58:56  14      DBMS_AUDIT_MGMT.clean_audit_trail(
15:58:56  15        audit_trail_type        => DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,
15:58:56  16        use_last_arch_timestamp => TRUE);
15:58:56  17
15:58:56  18   end loop;
15:58:56  19
15:58:56  20  end;
15:58:56  21  /
PL/SQL procedure successfully completed.

15:58:56 SYS @ ORCL1 >
15:58:56 SYS @ ORCL1 >
15:58:56 SYS @ ORCL1 > host ssh server01 du -m -s /u01/app/oracle/admin/ORCL/a*
354 /u01/app/oracle/admin/ORCL/adump
15:58:57 SYS @ ORCL1 > host ssh server02 du -m -s /u01/app/oracle/admin/ORCL/a*
364 /u01/app/oracle/admin/ORCL/adump
15:58:57 SYS @ ORCL1 >
 


 

 
And just one final bit. If you change the default

AUDIT_TRAIL=DB to
AUDIT_TRAIL=DB, EXTENDED

the audit writes the first 2,000 characters of the SQL command to DBA_AUDIT_TRAIL.SQL_TEXT, so you get the full text of the SQL command audited, rather than just the action. Very handy!

UKOUG RDBMS and RAC-CIA Special Interest Groups

On Thursday 21st April, there is a dual UKOUG Database and RAC, Cloud, Infrastructure and Availability special interest group.

For the first time, this event is being held in the fabulous Northern city of Manchester!

There are a dozen interesting, career-assisting, educational talks from end users, Oracle employees and a number of well known Oracle ACE’s at all levels, including Carl Dudley, Jonathan Lewis, Phil Brown and myself.

I will be talking about how to troubleshoot Goldengate, showing optimal configurations to assist with problem determination and a bit of staring at Hex dumps for the brave.

There are only a few places left for this popular dual-stream event. Click Here for more details about the talks and speakers, and details for registration.

See you there!

OEM and monitoring the -MGMTDB GMIR Database

When you create Grid Infrastructure in 12.1.0.2, you are presented with a new (annoyingly named) “-MGMTDB”

This database is a standard, 12C CDB/PDB storing transient performance and other information (Grid Infrastructure Management Repository). If it is lost, no biggie. Just re-create it (in your voting disk DG. Aside: create a new MGMTDB_DG, move your voting disk there, re-create -MGMTDB, then move your voting disk back out to the proper multiple voting volumes.)

However, Oracle Enterprise Manager Cloud Control 12.1.0.5.0 and earlier sees this oracle database, PDB, listener and all, and decides to discover it. This is incorrect and should not happen. It is supposed to be “masked off” from OEM. Monitoring this database system will only lead to false positives and problems where none really exist, and all of the targets should be ignored (as per the attached picture)

OEM-MGMTDB

In a future release of Oracle Enterprise Manager Cloud Control, these targets will no longer be discovered and will automatically remain hidden from view within OEM, once the team have fixed the bug which – I was very reliably informed – they discovered the root cause of today.

Extending an ACFS filesystem dynamically.

To extend an ACFS cluster filesystem dynamically, we need to use the acfsutil command:

node01:/u01/grid>/sbin/acfsutil size +10G /u02
acfsutil size: ACFS-03008: The volume could not be resized.  The volume expansion limit has been reached.
acfsutil size: ACFS-03216: The ADVM compatibility attribute for the diskgroup was below the required
                           version (11.2.0.4.0) for unlimited volume expansions.

Oh dear, not 11.0.2.4, so you can only extend volumes dynamically a few times (5) before the global bitmap becomes full. So, now it’s an offline change. 😦

Check what is accessing /u02 and stop it:

node01:/opt/oracle>sudo -s
[root@node01 oracle]# lsof /u02

COMMAND   PID   USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
bash     5566 oracle  cwd    DIR 252,50177    12288   78 /u02/goldengate/bin11
su      29509   root  cwd    DIR 252,50177    12288   78 /u02/goldengate/bin11

erm. kill -9 5566 29509 :-)

DO THIS ON EVERY RAC NODE!

[root@node01 oracle]# umount –t acfs /u02
[root@node02 oracle]# umount –t acfs /u02
[root@node-n oracle]# umount –t acfs /u02

Once unmounted, we can “repair” the global bitmap:

[root@node02 oracle]# fsck -y -t acfs  /dev/asm/acfsdisk_u02-98
fsck from util-linux-ng 2.17.2
version                   = 11.2.0.4.0
*****************************
********** Pass: 1 **********
*****************************
Oracle ASM Cluster File System (ACFS) On-Disk Structure Version: 39.0
 ACFS file system created at: Thu Jan  2 17:08:02 2014
 checking primary file system
 Files checked in primary file system: 25%
 Files checked in primary file system: 100%

 fsck.acfs: ACFS-07728: The Global_BitMap file has reached the maximum number of extents (5).
 The file system can no longer be expanded. 

 Running fsck.acfs in fixer mode will attempt to consolidate the storage bitmap into 
 fewer extents which would allow for file system expansion

 Checking if any files are orphaned...
 0 orphans found
 Checker completed with no errors.

So lets fix it – output seriously trimmed, but with the important bit

[root@node02 oracle]# /sbin/fsck.acfs -a -v /dev/asm/acfsdisk_u02-98
fsck from util-linux-ng 2.17.2
version                   = 11.2.0.4.0
 *****************************
 ********** Pass: 1 **********
 *****************************
 Oracle ASM Cluster File System (ACFS) On-Disk Structure Version: 39.0
 ACFS file system created at: Thu Jan  2 17:08:02 2014

 checking primary file system
 Files checked in primary file system: 25%
 Files checked in primary file system: 100%

 fsck.acfs: ACFS-07729: The Global_Bitmap file has been
 consolidated into 2 extents.
 This may allow for file system expansion via the 'acfsutil size' command.
  
 Checking if any files are orphaned...
 0 orphans found
 Checker completed with no errors.

So, we’re done and can re-mount ON EVERY NODE. Given it’s now 2 extents, and the max we can have is 5, we have 3 more dynamic extensions before we need to do this again.

mount –t acfs /dev/asm/acfsdisk_u02-98 /u02

And re-attempt to expand the filesystem

node01:/u01/grid>df –h /u02
Filesystem              Size  Used Avail Use% Mounted on
/dev/asm/acfsdisk_u02-98  325G   36G  290G  12% /u02

node01:/u01/grid>/sbin/acfsutil size +10G /u02
acfsutil size: new file system size: 359703511040 (343040MB)
node01:/u01/grid>df -h

node01:/u01/grid>df -h /u02
Filesystem              Size  Used Avail Use% Mounted on
/dev/asm/acfsdisk_u02-98  335G   36G  300G  11% /u02
node01:/u01/grid>

Yey! Bigger filesystem! Lets minimise the amount of times it needs to be extended in the future but doing it in big lumps. Might just save a planned outage.

 

Alternatively, upgrade Grid Infra to at least 11.2.0.4 and set advm compatibility to 11.2.0.4 and the restriction will be gone for good:

ALTER DISKGROUP acfsdisk SET ATTRIBUTE 'compatible.asm' = '11.2.0.4', 'compatible.rdbms' = '11.2.0.4', 'compatible.advm' = '11.2.0.4';
(or ASMCMD [+] > setattr -G acfsdisk compatible.advm 11.2.0.4)
(or right-click on the disk group in asmca and click "edit attributes")

Grid Infrastructure Disk Space Problem – CHM DB file: crfclust.bdb

The Grid Infrastructure filesystem was reporting that it was a bit full today (release 11.2.0.4). This was tracked down to the “crfclust.bdb” file, which records information about the cluster health for monitoring purposes. It was 26GB. It’s not supposed to get bigger than 1GB so this is probably a bug, but let’s explicitly resolve the size issue right now and search Oracle support later. Worst case, bdb (Berkerley Database) files get regenerated when CHM (ora.crf) resource is restarted.  You only lose the (OS) statistics that CHM has gathered. Deleting bdb files does not have other impact.  CHM will start collecting the OS statistics again.

 

df –h /u01

Filesystem                Size  Used Avail Use% Mounted on
/dev/sdc1                  48G   36G  9.0G  81% /u01

pwd
/u01/app/11g/grid/crf/db/node01

ls -lh
total 29G

-rw-r–r– 1 root root 2.1M Jul 22 12:12 22-JUL-2014-12:12:03.txt
-rw-r–r– 1 root root 1.3M Apr 23 14:28 23-APR-2014-14:28:04.txt
-rw-r–r– 1 root root 1.2M Apr 23 14:33 23-APR-2014-14:33:34.txt
-rw-r–r– 1 root root 1.3M Jul 23 12:53 23-JUL-2014-12:53:02.txt
-rw-r–r– 1 root root 946K Apr 26 03:57 26-APR-2014-03:57:21.txt
-rw-r—– 1 root root 492M Aug 26 10:33 crfalert.bdb
-rw-r—– 1 root root  26G Aug 26 10:33 crfclust.bdb   <-26G!
-rw-r—– 1 root root 8.0K Jul 23 12:52 crfconn.bdb
-rw-r—– 1 root root 521M Aug 26 10:33 crfcpu.bdb
-rw-r—– 1 root root 513M Aug 26 10:33 crfhosts.bdb
-rw-r—– 1 root root 645M Aug 26 10:33 crfloclts.bdb
-rw-r—– 1 root root 418M Aug 26 10:33 crfts.bdb
-rw-r—– 1 root root  24K Aug  1 16:07 __db.001
-rw-r—– 1 root root 392K Aug 26 10:33 __db.002
-rw-r—– 1 root root 2.6M Aug 26 10:33 __db.003
-rw-r—– 1 root root 2.1M Aug 26 10:34 __db.004
-rw-r—– 1 root root 1.2M Aug 26 10:33 __db.005
-rw-r—– 1 root root  56K Aug 26 10:34 __db.006
-rw-r—– 1 root root  16M Aug 26 10:17 log.0000008759
-rw-r—– 1 root root  16M Aug 26 10:33 log.0000008760
-rw-r—– 1 root root 8.0K Aug 26 10:33 repdhosts.bdb
-rw-r–r– 1 root root 115M Jul 22 12:12 node01.ldb

Lets see how big the repository is…

oclumon manage -get repsize
CHM Repository Size = 1073736016

Wow.  Seems a bit oversized. Change the repository size to the desired number of seconds, between 3600 (1 hour) and 259200 (3 days)

oclumon manage -repos resize 259200

node01 –> retention check successful
node02 –> retention check successful

New retention is 259200 and will use 4524595200 bytes of disk space
CRS-9115-Cluster Health Monitor repository size change completed on all nodes.

If we now check the size, we get an error as the repository is bigger than the max allowed size.

oclumon manage -get resize
CRS-9011-Error manage: Failed to initialize connection to the Cluster Logger Service

So we need to stop and start the ora.crf service to get everything working again. It should be OK to do this on a running system with no impact, but I’d start with your sandpit to test it. Don’t take my word for it!

Check for process:

node01:/u01/app/11g/grid/bin>ps -ef |grep crf
root     26983     1  0 10:44 ?        00:00:00 /u01/app/11g/grid/bin/ologgerd -m node02 -r -d /u01/app/11g/grid/crf/db/node01

Stop service:
node01:/u01/app/11g/grid/bin>crsctl stop res ora.crf -init

CRS-2673: Attempting to stop ‘ora.crf’ on ‘node01’
CRS-2677: Stop of ‘ora.crf’ on ‘node01’ succeeded

Start Service:
node01:/u01/app/11g/grid/bin>crsctl start res ora.crf -init
CRS-2672: Attempting to start ‘ora.crf’ on ‘node01’
CRS-2676: Start of ‘ora.crf’ on ‘node01’ succeeded

Check for Process:
node01:/u01/app/11g/grid/bin>ps -ef  |grep crf
root     28000     1  5 10:49 ?        00:00:00 /u01/app/11g/grid/bin/ologgerd -m node02 -r -d /u01/app/11g/grid/crf/db/node01

Check the size – as specified:
node01:/u01/app/11g/grid/bin>oclumon manage -get repsize

CHM Repository Size = 259200

Done

And the space is released and reclaimed.

node01:/u01/app/11g/grid/bin>df –h /u01

Filesystem                Size  Used Avail Use% Mounted on
/dev/sdc1                  48G  7.7G   38G  18% /u01

The space has been returned. Marvellous.
Now repeat the stop/start on each node.

 

UPDATE: From Oracle Support: Having very large bdb files (greater than 2GB) is likely due to a bug since the default size limits the bdb to 1GB unless the CHM data retention time is increased.  One such bug is 10165314.