Oracle SQL Monitor not monitoring my SQL

I needed to monitor a SQL statement in 11.2.0.3 (the limits mentioned below are the same in 12.1, 12.2, 18.4 and 19C) to determine what is was doing and why it was slow.sql_monitor

Usually I would use SQL Monitor [NOTE: You need to license the Oracle Tuning Pack to use SQL Monitor] for this but the SQL was not appearing in there, despite running for over 5 seconds, and being a parallel SQL (both of which qualify to be included in SQL Monitor). So I asked Twitter why, and thought I’d share the output here.

https://twitter.com/ChandlerDBA/status/1075692070952677376

It was nailed immediately by Jonathan Lewis, with added help from Ivica Arsov. (thank you!)

There is a hidden parameter “_sqlmon_max_planlines” which states that any SQL with a plan in excess of 300 lines should not be monitored (see below for SQLMon hidden parameters – and change them at your own risk, preferably with the backing of an SR from Oracle Support). This execution plan had well over 300 lines. The solution is to change either the session or the system to allow monitoring to happen when the plan is over 300 lines.

e.g.

alter system  set "_sqlmon_max_planlines"=500 scope=memory sid='*';
or
alter session set "_sqlmon_max_planlines"=500;

The negative side effect it that the monitoring will use more resources (primarily memory and CPU), which is why there are default limits on this feature. You might want to change it back when you’re finished to conserve resources.

Note that if you change the system parameter whilst the SQL is running, it will start to monitor the SQL at that point, so you will only get a partial picture of what is taking place, which is less valuable.

select ksppinm, ksppstvl, ksppdesc
  from sys.x$ksppi a, sys.x$ksppsv b
 where a.indx=b.indx
  and lower(ksppinm) like lower('%sqlmon%')
order by ksppinm;

KSPPINM                   KSPPSTVL  KSPPDESC
------------------------- --------- --------------------------------------------------------------------------------
_sqlmon_binds_xml_format  default   format of column binds_xml in [G]V$SQL_MONITOR
_sqlmon_max_plan          640       Maximum number of plans entry that can be monitored. Defaults to 20 per CPU
_sqlmon_max_planlines     300       Number of plan lines beyond which a plan cannot be monitored
_sqlmon_recycle_time      60        Minimum time (in s) to wait before a plan entry can be recycled
_sqlmon_threshold         5         CPU/IO time threshold before a statement is monitored. 0 is disabled

You may also notice a few other parameters in there. The “_sqlmon_recycle_time” hows the amount of time that the SQLMon plan will be guaranteed to be retained. Any retention time after that will be a bonus and depend upon the amount of SQL needing to be monitored. I see monitoring plans disappearing after 2-3 minutes in some systems, so you need to be quick, and you should save the plans down to disk.

save_sqlmon

The mad thing is that I was aware of this restriction before I posted by request for help on Twitter but I’d completely forgotten about it. So here’s the blog post to help me remember!

Advertisement

Grid Infrastructure Disk Space Problem – CHM DB file: crfclust.bdb

The Grid Infrastructure filesystem was reporting that it was a bit full today (release 11.2.0.4). This was tracked down to the “crfclust.bdb” file, which records information about the cluster health for monitoring purposes. It was 26GB. It’s not supposed to get bigger than 1GB so this is probably a bug, but let’s explicitly resolve the size issue right now and search Oracle support later. Worst case, bdb (Berkerley Database) files get regenerated when CHM (ora.crf) resource is restarted.  You only lose the (OS) statistics that CHM has gathered. Deleting bdb files does not have other impact.  CHM will start collecting the OS statistics again.

 

df –h /u01

Filesystem                Size  Used Avail Use% Mounted on
/dev/sdc1                  48G   36G  9.0G  81% /u01

pwd
/u01/app/11g/grid/crf/db/node01

ls -lh
total 29G

-rw-r–r– 1 root root 2.1M Jul 22 12:12 22-JUL-2014-12:12:03.txt
-rw-r–r– 1 root root 1.3M Apr 23 14:28 23-APR-2014-14:28:04.txt
-rw-r–r– 1 root root 1.2M Apr 23 14:33 23-APR-2014-14:33:34.txt
-rw-r–r– 1 root root 1.3M Jul 23 12:53 23-JUL-2014-12:53:02.txt
-rw-r–r– 1 root root 946K Apr 26 03:57 26-APR-2014-03:57:21.txt
-rw-r—– 1 root root 492M Aug 26 10:33 crfalert.bdb
-rw-r—– 1 root root  26G Aug 26 10:33 crfclust.bdb   <-26G!
-rw-r—– 1 root root 8.0K Jul 23 12:52 crfconn.bdb
-rw-r—– 1 root root 521M Aug 26 10:33 crfcpu.bdb
-rw-r—– 1 root root 513M Aug 26 10:33 crfhosts.bdb
-rw-r—– 1 root root 645M Aug 26 10:33 crfloclts.bdb
-rw-r—– 1 root root 418M Aug 26 10:33 crfts.bdb
-rw-r—– 1 root root  24K Aug  1 16:07 __db.001
-rw-r—– 1 root root 392K Aug 26 10:33 __db.002
-rw-r—– 1 root root 2.6M Aug 26 10:33 __db.003
-rw-r—– 1 root root 2.1M Aug 26 10:34 __db.004
-rw-r—– 1 root root 1.2M Aug 26 10:33 __db.005
-rw-r—– 1 root root  56K Aug 26 10:34 __db.006
-rw-r—– 1 root root  16M Aug 26 10:17 log.0000008759
-rw-r—– 1 root root  16M Aug 26 10:33 log.0000008760
-rw-r—– 1 root root 8.0K Aug 26 10:33 repdhosts.bdb
-rw-r–r– 1 root root 115M Jul 22 12:12 node01.ldb

Lets see how big the repository is…

oclumon manage -get repsize
CHM Repository Size = 1073736016

Wow.  Seems a bit oversized. Change the repository size to the desired number of seconds, between 3600 (1 hour) and 259200 (3 days)

oclumon manage -repos resize 259200

node01 –> retention check successful
node02 –> retention check successful

New retention is 259200 and will use 4524595200 bytes of disk space
CRS-9115-Cluster Health Monitor repository size change completed on all nodes.

If we now check the size, we get an error as the repository is bigger than the max allowed size.

oclumon manage -get resize
CRS-9011-Error manage: Failed to initialize connection to the Cluster Logger Service

So we need to stop and start the ora.crf service to get everything working again. It should be OK to do this on a running system with no impact, but I’d start with your sandpit to test it. Don’t take my word for it!

Check for process:

node01:/u01/app/11g/grid/bin>ps -ef |grep crf
root     26983     1  0 10:44 ?        00:00:00 /u01/app/11g/grid/bin/ologgerd -m node02 -r -d /u01/app/11g/grid/crf/db/node01

Stop service:
node01:/u01/app/11g/grid/bin>crsctl stop res ora.crf -init

CRS-2673: Attempting to stop ‘ora.crf’ on ‘node01’
CRS-2677: Stop of ‘ora.crf’ on ‘node01’ succeeded

Start Service:
node01:/u01/app/11g/grid/bin>crsctl start res ora.crf -init
CRS-2672: Attempting to start ‘ora.crf’ on ‘node01’
CRS-2676: Start of ‘ora.crf’ on ‘node01’ succeeded

Check for Process:
node01:/u01/app/11g/grid/bin>ps -ef  |grep crf
root     28000     1  5 10:49 ?        00:00:00 /u01/app/11g/grid/bin/ologgerd -m node02 -r -d /u01/app/11g/grid/crf/db/node01

Check the size – as specified:
node01:/u01/app/11g/grid/bin>oclumon manage -get repsize

CHM Repository Size = 259200

Done

And the space is released and reclaimed.

node01:/u01/app/11g/grid/bin>df –h /u01

Filesystem                Size  Used Avail Use% Mounted on
/dev/sdc1                  48G  7.7G   38G  18% /u01

The space has been returned. Marvellous.
Now repeat the stop/start on each node.

 

UPDATE: From Oracle Support: Having very large bdb files (greater than 2GB) is likely due to a bug since the default size limits the bdb to 1GB unless the CHM data retention time is increased.  One such bug is 10165314.