Oracle Statistics Gathering Timeout

January 2024 – Oracle 12C, 19C, 21C, 23C+

Gathering object statistic in Oracle is important. The optimizer needs metadata about the object, such as the amount of rows and number of distinct values in a column, to help it decide the optimum way to access your data. This takes effort, and takes place automatically in scheduled windows, both overnight or at the weekend. These windows have a limited duration, and if there’s a lot of data to analyse to get the requisite metadata, it may run out of time.

Photo of a large table by David Vives on Unsplash

Large tables, with billions of rows, tend to be partitioned. Oracle invented INCREMENTAL statistics gathering to minimise the amount of time which it needs to spend gathering statistics by ignoring partitions which have not changed.

If you have a large partitioned table, it is frequently a good idea to switch on INCREMENTAL statistics. However, the actual behaviours are not always obvious when doing this and the following message formed part of an online help request: “the manual gather(which is incremental stats) on those partition table runs longer as it tries to gather on all the partitions which are not even modified. How should we tackle this scenario“.

To switch on INCREMENTAL stats gathering:

EXEC dbms_stats.set_table_prefs('<schema>','<table>','INCREMENTAL','TRUE')

After the first stats gather, the gather job should only gather stats for partitions which have changed which will minimise the gather time. Global statistics are calculated from the statistics associated with each partition (not subpartition), in conjunction with some stored SYNOPSES to calculate global distinct column values associated with each partition.

Check the stats gathering regularly

Stats gathering fails silently. You need to check it yourself, regularly! To do this query DBA_AUTOTASK_JOB_HISTORY where the JOB_STATUS is “STOPPED”. If you see the JOB_INFO message “Stop job called because associated window was closed”, you should investigate as you ran out of time on the stats gathering. Usually the autotask will catch up at the weekend, when the default window is much longer than the 4 hour weekday window, but you need to understand if old stats are acceptable to your database (some will be, some may not). Importantly, the weekend window may fail too, and then you’re not getting new stats at all.

2 strategies for speeding up the stats gather are to gather stats in parallel, or to gather stats concurrently. Both of these will increase the CPU required by the database to gather the stats, so this needs to be balanced with the workload during the stats gather windows to ensure you are not negatively impacting the application.

Check out the options:

exec DBMS_STATS.SET_GLOBAL_PREFS('CONCURRENT','AUTOMATIC')
(gathers several tables at the same time using multiple jobs)

exec DBMS_STATS.SET_TABLE_PREFS('<schema>', '<table>', 'DEGREE', n)
(where n is the degree of parallel required, or DBMS_STATS.AUTO_DEGREE)


WARNING: If you are doing ANY PARALLEL processing you really should use Resource Manager to control it

Unexpected Gotchas

Here’s a few gotcha’s you should know if using INCREMENTAL stats gathering:

Problem
Partitions which are not labelled as “stale” might actually be stale. The modification of a single row in a partition will cause the stats to re-gather for that partition. You cannot rely on the ALL_TAB_STATISTICS.STALE_STATS status, which will still show as “NO” under these circumstances.

Solution :
tell the incremental stats gather to use the tables “stale percent” (default 10%, modifiable with table_pref ‘STALE_PERCENT’) before it sees a partition as stale, rather than *any* change:

exec dbms_stats.set_table_prefs
('<schema>','<table>','INCREMENTAL_STALENESS','USE_STALE_PERCENT')


Problem
if the stats on a partition are locked, and a single row in that partition is modified, INCREMENTAL is effectively (silently) disabled as Oracle can’t trust the stats for a global aggregation and you revert to normal “full table” stats gathering.

Solution
tell the stats gather to use stale stats on a locked partition (regardless of the amount of DML) as if it was not locked (and combine with the previous solution):

exec dbms_stats.set_table_prefs
('schema','table','INCREMENTAL_STALENESS','USE_STALE_PERCENT,USE_LOCKED_STATS';


Problem
a new histogram appears due to a SQL having a previously unused predicate (column in the WHERE clause) and the METHOD_OPT is default (for all columns size auto), Oracle may create a new histogram automatically.

Solution
check which predicates are candidates for a histogram (in SYS.COL_USAGE$) and explicitly control your histograms. This does mean that you don’t get new histograms automatically, which might not be your intention, but you will only get the histograms that you ask for.

exec dbms_stats.set_table_prefs('<schema>','<table>','METHOD_OPT',
  'FOR ALL COLUMNS SIZE 1
   FOR COLUMNS SIZE 256  colA,colQ
   FOR COLUMNS SIZE 2000 colB,colZ');


Problem
there are job(s) gathering stats, as well as the auto task, and the job is passing in parameters which are causing the issue (rather than relying on table or global “prefs” for consistency)

Solution
Consider ignoring any command line parameters for stats gather jobs. In this way, every stats gather will only use the preferences and will be consistent across each gather

dbms_stats.set_global_prefs(‘PREFERENCE_OVERRIDES_PARAMETER‘,’TRUE’);
or
dbms_stats.set_table_prefs(‘<schema>’,'<table>’,’PREFERENCE_OVERRIDES_PARAMETER‘,’TRUE’);


To validate what is going on, you can always (carefully!) trace the stats gather, either at a table level or globally using the UNDOCUMENTED preference “trace”. Be aware that you may get a lot of trace file output and a potential performance hit on the stats gather job too.

exec dbms_stats.set_global_prefs('trace',65532);

Finally, when you switch on INCREMENTAL statistics, Oracle needs to store additional SYNOPSES data to allow it to calculate the number of distinct values in a column more easily. This can take a lot of space in SYSAUX, although this has been significantly improved in Oracle 12.2 onwards with the synopses tables occupying around 3-5% of their previous large footprint. For more information about SYNOPSES, Oracle Support maintains a note with a query: 1953961.1 and for more information about INCREMENTAL stats in general, the Oracle Optimizer PM Nigel Bayliss has some excellent blog posts from 2020 here

Oracle 19C New Feature Availability

trump-exasperatedSince Oracle Open World 2018, Oracle have been trumpetting a few cool new features in the Oracle 19C database, the headline two for administrators being “Automatic Indexing” and “Real-Time Statistics“.

With the release of Oracle 19.2 on Exadata (on-premises – not yet on Cloud!) this week, we also got the documentation released which allow us to answer a very important question: on which platforms will we be able to use these 2 (and other) new awesome features. We may not like the answers – Oracle has decreed that we cannot have them on-premises for SE2, Enterprise Edition or on an ODA.

This is confined purely to Exadata on-prem and the cloud offerings. That’s it. There are no technical reasons why this should be the case – it’s just code – so the restrictions can only be marketing-based (like Hybrid Columnar Compression).

I’m disappointed, but it does mean that the over 80% of Oracle database clients still not living in the cloud should continue to need that aspect of my services!

Oracle Factoring

OracleLicensingFactoring

 

Oracle SQL Monitor not monitoring my SQL

I needed to monitor a SQL statement in 11.2.0.3 (the limits mentioned below are the same in 12.1, 12.2, 18.4 and 19C) to determine what is was doing and why it was slow.sql_monitor

Usually I would use SQL Monitor [NOTE: You need to license the Oracle Tuning Pack to use SQL Monitor] for this but the SQL was not appearing in there, despite running for over 5 seconds, and being a parallel SQL (both of which qualify to be included in SQL Monitor). So I asked Twitter why, and thought I’d share the output here.

https://twitter.com/ChandlerDBA/status/1075692070952677376

It was nailed immediately by Jonathan Lewis, with added help from Ivica Arsov. (thank you!)

There is a hidden parameter “_sqlmon_max_planlines” which states that any SQL with a plan in excess of 300 lines should not be monitored (see below for SQLMon hidden parameters – and change them at your own risk, preferably with the backing of an SR from Oracle Support). This execution plan had well over 300 lines. The solution is to change either the session or the system to allow monitoring to happen when the plan is over 300 lines.

e.g.

alter system  set "_sqlmon_max_planlines"=500 scope=memory sid='*';
or
alter session set "_sqlmon_max_planlines"=500;

The negative side effect it that the monitoring will use more resources (primarily memory and CPU), which is why there are default limits on this feature. You might want to change it back when you’re finished to conserve resources.

Note that if you change the system parameter whilst the SQL is running, it will start to monitor the SQL at that point, so you will only get a partial picture of what is taking place, which is less valuable.

select ksppinm, ksppstvl, ksppdesc
  from sys.x$ksppi a, sys.x$ksppsv b
 where a.indx=b.indx
  and lower(ksppinm) like lower('%sqlmon%')
order by ksppinm;

KSPPINM                   KSPPSTVL  KSPPDESC
------------------------- --------- --------------------------------------------------------------------------------
_sqlmon_binds_xml_format  default   format of column binds_xml in [G]V$SQL_MONITOR
_sqlmon_max_plan          640       Maximum number of plans entry that can be monitored. Defaults to 20 per CPU
_sqlmon_max_planlines     300       Number of plan lines beyond which a plan cannot be monitored
_sqlmon_recycle_time      60        Minimum time (in s) to wait before a plan entry can be recycled
_sqlmon_threshold         5         CPU/IO time threshold before a statement is monitored. 0 is disabled

You may also notice a few other parameters in there. The “_sqlmon_recycle_time” hows the amount of time that the SQLMon plan will be guaranteed to be retained. Any retention time after that will be a bonus and depend upon the amount of SQL needing to be monitored. I see monitoring plans disappearing after 2-3 minutes in some systems, so you need to be quick, and you should save the plans down to disk.

save_sqlmon

The mad thing is that I was aware of this restriction before I posted by request for help on Twitter but I’d completely forgotten about it. So here’s the blog post to help me remember!

Online Index Rebuild Problem in 12C/18C/19C

When building an index in Oracle Enterprise edition using the keyword “ONLINE”, if the index build fails for whatever reason a messy set of extents can get left behind.

In Oracle 12.1 [update still there in 12.2 and 18C and 19.6] there’s a job (located in DBA_SCHEDULER_JOBS) to help with this: “SYS”.”CLEANUP_ONLINE_IND_BUILD” which ticks away, undocumented, in the background cleaning up “rogue” extents once an hour, every hour (start time based upon job creation time)However, if I run a CREATE INDEX … ONLINE command on a table and the index creation happens to coincide with the invocation of the cleanup job I get the following error in my alert log

ADR Home = /u01/app/oracle/diag/rdbms/orcl/orcl1:
*************************************************************************
2017-05-12 05:28:37.191123 -04:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j000_12235.trc:
ORA-12012: error on auto execute of job "SYS"."CLEANUP_ONLINE_IND_BUILD"
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
ORA-06512: at "SYS.DBMS_PDB", line 76
ORA-06512: at line 4

Oh no! The job has failed to clean up the extents that I was currently using. Phew!

The CREATE INDEX … ONLINE command completed with success a few minutes later.

A few minutes after that, an update statement failed with:

ORA-08102 index key not found for obj# 107102 file 16 block 1171234

Yes, I have a corrupt index! index_fadeMy assumption is that the cleanup job had removed all of the extents which had been allocated before the cleanup job started. The index create statement did not notice as the current extent removal attempt blocked (hence the job error) and future to-be created extents were not affected. However, most of the index had been “cleaned up”. #sadface

Solution: re-create the index and make sure the index creation is complete before your hourly job kicks in. Better still, disable the undocumented hourly job… make sure get the full backing of an Oracle Support SR first!

Next Step: Raise SR with Oracle about this!

Version Check 12/11/2018:

select banner_full from v$version

BANNER_FULL
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Oracle Database 18c EE High Perf Release 18.0.0.0.0 - Production
Version 18.1.0.0.0


21:17:19 SYS @ ORCL > select owner,job_name,start_date,repeat_interval,state from dba_scheduler_jobs where job_name = 'CLEANUP_ONLINE_IND_BUILD';

OWNER JOB_NAME START_DATE REPEAT_INTERVAL STATE
------------------------------ ------------------------------ ----------------------------------- ------------------------------ ---------------
SYS CLEANUP_ONLINE_IND_BUILD 07-FEB-18 08.13.44.415343 PM +00:00 FREQ = HOURLY; INTERVAL = 1 SCHEDULED

 

 

MOS Solution Note 2280374.1 is not helpful:

Solution

Rebuild the index at a time other than the time frame when “SYS”.”CLEANUP_ONLINE_IND_BUILD” is running as auto job. #facepalm #whatifittakesover1hourtobuild