Exadata System Statistics
11/02/2021 1 Comment
Following on from last weeks Oracle Optimizer System Statistics post, I though it worthwhile adding a note about gathering system statistics in Exadata mode.
exec dbms_stats.gather_system_statistics('EXADATA');
So what is this, and why would you chose to do it? First of all, these are not workload system statistics (which I believe you should never gather without extraordinary proof of their effectiveness – all Oracle official documenation and MOS notes about workload stats should point to this blog post now, explaining why: https://blogs.oracle.com/optimizer/should-you-gather-system-statistics). Workload stats measure a number of I/O and CPU metrics. Exadata stats are a little bit of measurement (IO Transfer Speed – IOTFRSPEED) and the hard-setting of the Multi-block read count (MBRC). How are these set?
The IOTFRSPEED is the rate at which an Oracle database can read data in a single read request. It is measured and set (subject to bug 17501565.8 not getting in the way – Pre 12.1- and leaving it to default). The default is 4096, but after measuring and setting it will increase significantly. For the worked example below, lets say the speed was measures at 200,000 (about 200MB, which is a good number for an exadata)
The MBRC is set by copying in the db_file_multiblock_read_count in from the initialization parameters. By default this is not set explicitly and relies upon 2 hidden parameters. If your block size is 8k, it will probably be 128, and if your block size is 16k it will probably be 64. If your block size is any other size, I hope you have a good tested and proven reason for your edge-case configuration.
Each of these setting will change the balance the optimizer will use between single block reads (to read index blocks, and to read table rows by rowid), and performing full table scans. Higher MBRC and IOTFRSPEED’s mean table scans become more attractive to the optimizer. This means they will occur more frequently. What does an Exadata do very well? Offload table scans to storage cells.
DEFAULT, 8K Block Size MBRC=8 (db_file_multblock_read is not set, so use _db_file_optimizer_read_count) IOTFRSPEED=4096 Calculated Ratio 0.271, or 3.69 multiblocks for every single block.
DEFAULT, 16K Block Size MBRC=8 (db_file_multblock_read is not set, so use _db_file_optimizer_read_count) IOTFRSPEED=4096 Calculated Ratio 0.375, or 2.67 multiblocks for every single block. 16k block sizes mean indexes are slightly more likely to be used as we get less blocks per scan.
DEFAULT, 8K Block Size, db_file_multblock_read is set explicitly MBRC=128 (db_file_multblock_read is explicitly set, so use explicit 128) IOTFRSPEED=4096 Calculated Ratio 0.173, or 5.77 multiblocks for every single block. Tablescans are a little more attractive than default. You get 5.77 blocks for every multiblock read instead of 3.69.
EXADATA GATHERED, 8K Block Size, db_file_multblock_read NOT set explicitly MBRC=128 (db_file_multblock_read is not set, so set it to the value of _db_file_exec_read_count) IOTFRSPEED=200000 - this was measured and set by the 'exadata' gather. Calculated Ratio 0.012, or 84.32 multiblocks for every single block. For an index to be used it is going to have to be much more efficient than table scan. Pulling a few rows out of unique indexes will continue to work well but any kind of predicate which returns a set of data will be pushed towards a FTS (or IFFS), and therefore the exadata will offload to the storage cells.
So how does this affect optimizer costing? Looking at a 10053 trace to see how the cost is changed:
I/O COST WITH DEFAULTS: 270,835
[10053] SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for COST_CHECK[COST_CHECK]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
Table: COST_CHECK Alias: COST_CHECK
Card: Original: 1000000.000000 Rounded: 1000000 Computed: 1000000.000000 Non Adjusted:1000000.000000
Scan IO Cost (Disk) = 270835.000000
...
I/O COST EXADATA GATHERED AS ABOVE: 11,860
[10053] SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for COST_CHECK[COST_CHECK]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
Table: COST_CHECK Alias: COST_CHECK
Card: Original: 1000000.000000 Rounded: 1000000 Computed: 1000000.000000 Non Adjusted:1000000.000000
Scan IO Cost (Disk) = 11860.000000
...
So, the I/O cost with “Exadata” system stats us reduces from 270,835 to 11,860. That’s quite a reduction.
Please bear in mind that this is only the Optimizer Cost calculation of a table scan, and not the reality of accessing the data. This is used to decide the access path.
By gathering ‘Exadata’ stats, you need to ensure you have the capacity on your exadata to offload the workload increase on the storage cells. For a classic “overnight load, daytime query, dedicated Exadata”, this is probably a good thing. Where the storage cells are already being worked hard, this will make things worse.
If you followed the advice of the Exadata salesman and marketing people and already removed your indexes, you should already be employing a performance expert to put (some of) the indexes back on…
How did I calculate the ratio’s above? Check the Oracle Optimizer System Statistics blog post from last week….