The 10046 trace. Largely useless, isn’t it?

The other night I was sat in the pub with some like-minded individuals discussing the relative merits of the 10046 trace (we Rock! in the pub, dudes!) and somebody asked me how often I has actually used it in anger? A well-respected DBA / Architect maintained it was a pretty useless and difficult option to use, given the topology of modern applications (e.g. How do you find the correct session with all that connection pooling going on from multiple web servers.)

My answer surprised me – I thought back to one client where I spent 90% of my time performance tuning a large (TiB-ish) OLTP/Batch hybrid system and concluded that I had ran a 10046 against production about once a year. Once. So if the 10046 is the holy grail of plan information, why wasn’t I using it that much. And why did I never use a 10053 against Production there?

The answer for me is a little more complex than that given in the pub:

1. as stated above, it’s hard to catch the in-flight session unless the application is instrumented to inject the trace statement when needed (and how many applications are instrumented to help you discover problems? Screen ST03 in SAP is very helpful. Any others in major ERP’s? Thought not.)

2. In many places that I have worked, getting authorisation to make any a change to a 24×7 mission-critical system is highly bureaucratic, involving cast-iron justification for the change and it’s positive benefits, requirement that there will be no adverse effects because of the change, very senior sign-off, more red-tape, etc. This causes a significant amount of work simply to put a trace on, even if you can catch the SQL. This can end up being more work than actually fixing the problem.

3. An awful lot of SQL tuning is a fairly blunt affair, as the developer (who is frequently database-blind) has usually missed something obvious. It is frequently to do with incorrectly using or not using an index (or using a poor index), or lack of filtering data at the right point to minimise the I/O.

4. Most importantly, if you have AWR and ASH, it’s not really needed. For each plan created by the optimizer the database stores the bind variables along with it, so we can usually understand why the optimizer makes the decisions it makes. ASH contains the main event waits. Why bother trying to capture all of the detail in a trace when you really don’t need that much detail, and it’s all already there; ready to be extracted from the relevant tables (e.g. dba_hist_active_sess_history, dba_hist_sql_plan and dba_hist_sql_bind.)

I have never used a 10053 trace on a Production system. I have simply never needed to know the decisions taken by the optimizer in that much detail. Like most DBA’s and Oracle consultants, I don’t go from site-to-site on a weekly basis resolving edge-case problems that the incumbent DBA’s haven’t had the time, or possibly don’t have the skills, to resolve themselves. I usually don’t need that level of confirmation that I’m right about why the plan is wrong, and I don’t have the time to conclusively prove it over and over again – I just need to get the fix into place and move onto the next problem.

That said, perhaps I should get fully to the bottom of these problems to ensure that they never occur again – which is the fundamental problem with Adaptive Cursor Sharing.

So how big is that disk?

I’m doing a fair bit of SAN-based work at the moment, migrating a bunch of Oracle databases from EMC to HP (posts to follow regarding Orion testing and other related topics). One thing that annoys me is the way SAN Manufacturers have changed the meaning of technical terms over time to suit themselves. Firstly, they changed the meaning of RAID from “Redundant Array of Inexpensive Disk” to “Redundant Array of Independent Disk”. Apparently it’s hard to put a $1,000,000 price tag on “inexpensive” disk.

Also, the industry changed what a Megabyte is. This really annoys me as it can lead to unexpected shortfalls in the LUN allocations if your traditional Oracle Megabyte is different to the SAN Admin’s Megabyte.

In our industry, the terms “kilo”, “mega”, “giga“, “tera”, “peta”, and “exa” are commonly used prefixes for computing performance and capacity. SAN manufacturers use the terms defined in “powers of ten” measurement units:

  • A kilobyte (KB) is equal to 1,000 (103) bytes.
  • A megabyte (MB) is equal to 1,000,000 (106) bytes.
  • A gigabyte (GB) is equal to 1,000,000,000 (109) bytes.
  • A terabyte (TB) is equal to 1,000,000,000,000 (1012) bytes.
  • A petabyte (PB) is equal to 1,000,000,000,000,000 (1015) bytes
  • An exabyte (EB) is equal to 1,000,000,000,000,000,000 (1018) bytes

Most operating systems components and (importantly for us) Oracle use “powers of two” measurement units rather than “power of ten” units. They are defined as:

  • A kibibyte (KiB) is equal to 1,024 (210) bytes.
  • A mebibyte (MiB) is equal to 1,048,576 (220) bytes.
  • A gibibyte (GiB) is equal to 1,073,741,824 (230) bytes.
  • A tebibyte (TiB) is equal to 1,099,511,627,776 (240) bytes.
  • A pebibyte (PiB) is equal to 1,125,899,906,842,624 (250) bytes.
  • An exbibyte (EiB) is equal to 1,152,921,504,606,846,976 (260) bytes.

That means that 1 terabyte is about 9% smaller than a tebibyte. Thanks, SAN manufacturers, for making our lives just a tiny bit more difficult than it needed to be so you can market drives as being seemingly larger than they really are (in my head anyway.)