ORA-04021 timeout occurred while waiting to lock object during stats gather

I recently came across an interesting change in behaviour when gathering stats. Patch 32781163 (Deadlock on library cache lock on MV refresh and dbms_stats gathering the same object) was applied to an Oracle 19C system, and as well as curing the deadlock problem the client was experiencing it also changed one of the Global Stats Preferences.

Before patch application:

SELECT dbms_stats.get_param('WAIT_TIME_TO_UPDATE_STATS') as WAIT_TIME_TO_UPDATE_STATS FROM dual;

WAIT_TIME_TO_UPDATE_STATS
-------------------------
15

[apply patch]

SELECT dbms_stats.get_param('WAIT_TIME_TO_UPDATE_STATS') as WAIT_TIME_TO_UPDATE_STATS FROM dual;

WAIT_TIME_TO_UPDATE_STATS
-------------------------
0.0033

This value for this preference is specified in minutes (weird. why minutes?), so the patch reduces the default wait time for lock during a stats gather from 15 minutes to 0.0033 minutes (or 0.198 seconds)

On a heavily loaded system, where you’re gathering stats whilst processing is taking place, you’re going to hit the timeout and receive an ORA-04021 instead of the Deadlock. An improvement but not ideal. I would not normally blog about this sort of change but it’s a change to a default and…

If you have a heavily utilised system, chances are you’re going to hit this error.

To resolve the problem, I changed the timeout from 0.0033 minutes to 1 minute and everything started working again.

exec dbms_stats.set_global_prefs('WAIT_TIME_TO_UPDATE_STATS','1');

Your setting will depend upon how your system works, but changing the default from 15 to 0.0033 minutes is too low for the default in my opinion.

Hopefully Oracle will raise this to something more sensible in the future. I’m not a fan of changing defaults but there are cases when it should happen, like changing Global Stats Prefs for TABLE_CACHED_BLOCKS from 1 to 16 to make the index clustering factor more realistic. This would make the costing of table access from indexes a more sensible. I’ve been asking for this for years but they never change it.

Come on Oracle – you can make a change to the defaults, as you’ve just demonstrated in a PATCH! How about the one I’ve been asking for?

SELECT dbms_stats.get_param('TABLE_CACHED_BLOCKS') from dual;

DBMS_STATS.GET_PARAM('TABLE_CACHED_BLOCKS')
-------------------------------------------
1

exec dbms_stats.set_global_prefs('TABLE_CACHED_BLOCKS','16');

It has been indicated to me that the ORA-04021 error does not appear as readily in 23ai as it does in 19C + patch (all of my testing has been on 19C). I have not had the time to investigate yet, but this suggests a different code path in the optimizer, changing the behaviour of the wait. I’ll add some more here if I find out how the behaviour has changed between releases – did someone say it’s time to trace?

The timeout is not the entirety of the mechanism involved. The mechanism is a loop involving a retry/sleep/wait on lock – so wait time is only part of the equation and the overall wait time longer than the timeout specified. This is not documented, so an enhancement to the documentation explaining how the wait time factors into the overall mechanism would be useful if someone wants to comment!

Please bear in mind that this is just a timeout setting. It will not fix any underlying issues that you may have, unless the underlying issue is only that you have a very busy system. For example there are bugs which may cause excessive lock durations which may have an ORA-4021 as a symptom (e.g. 37668732 – ORA-44023 Errors in ADG can show up as 4021’s). Changing the timeout may simply give you longer to wait before it still breaks.

So, after more testing and speaking to Oracle (thank you for your insights, Nigel), the formula for the wait time would appear to be something like:

loop 100 times
attempt to get lock and wait for WAIT_TIME_TO_UPDATE_STATS
sleep**
increase sleep duration
end loop


**the sleep starts small and increases gradually through each retry, totalling 1,000 seconds by the final retry.

Therefore the total formula is something like:
(WAIT_TIME_TO_UPDATE_STATS * 60for seconds * 100) + 1000

The DEFAULT for WAIT_TIME_TO_UPDATE_STATS gives:
(0.0033 * 60 * 100) + 1000 = 1020 seconds, or 17 minutes

Increasing WAIT_TIME_TO_UPDATE_STATS to 1 minute gives
(1 * 60 * 100) + 1000 = 7000 seconds, or 1 hour 56 minutes.

You cannot reduce the overall WAIT_TIME_TO_UPDATE_STATS below about 17 minutes.

Worth bearing in mind that whilst the code is sleeping, you could release the lock and another process acquire it. One a very busy system with a lot of slow hard parsing (this is a library cache lock, not a table lock), you could have a long wait

I have requested that Oracle updates the documentation to reflect this formula.

[note: 19C and 23ai formulas seem to have some differences, but given it works completely differently to how I originally thought, I’m not stressing over them]

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.