When configuring a physical standby database for Oracle using Data Guard, you need to create Standby Redo logs to allow the redo to be applied in (near) real time to the Standby. Without standby redo logs, Oracle will wait for an entire Archive Log to be filled and copied across to the standby before it will apply changes, which could take quite a while.
Which leads me to the problem I encountered a while ago, and due to being forgetful, still encounter today when creating standby redo logs in Oracle 12C and 18C.
I have a Primary database with 3 Redo logs, and I have created a Physical Standby database with 3 Redo logs and then 4 Standby Redo logs created as follows:
alter database add standby logfile group 4 '/u01/.../redo_standby04.dbf' size 200M;
alter database add standby logfile group 5 '/u01/.../redo_standby05.dbf' size 200M;
alter database add standby logfile group 6 '/u01/.../redo_standby06.dbf' size 200M;
alter database add standby logfile group 7 '/u01/.../redo_standby07.dbf' size 200M;
And this is what they look like:
SQL> select group#,thread#,sequence#,bytes/1024/1024 MB,status from v$log order by group#;
GROUP# THREAD# SEQUENCE# MB STATUS
---------- ---------- ---------- ---------- ----------------
1 1 0 200 UNUSED
2 1 0 200 UNUSED
3 1 0 200 UNUSED
SQL> select group#,thread#,sequence#,bytes/1024/1024 MB,status from v$standby_log order by group#;
GROUP# THREAD# SEQUENCE# MB STATUS
---------- ---------- ---------- ---------- ----------
4 1 111 200 ACTIVE
5 0 0 200 UNASSIGNED
6 0 0 200 UNASSIGNED
7 0 0 200 UNASSIGNED
SQL> select * from v$logfile order by group#;
GROUP# STATUS TYPE MEMBER
---------- ------- ------- --------------------------------------------------------------------------------
1 ONLINE /u01/.../redo01.dbf
2 ONLINE /u01/.../redo02.dbf
3 ONLINE /u01/.../redo03.dbf
4 STANDBY /u01/.../redo_standby01.dbf
5 STANDBY /u01/.../redo_standby02.dbf
6 STANDBY /u01/.../redo_standby03.dbf
7 STANDBY /u01/.../redo_standby04.dbf
So what’s the problem?
Well, Dataguard works as expected for a while – applying change immediately – then it lags for ages and you get an OEM alert that it’s lagging. Then it works fine for a while again and then another lag.
The problem is that when I created the Standby Redo log files, I did not specify the Thread. But why do I need to specify the thread? This is not RAC. It does not have multiple threads for redo – one for each node. We have Thread 1 and that’s it.
When I used to do this in Oracle 11.2, the Standby Redo logs got Thread 1 by default. Always Thread 1.
From Oracle 12.1, and also in 12.2.0.1, 18C (12.2.0.2) and probably 19C (12.2.0.3), Oracle creates the Standby Redo logs with Thread 0 by default, although sometimes it creates them on Thread 1 (although this seems to be only in 12.1 but let me know if you know better). I don’t know why.
Standby Redo log MUST belong to the SAME Thread as the Online Redo logs for them to be used. Having (in this case) 1 x redo on Thread 1 and 3 x Redo on Thread 0, I only have 1 x Standby Redo usable for my online Redo.
The solution is stop the Apply, DROP the Standby Redo on Thread 0, and create new Standby Redo Logs explicity stating Thread 1, and restart the Apply.
dgmgrl> edit database orcl12dr set state='APPLY-OFF';
sql> alter database drop standby logfile group 4 '/u01/.../redo_standby04.dbf';
sql> alter database add standby logfile thread 1 group 4 '/u01/.../redo_standby04.dbf' size 200M;
dgmgrl> edit database orcl12dr set state='APPLY-ON';
And all is good with the world.
It is worth saying that MOS note 1595034.1 gets the corrective advice EXACTLY WRONG, with the incorrect snippet reproduced here:
"The standby redo logs can't be used, since they are assigned to thread 0.
Standby RedoLogs should not be assigned to a specific Thread.
Please use fallowing statements:
...
SQL> ALTER DATABASE DROP STANDBY LOGFILE GROUP <Group#>;
and create new SRLs with
SQL> ALTER DATABASE ADD STANDBY LOGFILE GROUP <Group#> SIZE ;"
No MOS analyst! Standby Redo Logs MUST be assigned to a specific thread. Thread 0 Standby Logs are ignored!
And don’t forget, if you are using RAC you need to have Standby Redo Logs for EVERY thread accessible to EVERY instance, just like the Online Redo logs.
If you have single instance, ensure you specify Thread 1, and ensure you have 1 more Standby Redo log available than Online Redo log.
As a final note on Data Guard, from Oracle 12.1 you really should be using the Data Guard Broker to configure and manage Data Guard. It’s simple and it’s much better and safer than configuring Data Guard manually and has some nice validation and monitoring features too.
Addendum: For clarity, not specifying a Thread means that Oracle is supposed to auto-adjust the thread to match what thread is needed, using the Thread 0 as a pool so you presumably need less redo defined in RAC environment, but it just doesn’t work very well (or not at all) leading to the fast/slow/fast/slow lag symptoms. It’s also a pretty pointless saving as redo logs are small relative to the database size. If you have a LOT of nodes in the cluster meaning a lot of standby redo logs, and they aren’t all committing redo heavily (when they will all need all of their standby redo logs anyway) why do you have so many nodes? And validating the setup will report that you don’t have enough standby redo logs. Looks like a solution to something that isn’t a problem for anyone.
So this is an undocumented problem-introducing enhancement in Oracle 12 solving no practical real-world problem.
I’ll let you know if it is fixed in Oracle 19C.
Here’s some real-world output showing the apply lag with Thread 0 SRL’s, and how it disappears when Thread 1 SRL’s are explicitly created. Please note the units and date. I can’t provide the alert log output in this blog to prove my annotations.
SQL> select * from v$standby_event_histogram order by LAST_TIME_UPDATED;
NAME TIME UNIT COUNT LAST_TIME_UPDATED CON_ID
-------------------- ---------- ---------------- ---------- -------------------- ----------
apply lag 43 seconds 3 01/03/2019 17:22:38 0
apply lag 9 minutes 58 01/03/2019 17:30:54 0
apply lag 11 minutes 60 01/03/2019 17:32:55 0
apply lag 12 minutes 59 01/03/2019 17:33:54 0
apply lag 13 minutes 57 01/03/2019 17:34:55 0
apply lag 14 minutes 60 01/03/2019 17:35:55 0
apply lag 15 minutes 47 01/03/2019 17:36:42 0
apply lag 10 minutes 61 01/03/2019 17:36:44 0
-- First SRL Thread 1 created : 2019-01-03T17:39:43.890329+00:00
apply lag 27 seconds 9 01/03/2019 17:45:39 0
apply lag 30 seconds 5 01/03/2019 17:45:42 0
apply lag 31 seconds 7 01/03/2019 17:45:44 0
apply lag 32 seconds 6 01/03/2019 17:45:45 0
apply lag 33 seconds 10 01/03/2019 17:45:46 0
apply lag 34 seconds 8 01/03/2019 17:45:47 0
apply lag 35 seconds 9 01/03/2019 17:45:48 0
apply lag 36 seconds 7 01/03/2019 17:45:49 0
apply lag 37 seconds 5 01/03/2019 17:45:50 0
apply lag 38 seconds 8 01/03/2019 17:45:51 0
apply lag 39 seconds 7 01/03/2019 17:45:52 0
apply lag 40 seconds 3 01/03/2019 17:45:53 0
apply lag 41 seconds 3 01/03/2019 17:45:54 0
apply lag 42 seconds 4 01/03/2019 17:45:55 0
apply lag 44 seconds 4 01/03/2019 17:45:56 0
apply lag 45 seconds 7 01/03/2019 17:45:57 0
apply lag 46 seconds 5 01/03/2019 17:45:58 0
apply lag 47 seconds 7 01/03/2019 17:45:59 0
apply lag 48 seconds 4 01/03/2019 17:46:00 0
apply lag 49 seconds 8 01/03/2019 17:46:01 0
apply lag 50 seconds 5 01/03/2019 17:46:02 0
apply lag 51 seconds 6 01/03/2019 17:46:03 0
apply lag 52 seconds 8 01/03/2019 17:46:04 0
apply lag 53 seconds 7 01/03/2019 17:46:05 0
apply lag 54 seconds 6 01/03/2019 17:46:06 0
apply lag 55 seconds 10 01/03/2019 17:46:07 0
apply lag 56 seconds 11 01/03/2019 17:46:09 0
apply lag 57 seconds 8 01/03/2019 17:46:10 0
apply lag 58 seconds 6 01/03/2019 17:46:11 0
apply lag 59 seconds 7 01/03/2019 17:46:12 0
-- Final SRL Thread 1 created 2019-01-03T17:51:58.694308+00:00
apply lag 8 minutes 103 01/03/2019 17:52:55 0
apply lag 7 minutes 120 01/03/2019 17:52:56 0
apply lag 6 minutes 130 01/03/2019 17:52:58 0
apply lag 5 minutes 180 01/03/2019 17:53:00 0
apply lag 4 minutes 181 01/03/2019 17:53:03 0
apply lag 3 minutes 183 01/03/2019 17:53:06 0
apply lag 2 minutes 260 01/03/2019 17:53:11 0
-- All SRL Thread 0's deleted, then ALTER SYSTEM ARCHIVE LOG issued. 2019-01-03T18:19:33.757994+00:00
-- No more APPLY lag for 3 days (any lag reported below -of a few seconds- is due to other reasons)
apply lag 23 seconds 10 01/06/2019 18:14:59 0
apply lag 29 seconds 8 01/06/2019 18:15:03 0
apply lag 26 seconds 10 01/06/2019 18:15:09 0
apply lag 28 seconds 10 01/06/2019 18:15:11 0
apply lag 24 seconds 12 01/06/2019 18:15:12 0
apply lag 25 seconds 9 01/06/2019 18:15:14 0
apply lag 16 seconds 33 01/07/2019 11:02:37 0
apply lag 20 seconds 17 01/09/2019 11:01:37 0
apply lag 21 seconds 29 01/09/2019 11:01:40 0
apply lag 22 seconds 19 01/09/2019 11:01:42 0
apply lag 14 seconds 50 01/10/2019 00:08:19 0
apply lag 15 seconds 44 01/10/2019 00:08:25 0
apply lag 17 seconds 17 01/10/2019 00:08:27 0
apply lag 18 seconds 23 01/10/2019 00:08:28 0
apply lag 19 seconds 21 01/10/2019 00:08:30 0
apply lag 13 seconds 73 01/10/2019 00:30:51 0
apply lag 9 seconds 170 01/10/2019 00:33:11 0
apply lag 6 seconds 426 01/10/2019 01:34:29 0
apply lag 4 seconds 1304 01/10/2019 11:00:42 0
apply lag 7 seconds 294 01/10/2019 11:00:54 0
apply lag 8 seconds 222 01/10/2019 11:00:55 0
apply lag 10 seconds 117 01/10/2019 11:00:56 0
apply lag 11 seconds 109 01/10/2019 11:00:57 0
apply lag 12 seconds 90 01/10/2019 11:00:58 0
apply lag 5 seconds 726 01/10/2019 11:01:38 0
apply lag 3 seconds 2911 01/10/2019 11:12:46 0
apply lag 2 seconds 7866 01/10/2019 11:56:28 0
apply lag 1 seconds 58544 01/10/2019 11:59:15 0
apply lag 0 seconds 758505 01/10/2019 12:00:07 0