Data Guard Unexpected Lag

facepalmWhen configuring a physical standby database for Oracle using Data Guard, you need to create Standby Redo logs to allow the redo to be applied in (near) real time to the Standby. Without standby redo logs, Oracle will wait for an entire Archive Log to be filled and copied across to the standby before it will apply changes, which could take quite a while.

Which leads me to the problem I encountered a while ago, and due to being forgetful, still encounter today when creating standby redo logs in Oracle 12C and 18C.

I have a Primary database with 3 Redo logs, and I have created a Physical Standby database with 3 Redo logs and then 4 Standby Redo logs created as follows:

alter database add standby logfile group 4 '/u01/.../redo_standby04.dbf' size 200M;
alter database add standby logfile group 5 '/u01/.../redo_standby05.dbf' size 200M;
alter database add standby logfile group 6 '/u01/.../redo_standby06.dbf' size 200M;
alter database add standby logfile group 7 '/u01/.../redo_standby07.dbf' size 200M;

And this is what they look like:

SQL> select group#,thread#,sequence#,bytes/1024/1024 MB,status from v$log order by group#;
GROUP#     THREAD#    SEQUENCE#  MB         STATUS
---------- ---------- ---------- ---------- ----------------
1                   1          0        200 UNUSED
2                   1          0        200 UNUSED
3                   1          0        200 UNUSED

SQL> select group#,thread#,sequence#,bytes/1024/1024 MB,status from v$standby_log order by group#;
GROUP#     THREAD#    SEQUENCE#  MB         STATUS
---------- ---------- ---------- ---------- ----------
4                   1        111        200 ACTIVE
5                   0          0        200 UNASSIGNED
6                   0          0        200 UNASSIGNED
7                   0          0        200 UNASSIGNED

SQL> select * from v$logfile order by group#;
GROUP#     STATUS  TYPE    MEMBER 
---------- ------- ------- --------------------------------------------------------------------------------
         1         ONLINE  /u01/.../redo01.dbf
         2         ONLINE  /u01/.../redo02.dbf
         3         ONLINE  /u01/.../redo03.dbf
         4         STANDBY /u01/.../redo_standby01.dbf
         5         STANDBY /u01/.../redo_standby02.dbf
         6         STANDBY /u01/.../redo_standby03.dbf
         7         STANDBY /u01/.../redo_standby04.dbf

So what’s the problem?

Well, Dataguard works as expected for a while – applying change immediately – then it lags for ages and you get an OEM alert that it’s lagging. Then it works fine for a while again and then another lag.

The problem is that when I created the Standby Redo log files, I did not specify the Thread. But why do I need to specify the thread? This is not RAC. It does not have multiple threads for redo – one for each node. We have Thread 1 and that’s it.

When I used to do this in Oracle 11.2, the Standby Redo logs got Thread 1 by default. Always Thread 1.

From Oracle 12.1, and also in 12.2.0.1, 18C (12.2.0.2) and probably 19C (12.2.0.3), Oracle creates the Standby Redo logs with Thread 0 by default, although sometimes it creates them on Thread 1 (although this seems to be only in 12.1 but let me know if you know better). I don’t know why.

Standby Redo log MUST belong to the SAME Thread as the Online Redo logs for them to be used. Having (in this case) 1 x redo on Thread 1 and 3 x Redo on Thread 0, I only have 1 x Standby Redo usable for my online Redo.

The solution is stop the Apply, DROP the Standby Redo on Thread 0, and create new Standby Redo Logs explicity stating Thread 1, and restart the Apply.

dgmgrl> edit database orcl12dr set state='APPLY-OFF';

sql> alter database drop standby logfile          group 4 '/u01/.../redo_standby04.dbf';
sql> alter database add  standby logfile thread 1 group 4 '/u01/.../redo_standby04.dbf' size 200M;

dgmgrl> edit database orcl12dr set state='APPLY-ON';

And all is good with the world.

It is worth saying that MOS note 1595034.1 gets the corrective advice EXACTLY WRONG, with the incorrect snippet reproduced here:

"The standby redo logs can't be used, since they are assigned to thread 0. 
Standby RedoLogs should not be assigned to a specific Thread. 
Please use fallowing statements:
...
SQL> ALTER DATABASE DROP STANDBY LOGFILE GROUP <Group#>; 
and create new SRLs with
SQL> ALTER DATABASE ADD STANDBY LOGFILE GROUP <Group#> SIZE ;"

No MOS analyst! Standby Redo Logs MUST be assigned to a specific thread. Thread 0 Standby Logs are ignored!

And don’t forget, if you are using RAC you need to have Standby Redo Logs for EVERY thread accessible to EVERY instance, just like the Online Redo logs.

If you have single instance, ensure you specify Thread 1, and ensure you have 1 more Standby Redo log available than Online Redo log.

As a final note on Data Guard, from Oracle 12.1 you really should be using the Data Guard Broker to configure and manage Data Guard. It’s simple and it’s much better and safer than configuring Data Guard manually and has some nice validation and monitoring features too.

Addendum: For clarity, not specifying a Thread means that Oracle is supposed to auto-adjust the thread to match what thread is needed, using the Thread 0 as a pool so you presumably need less redo defined in RAC environment, but it just doesn’t work very well (or not at all) leading to the fast/slow/fast/slow lag symptoms. It’s also a pretty pointless saving as redo logs are small relative to the database size. If you have a LOT of nodes in the cluster meaning a lot of standby redo logs, and they aren’t all committing redo heavily (when they will all need all of their standby redo logs anyway) why do you have so many nodes? And validating the setup will report that you don’t have enough standby redo logs. Looks like a solution to something that isn’t a problem for anyone.

So this is an undocumented problem-introducing enhancement in Oracle 12 solving no practical real-world problem.

I’ll let you know if it is fixed in Oracle 19C.

Here’s some real-world output showing the apply lag with Thread 0 SRL’s, and how it disappears when Thread 1 SRL’s are explicitly created. Please note the units and date. I can’t provide the alert log output in this blog to prove my annotations.

SQL> select * from v$standby_event_histogram order by LAST_TIME_UPDATED;

NAME                       TIME UNIT                  COUNT LAST_TIME_UPDATED        CON_ID
-------------------- ---------- ---------------- ---------- -------------------- ----------
apply lag                    43 seconds                   3 01/03/2019 17:22:38           0
apply lag                     9 minutes                  58 01/03/2019 17:30:54           0
apply lag                    11 minutes                  60 01/03/2019 17:32:55           0
apply lag                    12 minutes                  59 01/03/2019 17:33:54           0
apply lag                    13 minutes                  57 01/03/2019 17:34:55           0
apply lag                    14 minutes                  60 01/03/2019 17:35:55           0
apply lag                    15 minutes                  47 01/03/2019 17:36:42           0
apply lag                    10 minutes                  61 01/03/2019 17:36:44           0
-- First SRL Thread 1 created : 2019-01-03T17:39:43.890329+00:00
apply lag                    27 seconds                   9 01/03/2019 17:45:39           0
apply lag                    30 seconds                   5 01/03/2019 17:45:42           0
apply lag                    31 seconds                   7 01/03/2019 17:45:44           0
apply lag                    32 seconds                   6 01/03/2019 17:45:45           0
apply lag                    33 seconds                  10 01/03/2019 17:45:46           0
apply lag                    34 seconds                   8 01/03/2019 17:45:47           0
apply lag                    35 seconds                   9 01/03/2019 17:45:48           0
apply lag                    36 seconds                   7 01/03/2019 17:45:49           0
apply lag                    37 seconds                   5 01/03/2019 17:45:50           0
apply lag                    38 seconds                   8 01/03/2019 17:45:51           0
apply lag                    39 seconds                   7 01/03/2019 17:45:52           0
apply lag                    40 seconds                   3 01/03/2019 17:45:53           0
apply lag                    41 seconds                   3 01/03/2019 17:45:54           0
apply lag                    42 seconds                   4 01/03/2019 17:45:55           0
apply lag                    44 seconds                   4 01/03/2019 17:45:56           0
apply lag                    45 seconds                   7 01/03/2019 17:45:57           0
apply lag                    46 seconds                   5 01/03/2019 17:45:58           0
apply lag                    47 seconds                   7 01/03/2019 17:45:59           0
apply lag                    48 seconds                   4 01/03/2019 17:46:00           0
apply lag                    49 seconds                   8 01/03/2019 17:46:01           0
apply lag                    50 seconds                   5 01/03/2019 17:46:02           0
apply lag                    51 seconds                   6 01/03/2019 17:46:03           0
apply lag                    52 seconds                   8 01/03/2019 17:46:04           0
apply lag                    53 seconds                   7 01/03/2019 17:46:05           0
apply lag                    54 seconds                   6 01/03/2019 17:46:06           0
apply lag                    55 seconds                  10 01/03/2019 17:46:07           0
apply lag                    56 seconds                  11 01/03/2019 17:46:09           0
apply lag                    57 seconds                   8 01/03/2019 17:46:10           0
apply lag                    58 seconds                   6 01/03/2019 17:46:11           0
apply lag                    59 seconds                   7 01/03/2019 17:46:12           0
-- Final SRL Thread 1 created 2019-01-03T17:51:58.694308+00:00
apply lag                     8 minutes                 103 01/03/2019 17:52:55           0
apply lag                     7 minutes                 120 01/03/2019 17:52:56           0
apply lag                     6 minutes                 130 01/03/2019 17:52:58           0
apply lag                     5 minutes                 180 01/03/2019 17:53:00           0
apply lag                     4 minutes                 181 01/03/2019 17:53:03           0
apply lag                     3 minutes                 183 01/03/2019 17:53:06           0
apply lag                     2 minutes                 260 01/03/2019 17:53:11           0
-- All SRL Thread 0's deleted, then ALTER SYSTEM ARCHIVE LOG issued. 2019-01-03T18:19:33.757994+00:00
-- No more APPLY lag for 3 days (any lag reported below -of a few seconds- is due to other reasons)
apply lag                    23 seconds                  10 01/06/2019 18:14:59           0
apply lag                    29 seconds                   8 01/06/2019 18:15:03           0
apply lag                    26 seconds                  10 01/06/2019 18:15:09           0
apply lag                    28 seconds                  10 01/06/2019 18:15:11           0
apply lag                    24 seconds                  12 01/06/2019 18:15:12           0
apply lag                    25 seconds                   9 01/06/2019 18:15:14           0
apply lag                    16 seconds                  33 01/07/2019 11:02:37           0
apply lag                    20 seconds                  17 01/09/2019 11:01:37           0
apply lag                    21 seconds                  29 01/09/2019 11:01:40           0
apply lag                    22 seconds                  19 01/09/2019 11:01:42           0
apply lag                    14 seconds                  50 01/10/2019 00:08:19           0
apply lag                    15 seconds                  44 01/10/2019 00:08:25           0
apply lag                    17 seconds                  17 01/10/2019 00:08:27           0
apply lag                    18 seconds                  23 01/10/2019 00:08:28           0
apply lag                    19 seconds                  21 01/10/2019 00:08:30           0
apply lag                    13 seconds                  73 01/10/2019 00:30:51           0
apply lag                     9 seconds                 170 01/10/2019 00:33:11           0
apply lag                     6 seconds                 426 01/10/2019 01:34:29           0
apply lag                     4 seconds                1304 01/10/2019 11:00:42           0
apply lag                     7 seconds                 294 01/10/2019 11:00:54           0
apply lag                     8 seconds                 222 01/10/2019 11:00:55           0
apply lag                    10 seconds                 117 01/10/2019 11:00:56           0
apply lag                    11 seconds                 109 01/10/2019 11:00:57           0
apply lag                    12 seconds                  90 01/10/2019 11:00:58           0
apply lag                     5 seconds                 726 01/10/2019 11:01:38           0
apply lag                     3 seconds                2911 01/10/2019 11:12:46           0
apply lag                     2 seconds                7866 01/10/2019 11:56:28           0
apply lag                     1 seconds               58544 01/10/2019 11:59:15           0
apply lag                     0 seconds              758505 01/10/2019 12:00:07           0

Off to Poland…

This week I will be speaking at the Polish Oracle User Group annual conference in Sopot, Poland. It looks like a beautiful location for what is a really fun conference. As well as being surrounded by some of the most knowledgeable and excellent technology speakers in the world, there’s also plenty of fun and dancing taking place.

If I am extremely unlucky, I’ll end up swimming in the Baltic Sea. If you are unlucky, you’ll see me swimming in the Baltic Sea. Where I am from in the England there is an expression “It’s Baltic”, which means it is very cold outside. Very cold. So it is because it’s cold, OK. 🙂logo

I hope I’ll see you in Poland – fantastic place to visit with lovely people.

And don’t forget the parties and the dancing. Oh yeah!

And don’t believe all you hear about the vodka and axes, it’s much worse than that!

 

 

Which Oracle Release are you using?

Post Date: August 2018!

Recently an awesome Oracle Guru friend of mine questioned someone who was installing 11.2.0.4 with the word “seriously”, which is think shows that Oracle staff sometimes don’t live in the same technological world as the rest of business.

My response was:

11.2.0.4 is normal. In the real world:

– large corps mostly use old versions
– consultants look at current versions
– Oracle staff look at unreleased versions

I have known instances of Oracle staff blogging about how a feature works when, in the officially released versions, it didn’t. It only worked that way in a version which was released some months later. There was no reference to the release and the fact that there was a significant functional change between releases (but I suppose that’s a blog and not “official” documentation – the official documentation said nothing at all about how that particular feature worked. Nothing! So thank you mystery blogger.)

Anyway, the point of this post was I then did a small twitter poll to my most excellent and cosy band of followers to see what Oracle releases people were using. I asked 2 questions (because twitter is limited) and here’s the results:

oracle_version_highest

So more people have some form of 12 in the DB, but only 7% have 18 in Production. This at a time when most Oracle staff are thinking about Oracle 20 and 21, as Oracle 19 is done and just awaiting release. Think about that, Oracle… Whenever I am at a presentation by an Oracle PM, I think “wow – I might be able to use those new features in 2-5 years”.

oracle_version_lowest

So very few people have 12.x as their lowest version (which would include 18 as that’s really 12.2.0.2) and MORE have 9, 8 or 7 as their major headache! Yes – there are more on 9, 8 and 7 than are using 18 in Production. Lets say that once more. There are more on 9, 8 and 7 than are using 18 in Production

So why upgrade? Very few databases take advantage of all of the latest sexy features. I suspect that many of the applications still being produced could run on Oracle 7.3.4. – more so as the proliferation of ORM’s like Hibernate has left a generation of developers with little appreciation of the database and how to take advantage of it**. So why upgrade? These days? Security. Patches. Support. Without those 3 things, you are living on hope, hope that nothing goes wrong as you’ll struggle to find anyone to fix it – including Oracle. Hoping that nobody tries to hack your 8.1.7 database as it’s a Swiss Cheese of vulnerabilities, like all 7, 8, 8i, 9i, 10G DB’s. Not that we hear about systems being compromised every day on the news.

Anecdote #1: By coincidence I was talking to a client at about the same time and whilst they are a mostly 12.1 shop, they still had an old 8i database hanging around… as usual it was going to  be “retired soon” (which in my experience means sometime in the next 15-20 years) and wasn’t worth the time and effort to be upgraded or even do a business case to upgrade it!

**Anecdote #2: At a client a few years ago, an excellent Java Developer asked me to put an index on a flag column. I pointed out that with only 3 values that an index wouldn’t help, and as this was OLTP a bitmap index wasn’t appropriate due to concurrency issues. He said that with 3 values indexed, his query would be 3 times faster! We sat down and I explained some database fundamentals to him, at which point he said “don’t put an index on there – that would be a stupid idea”. A few weeks later he came back over and asked about SQL queries “I’m trying to aggregate this data – can the database help?”. I spent 30 minutes showing him in-line views and windowing analytic functions and we wrote the code he needed for his output. “Wow! You have just saved me 3 days of Java coding…” – he was going to pull everything into Java and process it there, so as well as 3 days of coding, we also saved the SAN, the network and a whole bunch of CPU by dealing with data at the database layer – which is always the most efficient place to deal with it!

Running RAC in a Virtual Environment

I was in a discussion at an Oracle Meetup this week, and the subject of RAC in a virtualized environment – specifically Oracle Virtual Machine (OVM) – came up.
Here’s a couple of points which were discussed.

pingtarget

There was a lack of awareness of a common problem, which has a solution built-in to Oracle 12.1.0.2 Grid Infrastructure and later. In a virtualized environment, the network components are also virtualized. Sometimes, network failures on the host may not be raised up to the guests. As a result, the guest O/S commands can fail to detect the network failures and the Virtual NIC remains up. Grid Infrastructure (GI) will not perform a VIP fail-over as it can’t see the failure despite the network being unavailable.

To resolve this, Oracle has added an option of a “pingtarget” for each public network defined in GI. This will perform a keep-alive to a external device, usually something like the default gateway. This is just like the heartbeat on the cluster interconnect.

Before

srvctl config network

Network 1 exists

Subnet IPv4: 192.168.0.160/255.255.255.224/eth1, static
Subnet IPv6:
Ping Targets:
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:

The default gateway makes a good ping target. For this IP and subnet, it’s 192.168.0.161

srvctl modify network -k 1 -pingtarget 192.168.0.161

After

srvctl config network

Network 1 exists

Subnet IPv4: 192.168.0.160/255.255.255.224/eth1, static
Subnet IPv6:
Ping Targets: 192.168.0.161
Network is enabled
Network is individually enabled on nodes:
Network is individually disabled on nodes:

All safe!

Server Pools

A second item we discussed was the Server Pools in OVM. Each RAC guest should be on a different host, otherwise you have not eliminated that as a Single Point Of Failure. A second less obvious SPOF is the Server Pool disk.

A Server Pool is a filesystem LUN (and IP address prior to release 3.4) used to group a logical collection of servers with similar CPU models, within which we can create and migrate VM guests. For a RAC installation, each RAC node should be within a different server pool, as well as on different physical hardware.

ovs

In this image, RAC nodes within the same cluster should be created within each server pool. This configuration can safely support a 2 node cluster despite having 4 servers, with one node created in “OVS-Pool-2” on server “ovs02“. The second node should be in “OVS-Pool-1″ and can be on “ovs01“, “ovs11” or “ovs12“.

It is possible to live migrate guests between these 3 servers.

 

 

 

 

 

UKOUG Member Advocate: sorted!

I received an email from the UKOUG last week.

winners-picture

 

It turns out that I have won the vote to become the next Member Advocate for the UKOUG. I will (eventually) be joined on the board by Martin Widlake – who won the vote for the position of President Elect – a position whereby Martin will shadow the current UKOUG President before taking office on the Board. He has written about it here.

I would very much like to thank everyone who voted for me. I hope I can live up to your expectations. If you did not vote for me, that’s OK – I’ll represent you and your interests just the same as everybody else! Looks like I’m going to have to spend time getting up to speed with the Apps community, and re-acquainting myself with my old JDE friends.

Here’s to moving the User Group forward to meet the challenges of the changing face of our industry. See you all soon, and Cheers Everyone!

(sorry – would have posted this sooner but I was at the OUG Ireland User Group giving a couple of presentations and taking part in a Q&A panel about “18c and everything else!” – fantastic conference and lovely people there!)

 

 

UKOUG M.A. Voting Hots Up!

It’s getting hotter!

After many years volunteering at the UKOUG I am standing to be the Member Advocate at the UKOUG, representing the members on the board.

Do you think I will represent the membership well? Yes. VOTE FOR ME!

Oh, you must be the main UKOUG membership contact, or know who is, to vote.

giphy

One twist in the election is that Steve Davis has withdrawn from the vote due to business commitments, leaving a straightforward race between myself and Andy Nellis. That’s a shame as I think Steve would have made a good Member Advocate had he been voted in. Andy is unknown to me so I cannot comment on his suitability as a UKOUG volunteer or board member.

I’d really appreciate your vote – I know that many of you have voted for me already, but I want ALL of you to give me a strong mandate to improve the user group.

VOTE HERE

 

 

I want your vote!

Are you a member of the UKOUG? Read on…

ukoug_logo

I’m currently standing for the role of Member Advocate. But why?

I’ve been volunteering at the UKOUG for over 10 years, and speaking at the conferences and SIGs for the last 5 years or so.  I’ve been deputy chairing or chairing a SIG for all of this time and I’m currently the RAC SIG Chairperson. I’m also involved with our main conference, having helped on the organizing committees for several years, I organized the database stream for Tech16 and the entire content for Tech17. I enjoy giving back to the community and I think that by joining the board to serve as Member Advocate I will be able to further enhance the user group and the members experience. Your experience.

 

 

Feel free to tweet me, message me or comment if you have any questions!

Oh, and while you’re there, can you also vote for Martin Widlake to be President Elect. Nobody is running against him but the membership do need to confirm they are happy with his appointment. I’ve already voted for him!

ukoug-vote

twit1twit2twit3


OK, here’s a few testimonials from Twitter…I hope you will join the other UKOUG members in voting for me and Martin!twit4twit5

%d bloggers like this: