Which Oracle Release are you using?

Post Date: August 2018!

Recently an awesome Oracle Guru friend of mine questioned someone who was installing 11.2.0.4 with the word “seriously”, which is think shows that Oracle staff sometimes don’t live in the same technological world as the rest of business.

My response was:

11.2.0.4 is normal. In the real world:

– large corps mostly use old versions
– consultants look at current versions
– Oracle staff look at unreleased versions

I have known instances of Oracle staff blogging about how a feature works when, in the officially released versions, it didn’t. It only worked that way in a version which was released some months later. There was no reference to the release and the fact that there was a significant functional change between releases (but I suppose that’s a blog and not “official” documentation – the official documentation said nothing at all about how that particular feature worked. Nothing! So thank you mystery blogger.)

Anyway, the point of this post was I then did a small twitter poll to my most excellent and cosy band of followers to see what Oracle releases people were using. I asked 2 questions (because twitter is limited) and here’s the results:

oracle_version_highest

So more people have some form of 12 in the DB, but only 7% have 18 in Production. This at a time when most Oracle staff are thinking about Oracle 20 and 21, as Oracle 19 is done and just awaiting release. Think about that, Oracle… Whenever I am at a presentation by an Oracle PM, I think “wow – I might be able to use those new features in 2-5 years”.

oracle_version_lowest

So very few people have 12.x as their lowest version (which would include 18 as that’s really 12.2.0.2) and MORE have 9, 8 or 7 as their major headache! Yes – there are more on 9, 8 and 7 than are using 18 in Production. Lets say that once more. There are more on 9, 8 and 7 than are using 18 in Production

So why upgrade? Very few databases take advantage of all of the latest sexy features. I suspect that many of the applications still being produced could run on Oracle 7.3.4. – more so as the proliferation of ORM’s like Hibernate has left a generation of developers with little appreciation of the database and how to take advantage of it**. So why upgrade? These days? Security. Patches. Support. Without those 3 things, you are living on hope, hope that nothing goes wrong as you’ll struggle to find anyone to fix it – including Oracle. Hoping that nobody tries to hack your 8.1.7 database as it’s a Swiss Cheese of vulnerabilities, like all 7, 8, 8i, 9i, 10G DB’s. Not that we hear about systems being compromised every day on the news.

Anecdote #1: By coincidence I was talking to a client at about the same time and whilst they are a mostly 12.1 shop, they still had an old 8i database hanging around… as usual it was going to  be “retired soon” (which in my experience means sometime in the next 15-20 years) and wasn’t worth the time and effort to be upgraded or even do a business case to upgrade it!

**Anecdote #2: At a client a few years ago, an excellent Java Developer asked me to put an index on a flag column. I pointed out that with only 3 values that an index wouldn’t help, and as this was OLTP a bitmap index wasn’t appropriate due to concurrency issues. He said that with 3 values indexed, his query would be 3 times faster! We sat down and I explained some database fundamentals to him, at which point he said “don’t put an index on there – that would be a stupid idea”. A few weeks later he came back over and asked about SQL queries “I’m trying to aggregate this data – can the database help?”. I spent 30 minutes showing him in-line views and windowing analytic functions and we wrote the code he needed for his output. “Wow! You have just saved me 3 days of Java coding…” – he was going to pull everything into Java and process it there, so as well as 3 days of coding, we also saved the SAN, the network and a whole bunch of CPU by dealing with data at the database layer – which is always the most efficient place to deal with it!

Adding a DEFAULT column in 12C

I was at a talk recently, and there was an update by Jason Arneil about adding columns to tables with DEFAULT values in Oracle 12C. The NOT NULL restriction has been lifted and now Oracle cleverly intercepts the null value and replaces it with the DEFAULT meta-data without storing it in the table. To repeat the 11G experiment I ran recently:

 

SQL> alter table ncha.tab1 add (filler_default char(1000) default 'EXPAND' not null);
Table altered.

SQL> select table_name,num_rows,blocks,avg_space,avg_row_len 
      from user_tables where table_name = 'TAB1';
TABLE_NAME NUM_ROWS       BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1            10000       1504          0        2017


In both releases we then issue:
SQL> alter table ncha.tab1 modify (filler_default null);
Table altered.


IN 11G
SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
      from user_tables where table_name = 'TAB1';

TABLE_NAME NUM_ROWS       BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1            10000       3394          0        2017

BUT IN 12C
SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
      from user_tables where table_name = 'TAB1';
TABLE_NAME NUM_ROWS       BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1            10000       1504          0        2017

So, as we can see, making the column NULLABLE in 12C didn’t cause it to go through and update every row in the way it must in 11G. It’s still a chained-row update accident waiting to happen, but its a more flexible accident 🙂

However, I think it’s worth pointing out that you only get “free data storage” when you add the column. When inserting a record, simply having a column with a DEFAULT value means that the DEFAULT gets physically stored with the record if it is not specified. The meta-data effect is ONLY for subsequently added columns with DEFAULT values.

SQL> create table ncha.tab1 (pk number, c2 timestamp, filler char(1000), filler2 char(1000) DEFAULT 'FILLER2' NOT NULL) pctfree 1;
Table created.

SQL> alter table ncha.tab1 add constraint tab1_pk primary key (pk);
Table altered.

Insert 10,000 rows into the table, but not into FILLER2 with the DEFAULT
SQL> insert into ncha.tab1 (pk, c2, filler) select rownum id, sysdate, 'A' from dual connect by level <= 10000;
commit;
Commit complete.

Gather some stats and have a look after loading the table. Check for chained rows at the same time.
SQL> exec dbms_stats.gather_table_stats('NCHA','TAB1',null,100);
PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
     from user_tables where table_name = 'TAB1';

TABLE_NAME   NUM_ROWS	  BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1		10000	    3394	  0	   2017

For a bit of fun, I thought I would see just how weird the stats might look if I played around with adding defaults

SQL> drop table ncha.tab1;
Table dropped.

SQL> create table ncha.tab1 (pk number) pctfree 1;
Table created.

SQL> alter table ncha.tab1 add constraint tab1_pk primary key (pk);
Table altered.

Insert 10,000 rows into the table

SQL> insert into ncha.tab1 (pk) select rownum id from dual connect by level <= 10000;
commit;
Commit complete.

Gather some stats and have a look after loading the table. Check for chained rows at the same time.
SQL> exec dbms_stats.gather_table_stats('NCHA','TAB1',null,100);

PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
  2    from user_tables
  3   where table_name = 'TAB1';

TABLE_NAME   NUM_ROWS	  BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1		10000	      20	  0	      4

Now lets add a lot of defaults
SQL> alter table ncha.tab1 add (filler_1 char(2000) default 'F1' not null, filler_2 char(2000) default 'F2' null, filler_3 char(2000) default 'F3', filler_4 char(2000) default 'how big?' null );
Table altered.

Gather some stats and have a look after adding the column. Check for chained rows at the same time.
SQL> exec dbms_stats.gather_table_stats('NCHA','TAB1',null,100);

PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
  2    from user_tables
  3   where table_name = 'TAB1';

TABLE_NAME   NUM_ROWS	  BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1		10000	      20	  0	   8008

10,000 rows with an AVG_ROW_LEN of 8008, all in 20 blocks. Magic!

Just to finish off, lets update each DEFAULT column so the table expands….

SQL> select filler_1, filler_2, filler_3, filler_4,count(*) from ncha.tab1 group by filler_1,filler_2,filler_3,filler_4;

FILLER_1   FILLER_2   FILLER_3	 FILLER_4     COUNT(*)
---------- ---------- ---------- ---------- ----------
F1	   F2	      F3	 how big?	 10000

So it's all there. The metadata is intercepting the nulls and converting them to the default on the fly, rather than storing them in the blocks.
So what happens if we actually UPDATE the table?

SQL> update ncha.tab1 set filler_1 = 'EXPAND', filler_2 = 'EXPAND', filler_3='EXPAND', filler_4='THIS BIG!';
10000 rows updated.

SQL> select filler_1, filler_2, filler_3, filler_4,count(*) from ncha.tab1 group by filler_1,filler_2,filler_3,filler_4;

FILLER_1   FILLER_2   FILLER_3	 FILLER_4     COUNT(*)
---------- ---------- ---------- ---------- ----------
EXPAND	   EXPAND     EXPAND	 THIS BIG!	 10000

Gather some stats and have a look after the update, checking for chained rows at the same time.
SQL> exec dbms_stats.gather_table_stats('NCHA','TAB1',null,100);

PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
     from user_tables where table_name = 'TAB1';

TABLE_NAME   NUM_ROWS	  BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1		10000	   19277	  0	   8010

SQL> 
SQL> analyze table tab1 list chained rows into chained_rows;

Table analyzed.

SQL> select count(*) CHAINED_ROWS from chained_rows;

CHAINED_ROWS
------------
       10000

Yep. That’s bigger.

Grid Infrastructure Disk Space Problem – CHM DB file: crfclust.bdb

The Grid Infrastructure filesystem was reporting that it was a bit full today (release 11.2.0.4). This was tracked down to the “crfclust.bdb” file, which records information about the cluster health for monitoring purposes. It was 26GB. It’s not supposed to get bigger than 1GB so this is probably a bug, but let’s explicitly resolve the size issue right now and search Oracle support later. Worst case, bdb (Berkerley Database) files get regenerated when CHM (ora.crf) resource is restarted.  You only lose the (OS) statistics that CHM has gathered. Deleting bdb files does not have other impact.  CHM will start collecting the OS statistics again.

 

df –h /u01

Filesystem                Size  Used Avail Use% Mounted on
/dev/sdc1                  48G   36G  9.0G  81% /u01

pwd
/u01/app/11g/grid/crf/db/node01

ls -lh
total 29G

-rw-r–r– 1 root root 2.1M Jul 22 12:12 22-JUL-2014-12:12:03.txt
-rw-r–r– 1 root root 1.3M Apr 23 14:28 23-APR-2014-14:28:04.txt
-rw-r–r– 1 root root 1.2M Apr 23 14:33 23-APR-2014-14:33:34.txt
-rw-r–r– 1 root root 1.3M Jul 23 12:53 23-JUL-2014-12:53:02.txt
-rw-r–r– 1 root root 946K Apr 26 03:57 26-APR-2014-03:57:21.txt
-rw-r—– 1 root root 492M Aug 26 10:33 crfalert.bdb
-rw-r—– 1 root root  26G Aug 26 10:33 crfclust.bdb   <-26G!
-rw-r—– 1 root root 8.0K Jul 23 12:52 crfconn.bdb
-rw-r—– 1 root root 521M Aug 26 10:33 crfcpu.bdb
-rw-r—– 1 root root 513M Aug 26 10:33 crfhosts.bdb
-rw-r—– 1 root root 645M Aug 26 10:33 crfloclts.bdb
-rw-r—– 1 root root 418M Aug 26 10:33 crfts.bdb
-rw-r—– 1 root root  24K Aug  1 16:07 __db.001
-rw-r—– 1 root root 392K Aug 26 10:33 __db.002
-rw-r—– 1 root root 2.6M Aug 26 10:33 __db.003
-rw-r—– 1 root root 2.1M Aug 26 10:34 __db.004
-rw-r—– 1 root root 1.2M Aug 26 10:33 __db.005
-rw-r—– 1 root root  56K Aug 26 10:34 __db.006
-rw-r—– 1 root root  16M Aug 26 10:17 log.0000008759
-rw-r—– 1 root root  16M Aug 26 10:33 log.0000008760
-rw-r—– 1 root root 8.0K Aug 26 10:33 repdhosts.bdb
-rw-r–r– 1 root root 115M Jul 22 12:12 node01.ldb

Lets see how big the repository is…

oclumon manage -get repsize
CHM Repository Size = 1073736016

Wow.  Seems a bit oversized. Change the repository size to the desired number of seconds, between 3600 (1 hour) and 259200 (3 days)

oclumon manage -repos resize 259200

node01 –> retention check successful
node02 –> retention check successful

New retention is 259200 and will use 4524595200 bytes of disk space
CRS-9115-Cluster Health Monitor repository size change completed on all nodes.

If we now check the size, we get an error as the repository is bigger than the max allowed size.

oclumon manage -get resize
CRS-9011-Error manage: Failed to initialize connection to the Cluster Logger Service

So we need to stop and start the ora.crf service to get everything working again. It should be OK to do this on a running system with no impact, but I’d start with your sandpit to test it. Don’t take my word for it!

Check for process:

node01:/u01/app/11g/grid/bin>ps -ef |grep crf
root     26983     1  0 10:44 ?        00:00:00 /u01/app/11g/grid/bin/ologgerd -m node02 -r -d /u01/app/11g/grid/crf/db/node01

Stop service:
node01:/u01/app/11g/grid/bin>crsctl stop res ora.crf -init

CRS-2673: Attempting to stop ‘ora.crf’ on ‘node01’
CRS-2677: Stop of ‘ora.crf’ on ‘node01’ succeeded

Start Service:
node01:/u01/app/11g/grid/bin>crsctl start res ora.crf -init
CRS-2672: Attempting to start ‘ora.crf’ on ‘node01’
CRS-2676: Start of ‘ora.crf’ on ‘node01’ succeeded

Check for Process:
node01:/u01/app/11g/grid/bin>ps -ef  |grep crf
root     28000     1  5 10:49 ?        00:00:00 /u01/app/11g/grid/bin/ologgerd -m node02 -r -d /u01/app/11g/grid/crf/db/node01

Check the size – as specified:
node01:/u01/app/11g/grid/bin>oclumon manage -get repsize

CHM Repository Size = 259200

Done

And the space is released and reclaimed.

node01:/u01/app/11g/grid/bin>df –h /u01

Filesystem                Size  Used Avail Use% Mounted on
/dev/sdc1                  48G  7.7G   38G  18% /u01

The space has been returned. Marvellous.
Now repeat the stop/start on each node.

 

UPDATE: From Oracle Support: Having very large bdb files (greater than 2GB) is likely due to a bug since the default size limits the bdb to 1GB unless the CHM data retention time is increased.  One such bug is 10165314.

RMAN Incarnations revisited (11G)

Time for an update to a older post. I have previously talked about the annoyance of connecting to RMAN with a duplicated database where the DBID has not been changed. RMAN happily breaks the catalog by assuming the “new” database is a new incarnation, and prevents the previous owner of the catalog from using the backups.

I wrote a blog post a while ago about hacking your way past this problem, but was recently informed by Martin Bach that there was actually an RMAN command to fix the Incarnation problem I had encountered, so I though I had better take a look and see if it worked!

Well, the first thing I noticed was that Oracle 11G does not break when connecting from a different database with the same DBID the way it did in Oracle 10G:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sat Jan 26 12:26:10 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL1 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;
starting full resync of recovery catalog
full resync complete
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
1  2 ORCL1 1229390655 CURRENT 754488 30/10/09 11:38:43

RMAN> list backup summary;

List of Backups
===============
Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- ----------------- ------- ------- ---------- ---
331 B A A DISK 26/01/13 08:55:32 1 1 NO BACKUP1
375 B A A DISK 26/01/13 09:06:44 1 1 NO BACKUP2
400 B A A DISK 26/01/13 09:07:02 1 1 NO BACKUP3
587 B F A DISK 26/01/13 11:20:09 1 1 YES FULL BACKUP
609 B F A DISK 26/01/13 11:20:11 1 1 NO TAG20130126T112010

And on the alternate database:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sat Jan 26 12:15:07 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;
starting full resync of recovery catalog
full resync complete
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 18 ORCL2 1229390655 PARENT 1 13/08/09 23:00:48
1  2 ORCL2 1229390655 CURRENT 754488 30/10/09 11:38:43

RMAN> list backup summary;

specification does not match any backup in the repository

RMAN>

Whilst the incarnations look a little incorrect (referring to ORCL2), the system does not break. So, no more need to hack around with incarnations if the system breaks accidentally. However, what if you register the other database…

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 05:44:06 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
connected to recovery catalog database
RMAN> register database;
starting full resync of recovery catalog
full resync complete
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of register command on default channel at 01/27/2013 05:44:12
RMAN-20002: target database already registered in recovery catalog

So, after a little effort it would appear I can’t easily break the incarnations in Oracle 11G. So let’s try. I recovered the ORCL1 database to create a new incarnation to see how ORCL2 would behave when connected:

on ORCL1:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 12:32:09 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL1 (DBID=1229390655)
 connected to recovery catalog database


RMAN> list incarnation;
 List of Database Incarnations
 DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
 ------- ------- -------- ---------------- --- ---------- ----------
 1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
 1 2 ORCL1 1229390655 PARENT 754488 30/10/09 11:38:43
 1 921 ORCL1 1229390655 CURRENT 10215936 27/01/13 12:27:12 <- new incarnation

<BR>

And now ORCL2 behaves a little differently, recognising the ORCL1 incarnations correctly, and throwing an error:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 12:19:27 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
 connected to recovery catalog database

RMAN> list incarnation;
 List of Database Incarnations
 DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
 ------- ------- -------- ---------------- --- ---------- ----------
 1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
 1 2 ORCL1 1229390655 PARENT 754488 30/10/09 11:38:43
 1 921 ORCL1 1229390655 CURRENT 10215936 27/01/13 12:27:12

RMAN> list backup summary;
RMAN-00571: ===========================================================
 RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
 RMAN-00571: ===========================================================
 RMAN-03002: failure of list command at 01/27/2013 12:19:38
 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20004: target database name does not match name in recovery catalog

So, what if I change the name of ORCL2 back to ORCL1. Can I reproduce my error then?

[oracle@localhost dbs]$ sqlplus / as sysdba
SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 27 12:23:29 2013
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to:
 Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
 With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shutdown immediate;
 Database closed.
 Database dismounted.
 ORACLE instance shut down.

SQL> startup mount;
 ORACLE instance started.
Total System Global Area 456146944 bytes
 Fixed Size 1344840 bytes
 Variable Size 381684408 bytes
 Database Buffers 67108864 bytes
 Redo Buffers 6008832 bytes
 Database mounted.

 SQL> exit
 Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
 With the Partitioning, OLAP, Data Mining and Real Application Testing options

[oracle@localhost dbs]$ nid target=system/oracle dbname=orcl1 setname=yes
DBNEWID: Release 11.2.0.2.0 - Production on Sun Jan 27 12:24:10 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Connected to database ORCL2 (DBID=1229390655)
Connected to server version 11.2.0
Control Files in database:
 /home/oracle/app/oracle/oradata/orcl/control01.ctl
 /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl
Change database name of database ORCL2 to ORCL1? (Y/[N]) => Y
Proceeding with operation
 Changing database name from ORCL2 to ORCL1
 Control File /home/oracle/app/oracle/oradata/orcl/control01.ctl - modified
 Control File /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl - modified
 Datafile /home/oracle/app/oracle/oradata/orcl/system01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/sysaux01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/undotbs01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/users01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/example01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/APEX_1246426611663638.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/APEX_1265209995679366.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/temp01.db - wrote new name
 Control File /home/oracle/app/oracle/oradata/orcl/control01.ctl - wrote new name
 Control File /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl - wrote new name
 Instance shut down
Database name changed to ORCL1.
 Modify parameter file and generate a new password file before restarting.
 Succesfully changed database name.
 DBNEWID - Completed succesfully.
[note: I have already got the relevant init.ora and oratab setup]
[oracle@localhost dbs]$ . oraenv
 ORACLE_SID = [orcl2] ? orcl1
 The Oracle base has been set to /home/oracle/app/oracle

 [oracle@localhost dbs]$ sqlplus / as sysdba
SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 27 12:24:31 2013
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup mount;
 ORACLE instance started.
Total System Global Area 456146944 bytes
 Fixed Size 1344840 bytes
 Variable Size 381684408 bytes
 Database Buffers 67108864 bytes
 Redo Buffers 6008832 bytes
 Database mounted.
SQL> alter database open;
Database altered.

SQL >exit

[oracle@localhost dbs]$ rman target system/oracle catalog rman/rman@orcl1

Recovery Manager: Release 11.2.0.2.0 – Production on Sun Jan 27 12:52:45 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database: ORCL1 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;

database reset to incarnation 2
starting full resync of recovery catalog

List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
——- ——- ——– —————- — ———- ———-
1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
1 2 ORCL1 1229390655 CURRENT 754488 30/10/09 11:38:43
1 921 ORCL1 1229390655 ORPHAN 10215936 27/01/13 12:27:12

So, the newly rename-to ORCL1 thinks we are at incarnation 2. However, log back into the original ORCL1 and it resets the incarnation back to 921. Still no corruption, still no problem!

So, I still can’t prove whether the ALTER DATABASE SET INCARNATION command will work as mentioned to me, or whether it’s just something that allows me to recover across a resetlogs command. Looks like I’ll have to reinstall Oracle 10G… tomorrow.

Exposing the Oracle Alert Log to SQL

I’ve been spending some time working in Apex recently, building a small app to draw together the monitoring of application and infrastructure components into a single easy-to-visualise tool. As part of that, I wanted to be able to read and report on the alert log. Traditionally, that would have meant creating an external table to point to the alert log and reading it that way, with lots of string manipulation and regular expressions to try to pull out useful bits of information. However, Oracle 11G had made that a lot easier. Step forward X$DBGALERTEXT. This is the decoded version of the XML version of the Alert log, and as such provides lots of lovely columns to filter by, rather than a single line of text to decode. Particularly useful (for me) is the MESSAGE_LEVEL. Is this line of text informational (16), or critical (1), or something in between? Of course, each “normal” line of text is still available in the MESSAGE_TEXT column.

SQL> desc x$dbgalertext;
 Name                           Type
 ------------------------------ --------------------------------------------------------
 ADDR                           RAW(4)
 INDX                           NUMBER
 INST_ID                        NUMBER
 ORIGINATING_TIMESTAMP          TIMESTAMP(3) WITH TIME ZONE
 NORMALIZED_TIMESTAMP           TIMESTAMP(3) WITH TIME ZONE
 ORGANIZATION_ID                VARCHAR2(64)
 COMPONENT_ID                   VARCHAR2(64)
 HOST_ID                        VARCHAR2(64)
 HOST_ADDRESS                   VARCHAR2(46)
 MESSAGE_TYPE                   NUMBER
 MESSAGE_LEVEL                  NUMBER
 MESSAGE_ID                     VARCHAR2(64)
 MESSAGE_GROUP                  VARCHAR2(64)
 CLIENT_ID                      VARCHAR2(64)
 MODULE_ID                      VARCHAR2(64)
 PROCESS_ID                     VARCHAR2(32)
 THREAD_ID                      VARCHAR2(64)
 USER_ID                        VARCHAR2(64)
 INSTANCE_ID                    VARCHAR2(64)
 DETAILED_LOCATION              VARCHAR2(160)
 PROBLEM_KEY                    VARCHAR2(64)
 UPSTREAM_COMP_ID               VARCHAR2(100)
 DOWNSTREAM_COMP_ID             VARCHAR2(100)
 EXECUTION_CONTEXT_ID           VARCHAR2(100)
 EXECUTION_CONTEXT_SEQUENCE     NUMBER
 ERROR_INSTANCE_ID              NUMBER
 ERROR_INSTANCE_SEQUENCE        NUMBER
 VERSION                        NUMBER
 MESSAGE_TEXT                   VARCHAR2(2048)
 MESSAGE_ARGUMENTS              VARCHAR2(128)
 SUPPLEMENTAL_ATTRIBUTES        VARCHAR2(128)
 SUPPLEMENTAL_DETAILS           VARCHAR2(128)
 PARTITION                      NUMBER
 RECORD_ID                      NUMBER

Very handy. Just add your own view, synonym and permissions to read the view, and you’re away…

create view v_$alert_log as select * from x$dbgalertext;
create public synonym v$alert_log for sys.v_$alert_log;
grant select on v$alert_log to whomever...

  1* select message_text from v$alert_log where ...;

MESSAGE_TEXT
-----------------------------------------------------------------
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 2
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.

Using parameter settings in server-side pfile /home/oracle/app/oracle/product/11.2.0/dbhome_2/dbs/initorcl.ora
System parameters with non-default values:
  processes                = 200
  sessions                 = 322
  sga_max_size             = 2G
  pre_page_sga             = TRUE
  nls_language             = "ENGLISH"
  nls_territory            = "UNITED KINGDOM"
  filesystemio_options     = "SetAll"
  sga_target               = 2G
  control_files            = "/u02/oradata/orcl/control01.ctl"
.
[snip]
.
  aq_tm_processes          = 1
  diagnostic_dest          = "/u20/apps/oracle"
PMON started with pid=2, OS id=2492
PSP0 started with pid=3, OS id=2494
VKTM started with pid=4, OS id=2512 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
GEN0 started with pid=5, OS id=2520
DIAG started with pid=6, OS id=2522
...etc...

Problems with RMAN and incarnations.

One day, not so very long ago, I was at a client site looking through the “passive” half of an AIX  HACMP clustered server to tidy it up a little as we were experiencing pressure on space. There was a test database on there with a very large amount of historic archive logs. I thought it would be a good idea to check the database backups in RMAN and maybe do some tidying up through that route. This, it turned out, was not the most sensible thing I have done. The test database was a straight binary copy of the Production database. It had received no subsequent changes, especially the most important one from an RMAN perspective: the Database ID. Without a warning, RMAN immediately assumed that this database, with its more recent resetlogs and matching ID, was a new Incarnation of Production and promptly amended the catalog to that effect. Let’s just see that in action:

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:01:41 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       PROD     1099918981       CURRENT 446075     04-APR-10

RMAN> list backup summary;

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> exit

Recovery Manager complete.

[oracle]$ export ORACLE_SID=TEST
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:23 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: TEST (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

new incarnation of database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       TEST     1099918981       PARENT  446075     04-APR-10
1       1789    TEST     1099918981       CURRENT 3023938    27-FEB-11

RMAN> exit

Recovery Manager complete.

And lets see what happens when we go into RMAN for Production

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:59 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of list command at 02/27/2011 21:03:04
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog

RMAN> list incarnation;
RMAN> exit

Recovery Manager complete.

So where does that leave me? With the current production datawarehouse unable to access RMAN as it’s not the right Incarnation. One quick look at the clock, and you know what time it is. 30 minutes to the start of a tight backup window, which will fail. It’s inevitable that this sort of thing never happens with 8 hours of free time to work out the best way to resolve the problem, but with scant time to sort it out with no impact on the Production system. After some thought, and some Google, it became apparent that the only solution was to hack manually edit the RMAN catalog to remove the new incarnation.

EDIT! Before trying the catalog mod below you should look at the My Oracle Support document 412113.1, and check out the rman commands:

RMAN> list incarnation;
RMAN> reset database to incarnation <dbinc_key>;    
RMAN> resync catalog;
RMAN> list incarnation;

OK. Proceed at your own risk!

To remove the bad incarnation record from the recovery catalog:

[oracle]$ sqlplus rman/rman@rman_db
RMAN @ RMAN_DB >  select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         NO          8                                              1 30-JUN-05 PARENT
1 1099918981       1789 TEST                3023938          27-FEB-11         YES         2                                         446075 04-APR-10 CURRENT
RMAN @ RMAN_DB > select * from db;
DB_KEY      DB_ID HIGH_CONF_RECID LAST_KCCDIVTS HIGH_IC_RECID CURR_DBINC_KEY
---------- ---------- --------------- ------------- ------------- --------------
1 1099918981                     744219186             2           1789
RMAN @ RMAN_DB > update db set curr_dbinc_key = 2;
1 row updated.
RMAN @ RMAN_DB > delete from dbinc where dbinc_key = 1789;
1 row deleted.
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         YES         8                                              1 30-JUN-05 PARENT

RMAN @ RMAN_DB > commit;

Commit complete.
And let's see if we can use RMAN again...
[oracle]$ rman target / catalog rman/rman@rman_db
Recovery Manager: Release 10.2.0.1.0 - Production on Mon Mar 21 22:08:45 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: PROD  (DBID=1099918981)
connected to recovery catalog database
RMAN> list backup summary;
starting full resync of recovery catalog
full resync complete
List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        TAG20110227T144639
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

And so, we are just about back where we started before some idiot messed up the RMAN catalog, and the backups work just fine. Now we need to change the dbid on the TEST database, using the nid command before another DBA does the same thing.

The last thing to do was to ensure that the recovery worked too.

NOTE: 11G Update to this blog entry

%d bloggers like this: