RMAN Incarnations revisited (11G)

Time for an update to a older post. I have previously talked about the annoyance of connecting to RMAN with a duplicated database where the DBID has not been changed. RMAN happily breaks the catalog by assuming the “new” database is a new incarnation, and prevents the previous owner of the catalog from using the backups.

I wrote a blog post a while ago about hacking your way past this problem, but was recently informed by Martin Bach that there was actually an RMAN command to fix the Incarnation problem I had encountered, so I though I had better take a look and see if it worked!

Well, the first thing I noticed was that Oracle 11G does not break when connecting from a different database with the same DBID the way it did in Oracle 10G:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sat Jan 26 12:26:10 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL1 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;
starting full resync of recovery catalog
full resync complete
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
1  2 ORCL1 1229390655 CURRENT 754488 30/10/09 11:38:43

RMAN> list backup summary;

List of Backups
===============
Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- ----------------- ------- ------- ---------- ---
331 B A A DISK 26/01/13 08:55:32 1 1 NO BACKUP1
375 B A A DISK 26/01/13 09:06:44 1 1 NO BACKUP2
400 B A A DISK 26/01/13 09:07:02 1 1 NO BACKUP3
587 B F A DISK 26/01/13 11:20:09 1 1 YES FULL BACKUP
609 B F A DISK 26/01/13 11:20:11 1 1 NO TAG20130126T112010

And on the alternate database:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sat Jan 26 12:15:07 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;
starting full resync of recovery catalog
full resync complete
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 18 ORCL2 1229390655 PARENT 1 13/08/09 23:00:48
1  2 ORCL2 1229390655 CURRENT 754488 30/10/09 11:38:43

RMAN> list backup summary;

specification does not match any backup in the repository

RMAN>

Whilst the incarnations look a little incorrect (referring to ORCL2), the system does not break. So, no more need to hack around with incarnations if the system breaks accidentally. However, what if you register the other database…

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 05:44:06 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
connected to recovery catalog database
RMAN> register database;
starting full resync of recovery catalog
full resync complete
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of register command on default channel at 01/27/2013 05:44:12
RMAN-20002: target database already registered in recovery catalog

So, after a little effort it would appear I can’t easily break the incarnations in Oracle 11G. So let’s try. I recovered the ORCL1 database to create a new incarnation to see how ORCL2 would behave when connected:

on ORCL1:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 12:32:09 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL1 (DBID=1229390655)
 connected to recovery catalog database


RMAN> list incarnation;
 List of Database Incarnations
 DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
 ------- ------- -------- ---------------- --- ---------- ----------
 1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
 1 2 ORCL1 1229390655 PARENT 754488 30/10/09 11:38:43
 1 921 ORCL1 1229390655 CURRENT 10215936 27/01/13 12:27:12 <- new incarnation

<BR>

And now ORCL2 behaves a little differently, recognising the ORCL1 incarnations correctly, and throwing an error:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 12:19:27 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
 connected to recovery catalog database

RMAN> list incarnation;
 List of Database Incarnations
 DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
 ------- ------- -------- ---------------- --- ---------- ----------
 1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
 1 2 ORCL1 1229390655 PARENT 754488 30/10/09 11:38:43
 1 921 ORCL1 1229390655 CURRENT 10215936 27/01/13 12:27:12

RMAN> list backup summary;
RMAN-00571: ===========================================================
 RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
 RMAN-00571: ===========================================================
 RMAN-03002: failure of list command at 01/27/2013 12:19:38
 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20004: target database name does not match name in recovery catalog

So, what if I change the name of ORCL2 back to ORCL1. Can I reproduce my error then?

[oracle@localhost dbs]$ sqlplus / as sysdba
SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 27 12:23:29 2013
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to:
 Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
 With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shutdown immediate;
 Database closed.
 Database dismounted.
 ORACLE instance shut down.

SQL> startup mount;
 ORACLE instance started.
Total System Global Area 456146944 bytes
 Fixed Size 1344840 bytes
 Variable Size 381684408 bytes
 Database Buffers 67108864 bytes
 Redo Buffers 6008832 bytes
 Database mounted.

 SQL> exit
 Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
 With the Partitioning, OLAP, Data Mining and Real Application Testing options

[oracle@localhost dbs]$ nid target=system/oracle dbname=orcl1 setname=yes
DBNEWID: Release 11.2.0.2.0 - Production on Sun Jan 27 12:24:10 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Connected to database ORCL2 (DBID=1229390655)
Connected to server version 11.2.0
Control Files in database:
 /home/oracle/app/oracle/oradata/orcl/control01.ctl
 /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl
Change database name of database ORCL2 to ORCL1? (Y/[N]) => Y
Proceeding with operation
 Changing database name from ORCL2 to ORCL1
 Control File /home/oracle/app/oracle/oradata/orcl/control01.ctl - modified
 Control File /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl - modified
 Datafile /home/oracle/app/oracle/oradata/orcl/system01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/sysaux01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/undotbs01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/users01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/example01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/APEX_1246426611663638.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/APEX_1265209995679366.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/temp01.db - wrote new name
 Control File /home/oracle/app/oracle/oradata/orcl/control01.ctl - wrote new name
 Control File /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl - wrote new name
 Instance shut down
Database name changed to ORCL1.
 Modify parameter file and generate a new password file before restarting.
 Succesfully changed database name.
 DBNEWID - Completed succesfully.
[note: I have already got the relevant init.ora and oratab setup]
[oracle@localhost dbs]$ . oraenv
 ORACLE_SID = [orcl2] ? orcl1
 The Oracle base has been set to /home/oracle/app/oracle

 [oracle@localhost dbs]$ sqlplus / as sysdba
SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 27 12:24:31 2013
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup mount;
 ORACLE instance started.
Total System Global Area 456146944 bytes
 Fixed Size 1344840 bytes
 Variable Size 381684408 bytes
 Database Buffers 67108864 bytes
 Redo Buffers 6008832 bytes
 Database mounted.
SQL> alter database open;
Database altered.

SQL >exit

[oracle@localhost dbs]$ rman target system/oracle catalog rman/rman@orcl1

Recovery Manager: Release 11.2.0.2.0 – Production on Sun Jan 27 12:52:45 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database: ORCL1 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;

database reset to incarnation 2
starting full resync of recovery catalog

List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
——- ——- ——– —————- — ———- ———-
1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
1 2 ORCL1 1229390655 CURRENT 754488 30/10/09 11:38:43
1 921 ORCL1 1229390655 ORPHAN 10215936 27/01/13 12:27:12

So, the newly rename-to ORCL1 thinks we are at incarnation 2. However, log back into the original ORCL1 and it resets the incarnation back to 921. Still no corruption, still no problem!

So, I still can’t prove whether the ALTER DATABASE SET INCARNATION command will work as mentioned to me, or whether it’s just something that allows me to recover across a resetlogs command. Looks like I’ll have to reinstall Oracle 10G… tomorrow.

Advertisement

Problems with RMAN and incarnations.

One day, not so very long ago, I was at a client site looking through the “passive” half of an AIX  HACMP clustered server to tidy it up a little as we were experiencing pressure on space. There was a test database on there with a very large amount of historic archive logs. I thought it would be a good idea to check the database backups in RMAN and maybe do some tidying up through that route. This, it turned out, was not the most sensible thing I have done. The test database was a straight binary copy of the Production database. It had received no subsequent changes, especially the most important one from an RMAN perspective: the Database ID. Without a warning, RMAN immediately assumed that this database, with its more recent resetlogs and matching ID, was a new Incarnation of Production and promptly amended the catalog to that effect. Let’s just see that in action:

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:01:41 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       PROD     1099918981       CURRENT 446075     04-APR-10

RMAN> list backup summary;

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> exit

Recovery Manager complete.

[oracle]$ export ORACLE_SID=TEST
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:23 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: TEST (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

new incarnation of database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       TEST     1099918981       PARENT  446075     04-APR-10
1       1789    TEST     1099918981       CURRENT 3023938    27-FEB-11

RMAN> exit

Recovery Manager complete.

And lets see what happens when we go into RMAN for Production

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:59 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of list command at 02/27/2011 21:03:04
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog

RMAN> list incarnation;
RMAN> exit

Recovery Manager complete.

So where does that leave me? With the current production datawarehouse unable to access RMAN as it’s not the right Incarnation. One quick look at the clock, and you know what time it is. 30 minutes to the start of a tight backup window, which will fail. It’s inevitable that this sort of thing never happens with 8 hours of free time to work out the best way to resolve the problem, but with scant time to sort it out with no impact on the Production system. After some thought, and some Google, it became apparent that the only solution was to hack manually edit the RMAN catalog to remove the new incarnation.

EDIT! Before trying the catalog mod below you should look at the My Oracle Support document 412113.1, and check out the rman commands:

RMAN> list incarnation;
RMAN> reset database to incarnation <dbinc_key>;    
RMAN> resync catalog;
RMAN> list incarnation;

OK. Proceed at your own risk!

To remove the bad incarnation record from the recovery catalog:

[oracle]$ sqlplus rman/rman@rman_db
RMAN @ RMAN_DB >  select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         NO          8                                              1 30-JUN-05 PARENT
1 1099918981       1789 TEST                3023938          27-FEB-11         YES         2                                         446075 04-APR-10 CURRENT
RMAN @ RMAN_DB > select * from db;
DB_KEY      DB_ID HIGH_CONF_RECID LAST_KCCDIVTS HIGH_IC_RECID CURR_DBINC_KEY
---------- ---------- --------------- ------------- ------------- --------------
1 1099918981                     744219186             2           1789
RMAN @ RMAN_DB > update db set curr_dbinc_key = 2;
1 row updated.
RMAN @ RMAN_DB > delete from dbinc where dbinc_key = 1789;
1 row deleted.
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         YES         8                                              1 30-JUN-05 PARENT

RMAN @ RMAN_DB > commit;

Commit complete.
And let's see if we can use RMAN again...
[oracle]$ rman target / catalog rman/rman@rman_db
Recovery Manager: Release 10.2.0.1.0 - Production on Mon Mar 21 22:08:45 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: PROD  (DBID=1099918981)
connected to recovery catalog database
RMAN> list backup summary;
starting full resync of recovery catalog
full resync complete
List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        TAG20110227T144639
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

And so, we are just about back where we started before some idiot messed up the RMAN catalog, and the backups work just fine. Now we need to change the dbid on the TEST database, using the nid command before another DBA does the same thing.

The last thing to do was to ensure that the recovery worked too.

NOTE: 11G Update to this blog entry