Problems with RMAN and incarnations.

One day, not so very long ago, I was at a client site looking through the “passive” half of an AIX  HACMP clustered server to tidy it up a little as we were experiencing pressure on space. There was a test database on there with a very large amount of historic archive logs. I thought it would be a good idea to check the database backups in RMAN and maybe do some tidying up through that route. This, it turned out, was not the most sensible thing I have done. The test database was a straight binary copy of the Production database. It had received no subsequent changes, especially the most important one from an RMAN perspective: the Database ID. Without a warning, RMAN immediately assumed that this database, with its more recent resetlogs and matching ID, was a new Incarnation of Production and promptly amended the catalog to that effect. Let’s just see that in action:

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:01:41 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       PROD     1099918981       CURRENT 446075     04-APR-10

RMAN> list backup summary;

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> exit

Recovery Manager complete.

[oracle]$ export ORACLE_SID=TEST
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:23 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: TEST (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

new incarnation of database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       TEST     1099918981       PARENT  446075     04-APR-10
1       1789    TEST     1099918981       CURRENT 3023938    27-FEB-11

RMAN> exit

Recovery Manager complete.

And lets see what happens when we go into RMAN for Production

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:59 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of list command at 02/27/2011 21:03:04
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog

RMAN> list incarnation;
RMAN> exit

Recovery Manager complete.

So where does that leave me? With the current production datawarehouse unable to access RMAN as it’s not the right Incarnation. One quick look at the clock, and you know what time it is. 30 minutes to the start of a tight backup window, which will fail. It’s inevitable that this sort of thing never happens with 8 hours of free time to work out the best way to resolve the problem, but with scant time to sort it out with no impact on the Production system. After some thought, and some Google, it became apparent that the only solution was to hack manually edit the RMAN catalog to remove the new incarnation.

EDIT! Before trying the catalog mod below you should look at the My Oracle Support document 412113.1, and check out the rman commands:

RMAN> list incarnation;
RMAN> reset database to incarnation <dbinc_key>;    
RMAN> resync catalog;
RMAN> list incarnation;

OK. Proceed at your own risk!

To remove the bad incarnation record from the recovery catalog:

[oracle]$ sqlplus rman/rman@rman_db
RMAN @ RMAN_DB >  select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         NO          8                                              1 30-JUN-05 PARENT
1 1099918981       1789 TEST                3023938          27-FEB-11         YES         2                                         446075 04-APR-10 CURRENT
RMAN @ RMAN_DB > select * from db;
DB_KEY      DB_ID HIGH_CONF_RECID LAST_KCCDIVTS HIGH_IC_RECID CURR_DBINC_KEY
---------- ---------- --------------- ------------- ------------- --------------
1 1099918981                     744219186             2           1789
RMAN @ RMAN_DB > update db set curr_dbinc_key = 2;
1 row updated.
RMAN @ RMAN_DB > delete from dbinc where dbinc_key = 1789;
1 row deleted.
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         YES         8                                              1 30-JUN-05 PARENT

RMAN @ RMAN_DB > commit;

Commit complete.
And let's see if we can use RMAN again...
[oracle]$ rman target / catalog rman/rman@rman_db
Recovery Manager: Release 10.2.0.1.0 - Production on Mon Mar 21 22:08:45 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: PROD  (DBID=1099918981)
connected to recovery catalog database
RMAN> list backup summary;
starting full resync of recovery catalog
full resync complete
List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        TAG20110227T144639
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

And so, we are just about back where we started before some idiot messed up the RMAN catalog, and the backups work just fine. Now we need to change the dbid on the TEST database, using the nid command before another DBA does the same thing.

The last thing to do was to ensure that the recovery worked too.

NOTE: 11G Update to this blog entry

5 Responses to Problems with RMAN and incarnations.

  1. Martin Widlake says:

    A belated but warm welcome to the world of blogging Neil. And a nice opening piece about unexpected misfortune to start with. I don’t think there are many of us out there who would have predicted this one, so thanks for sharing.

    So, did the test recovery work? Or did you find that, like many sites, your backup was just a way to keep the network busy for a while? 🙂

    Like

    • Martin,

      Thanks. Whilst the backups at that site are somewhat unusual, the restore of this system (to Pre-prod) worked very well.

      I would recommend that following this action, the database is unregistered and reregistered into the catalog (if that’s acceptable at your site) to clean up the RMAN catalog properly.

      And you are right – backups aren’t that important. Being able to restore them is.

      Neil

      Like

  2. veera says:

    Luv u Neil Chandler.
    Hats off to Neil Chandler’s DBA Blog.
    U saved my day.
    Yup, I did the same mistake. I was wondering what I wud do if tis blog is not there!!!
    keep up the great work!!!
    Thanks a lot Bro!

    There is a little diffreence in my issue though ! the database name of prod got changed to test and test remins as test.

    SQL> conn rman/rman@CATDB
    Connected.
    SQL> select * from db where db_id=1224325486;

    DB_KEY DB_ID HIGH_CONF_RECID LAST_KCCDIVTS CURR_DBINC_KEY
    ———- ———- ————— ————- ————–
    1 1224325486 11 766845572 262897

    SQL> select * from rc_database_incarnation where DBID=1224325486 order by resetlogs_time;

    DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOG CUR PARENT_DBINC_KEY
    ———- ———- ———- ——– —————– ——– — —————-
    1 1224325486 3183 UNKNOWN 1 19/12/06 NO
    1 1224325486 2 TEST 8814510 11/05/07 NO 3183
    1 1224325486 262897 TEST 8,8030E+12 10/11/11 YES 2

    SQL> update db set curr_dbinc_key = 2 where db_id=1224325486;

    1 row updated.

    SQL> delete from dbinc where dbinc_key =262897;

    1 row deleted.

    SQL> select * from rc_database_incarnation where DBID=1224325486 order by resetlogs_time;

    DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOG CUR PARENT_DBINC_KEY
    ———- ———- ———- ——– —————– ——– — —————-
    1 1224325486 3183 UNKNOWN 1 19/12/06 NO
    1 1224325486 2 TEST 8814510 11/05/07 YES 3183

    SQL> commit;

    Commit complete.

    SQL> select * from rc_database_incarnation where DBID=1224325486 order by resetlogs_time;

    DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOG CUR PARENT_DBINC_KEY
    ———- ———- ———- ——– —————– ——– — —————-
    1 1224325486 3183 UNKNOWN 1 19/12/06 NO
    1 1224325486 2 PROD 8814510 11/05/07 YES 3183

    Like

    • Glad I was able to help! I know exactly how you were feeling when you discovered the problem!

      Please remember to TEST YOUR RESTORE now (following a successful backup) to ensure that you catalog is working correctly after the change.

      I cannot recommend enough that you restore a backup to a test system to ensure it’s all fine.

      regards

      Neil.

      Like

  3. Pingback: RMAN Incarnations revisited (11G) « Neil Chandler's DBA Blog

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.