GoldenGate unsupported types

identity
When using GoldenGate there are few rules which need to be followed, and a very small number of unsupported datatypes. This is the excerpt from the GoldenGate 12.2 manual:

 

 

1.6.10 Non-Supported Oracle Data Types
Oracle GoldenGate does not support the following data types.

 ANYDATA fetch-based column support for data types with VARRAYS that do not include named collections and VARRAYS embedded within those data types
 ANYDATASET
 ANYTYPE
 BFILE
 MLSLABEL
 ORDDICOM
 REFs
 TIMEZONE_ABBR
 URITYPE
 UDT with containing an unsupported Oracle Data Type
 Oracle GoldenGate does not support replication of identity column data or Valid Time Temporal column data.

One of those may be more common that you think – the last bullet point is the Oracle 12C datatype IDENTITY. The following code snippet is not supported for replication by GoldenGate

create table with_identity
( COL_ID    number(38) generated always as identity minvalue 1 not null enable,
  text_info varchar2(10),
  constraint with_identity_pk  primary key (COL_ID) );

insert into with_identity (text_info) values ('Row 1');

commit;

desc with_identity
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 COL_ID                        NOT NULL NUMBER(38)
 TEXT_INFO                              VARCHAR2(10)

You can see that it is an identity column in user_tab_columns:

select TABLE_NAME,COLUMN_NAME,IDENTITY_COLUMN from user_tab_columns where table_name = 'WITH_IDENTITY';

TABLE_NAME                     COLUMN_NAME                    IDE
------------------------------ ------------------------------ ---
WITH_IDENTITY                  COL_ID                         YES
WITH_IDENTITY                  TEXT_INFO                      NO

So, how does GoldenGate alert you to the fact that it will not be replicating this table with an unsupported type? It doesn’t.

look in ggserr.log – nothing
look in the extract report file for wildcard matching the table – nothing

With a table containing an unsupported data type, the lack of replication is silent because GoldenGate is completely ignoring the object. Completely Silently.

So, how could I have picked this up before it became a problem?

Obviously I need to run the script in MOS article 1296168.1, which analyzes your schema for GoldenGate compatiblity. Unfortunately, Oracle have not updated this to account for IDENTITY columns, so there is no indication that it does not work!

The script to run is the GoldenGate Integrated Extract Healthcheck script, found in MOS article 1448324.1

Finally some vague and unclear indication that we may have a problem. Part way through the report we get this:

++ TABLES SUPPORT BY GOLDENGATE Integrated Capture ++
Lists tables that can not be supported by OGG (NONE)
Lists table that are supported via OGG FETCH (ID KEY)

Owner           OBJECT_NAME             SUPPOR  Container ID
C##GOLDENGATE 	AQ$_OGG$Q_TAB_E_HR_C 	NONE 	1
C##GOLDENGATE 	AQ$_OGG$Q_TAB_E_HR_D 	NONE 	1
C##GOLDENGATE 	OGG$Q_TAB_E_HR 	        NONE 	1
HR              WITH_IDENTITY         ID KEY     3
IX 	        AQ$_ORDERS_QUEUETABLE_G NONE 	3

Wow.
Always take care with GoldenGate – it’s not just a plug-in solution. You need to think about the implementaiton and work with the application to get the best outcomes.

Oracle Audit Part 4 – Unified Auditing Problem

Part 4 in a loose series of blog posts about auditing; the new Oracle 12 feature. Unified auditing.monkey_examine

The intention behind Unified Auditing, as the name suggests, is to pull together all of the audit records from the disparate Oracle auditing locations into a single unified location. It captures standard audit information (including sys records normally written to the O/S),  FGA audit, record from Database Vault, Label Security, rman, etc.

Sounds great, and it is. 12.1 has a significant problem, but it seems a lot better in 12.2 (where I have not yet had time to spot any problems 🙂 )

The first thing that you need to know is that it is on by default.
It is gathering audit information in your 12C database right now, possibly duplicating any traditional auditing you may have switched-on yourself. By default the database is in “Mixed mode”, meaning you can use the old style of audit and unified auditing too. You need to re-link the oracle binary with the database down to enable exclusively-Unified auditing

cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk uniaud_on ioracle

(on windows, you need to move the correct DLL into place.)

So what is enabled by default?

1  select * from audit_unified_enabled_policies;

USER_NAME       POLICY_NAME            ENABLED ENABLED_OPTION  ENTITY_NAME     ENTITY_ SUC FAI
--------------- ---------------------- ------- --------------- --------------- ------- --- ---
ALL USERS       ORA_SECURECONFIG       BY      BY USER         ALL USERS       USER    YES YES
ALL USERS       ORA_LOGON_FAILURES     BY      BY USER         ALL USERS       USER    NO  YES

What does that mean?

  1  select POLICY_NAME,AUDIT_OPTION,AUDIT_OPTION_TYPE,OBJECT_SCHEMA,OBJECT_NAME,OBJECT_TYPE
  2    from audit_unified_policies
  3   where policy_name in ('ORA_SECURECONFIG','ORA_LOGON_FAILURES')
  4*  order by 1,2,3,4,5;

POLICY_NAME          AUDIT_OPTION                             AUDIT_OPTION_TYPE  OBJECT_SCHEMA             OBJECT_NAME               OBJECT_TYPE
-------------------- ---------------------------------------- ------------------ ------------------------- ------------------------- --------------------
ORA_LOGON_FAILURES   LOGON                                    STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     ADMINISTER KEY MANAGEMENT                SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     ALTER ANY PROCEDURE                      SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     ALTER ANY SQL TRANSLATION PROFILE        SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     ALTER ANY TABLE                          SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     ALTER DATABASE                           SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     ALTER DATABASE LINK                      STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     ALTER PLUGGABLE DATABASE                 STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     ALTER PROFILE                            STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     ALTER ROLE                               STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     ALTER SYSTEM                             SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     ALTER USER                               STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     AUDIT SYSTEM                             SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     BECOME USER                              SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE ANY JOB                           SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE ANY LIBRARY                       SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE ANY PROCEDURE                     SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE ANY SQL TRANSLATION PROFILE       SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE ANY TABLE                         SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE DATABASE LINK                     STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE DIRECTORY                         STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE EXTERNAL JOB                      SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE PLUGGABLE DATABASE                STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE PROFILE                           STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE PUBLIC SYNONYM                    SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE ROLE                              STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE SQL TRANSLATION PROFILE           SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     CREATE USER                              SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     DROP ANY PROCEDURE                       SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     DROP ANY SQL TRANSLATION PROFILE         SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     DROP ANY TABLE                           SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     DROP DATABASE LINK                       STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     DROP DIRECTORY                           STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     DROP PLUGGABLE DATABASE                  STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     DROP PROFILE                             STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     DROP PUBLIC SYNONYM                      SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     DROP ROLE                                STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     DROP USER                                SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     EXECUTE                                  OBJECT ACTION      REMOTE_SCHEDULER_AGENT    ADD_AGENT_CERTIFICATE     PROCEDURE
ORA_SECURECONFIG     EXECUTE                                  OBJECT ACTION      SYS                       DBMS_RLS                  PACKAGE
ORA_SECURECONFIG     EXEMPT ACCESS POLICY                     SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     EXEMPT REDACTION POLICY                  SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     GRANT ANY OBJECT PRIVILEGE               SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     GRANT ANY PRIVILEGE                      SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     GRANT ANY ROLE                           SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     LOGMINING                                SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     PURGE DBA_RECYCLEBIN                     SYSTEM PRIVILEGE   NONE                      NONE                      NONE
ORA_SECURECONFIG     SET ROLE                                 STANDARD ACTION    NONE                      NONE                      NONE
ORA_SECURECONFIG     TRANSLATE ANY SQL                        SYSTEM PRIVILEGE   NONE                      NONE                      NONE

49 rows selected.

So that’s quite a lot of auditing on by default.

If you want to disable unified auditing policies, you need to use:

noaudit policy 
e.g. noaudit policy ora_secureconfig

So what’s the problem with Unified Auditing in 12.1?

Performance. It’s dreadful, and simply unusable.  Hwo about this simple select asking how much audit in the last 30 minutes from the UNIFIED_AUDIT_TRAIL view?

select count(*) from unified_audit_trail where EVENT_TIMESTAMP > systimestamp - interval '30' minutes;

It effectively blocked itself in a RAC environment, with one parallel query slave blocked by another – really not sure what was going on there but it never completed (well, I killed it after half a day).

I have seen a similar query in a non-RAC environment (or against table v$unified_audit_trail – which you really shouldn’t use as it can give an incomplete picture) run for well over an hour with a remarkably small data set. Not useful if you are trying to scrape content into an external monitor regularly…

MOS Article 2212196.1 explains how to resolve this in 12.1, by transferring all of the data to a table.

In 12.2, this is resolved by all data being stored in a table call AUDSYS.AUD$UNIFIED.
The table uses interval-based partitioning and is partitioned with a monthly interval:

...
PARTITION BY RANGE ("EVENT_TIMESTAMP") INTERVAL (INTERVAL '1' MONTH)
 (PARTITION "AUD_UNIFIED_P0"  VALUES LESS THAN (TIMESTAMP' 2014-07-01 00:00:00')
...

Because it is a table, we can index it.It works like a table. It’s quick.

In my next audit article, I’ll look to explain how unified auditing works in a bit more depth, now that the performance no longer renders it unusable to queries.

Goldengate: Problems with character sets

One complication that you may face with replicating data using Goldengate (or other tools) is when your source character set is different to your destination character set. This is particularly true when the source character set is UTF-8 and the destination is not.

If the application does not sanitise (or you do not want to sanitise) inputs to restrict them to the lowest common denominator within your systems, you will need to ensure that you take action to ensure the source data is fed appropriately to the destination systems.

I recently experienced this at a client, but the special measures taken by me to allow Oracle UTF-8 data into a SQL Server database using a standard Windows-1252 character set also hit a Goldengate codepath bug, recreated here:

The source table, TAB1, has 3 columns for these purposes:
 

 ID     number
 NAME   varchar2(50)
 COL_TS timestamp

 
The source table is allowed to contain NULLS, but the destination table must not, so a null value test is specified in the COLMAP in the REPLICAT.

To cope with the character set conversion, the parameter REPLACEBADCHAR SPACE is specified in the replication. This states that is there are ANY characters in the trail file which the destination database cannot store, then that character should be converted to (in this instance) a space.

REPLICAT snippet:

REPLACEBADCHAR SPACE
MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1,
    COLMAP (USEDEFAULTS,
            NAME =@IF(@COLTEST(NAME, NULL), ' ' ,NAME));

 
All processing progressed nicely, with the NULLs entered into TAB1.NAME being converted into a single space, until an unexpected character was pasted into the screen on the source system and the REPLICAT abended:

2015-02-25 22:32:00 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set.
2015-02-25 22:32:00 WARNING OGG-01503 Aborting BATCHSQL transaction. Mapping error.
2015-02-25 22:32:01 WARNING OGG-01137 BATCHSQL suspended, continuing in normal mode.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.
2015-02-25 22:32:01 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set.
2015-02-25 22:32:01 WARNING OGG-01431 Aborted grouped transaction on 'dbo.TAB1', Mapping error.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.
2015-02-25 22:32:01 WARNING OGG-01151 Error mapping from SCHEMA_OWNER.TAB1 to dbo.SUP_TAB1.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.

Source Context :
SourceModule : [er.errors]
SourceID : [er/errors.cpp]
SourceFunction : [take_rep_err_action]
SourceLine : [682]
ThreadBacktrace : [12] elements
: [Z:\gg12\gglog.dll(?CreateMessage@CMessageFactory@@QEAAPEAVCMessage@@PEAVCSourceContext@@IZZ+0x886) [0x000007FEF00809D6]]
: [Z:\gg12\gglog.dll(?_MSG_ERR_MAP_TO_TANDEM_FAILED@@YAPEAVCMessage@@PEAVCSourceContext@@AEBV?$CQualDBObjName@$00@ggapp@gglib@ggs@@1W4MessageDisposition@CMessageFactory@@@Z+0x81) [0x000007FEF0043631]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x733c) [0x000000013F6E96BC]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x2fe7a) [0x000000013F7121FA]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x6a575) [0x000000013F74C8F5]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xea23) [0x000000013F7F2323]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe000) [0x000000013F7F1900]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe8cd) [0x000000013F7F21CD]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x6a5f9) [0x000000013F74C979]]
: [Z:\gg12\replicat.exe(CommonLexerNewSSD+0xc0d2) [0x000000013F8862D2]]
: [C:\Windows\system32\kernel32.dll(BaseThreadInitThunk+0xd) [0x00000000773B652D]]
: [C:\Windows\SYSTEM32\ntdll.dll(RtlUserThreadStart+0x21) [0x00000000774EC541]]

2015-02-25 22:32:01 ERROR OGG-01296 Error mapping from SCHEMA_OWNER.TAB1 to dbo.TAB1.

Looking in the Goldengate discard file (always a good place to start when you have a GG problem), you can see the problem character “ef bf bd”:

Oracle GoldenGate Delivery for SQL Server process started, group REP_SQL discard file opened: 2015-02-11 22:02:42.389000
Mapping error to target column: NAME
Mapping error to target column: NAME
Mapping error to target column: NAME
Current time: 2015-02-25 22:32:01
Discarded record from action ABEND on error 0

Aborting transaction on ./dirdat/NC beginning at seqno 2 rba 123
error at seqno 2 rba 123
Problem replicating SCHEMA_OWNER.TAB1 to dbo.TAB1
Mapping problem with insert record (source format)...
*
ID = 123
000000: 31 32 33  |123     |

NAME = NEIL \uFFFD CHA
000000: 4E 45 49 4C 20 ef bf bd 20 43 48 41 | NEIL ... CHA|

COL_TS = 2015-02-25:22:31:56.303000000
000000: 32 30 31 35 2d 30 32 2d 32 35 3a 32 32 3a 33 31 |2015-02-25:22:31|
000010: 3a 35 36 2e 33 30 33 30 30 30 30 30 30          |:56.303000000   |
*

Process Abending : 2015-02-25 22:32:01

So, why didn’t REPLACEBADCHAR catch this and turn the offending character into a space? There’s a clue in the ABEND report information

WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 

The column is referred-to as @IF(), not as NAME. A quick scan of MOS show that this appears to be BUG 19818362 “Column function execution was happening internally under NOCHARSETCONVERSION cases” – it’s going through the wrong codepath for REPLCEBADCHAR to work. And this bug fix was released 10 days before this problem was encountered. Result!

The short-term fix? Remove the data manipulation from the REPLICAT

REPLACEBADCHAR SPACE
MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1,
    COLMAP (USEDEFAULTS);

 

Start and run the REPLICAT until past the problem, then revert the REPLICAT back to data manipulation until either Goldengate is patched and tested or we have to repeat this exercise due to another Unicode character problem occurs.

RMAN Incarnations revisited (11G)

Time for an update to a older post. I have previously talked about the annoyance of connecting to RMAN with a duplicated database where the DBID has not been changed. RMAN happily breaks the catalog by assuming the “new” database is a new incarnation, and prevents the previous owner of the catalog from using the backups.

I wrote a blog post a while ago about hacking your way past this problem, but was recently informed by Martin Bach that there was actually an RMAN command to fix the Incarnation problem I had encountered, so I though I had better take a look and see if it worked!

Well, the first thing I noticed was that Oracle 11G does not break when connecting from a different database with the same DBID the way it did in Oracle 10G:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sat Jan 26 12:26:10 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL1 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;
starting full resync of recovery catalog
full resync complete
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
1  2 ORCL1 1229390655 CURRENT 754488 30/10/09 11:38:43

RMAN> list backup summary;

List of Backups
===============
Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- ----------------- ------- ------- ---------- ---
331 B A A DISK 26/01/13 08:55:32 1 1 NO BACKUP1
375 B A A DISK 26/01/13 09:06:44 1 1 NO BACKUP2
400 B A A DISK 26/01/13 09:07:02 1 1 NO BACKUP3
587 B F A DISK 26/01/13 11:20:09 1 1 YES FULL BACKUP
609 B F A DISK 26/01/13 11:20:11 1 1 NO TAG20130126T112010

And on the alternate database:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sat Jan 26 12:15:07 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;
starting full resync of recovery catalog
full resync complete
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 18 ORCL2 1229390655 PARENT 1 13/08/09 23:00:48
1  2 ORCL2 1229390655 CURRENT 754488 30/10/09 11:38:43

RMAN> list backup summary;

specification does not match any backup in the repository

RMAN>

Whilst the incarnations look a little incorrect (referring to ORCL2), the system does not break. So, no more need to hack around with incarnations if the system breaks accidentally. However, what if you register the other database…

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 05:44:06 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
connected to recovery catalog database
RMAN> register database;
starting full resync of recovery catalog
full resync complete
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of register command on default channel at 01/27/2013 05:44:12
RMAN-20002: target database already registered in recovery catalog

So, after a little effort it would appear I can’t easily break the incarnations in Oracle 11G. So let’s try. I recovered the ORCL1 database to create a new incarnation to see how ORCL2 would behave when connected:

on ORCL1:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 12:32:09 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL1 (DBID=1229390655)
 connected to recovery catalog database


RMAN> list incarnation;
 List of Database Incarnations
 DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
 ------- ------- -------- ---------------- --- ---------- ----------
 1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
 1 2 ORCL1 1229390655 PARENT 754488 30/10/09 11:38:43
 1 921 ORCL1 1229390655 CURRENT 10215936 27/01/13 12:27:12 <- new incarnation

<BR>

And now ORCL2 behaves a little differently, recognising the ORCL1 incarnations correctly, and throwing an error:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 12:19:27 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
 connected to recovery catalog database

RMAN> list incarnation;
 List of Database Incarnations
 DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
 ------- ------- -------- ---------------- --- ---------- ----------
 1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
 1 2 ORCL1 1229390655 PARENT 754488 30/10/09 11:38:43
 1 921 ORCL1 1229390655 CURRENT 10215936 27/01/13 12:27:12

RMAN> list backup summary;
RMAN-00571: ===========================================================
 RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
 RMAN-00571: ===========================================================
 RMAN-03002: failure of list command at 01/27/2013 12:19:38
 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20004: target database name does not match name in recovery catalog

So, what if I change the name of ORCL2 back to ORCL1. Can I reproduce my error then?

[oracle@localhost dbs]$ sqlplus / as sysdba
SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 27 12:23:29 2013
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to:
 Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
 With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shutdown immediate;
 Database closed.
 Database dismounted.
 ORACLE instance shut down.

SQL> startup mount;
 ORACLE instance started.
Total System Global Area 456146944 bytes
 Fixed Size 1344840 bytes
 Variable Size 381684408 bytes
 Database Buffers 67108864 bytes
 Redo Buffers 6008832 bytes
 Database mounted.

 SQL> exit
 Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
 With the Partitioning, OLAP, Data Mining and Real Application Testing options

[oracle@localhost dbs]$ nid target=system/oracle dbname=orcl1 setname=yes
DBNEWID: Release 11.2.0.2.0 - Production on Sun Jan 27 12:24:10 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Connected to database ORCL2 (DBID=1229390655)
Connected to server version 11.2.0
Control Files in database:
 /home/oracle/app/oracle/oradata/orcl/control01.ctl
 /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl
Change database name of database ORCL2 to ORCL1? (Y/[N]) => Y
Proceeding with operation
 Changing database name from ORCL2 to ORCL1
 Control File /home/oracle/app/oracle/oradata/orcl/control01.ctl - modified
 Control File /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl - modified
 Datafile /home/oracle/app/oracle/oradata/orcl/system01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/sysaux01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/undotbs01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/users01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/example01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/APEX_1246426611663638.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/APEX_1265209995679366.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/temp01.db - wrote new name
 Control File /home/oracle/app/oracle/oradata/orcl/control01.ctl - wrote new name
 Control File /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl - wrote new name
 Instance shut down
Database name changed to ORCL1.
 Modify parameter file and generate a new password file before restarting.
 Succesfully changed database name.
 DBNEWID - Completed succesfully.
[note: I have already got the relevant init.ora and oratab setup]
[oracle@localhost dbs]$ . oraenv
 ORACLE_SID = [orcl2] ? orcl1
 The Oracle base has been set to /home/oracle/app/oracle

 [oracle@localhost dbs]$ sqlplus / as sysdba
SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 27 12:24:31 2013
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup mount;
 ORACLE instance started.
Total System Global Area 456146944 bytes
 Fixed Size 1344840 bytes
 Variable Size 381684408 bytes
 Database Buffers 67108864 bytes
 Redo Buffers 6008832 bytes
 Database mounted.
SQL> alter database open;
Database altered.

SQL >exit

[oracle@localhost dbs]$ rman target system/oracle catalog rman/rman@orcl1

Recovery Manager: Release 11.2.0.2.0 – Production on Sun Jan 27 12:52:45 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database: ORCL1 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;

database reset to incarnation 2
starting full resync of recovery catalog

List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
——- ——- ——– —————- — ———- ———-
1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
1 2 ORCL1 1229390655 CURRENT 754488 30/10/09 11:38:43
1 921 ORCL1 1229390655 ORPHAN 10215936 27/01/13 12:27:12

So, the newly rename-to ORCL1 thinks we are at incarnation 2. However, log back into the original ORCL1 and it resets the incarnation back to 921. Still no corruption, still no problem!

So, I still can’t prove whether the ALTER DATABASE SET INCARNATION command will work as mentioned to me, or whether it’s just something that allows me to recover across a resetlogs command. Looks like I’ll have to reinstall Oracle 10G… tomorrow.

Why ITIL CAB’s do not work as expected

ITIL (Information Technology Infrastructure Library) does work. It works very well and is becoming more and more widely adopted, but does not always work the way that management thinks it works.

The basic premise of ITIL is to put structure and process around the business of running IT. It is a collection of guiding principles which, if appropriately adopted into a company will help run the IT function in a reasonable process-driven fashion. It’s largely common sense, which helps.

However, after lots of experience with all of the different aspects of ITIL I have decided that it doesn’t really work the way management think it does. This is especially relevant to Change Management.

The premise behind Change Management is that technical staff are prevented from making non-Standard changes to the IT systems without prior authorisation from all relevant stakeholders.

Where this falls down is 2 fold.

– The definition of Standard change needs to be very carefully identified, documented and locked-down. It never is, leading to techies making personal decision about what constitutes standard change. This definition is always enthusiastically elastic, until something snaps when it will become frustratingly draconian (for a while).

– The stakeholders rarely understand the nature and true impact and risk of the changes taking place, so the CAB must accept the opinion of the techie (see above). The net result of a Change Advisory Board (CAB) is to authorise the technical staff to make change whilst putting the burden of responsibility for the change onto the management. Not the techie. If the changes goes wrong, then it is fault of the approving manager for not assessing the risk correctly, or for approving the change to take place at the wrong time or in the wrong way. However, the techie gets off scott free in all of this.

As long as the process is followed, we can get away with murder.

Problems with RMAN and incarnations.

One day, not so very long ago, I was at a client site looking through the “passive” half of an AIX  HACMP clustered server to tidy it up a little as we were experiencing pressure on space. There was a test database on there with a very large amount of historic archive logs. I thought it would be a good idea to check the database backups in RMAN and maybe do some tidying up through that route. This, it turned out, was not the most sensible thing I have done. The test database was a straight binary copy of the Production database. It had received no subsequent changes, especially the most important one from an RMAN perspective: the Database ID. Without a warning, RMAN immediately assumed that this database, with its more recent resetlogs and matching ID, was a new Incarnation of Production and promptly amended the catalog to that effect. Let’s just see that in action:

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:01:41 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       PROD     1099918981       CURRENT 446075     04-APR-10

RMAN> list backup summary;

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> exit

Recovery Manager complete.

[oracle]$ export ORACLE_SID=TEST
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:23 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: TEST (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

new incarnation of database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       TEST     1099918981       PARENT  446075     04-APR-10
1       1789    TEST     1099918981       CURRENT 3023938    27-FEB-11

RMAN> exit

Recovery Manager complete.

And lets see what happens when we go into RMAN for Production

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:59 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of list command at 02/27/2011 21:03:04
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog

RMAN> list incarnation;
RMAN> exit

Recovery Manager complete.

So where does that leave me? With the current production datawarehouse unable to access RMAN as it’s not the right Incarnation. One quick look at the clock, and you know what time it is. 30 minutes to the start of a tight backup window, which will fail. It’s inevitable that this sort of thing never happens with 8 hours of free time to work out the best way to resolve the problem, but with scant time to sort it out with no impact on the Production system. After some thought, and some Google, it became apparent that the only solution was to hack manually edit the RMAN catalog to remove the new incarnation.

EDIT! Before trying the catalog mod below you should look at the My Oracle Support document 412113.1, and check out the rman commands:

RMAN> list incarnation;
RMAN> reset database to incarnation <dbinc_key>;    
RMAN> resync catalog;
RMAN> list incarnation;

OK. Proceed at your own risk!

To remove the bad incarnation record from the recovery catalog:

[oracle]$ sqlplus rman/rman@rman_db
RMAN @ RMAN_DB >  select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         NO          8                                              1 30-JUN-05 PARENT
1 1099918981       1789 TEST                3023938          27-FEB-11         YES         2                                         446075 04-APR-10 CURRENT
RMAN @ RMAN_DB > select * from db;
DB_KEY      DB_ID HIGH_CONF_RECID LAST_KCCDIVTS HIGH_IC_RECID CURR_DBINC_KEY
---------- ---------- --------------- ------------- ------------- --------------
1 1099918981                     744219186             2           1789
RMAN @ RMAN_DB > update db set curr_dbinc_key = 2;
1 row updated.
RMAN @ RMAN_DB > delete from dbinc where dbinc_key = 1789;
1 row deleted.
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         YES         8                                              1 30-JUN-05 PARENT

RMAN @ RMAN_DB > commit;

Commit complete.
And let's see if we can use RMAN again...
[oracle]$ rman target / catalog rman/rman@rman_db
Recovery Manager: Release 10.2.0.1.0 - Production on Mon Mar 21 22:08:45 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: PROD  (DBID=1099918981)
connected to recovery catalog database
RMAN> list backup summary;
starting full resync of recovery catalog
full resync complete
List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        TAG20110227T144639
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

And so, we are just about back where we started before some idiot messed up the RMAN catalog, and the backups work just fine. Now we need to change the dbid on the TEST database, using the nid command before another DBA does the same thing.

The last thing to do was to ensure that the recovery worked too.

NOTE: 11G Update to this blog entry

%d bloggers like this: