Goldengate Install Error

Sometimes you waste much more time than you can believe because the instrumentation of the system isn’t great. If you don’t know what a system is doing, you can’t easily fix it. Time should be take to instrument your code, and ensure that any outputs from that instrumentation are readily understandable by others.

Here’s a mild frustration from this week that I encountered. I was installing Goldengate…

./runInstaller -silent -waitforcompletion -responseFile /u01/migrate/gg/software/fbo_ggs_Linux_x64_shiphome/Disk1/install_source.rsp
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 120 MB. Actual 21571 MB Passed
Checking swap space: must be greater than 150 MB. Actual 24575 MB Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2020-07-30_08-03-36PM. Please wait ...
[WARNING] [INS-08109] Unexpected error occurred while validating inputs at state 'installOptions'.
CAUSE: No additional information available.
ACTION: Contact Oracle Support Services or refer to the software manual.
SUMMARY:
- java.lang.NullPointerException

Oh my word! ACTION: Contact Oracle Support Services! java.lang.NullPointerException! Something is terribly wrong!

I’d better re-read this error message carefully… Unexpected error occurred while validating inputs at state ‘installOptions’. What? Validating inputs? Lets have a look at the inputs from the response file:

install_source.rsp

################################################################################
## ##
## Oracle GoldenGate installation option and details ##
## ##
################################################################################

#-------------------------------------------------------------------------------
# Specify the installation option.
# Specify ORA19c for installing Oracle GoldenGate for Oracle Database 19c or
# ORA18c for installing Oracle GoldenGate for Oracle Database 18c or
# ORA12c for installing Oracle GoldenGate for Oracle Database 12c or
# ORA11g for installing Oracle GoldenGate for Oracle Database 11g
#-------------------------------------------------------------------------------
INSTALL_OPTION=ORA11G

#-------------------------------------------------------------------------------
# Specify a location to install Oracle GoldenGate
#-------------------------------------------------------------------------------
SOFTWARE_LOCATION=/u01/gg

START_MANAGER=
MANAGER_PORT=
DATABASE_LOCATION=
INVENTORY_LOCATION=
UNIX_GROUP_NAME=

 

I only have 2 inputs; the INSTALL_OPTION and the SOFTWARE_LOCATION… and the INSTALL_OPTION is wrong! This field is case sensitive. ORA11G does not match ORA11g, so I get a dramatic null pointer exception and a failure. It would have been a much quicker remediation if the Oracle error had been curated a little better and the programmer a little more sympathetic to users of their code.

How about:
Installation parameter INSTALL_OPTION invalid,
rather than
Unexpected error occurred while validating inputs at state ‘installOptions’

Advertisement

GoldenGate unsupported types

identity
When using GoldenGate there are few rules which need to be followed, and a very small number of unsupported datatypes. This is the excerpt from the GoldenGate 12.2 manual:

 

 

1.6.10 Non-Supported Oracle Data Types
Oracle GoldenGate does not support the following data types.

 ANYDATA fetch-based column support for data types with VARRAYS that do not include named collections and VARRAYS embedded within those data types
 ANYDATASET
 ANYTYPE
 BFILE
 MLSLABEL
 ORDDICOM
 REFs
 TIMEZONE_ABBR
 URITYPE
 UDT with containing an unsupported Oracle Data Type
 Oracle GoldenGate does not support replication of identity column data or Valid Time Temporal column data.

One of those may be more common that you think – the last bullet point is the Oracle 12C datatype IDENTITY. The following code snippet is not supported for replication by GoldenGate

create table with_identity
( COL_ID    number(38) generated always as identity minvalue 1 not null enable,
  text_info varchar2(10),
  constraint with_identity_pk  primary key (COL_ID) );

insert into with_identity (text_info) values ('Row 1');

commit;

desc with_identity
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 COL_ID                        NOT NULL NUMBER(38)
 TEXT_INFO                              VARCHAR2(10)

You can see that it is an identity column in user_tab_columns:

select TABLE_NAME,COLUMN_NAME,IDENTITY_COLUMN from user_tab_columns where table_name = 'WITH_IDENTITY';

TABLE_NAME                     COLUMN_NAME                    IDE
------------------------------ ------------------------------ ---
WITH_IDENTITY                  COL_ID                         YES
WITH_IDENTITY                  TEXT_INFO                      NO

So, how does GoldenGate alert you to the fact that it will not be replicating this table with an unsupported type? It doesn’t.

look in ggserr.log – nothing
look in the extract report file for wildcard matching the table – nothing

With a table containing an unsupported data type, the lack of replication is silent because GoldenGate is completely ignoring the object. Completely Silently.

So, how could I have picked this up before it became a problem?

Obviously I need to run the script in MOS article 1296168.1, which analyzes your schema for GoldenGate compatiblity. Unfortunately, Oracle have not updated this to account for IDENTITY columns, so there is no indication that it does not work!

The script to run is the GoldenGate Integrated Extract Healthcheck script, found in MOS article 1448324.1

Finally some vague and unclear indication that we may have a problem. Part way through the report we get this:

++ TABLES SUPPORT BY GOLDENGATE Integrated Capture ++
Lists tables that can not be supported by OGG (NONE)
Lists table that are supported via OGG FETCH (ID KEY)

Owner           OBJECT_NAME             SUPPOR  Container ID
C##GOLDENGATE 	AQ$_OGG$Q_TAB_E_HR_C 	NONE 	1
C##GOLDENGATE 	AQ$_OGG$Q_TAB_E_HR_D 	NONE 	1
C##GOLDENGATE 	OGG$Q_TAB_E_HR 	        NONE 	1
HR              WITH_IDENTITY         ID KEY     3
IX 	        AQ$_ORDERS_QUEUETABLE_G NONE 	3

Wow.
Always take care with GoldenGate – it’s not just a plug-in solution. You need to think about the implementaiton and work with the application to get the best outcomes.

Getting Started with GoldenGate

gettingstartedwithgoldengateNot a blog post per-se, but a link to my article in the UKOUG quarterly magazine OracleScene. It is a great publication with some simply fantastic articles. Its completely free, and is available online as well as in print at Oracle offices in the UK and at UKOUG events. My article will take you through a simple step-by-step Oracle centric one way replication on VM Virtual Box. From nothing to fully working replication. Sexy! Hope you enjoy it.

UKOUG RDBMS and RAC-CIA Special Interest Groups

On Thursday 21st April, there is a dual UKOUG Database and RAC, Cloud, Infrastructure and Availability special interest group.

For the first time, this event is being held in the fabulous Northern city of Manchester!

There are a dozen interesting, career-assisting, educational talks from end users, Oracle employees and a number of well known Oracle ACE’s at all levels, including Carl Dudley, Jonathan Lewis, Phil Brown and myself.

I will be talking about how to troubleshoot Goldengate, showing optimal configurations to assist with problem determination and a bit of staring at Hex dumps for the brave.

There are only a few places left for this popular dual-stream event. Click Here for more details about the talks and speakers, and details for registration.

See you there!

Goldengate Data Manipulation – When Inserts & Updates differ

One very useful aspect of Golden Gate is to allow the manipulation of data between the source and the destinations.

One recent problem that I encountered was to alter the data differently for inserts than for updates. This was caused by the receiving system needing to have some default data in columns which may or may not be supplied by the insert or update statements. This is slightly more complex than first imagined:

  • If we have an INSERT and the column value IS NULL, or the column value IS NOT SUPPLIED by the insert statement, we should set the default for that column.
  • If we have an UPDATE and the column value IS NULL, then we should set the default for that column.
    We must NOT set the default if the UPDATE does not supply the column, otherwise we may incorrectly overwrite data in the target system.

The first problem is that, by default, you are not allowed to have more than one table mapping per table. To get around this, you need to use the “ALLOWDUPTARGETMAP” parameter. You can then add multiple mappings.

You need to be aware that each mapping will fire for each transaction action. If you have 2 active table mappings for the same table, you will end up with 2 inserts/updates/deletes. Get this mapping wrong and your data integrity will be destroyed, and you will get a lot of constraint errors. In this case we have 2 mappings, one for inserts and one for updates and deletes. I need to use the get/ignore commands to indicate which actions each mapping should use.

The following example was for a data pump, but it is valid to do this for all extracts and replicats.

 

-- ggsci: add extract p_neil, exttrailsource /u02/gg/bin12/dirdat/NE, BEGIN NOW
-- ggsci: add rmttrail ./dirdat/NP extract p_neil megabytes 100

EXTRACT p_neil
USERID owner_goldengate@DB_LOCAL PASSWORD password
passthru
TARGETDEFS ./dirdef/defgen.neil.def
RMTHOST remote.server.world mgrport 7809
RMTTRAIL ./dirdat/NP

-- So we can have multiple mappings for a single table. This is a dangerous parameter!
ALLOWDUPTARGETMAP

-- FOR INSERTS - REPLACE MISSING COLUMNS
getinserts
ignoreupdates
ignoredeletes

TABLE NCHA.NEIL, TARGET NCHA.NEIL
COLMAP (usedefaults, &
C2 = @IF (@COLTEST (C2 , NULL, MISSING) , '1900-01-01:00:00:00.000000' , C2 ), &
C3 = @IF (@COLTEST (C3 , NULL, MISSING) , '1900-01-01:00:00:00.000000' , C3 ), &
);

-- FOR UPDATES - IGNORE MISSING COLUMNS
-- We will do the deletes here too. If they are supplied as NULL they should be modified
-- You may need to do a separate section for deletes depending upon your rules.
ignoreinserts
getupdates
getdeletes

TABLE NCHA.NEIL, TARGET NCHA.NEIL
COLMAP (usedefaults, &
C2 = @IF (@COLTEST (C2 , NULL) , '1900-01-01:00:00:00.000000' , C2 ), &
C3 = @IF (@COLTEST (C3 , NULL) , '1900-01-01:00:00:00.000000' , C3 ), &
);

-- And back to normal for subsequent table mappings
getinserts
getupdates
getdeletes

Goldengate Log Rotation

Golden Gate 12 has some excellent commands to keep your log files in check, plus one glaring omission (scheduled for a future enhancement)

Each extract, datapump and replicat will be writing to report (.rpt) and discard (.dsc) files in the dirrpt directory (if you aren’t specifying a discard file, you should. They are very useful for troubleshooting)
If your system is up for a long time, these files are going to get large. Oracle has realised this and provides some lovely in-built log rotation commands. To keep my parameter (.prm) files nice and neat and consistent, I use include files, and this is a perfect case for a standard include.

report.prm:

-- Standard include commands for ALL extracts and replicats to ensure they are aligned
-- Write the days stats out to the file at the end of every day.
-- Roll the file over every week
-- Report just how much throughput we have every 15 minutes
-- History: 14.04.2015 N Chandler 
-- place the command include/dirprm/report.prm in your parameter files

STATOPTIONS REPORTDETAIL, RESETREPORTSTATS
REPORT AT 23:59
REPORTROLLOVER AT 00:01 ON MONDAY
REPORTCOUNT EVERY 15 MINUTES, RATE
-- or if you would rather by volume...
--REPORTCOUNT EVERY 10000000 RECORDS, RATE

I think it’s worth pointing out here what the 2 bracketed throughput numbers output by the REPORTCOUNT commands mean, as you’ll struggle to find it in the documentation

Rate  = number of records processed per second since startup divided by the total time since startup of the extract/replicat
Delta = number of records processed per second since last report divided by time since last report (in this case, 15 minutes)

 

 

There is 1 notable growing file which you cannot rotate using Goldengate commands: ggserr.log

This is a significant oversight by Oracle and will be rectified in a future release, but as of 12.1 you have to manually sort this out. You have 2 main options to do this:

1. Stop the manager, rename the file, restart the manager
2. Copy the file to a new file and then empty the in-place file by catting /dev/null into it. (I’m sure there’s a Windows equivalent of this, but I mainly work on Unix)
* DO NOT simply delete the file while the manager is running.
All future error output will drop into a “black hole” until the manager is restarted.   Option 2 tends to be preferably, so here’s part of a bash script I use to perform this action

#!/bin/bash
# rotate_ggserr_log.sh - copies the logfile to one side with a date suffix and blows away the current file
# but leaves it in place to we can continue to write to it with the manager.
# Neil Chandler 14.04.2015 created
#
today=`date +%Y%m%d`

-- Check to see if we have already rolled-over today
if [ -e /u99/gg/bin/ggserr.log.${today} ]
then
 echo "File /u99/gg/bin/ggserr.log.${today} exist already. Stopping."
else
 # copy the log file preserving attributes
 /bin/cp -pnv /u99/gg/bin/ggserr.log /u99/gg/bin/ggserr.log.${today}

 # See if there is a difference - did you copy it successfully?
 diff /u99/gg/bin/ggserr.log /u99/gg/bin/ggserr.log.${today}
 RC=$?

 # If there is no difference, wipe the ggserr.log file out
 # otherwise stop!
 if [ ${RC} -eq 0 ]
 then
  echo "clear the file /u99/gg/bin/ggserr.log"
  cat /dev/null > /u99/gg/bin/ggserr.log
  exit ${RC}
 else
  echo "Error - cannot clear file /u99/gg/bin/ggserr.log as it's not the same as the copied version. Stopping."
  exit ${RC}
 fi
fi

Goldengate: Problems with character sets

One complication that you may face with replicating data using Goldengate (or other tools) is when your source character set is different to your destination character set. This is particularly true when the source character set is UTF-8 and the destination is not.

If the application does not sanitise (or you do not want to sanitise) inputs to restrict them to the lowest common denominator within your systems, you will need to ensure that you take action to ensure the source data is fed appropriately to the destination systems.

I recently experienced this at a client, but the special measures taken by me to allow Oracle UTF-8 data into a SQL Server database using a standard Windows-1252 character set also hit a Goldengate codepath bug, recreated here:

The source table, TAB1, has 3 columns for these purposes:
 

 ID     number
 NAME   varchar2(50)
 COL_TS timestamp

 
The source table is allowed to contain NULLS, but the destination table must not, so a null value test is specified in the COLMAP in the REPLICAT.

To cope with the character set conversion, the parameter REPLACEBADCHAR SPACE is specified in the replication. This states that is there are ANY characters in the trail file which the destination database cannot store, then that character should be converted to (in this instance) a space.

REPLICAT snippet:

REPLACEBADCHAR SPACE
MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1,
    COLMAP (USEDEFAULTS,
            NAME =@IF(@COLTEST(NAME, NULL), ' ' ,NAME));

 
All processing progressed nicely, with the NULLs entered into TAB1.NAME being converted into a single space, until an unexpected character was pasted into the screen on the source system and the REPLICAT abended:

2015-02-25 22:32:00 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set.
2015-02-25 22:32:00 WARNING OGG-01503 Aborting BATCHSQL transaction. Mapping error.
2015-02-25 22:32:01 WARNING OGG-01137 BATCHSQL suspended, continuing in normal mode.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.
2015-02-25 22:32:01 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set.
2015-02-25 22:32:01 WARNING OGG-01431 Aborted grouped transaction on 'dbo.TAB1', Mapping error.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.
2015-02-25 22:32:01 WARNING OGG-01151 Error mapping from SCHEMA_OWNER.TAB1 to dbo.SUP_TAB1.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.

Source Context :
SourceModule : [er.errors]
SourceID : [er/errors.cpp]
SourceFunction : [take_rep_err_action]
SourceLine : [682]
ThreadBacktrace : [12] elements
: [Z:\gg12\gglog.dll(?CreateMessage@CMessageFactory@@QEAAPEAVCMessage@@PEAVCSourceContext@@IZZ+0x886) [0x000007FEF00809D6]]
: [Z:\gg12\gglog.dll(?_MSG_ERR_MAP_TO_TANDEM_FAILED@@YAPEAVCMessage@@PEAVCSourceContext@@AEBV?$CQualDBObjName@$00@ggapp@gglib@ggs@@1W4MessageDisposition@CMessageFactory@@@Z+0x81) [0x000007FEF0043631]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x733c) [0x000000013F6E96BC]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x2fe7a) [0x000000013F7121FA]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x6a575) [0x000000013F74C8F5]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xea23) [0x000000013F7F2323]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe000) [0x000000013F7F1900]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe8cd) [0x000000013F7F21CD]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x6a5f9) [0x000000013F74C979]]
: [Z:\gg12\replicat.exe(CommonLexerNewSSD+0xc0d2) [0x000000013F8862D2]]
: [C:\Windows\system32\kernel32.dll(BaseThreadInitThunk+0xd) [0x00000000773B652D]]
: [C:\Windows\SYSTEM32\ntdll.dll(RtlUserThreadStart+0x21) [0x00000000774EC541]]

2015-02-25 22:32:01 ERROR OGG-01296 Error mapping from SCHEMA_OWNER.TAB1 to dbo.TAB1.

Looking in the Goldengate discard file (always a good place to start when you have a GG problem), you can see the problem character “ef bf bd”:

Oracle GoldenGate Delivery for SQL Server process started, group REP_SQL discard file opened: 2015-02-11 22:02:42.389000
Mapping error to target column: NAME
Mapping error to target column: NAME
Mapping error to target column: NAME
Current time: 2015-02-25 22:32:01
Discarded record from action ABEND on error 0

Aborting transaction on ./dirdat/NC beginning at seqno 2 rba 123
error at seqno 2 rba 123
Problem replicating SCHEMA_OWNER.TAB1 to dbo.TAB1
Mapping problem with insert record (source format)...
*
ID = 123
000000: 31 32 33  |123     |

NAME = NEIL \uFFFD CHA
000000: 4E 45 49 4C 20 ef bf bd 20 43 48 41 | NEIL ... CHA|

COL_TS = 2015-02-25:22:31:56.303000000
000000: 32 30 31 35 2d 30 32 2d 32 35 3a 32 32 3a 33 31 |2015-02-25:22:31|
000010: 3a 35 36 2e 33 30 33 30 30 30 30 30 30          |:56.303000000   |
*

Process Abending : 2015-02-25 22:32:01

So, why didn’t REPLACEBADCHAR catch this and turn the offending character into a space? There’s a clue in the ABEND report information

WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 

The column is referred-to as @IF(), not as NAME. A quick scan of MOS show that this appears to be BUG 19818362 “Column function execution was happening internally under NOCHARSETCONVERSION cases” – it’s going through the wrong codepath for REPLCEBADCHAR to work. And this bug fix was released 10 days before this problem was encountered. Result!

The short-term fix? Remove the data manipulation from the REPLICAT

REPLACEBADCHAR SPACE
MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1,
    COLMAP (USEDEFAULTS);

 

Start and run the REPLICAT until past the problem, then revert the REPLICAT back to data manipulation until either Goldengate is patched and tested or we have to repeat this exercise due to another Unicode character problem occurs.

UKOUG Tech 14

On Sunday I will be heading North from London to Liverpool for 4 days, to attend another UK Oracle User Group conference – #UKOUG_Tech14

I’m sure it will be as wonderful and informative a 4 days as you can get in the Oracle technical area. The hard part of attending is working out what and who to see.

I will be presenting there again – this time a talk on Goldengate late on the final day. I just need to get my slides a little more polished…

Hopefully I’ll see you there. Please say Hi! I am fairly social and mostly house trained. But no stalking, OK.

Goldengate OGG-01223 (Version) Problem

Just implementing Goldengate between a platform I don’t understand, a Tandem/HP non-stop, and Oracle 11G R2 RAC. So, I spend the day trying to get it working, have all of the configuration seemingly correct and when the Tandem guy an I try to for a connection to send data over, we keep getting the following warning (and no data):

WARNING OGG-01223  Oracle GoldenGate Collector for Oracle:  Could not find definition for <Tandem-table-thing>

So, after extensive tweaks, mods, changes, banging heads off desks and general disbelief, I spend a while Googling for (or, more accurately, duckduckgo-ing – I’m sick of being the commodity), and on MOS, and noticed a problem between Version 11.2 of GG and 11.1. It was in no way related to the problem we had, but it prompted me into a though; exactly which versions of GG are on each platform? There was a minor discrepancy of versions.

OK – download GG 11.1.1.0.0 for Linux to match the Tandem version. Install. Replay the Configuration. Run. Works. Simples.

Lesson: Check your versions, check your compatibility. Where possible, keep them absolutely perfectly aligned and you might not waste 1/2 day troubleshooting. Grrrr.