Goldengate Install Error

Sometimes you waste much more time than you can believe because the instrumentation of the system isn’t great. If you don’t know what a system is doing, you can’t easily fix it. Time should be take to instrument your code, and ensure that any outputs from that instrumentation are readily understandable by others.

Here’s a mild frustration from this week that I encountered. I was installing Goldengate…

./runInstaller -silent -waitforcompletion -responseFile /u01/migrate/gg/software/fbo_ggs_Linux_x64_shiphome/Disk1/install_source.rsp
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 120 MB. Actual 21571 MB Passed
Checking swap space: must be greater than 150 MB. Actual 24575 MB Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2020-07-30_08-03-36PM. Please wait ...
[WARNING] [INS-08109] Unexpected error occurred while validating inputs at state 'installOptions'.
CAUSE: No additional information available.
ACTION: Contact Oracle Support Services or refer to the software manual.
SUMMARY:
- java.lang.NullPointerException

Oh my word! ACTION: Contact Oracle Support Services! java.lang.NullPointerException! Something is terribly wrong!

I’d better re-read this error message carefully… Unexpected error occurred while validating inputs at state ‘installOptions’. What? Validating inputs? Lets have a look at the inputs from the response file:

install_source.rsp

################################################################################
## ##
## Oracle GoldenGate installation option and details ##
## ##
################################################################################

#-------------------------------------------------------------------------------
# Specify the installation option.
# Specify ORA19c for installing Oracle GoldenGate for Oracle Database 19c or
# ORA18c for installing Oracle GoldenGate for Oracle Database 18c or
# ORA12c for installing Oracle GoldenGate for Oracle Database 12c or
# ORA11g for installing Oracle GoldenGate for Oracle Database 11g
#-------------------------------------------------------------------------------
INSTALL_OPTION=ORA11G

#-------------------------------------------------------------------------------
# Specify a location to install Oracle GoldenGate
#-------------------------------------------------------------------------------
SOFTWARE_LOCATION=/u01/gg

START_MANAGER=
MANAGER_PORT=
DATABASE_LOCATION=
INVENTORY_LOCATION=
UNIX_GROUP_NAME=

 

I only have 2 inputs; the INSTALL_OPTION and the SOFTWARE_LOCATION… and the INSTALL_OPTION is wrong! This field is case sensitive. ORA11G does not match ORA11g, so I get a dramatic null pointer exception and a failure. It would have been a much quicker remediation if the Oracle error had been curated a little better and the programmer a little more sympathetic to users of their code.

How about:
Installation parameter INSTALL_OPTION invalid,
rather than
Unexpected error occurred while validating inputs at state ‘installOptions’

Online Index Rebuild Problem in 12C/18C/19C

When building an index in Oracle Enterprise edition using the keyword “ONLINE”, if the index build fails for whatever reason a messy set of extents can get left behind.

In Oracle 12.1 [update still there in 12.2 and 18C and 19.6] there’s a job (located in DBA_SCHEDULER_JOBS) to help with this: “SYS”.”CLEANUP_ONLINE_IND_BUILD” which ticks away, undocumented, in the background cleaning up “rogue” extents once an hour, every hour (start time based upon job creation time)However, if I run a CREATE INDEX … ONLINE command on a table and the index creation happens to coincide with the invocation of the cleanup job I get the following error in my alert log

ADR Home = /u01/app/oracle/diag/rdbms/orcl/orcl1:
*************************************************************************
2017-05-12 05:28:37.191123 -04:00
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j000_12235.trc:
ORA-12012: error on auto execute of job "SYS"."CLEANUP_ONLINE_IND_BUILD"
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
ORA-06512: at "SYS.DBMS_PDB", line 76
ORA-06512: at line 4

Oh no! The job has failed to clean up the extents that I was currently using. Phew!

The CREATE INDEX … ONLINE command completed with success a few minutes later.

A few minutes after that, an update statement failed with:

ORA-08102 index key not found for obj# 107102 file 16 block 1171234

Yes, I have a corrupt index! index_fadeMy assumption is that the cleanup job had removed all of the extents which had been allocated before the cleanup job started. The index create statement did not notice as the current extent removal attempt blocked (hence the job error) and future to-be created extents were not affected. However, most of the index had been “cleaned up”. #sadface

Solution: re-create the index and make sure the index creation is complete before your hourly job kicks in. Better still, disable the undocumented hourly job… make sure get the full backing of an Oracle Support SR first!

Next Step: Raise SR with Oracle about this!

Version Check 12/11/2018:

select banner_full from v$version

BANNER_FULL
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Oracle Database 18c EE High Perf Release 18.0.0.0.0 - Production
Version 18.1.0.0.0


21:17:19 SYS @ ORCL > select owner,job_name,start_date,repeat_interval,state from dba_scheduler_jobs where job_name = 'CLEANUP_ONLINE_IND_BUILD';

OWNER JOB_NAME START_DATE REPEAT_INTERVAL STATE
------------------------------ ------------------------------ ----------------------------------- ------------------------------ ---------------
SYS CLEANUP_ONLINE_IND_BUILD 07-FEB-18 08.13.44.415343 PM +00:00 FREQ = HOURLY; INTERVAL = 1 SCHEDULED

 

 

MOS Solution Note 2280374.1 is not helpful:

Solution

Rebuild the index at a time other than the time frame when “SYS”.”CLEANUP_ONLINE_IND_BUILD” is running as auto job. #facepalm #whatifittakesover1hourtobuild

 

Stats Collection Time Anomaly

Johnathan Lewis (@JLOracle) recently published a short post about Stats Collection Time, talking about the table dba_optstat_operation (and dba_optstat_operation_tasks ), which reminded me about (what I regard as) an anomaly in the output in the NOTES columns in Oracle 12C.

I won’t repeat why it’s useful to check these tables as Johnathans note and @MDWidlakes’s comment here should give you all you need to know.

The DBA_OPTSTAT_OPERATION.NOTES column contains the parameters passed into the DBMS_STATS command, so you know what was done. It also reports the DEFAULT used by the DBMS_STATS job. Well, it does if you call DBMS_STATS explicitly, but the standard overnight auto job just says “default”. Why doesn’t is expand on that the way the explicit call does? If the default was changed between runs, you may end up with very different results but with no indication why. Am I missing something?

The following 2 rows of data show the output from each run. Note that the DEFAULT for METHOD_OPT in this database has been changed from “FOR ALL COLUMNS SIZE AUTO” to “FOR ALL COLUMNS SIZE REPEAT”** but was not explicitly passed-in for either run.

DBMS_STATS.GATHER_SCHEMA_STATS – decodes the DEFAULTs

OPERATION : gather_schema_stats            
TARGET    : MYSCHEMA     
START_TIME: 15-SEP-16 07.04.47 
END_TIME  : 15-SEP-16 07.09.02 
STATUS    : COMPLETED                                   
JOB_NAME  : 
NOTES     : <params>
            <param name="block_sample" val="FALSE"/>
            <param name="cascade" val="NULL"/>
            <param name="concurrent" val="FALSE"/>
            <param name="degree" val="NULL"/>
            <param name="estimate_percent" val="DBMS_STATS.AUTO_SAMPLE_SIZE"/>
            <param name="force" val="FALSE"/>
            <param name="gather_fixed" val="FALSE"/>
            <param name="gather_temp" val="FALSE"/>
            <param name="granularity" val="AUTO"/>
            <param name="method_opt" val="FOR ALL COLUMNS SIZE REPEAT"/>
            <param name="no_invalidate" val="NULL"/>
            <param name="options" val="GATHER"/>
            <param name="ownname" val="MYSCHEMA"/>
            <param name="reporting_mode" val="FALSE"/>
            <param name="statid" val=""/>
            <param name="statown" val=""/>
            <param name="stattab" val=""/>
            <param name="stattype" val="DATA"/>
            </params>

Autotask Overnight Gather – doesn’t decode the DEFAULTs

OPERATION : gather_database_stats (auto)   
TARGET    : AUTO       
START_TIME: 15-SEP-16 22.01.20 
END_TIME  : 15-SEP-16 22.38.40 
STATUS    : COMPLETED            
JOB_NAME  : ORA$AT_OS_OPT_SY_1212  
NOTES     : <params>
            <param name="block_sample" val="FALSE"/>
            <param name="cascade" val="NULL"/>
            <param name="concurrent" val="FALSE"/>
            <param name="degree" val="DEFAULT_DEGREE_VALUE"/>
            <param name="estimate_percent" val="DEFAULT_ESTIMATE_PERCENT"/>
            <param name="granularity" val="DEFAULT_GRANULARITY"/>
            <param name="method_opt" val="DEFAULT_METHOD_OPT"/>
            <param name="no_invalidate" val="DBMS_STATS.AUTO_INVALIDATE"/>
            <param name="reporting_mode" val="FALSE"/>
            <param name="stattype" val="DATA"/>
            </params>

 

**You should control the histograms that you need to maintain your schema stats as you would like Oracle to see them. The Oracle default approach of lots of histograms can be costly to maintain and store, and any stats which use Adaptive Sampling (all prior to 12C, and Height-Balanced / Hybrid in 12C onwards) carry a risk, especially in OLTP systems.
FOR ALL COLUMNS SIZE REPEAT was useful until Oracle12, when a change in the algorithm makes it dangerous, and it should NOT be used.  You should be using GLOBAL_STATS_PREFS and TABLE_STATS_PREFS to control your stats.

 

Accessing a user when you don’t know the password

There are times that you may need to logon to a database user, probably a schema owner to do a release, but you don’t know the password. You may not be able to (easily) change the password as it could be embedded in application connect strings or worse.

If may not be possible simply to change your session using alter session set current_schema=<schema-to-be-changed>; to auto-prefix all of your selects with the schema, especiually if the release references “USER_” views, which is unaffected by the session setting.

You need to become the account.

So, what you need to do is record the current password encryption, change the password, logon and do your maintenance, logoff and change the password back!

And this is how you do it:
Create an account:

04:38:35 SYS @ ORCL01 > create user hackme identified by password1;

User created.

04:38:35 SYS @ ORCL01 > grant connect,resource to hackme;

Grant succeeded.

Grab the encryption.This is stored in SYS.USER$.SPARE4 plus SYS.USER$.PASSWORD:

04:38:35 SYS @ ORCL01 > select name,'alter user '||name||' identified by values '''||spare4||';'||password||''';' command from sys.user$ where name = 'HACKME'
04:38:35   2  /

NAME       COMMAND
---------- ------------------------------------------------------------------------------------------------------------------------
HACKME     alter user HACKME identified by values 'S:59F38E64D3914BB9396C5D4B968380676333EA7CB34F2471A85C4770A7BA;H:2D3693D1357CF012D9A11EFE3D792C0C;T:B2261F70475F3BD6173867C68427E346C53216E3EC305121DDAF4E13E72E6889DF1E314934F3C5F46E5F12B82D8AC144955C937413FD192904A2762D66B31A872429AB78E72AFC2BC4101E68DB5903A6;4345E749C3EBB34A';

Now we can change the password, logon with the new password, logoff back to a DBA and change it back using the previously captured command

04:38:35 SYS @ ORCL01 > alter user hackme identified by hacker;

User altered.

04:38:35 SYS @ ORCL01 > connect hackme/hacker;
Connected.

04:38:35 HACKME @ ORCL01 > show user
USER is "HACKME"

04:38:35 HACKME @ ORCL01 > connect sys/oracle as sysdba
Connected.

04:38:35 SYS @ ORCL01 > alter user HACKME identified by values 'S:59F38E64D3914BB9396C5D4B968380676333EA7CB34F2471A85C4770A7BA;H:2D3693D1357CF012D9A11EFE3D792C0C;T:B2261F70475F3BD6173867C68427E346C53216E3EC305121DDAF4E13E72E6889DF1E314934F3C5F46E5F12B82D8AC144955C937413FD192904A2762D66B31A872429AB78E72AFC2BC4101E68DB5903A6;4345E749C3EBB34A';
User altered.

04:38:57 SYS @ ORCL01 > conn hackme/password1
Connected.

Magic!

You can also use DBMS_METADATA to get the encryption;

04:39:08 SYS @ ORCL01 >  set long 10000

04:39:08 SYS @ ORCL01 >  select dbms_metadata.get_ddl('USER','HACKME') command from dual;

COMMAND
--------------------------------------------------------------------------------

CREATE USER "HACKME" IDENTIFIED BY VALUES 'S:F299C40420DD341AF9AC4AC89C59A2BB1DFCEF01DB5E3C2B5AD837100117;H:2D3693D1357CF012D9A11EFE3D792C0C;T:101F2A697CA5F77B089C4ECA8EE2DDB82E340D46FE60712445699C5715C3C71BA06532F52CFA987076B51254E5E5A565C44E9F7479018F924707F30874A0BF958D1B8935B7434CF993D3346FF53F28B4;4345E749C3EBB34A'
DEFAULT TABLESPACE "USERS"
TEMPORARY TABLESPACE "TEMP"

Please read the COMMENTS to learn about Proxy Accounts – an (admin) alternative from 10G onwards!

When did I update that row?

I had a requirement the other day to understand when some data had been changed, but there was no column on the table which showed this.

So how do I find out? Well I could go off mining redo and lots of other time consuming and exotic things, but you can use the Oracle Pseudocolumn ORA_ROWSCN. This gives the SCN assocaited with each row. Well, actually it usually doesn’t. It does not show when the individual row of data was changed but, by default, gives the last changed SCN for the block where the row of data lives.

If you want to know with accuracy the SCN for the row change, you need to create your table with the extension “rowdependencies”. This adds a hidden column to each row, taking 6 bytes and storing the SCN on a row-by-row basis with your data. NOTE: This is a CREATE TABLE option only. You can’t modify a table to add rowdependencies, and there are a few restrictions for tables where this is enabled. Check the documentation for your version.

So, we now have the SCN, whether for the BLOCK or the ROW. What good is that? Well, there’s a chance that Oracle will have remembered approximately when that SCN came out. I think you are guaranteed this for about 120 hours – nearly a week – but depending upon a number of factors including the flashback retention and undo retention times.

You can get a rough idea of the SCN time from V$LOG_HISTORY, a more accurate idea from SYS.SMON_SCN_TIME, or just use the SCN_TO_TIMESTAMP function to make your life easier! If you are within 120 hours and have rowdependencies enabled, it will be roughly accurate to the time of the commit, depending upon a couple of factors – please see comments.

Here’s a short worked example to show the sorting of SCN’s in both normal (block) and rowdependency-enabled tables. Note how the ORA_ROWSCN is the same for each row in the same block in the normal table.

 

00:29:34 NEIL @ ORCL01 > create table scn_block (col1 number, col2 date, c_scn number);
Table created.
00:29:34 NEIL @ ORCL01 > create table scn_row (col1 number, col2 date, c_scn number) rowdependencies;
Table created.
00:29:34 NEIL @ ORCL01 > insert into scn_block values (1,sysdate,userenv('commitscn') );
1 row created.
00:29:34 NEIL @ ORCL01 > commit;
Commit complete.
00:29:34 NEIL @ ORCL01 > host sleep 5
00:29:39 NEIL @ ORCL01 > insert into scn_row values (1,sysdate,userenv('commitscn') );
1 row created.
00:29:39 NEIL @ ORCL01 > commit;
Commit complete.
00:29:39 NEIL @ ORCL01 > host sleep 5
00:29:44 NEIL @ ORCL01 > insert into scn_block values (1,sysdate,userenv('commitscn') );
1 row created.
00:29:44 NEIL @ ORCL01 > commit;
Commit complete.
00:29:44 NEIL @ ORCL01 > host sleep 5
00:29:49 NEIL @ ORCL01 > insert into scn_row values (1,sysdate,userenv('commitscn') );
1 row created.
00:29:49 NEIL @ ORCL01 > commit;
Commit complete.
00:29:49 NEIL @ ORCL01 > column dt_1 format a30 truncate
00:29:49 NEIL @ ORCL01 > column dt_2 format a30 truncate
00:29:49 NEIL @ ORCL01 >
00:29:49 NEIL @ ORCL01 > select col1,col2,c_scn,ora_rowscn,scn_to_timestamp(c_scn) dt_1,scn_to_timestamp(ora_rowscn) dt_2 from scn_block;

COL1       COL2      C_SCN      ORA_ROWSCN DT_1                           DT_2
---------- --------- ---------- ---------- ------------------------------ ------------------------------
1          05-DEC-15 3670102    3670149    05-DEC-15 00.29.34.000000000   05-DEC-15 00.29.43.000000000
1          05-DEC-15 3670148    3670149    05-DEC-15 00.29.43.000000000   05-DEC-15 00.29.43.000000000

00:29:49 NEIL @ ORCL01 > select col1,col2,c_scn,ora_rowscn,scn_to_timestamp(c_scn) dt_1,scn_to_timestamp(ora_rowscn) dt_2 from scn_row;

COL1       COL2      C_SCN      ORA_ROWSCN DT_1                           DT_2
---------- --------- ---------- ---------- ------------------------------ ------------------------------
1          05-DEC-15 3670133    3670134    05-DEC-15 00.29.39.000000000   05-DEC-15 00.29.39.000000000
1          05-DEC-15 3670160    3670161    05-DEC-15 00.29.48.000000000   05-DEC-15 00.29.48.000000000

In an interesting convergance, whilst I was doing this, Martin Widlake was looking at the same thing in a slightly different way. How meta.

 

Auditing Read-Only Standbys

For a more up-to-date post about auditing and security, read about Unified Auditing. On a read-only standby, when in Pure mode, the audit trail is written to disk (as it cannot write to a read-only database!) in a binary format which can be shipped across to the Primary occasionally and loaded there using a standard command. Just put the audit in the correct directory and load it automagically.

Original Article
If your company has a passing interest in database security, you are probably running some sort of session auditing at the very least [audit session] (although this can also be useful for troubleshooting connectivity issues). There’s a reasonable chance you’re also running some level of object auditing, or even DML access auditing if your employer so dictates:

Check access/change of objects in the DB
  1  select audit_option, success, failure
  2  from dba_stmt_audit_opts
  3  union
  4  select privilege, success, failure
  5  from dba_priv_audit_opts
  6* order by 2,1
AUDIT_OPTION                                       SUCCESS                                  FAILURE
-------------------------------------------------- ---------------------------------------- -----------------
ALTER ANY PROCEDURE                                BY ACCESS                                BY ACCESS
ALTER ANY TABLE                                    BY ACCESS                                BY ACCESS
ALTER DATABASE                                     BY ACCESS                                BY ACCESS
ALTER PROFILE                                      BY ACCESS                                BY ACCESS
ALTER SYSTEM                                       BY ACCESS                                BY ACCESS
ALTER TABLE                                        BY ACCESS                                BY ACCESS
ALTER USER                                         BY ACCESS                                BY ACCESS
AUDIT SYSTEM                                       BY ACCESS                                BY ACCESS
CREATE ANY JOB                                     BY ACCESS                                BY ACCESS
CREATE ANY LIBRARY                                 BY ACCESS                                BY ACCESS
CREATE ANY PROCEDURE                               BY ACCESS                                BY ACCESS
CREATE ANY TABLE                                   BY ACCESS                                BY ACCESS
CREATE EXTERNAL JOB                                BY ACCESS                                BY ACCESS
CREATE LIBRARY                                     BY ACCESS                                BY ACCESS
CREATE PROCEDURE                                   BY ACCESS                                BY ACCESS
CREATE PUBLIC DATABASE LINK                        BY ACCESS                                BY ACCESS
CREATE SESSION                                     BY ACCESS                                BY ACCESS
CREATE TABLE                                       BY ACCESS                                BY ACCESS
CREATE USER                                        BY ACCESS                                BY ACCESS
DATABASE LINK                                      BY ACCESS                                BY ACCESS
DIRECTORY                                          BY ACCESS                                BY ACCESS
DROP ANY PROCEDURE                                 BY ACCESS                                BY ACCESS
DROP ANY TABLE                                     BY ACCESS                                BY ACCESS
DROP PROFILE                                       BY ACCESS                                BY ACCESS
DROP USER                                          BY ACCESS                                BY ACCESS
EXEMPT ACCESS POLICY                               BY ACCESS                                BY ACCESS
GRANT ANY OBJECT PRIVILEGE                         BY ACCESS                                BY ACCESS
GRANT ANY PRIVILEGE                                BY ACCESS                                BY ACCESS
GRANT ANY ROLE                                     BY ACCESS                                BY ACCESS
PROFILE                                            BY ACCESS                                BY ACCESS
PUBLIC DATABASE LINK                               BY ACCESS                                BY ACCESS
PUBLIC SYNONYM                                     BY ACCESS                                BY ACCESS
ROLE                                               BY ACCESS                                BY ACCESS
SYSTEM AUDIT                                       BY ACCESS                                BY ACCESS
SYSTEM GRANT                                       BY ACCESS                                BY ACCESS
USER                                               BY ACCESS                                BY ACCESS
CREATE JOB                                         BY SESSION                               BY SESSION

And you’re probably writing it into a database table [AUDIT_TRAIL=’DB’]

so how does that work if you open a Dataguard database read only? You are writing into sys.aud$ on the Primary, and that table is replicated to the Standby. So what happens?

From Oracle 11G, if you are running Dataguard and opening the standby up for read access, you may not notice the line in the ALERT log which reads:

AUDIT_TRAIL initialization parameter is changed to OS, as DB is NOT compatible for database opened with read-only access

So it starts writing down O/S audit trail files for all of your auditing options (well, the session connect and DML audit options – you can’t run DDL in a r/o DB). You might want to go and see just how many files it has written to [audit_file_dest], as you may be surprised at just how many are in there.

You may, one day, either run out of space or (more worryingly) have so many millions of files that it causes a performance problem when Oracle access the O/S directory. You might want to think about some sort of periodic clean-up job.

Goldengate: Problems with character sets

One complication that you may face with replicating data using Goldengate (or other tools) is when your source character set is different to your destination character set. This is particularly true when the source character set is UTF-8 and the destination is not.

If the application does not sanitise (or you do not want to sanitise) inputs to restrict them to the lowest common denominator within your systems, you will need to ensure that you take action to ensure the source data is fed appropriately to the destination systems.

I recently experienced this at a client, but the special measures taken by me to allow Oracle UTF-8 data into a SQL Server database using a standard Windows-1252 character set also hit a Goldengate codepath bug, recreated here:

The source table, TAB1, has 3 columns for these purposes:
 

 ID     number
 NAME   varchar2(50)
 COL_TS timestamp

 
The source table is allowed to contain NULLS, but the destination table must not, so a null value test is specified in the COLMAP in the REPLICAT.

To cope with the character set conversion, the parameter REPLACEBADCHAR SPACE is specified in the replication. This states that is there are ANY characters in the trail file which the destination database cannot store, then that character should be converted to (in this instance) a space.

REPLICAT snippet:

REPLACEBADCHAR SPACE
MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1,
    COLMAP (USEDEFAULTS,
            NAME =@IF(@COLTEST(NAME, NULL), ' ' ,NAME));

 
All processing progressed nicely, with the NULLs entered into TAB1.NAME being converted into a single space, until an unexpected character was pasted into the screen on the source system and the REPLICAT abended:

2015-02-25 22:32:00 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set.
2015-02-25 22:32:00 WARNING OGG-01503 Aborting BATCHSQL transaction. Mapping error.
2015-02-25 22:32:01 WARNING OGG-01137 BATCHSQL suspended, continuing in normal mode.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.
2015-02-25 22:32:01 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set.
2015-02-25 22:32:01 WARNING OGG-01431 Aborted grouped transaction on 'dbo.TAB1', Mapping error.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.
2015-02-25 22:32:01 WARNING OGG-01151 Error mapping from SCHEMA_OWNER.TAB1 to dbo.SUP_TAB1.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.

Source Context :
SourceModule : [er.errors]
SourceID : [er/errors.cpp]
SourceFunction : [take_rep_err_action]
SourceLine : [682]
ThreadBacktrace : [12] elements
: [Z:\gg12\gglog.dll(?CreateMessage@CMessageFactory@@QEAAPEAVCMessage@@PEAVCSourceContext@@IZZ+0x886) [0x000007FEF00809D6]]
: [Z:\gg12\gglog.dll(?_MSG_ERR_MAP_TO_TANDEM_FAILED@@YAPEAVCMessage@@PEAVCSourceContext@@AEBV?$CQualDBObjName@$00@ggapp@gglib@ggs@@1W4MessageDisposition@CMessageFactory@@@Z+0x81) [0x000007FEF0043631]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x733c) [0x000000013F6E96BC]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x2fe7a) [0x000000013F7121FA]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x6a575) [0x000000013F74C8F5]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xea23) [0x000000013F7F2323]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe000) [0x000000013F7F1900]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe8cd) [0x000000013F7F21CD]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x6a5f9) [0x000000013F74C979]]
: [Z:\gg12\replicat.exe(CommonLexerNewSSD+0xc0d2) [0x000000013F8862D2]]
: [C:\Windows\system32\kernel32.dll(BaseThreadInitThunk+0xd) [0x00000000773B652D]]
: [C:\Windows\SYSTEM32\ntdll.dll(RtlUserThreadStart+0x21) [0x00000000774EC541]]

2015-02-25 22:32:01 ERROR OGG-01296 Error mapping from SCHEMA_OWNER.TAB1 to dbo.TAB1.

Looking in the Goldengate discard file (always a good place to start when you have a GG problem), you can see the problem character “ef bf bd”:

Oracle GoldenGate Delivery for SQL Server process started, group REP_SQL discard file opened: 2015-02-11 22:02:42.389000
Mapping error to target column: NAME
Mapping error to target column: NAME
Mapping error to target column: NAME
Current time: 2015-02-25 22:32:01
Discarded record from action ABEND on error 0

Aborting transaction on ./dirdat/NC beginning at seqno 2 rba 123
error at seqno 2 rba 123
Problem replicating SCHEMA_OWNER.TAB1 to dbo.TAB1
Mapping problem with insert record (source format)...
*
ID = 123
000000: 31 32 33  |123     |

NAME = NEIL \uFFFD CHA
000000: 4E 45 49 4C 20 ef bf bd 20 43 48 41 | NEIL ... CHA|

COL_TS = 2015-02-25:22:31:56.303000000
000000: 32 30 31 35 2d 30 32 2d 32 35 3a 32 32 3a 33 31 |2015-02-25:22:31|
000010: 3a 35 36 2e 33 30 33 30 30 30 30 30 30          |:56.303000000   |
*

Process Abending : 2015-02-25 22:32:01

So, why didn’t REPLACEBADCHAR catch this and turn the offending character into a space? There’s a clue in the ABEND report information

WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 

The column is referred-to as @IF(), not as NAME. A quick scan of MOS show that this appears to be BUG 19818362 “Column function execution was happening internally under NOCHARSETCONVERSION cases” – it’s going through the wrong codepath for REPLCEBADCHAR to work. And this bug fix was released 10 days before this problem was encountered. Result!

The short-term fix? Remove the data manipulation from the REPLICAT

REPLACEBADCHAR SPACE
MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1,
    COLMAP (USEDEFAULTS);

 

Start and run the REPLICAT until past the problem, then revert the REPLICAT back to data manipulation until either Goldengate is patched and tested or we have to repeat this exercise due to another Unicode character problem occurs.

Checking the alert log – the easy way

Do you check the alert log of your databases every day? In the morning when you get in? But what about the alerts which happen during the day? How do you spot them – especially if you don’t have Grid Control or Cloud Control configured. Even if you do have a full monitoring solution, this can be useful for a belt-and-braces approach.

Here’s a short bash shell script to use adrci to read through each ORACLE_HOME (for a DIAG location) and check every alert log contained therein, using adrci pattern matching functionality to search for problems. I usually schedule it within each host (using cron) to minimise the moving parts, and therefore minimise the opportunity for it to stop working. Any problems, and I get an email. I hope you find it useful. I usually keep it in /opt/oracle/bin, but you stick it in your script home of choice.

This should work for 11G and 12C database (tested to 12.1.0.2), unless I’ve made a cut/paste error 🙂

#!/bin/bash
#########################################################################################
# Description: Read each Oracle Home directory. Run adrci matching for problems
# Author : N Chandler.2014-03-28
#
# crontab : # Check Alert Log 30.03.2014
# 00,30 * * * * /opt/oracle/bin/adrci_alert.sh > /opt/oracle/bin/log/adrci.cron.log 2>&1
#
#########################################################################################
# Which HOME?
 export ORACLE_HOME=/opt/app/oracle/product/11g
 export DIAG_LOC=/opt/app/oracle/diag/rdbms
# Who gets the alert?
 export RECIPIENT='neil@chandler.uk.com'
# Other Variables
 export LD_LIBRARY_PATH=$ORACLE_HOME/lib
 export HOST=`hostname`
 export PATH=$ORACLE_HOME/bin:$PATH
 export NLS_DATE_FORMAT='yyyy-mm-dd hh24:mi:ss'
 export SUBJECT="Oracle ALERTS on ${HOST} OK"
 export LOG=/tmp
 export ALERT=$LOG/error.txt

# Write the alert log message header for the email
 echo "${HOST} `date +%Y-%m-%d.%H:%M:%S-%Z`" > ${ALERT}
 echo "All alerts in ADRCI Alert log for the last 30 minutes" >> ${ALERT}
 echo "THIS ALERT WILL NOT BE REPEATED!!! TAKE ACTION NOW!!!" >> ${ALERT}
 echo "Follow-up on this email and check the alert log on ${HOST}" >> ${ALERT}

# find out the homes
 adrci_homes=( $(adrci exec="show homes" | grep -e rdbms -e asm))

# run through Each home found and examine the alert log
# Here we are looking for ORA- messges, Deadlock, anything which raises an incident or anything which is instance-level
# IN THE LAST 30 MINUTES (1/48), so we need to run this code every 30 minutes or we may miss something. 
 for adrci_home in ${adrci_homes[@]}
 do
   echo "Checking: ${adrci_home}" >> ${ALERT}
   echo $adrci_home' Alert Log' >> ${ALERT}
   adrci exec="set home ${adrci_home} ; show alert -p \\\"(message_text like '%ORA-%' or message_text like '%Deadlock%' or message_text like '%instance%' or message_text like '%incident%') and originating_timestamp>=systimestamp-(1/48) \\\"" -term >>${ALERT}
 done
# count the errors. This is a good place to exclude specific errors you wish to ignore with a -v match.
# note - your grep must be aligned with the pattern match above for this to work
num_errors=`grep -c -e 'TNS' -e 'ORA' -e 'Deadlock' -e 'instance' -e 'incident' ${ALERT} | grep -v 'ORA-28'`

# If there are any errors, lets email the alert information to someone
if [ $num_errors -gt 0 ]
then
  SUBJECT="ERROR in Oracle ALERT log on ${HOST}"
  mail -s "${SUBJECT}" ${RECIPIENT} < ${ALERT}
fi

SCAN listener logs – where are they?

These are a bit of a pain as their location isn’t intuitive. So to remind me where they are:

Login to the server(s) as the grid owner and check the scan listener status. This will show you the location of the listener log. cd to just below the diag directory and you’re off!:

server-name:/u01/grid>ps -ef | grep SCAN
 grid 8542 8282 0 10:20 pts/0 00:00:00 grep SCAN
 grid 9349 1 0 Mar07 ? 00:07:33 /u01/app/11g/grid/bin/tnslsnr LISTENER_SCAN1 -inherit

server-name:/u01/grid>lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 28-MAY-2014 10:20:12
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
 STATUS of the LISTENER
 ------------------------
 Alias LISTENER_SCAN1
 Version TNSLSNR for Linux: Version 11.2.0.4.0 - Production
 Start Date 07-MAR-2014 17:27:50
 Uptime 81 days 15 hr. 52 min. 21 sec
 Trace Level off
 Security ON: Local OS Authentication
 SNMP OFF
 Listener Parameter File /u01/app/11g/grid/network/admin/listener.ora
 Listener Log File /u01/app/11g/grid/log/diag/tnslsnr/server-name/listener_scan1/alert/log.xml
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.6.148.141)(PORT=1521)))
 Services Summary...
 Service "FRONT_APP_DB_SVC.WORLD" has 2 instance(s).
 Instance "ORCL1", status READY, has 1 handler(s) for this service...
 Instance "ORCL2", status READY, has 1 handler(s) for this service...
 Service "ORCL.WORLD" has 2 instance(s).
 Instance "ORCL1", status READY, has 1 handler(s) for this service...
 Instance "ORCL2", status READY, has 1 handler(s) for this service...
 Service "ORCLXDB.WORLD" has 2 instance(s).
 Instance "ORCL1", status READY, has 1 handler(s) for this service...
 Instance "ORCL2", status READY, has 1 handler(s) for this service...
 The command completed successfully

server-name:/u01/app/11g/grid/log>cd /u01/app/11g/grid/log
server-name:/u01/app/11g/grid/log>adrci
ADRCI: Release 11.2.0.4.0 - Production on Wed May 28 10:22:47 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
ADR base = "/u01/app/11g/grid/log"
 adrci> show homes
 ADR Homes:
 diag/asmcmd/user_grid/server-name
 diag/tnslsnr/server-name/listener_scan3
 diag/tnslsnr/server-name/listener_scan2
 diag/tnslsnr/server-name/listener_scan1
 diag/asmtool/user_root/host_3797755080_80
adrci> show alert
Choose the alert log from the following homes to view:
1: diag/asmcmd/user_grid/server-name
 2: diag/tnslsnr/server-name/listener_scan3
 3: diag/tnslsnr/server-name/listener_scan2
 4: diag/tnslsnr/server-name/listener_scan1
 5: diag/asmtool/user_root/host_3797755080_80
 Q: to quit
Please select option: 4

and there we are…
remember to be on the correct host for each scan,
otherwise the alert (listener) log file will be out of date.

 

The trouble with Timezones & Grid Infrastructure

When installing Oracle Grid Infrastructure 11.2 (and all other releases), you need to make sure that you have all of the server setting correct and to standard before you do the install. One that bit me recently was the timezone setting. The Red Hat 6.4 server(s) in question has the correct file in /etc/localtime (copied from /usr/share/zoneinfo/whatever). If I type in date, I get the reply in the correct timezone (GMT/BST as I’m in London), so all seems correct.

However, the slack Unix Sysadmin (which might or might not have been me) had not put the correct setting in /etc/sysconfig/clock. Unfortunately, when you install Grid Infrastructure, the setting is read from /etc/sysconfig/clock and embedded into a Grid Inforastructure config file. $GRID_HOME/crs/install/s_crsconfig_hostname_env.txt


### This file can be used to modify the NLS_LANG environment variable, which determines the charset to be used for messages.

### For example, a new charset can be configured by setting NLS_LANG=JAPANESE_JAPAN.UTF8

### Do not modify this file except to change NLS_LANG, or under the direction of Oracle Support Services

TZ=Europe/London

NLS_LANG=AMERICAN_AMERICA.AL32UTF8

TNS_ADMIN=

ORACLE_BASE=


If you change this entry, and you should check with Oracle Support if this is OK for your site, and you will need to restart Grid Infrastructure. The one thing about this that I really don’t like is that Oracle is storing a runtime configuration file in a an install directory. Does it do that anywhere else?