Checking the alert log – the easy way

Do you check the alert log of your databases every day? In the morning when you get in? But what about the alerts which happen during the day? How do you spot them – especially if you don’t have Grid Control or Cloud Control configured. Even if you do have¬†a full monitoring solution, this can be useful for a belt-and-braces approach.

Here’s a short bash shell script to use adrci to read through each ORACLE_HOME (for a DIAG location) and check every alert log contained therein, using adrci pattern matching functionality to search for problems. I usually schedule it within each host (using cron) to minimise the moving parts, and therefore minimise the opportunity for it to stop working. Any problems, and I get an email. I hope you find it useful. I usually keep it in /opt/oracle/bin, but you stick it in your script home of choice.

This should work for 11G and 12C database (tested to 12.1.0.2), unless I’ve made a cut/paste error ūüôā

#!/bin/bash
#########################################################################################
# Description: Read each Oracle Home directory. Run adrci matching for problems
# Author : N Chandler.2014-03-28
#
# crontab : # Check Alert Log 30.03.2014
# 00,30 * * * * /opt/oracle/bin/adrci_alert.sh > /opt/oracle/bin/log/adrci.cron.log 2>&1
#
#########################################################################################
# Which HOME?
 export ORACLE_HOME=/opt/app/oracle/product/11g
 export DIAG_LOC=/opt/app/oracle/diag/rdbms
# Who gets the alert?
 export RECIPIENT='neil@chandler.uk.com'
# Other Variables
 export LD_LIBRARY_PATH=$ORACLE_HOME/lib
 export HOST=`hostname`
 export PATH=$ORACLE_HOME/bin:$PATH
 export NLS_DATE_FORMAT='yyyy-mm-dd hh24:mi:ss'
 export SUBJECT="Oracle ALERTS on ${HOST} OK"
 export LOG=/tmp
 export ALERT=$LOG/error.txt

# Write the alert log message header for the email
 echo "${HOST} `date +%Y-%m-%d.%H:%M:%S-%Z`" > ${ALERT}
 echo "All alerts in ADRCI Alert log for the last 30 minutes" >> ${ALERT}
 echo "THIS ALERT WILL NOT BE REPEATED!!! TAKE ACTION NOW!!!" >> ${ALERT}
 echo "Follow-up on this email and check the alert log on ${HOST}" >> ${ALERT}

# find out the homes
 adrci_homes=( $(adrci exec="show homes" | grep -e rdbms -e asm))

# run through Each home found and examine the alert log
# Here we are looking for ORA- messges, Deadlock, anything which raises an incident or anything which is instance-level
# IN THE LAST 30 MINUTES (1/48), so we need to run this code every 30 minutes or we may miss something. 
 for adrci_home in ${adrci_homes[@]}
 do
   echo "Checking: ${adrci_home}" >> ${ALERT}
   echo $adrci_home' Alert Log' >> ${ALERT}
   adrci exec="set home ${adrci_home} ; show alert -p \\\"(message_text like '%ORA-%' or message_text like '%Deadlock%' or message_text like '%instance%' or message_text like '%incident%') and originating_timestamp>=systimestamp-(1/48) \\\"" -term >>${ALERT}
 done
# count the errors. This is a good place to exclude specific errors you wish to ignore with a -v match.
# note - your grep must be aligned with the pattern match above for this to work
num_errors=`grep -c -e 'TNS' -e 'ORA' -e 'Deadlock' -e 'instance' -e 'incident' ${ALERT} | grep -v 'ORA-28'`

# If there are any errors, lets email the alert information to someone
if [ $num_errors -gt 0 ]
then
  SUBJECT="ERROR in Oracle ALERT log on ${HOST}"
  mail -s "${SUBJECT}" ${RECIPIENT} < ${ALERT}
fi

Exposing the Oracle Alert Log to SQL

I’ve been spending some time working in Apex recently, building a small app to draw together the monitoring of application and infrastructure components into a single easy-to-visualise tool. As part of that, I wanted to be able to read and report on the alert log. Traditionally, that would have meant creating an external table to point to the alert log and reading it that way, with lots of string manipulation and regular expressions to try to pull out useful bits of information. However, Oracle 11G had made that a lot easier. Step forward X$DBGALERTEXT. This is the decoded version of the XML version of the Alert log, and as such provides lots of lovely columns to filter by, rather than a single line of text to decode. Particularly useful (for me) is the MESSAGE_LEVEL. Is this line of text informational (16), or critical (1), or something in between? Of course, each “normal” line of text is still available in the MESSAGE_TEXT column.

SQL> desc x$dbgalertext;
 Name                           Type
 ------------------------------ --------------------------------------------------------
 ADDR                           RAW(4)
 INDX                           NUMBER
 INST_ID                        NUMBER
 ORIGINATING_TIMESTAMP          TIMESTAMP(3) WITH TIME ZONE
 NORMALIZED_TIMESTAMP           TIMESTAMP(3) WITH TIME ZONE
 ORGANIZATION_ID                VARCHAR2(64)
 COMPONENT_ID                   VARCHAR2(64)
 HOST_ID                        VARCHAR2(64)
 HOST_ADDRESS                   VARCHAR2(46)
 MESSAGE_TYPE                   NUMBER
 MESSAGE_LEVEL                  NUMBER
 MESSAGE_ID                     VARCHAR2(64)
 MESSAGE_GROUP                  VARCHAR2(64)
 CLIENT_ID                      VARCHAR2(64)
 MODULE_ID                      VARCHAR2(64)
 PROCESS_ID                     VARCHAR2(32)
 THREAD_ID                      VARCHAR2(64)
 USER_ID                        VARCHAR2(64)
 INSTANCE_ID                    VARCHAR2(64)
 DETAILED_LOCATION              VARCHAR2(160)
 PROBLEM_KEY                    VARCHAR2(64)
 UPSTREAM_COMP_ID               VARCHAR2(100)
 DOWNSTREAM_COMP_ID             VARCHAR2(100)
 EXECUTION_CONTEXT_ID           VARCHAR2(100)
 EXECUTION_CONTEXT_SEQUENCE     NUMBER
 ERROR_INSTANCE_ID              NUMBER
 ERROR_INSTANCE_SEQUENCE        NUMBER
 VERSION                        NUMBER
 MESSAGE_TEXT                   VARCHAR2(2048)
 MESSAGE_ARGUMENTS              VARCHAR2(128)
 SUPPLEMENTAL_ATTRIBUTES        VARCHAR2(128)
 SUPPLEMENTAL_DETAILS           VARCHAR2(128)
 PARTITION                      NUMBER
 RECORD_ID                      NUMBER

Very handy. Just add your own view, synonym and permissions to read the view, and you’re away…

create view v_$alert_log as select * from x$dbgalertext;
create public synonym v$alert_log for sys.v_$alert_log;
grant select on v$alert_log to whomever...

  1* select message_text from v$alert_log where ...;

MESSAGE_TEXT
-----------------------------------------------------------------
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 2
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.

Using parameter settings in server-side pfile /home/oracle/app/oracle/product/11.2.0/dbhome_2/dbs/initorcl.ora
System parameters with non-default values:
  processes                = 200
  sessions                 = 322
  sga_max_size             = 2G
  pre_page_sga             = TRUE
  nls_language             = "ENGLISH"
  nls_territory            = "UNITED KINGDOM"
  filesystemio_options     = "SetAll"
  sga_target               = 2G
  control_files            = "/u02/oradata/orcl/control01.ctl"
.
[snip]
.
  aq_tm_processes          = 1
  diagnostic_dest          = "/u20/apps/oracle"
PMON started with pid=2, OS id=2492
PSP0 started with pid=3, OS id=2494
VKTM started with pid=4, OS id=2512 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
GEN0 started with pid=5, OS id=2520
DIAG started with pid=6, OS id=2522
...etc...
%d bloggers like this: