KILL! KILL! KILL! (of Unix processes)

The start of this isn’t my post – I got it from here: but I wanted to reblog/repost and enhance it because as far as I can tell, 99% of all known DBA’s only use kill -9 to remove unhappy processes.


Original Post:

Useless Use of Kill -9 form letter

No no no.  Don't use kill -9.

It doesn't give the process a chance to cleanly:

1) shut down socket connections

2) clean up temp files

3) inform its children that it is going away

4) reset its terminal characteristics

and so on and so on and so on.

Generally, send 15, and wait a second or two, and if that doesn't
work, send 2, and if that doesn't work, send 1.  If that doesn't,
REMOVE THE BINARY because the program is badly behaved!**

Don't use kill -9.  Don't bring out the combine harvester just to tidy
up the flower pot.

**don’t remove your Oracle or any other binaries please.


 

I hope you found that useful. I know I did. But what do the numbers mean? Well, they are increasingly violent ways to ask the program to stop itself. The command kill -9 isn’t asking the program to stop, it’s asking the O/S to stop running the program now, regardless of what it’s doing.

Run order of kills:

kill -15 : this is the equivalent of kill -sigterm and it the default. The program should terminate after it has finished what it is doing.

kill -2 : this is the equivalent of kill -sigint and is the same as pressing CTRL+C. This should mean “stop what you’re doing” — and it may or may not kill the program.

kill -1 : this is the equivalent of kill -sighup and tells the program that the user has disconnected. (e.g. SSH session or terminal window was closed). It usually results in a graceful shutdown of the program.

The executing program needs to be coded to recognise these kill signals, and all good software will spot them.

The other fun kill command is kill -sigstop. This can’t be blocked (like -9) as it’s an O/S level command too, but freezes the program execution like pressing CTRL+Z. You can continue the program execution later using kill -sigcont.

Advertisement

Killing Sessions

You have a session which is out of control. You decide to kill it. What options are available to you as a DBA?

Assuming you don’t have access to the O/S to be able to use O/S commands to kill the session (using kill -9 in Unix/Linux or orakill in windows to kill the thread within the process – “orakill sid thread”), you probably turn to:

alter system kill session 'sid, serial#, @inst_id' immediate;

(note the undocumented @inst_id, which can be used to identify specific RAC instances, rather than the instance you are connected to) Note that immediate does not change the processing, but just returns control to the command line immediately.

You can generate this command using

select 'alter system kill session ''' || sid || ', ' || serial# || ', @' || inst_id || ''';' from gv$session;

 

However, despite the name, this does not actually kill the session. You may get a pause issuing this command, followed by a “session marked for kill” message. The session then gets a status of “KILLED” in gv$session, but the session does not go away. As a further side effect, frequently the O/S process id is removed from gv$process, so you can’t now locate the process in the O/S to (ora)kill it.

What actually happens is a bit gets set against the session which effectively says “When you next check this bit, kill yourself”. But sometimes the session is busy, is rolling back (check for changing values in gv$transaction.used_urec), is waiting for a reply which never comes or is caught in a tight loop, and never gets around to checking the bit, and so it doesn’t die.

Does anyone know if pmon comes along, notices that you have a session marked for kill, kills it and initiates recovery for the session???

So, what’s a DBA to do? How do I actually kill the session? Step forward little-known command:

alter system disconnect session 'sid,serial#,@inst_id' POST_TRANSACTION | IMMEDIATE

And to generate it:
select 'alter system disconnect session ''' || sid || ', ' || serial# || ', @' || inst_id || ''';' from gv$session;

Again, note the undocumented @inst_id option for RAC.

What this command does is effectively kill the Operating system process. Right now (or after the next commit/rollback if you include the option POST_TRANSACTION.) It is the same as issuing an (ora)kill command.The IMMEDIATE here means “kill it now!”

Bang! and the session is gone.

Don’t kill the wrong session… you may well take your database down.

Auditing Read-Only Standbys

If your company has a passing interest in database security, you are probably running some sort of session auditing at the very least [audit session] (although this can also be useful for troubleshooting connectivity issues). There’s a reasonable chance you’re also running some level of object auditing, or even DML access auditing if your employer so dictates:

Check access/change of objects in the DB
  1  select audit_option, success, failure
  2  from dba_stmt_audit_opts
  3  union
  4  select privilege, success, failure
  5  from dba_priv_audit_opts
  6* order by 2,1
AUDIT_OPTION                                       SUCCESS                                  FAILURE
-------------------------------------------------- ---------------------------------------- -----------------
ALTER ANY PROCEDURE                                BY ACCESS                                BY ACCESS
ALTER ANY TABLE                                    BY ACCESS                                BY ACCESS
ALTER DATABASE                                     BY ACCESS                                BY ACCESS
ALTER PROFILE                                      BY ACCESS                                BY ACCESS
ALTER SYSTEM                                       BY ACCESS                                BY ACCESS
ALTER TABLE                                        BY ACCESS                                BY ACCESS
ALTER USER                                         BY ACCESS                                BY ACCESS
AUDIT SYSTEM                                       BY ACCESS                                BY ACCESS
CREATE ANY JOB                                     BY ACCESS                                BY ACCESS
CREATE ANY LIBRARY                                 BY ACCESS                                BY ACCESS
CREATE ANY PROCEDURE                               BY ACCESS                                BY ACCESS
CREATE ANY TABLE                                   BY ACCESS                                BY ACCESS
CREATE EXTERNAL JOB                                BY ACCESS                                BY ACCESS
CREATE LIBRARY                                     BY ACCESS                                BY ACCESS
CREATE PROCEDURE                                   BY ACCESS                                BY ACCESS
CREATE PUBLIC DATABASE LINK                        BY ACCESS                                BY ACCESS
CREATE SESSION                                     BY ACCESS                                BY ACCESS
CREATE TABLE                                       BY ACCESS                                BY ACCESS
CREATE USER                                        BY ACCESS                                BY ACCESS
DATABASE LINK                                      BY ACCESS                                BY ACCESS
DIRECTORY                                          BY ACCESS                                BY ACCESS
DROP ANY PROCEDURE                                 BY ACCESS                                BY ACCESS
DROP ANY TABLE                                     BY ACCESS                                BY ACCESS
DROP PROFILE                                       BY ACCESS                                BY ACCESS
DROP USER                                          BY ACCESS                                BY ACCESS
EXEMPT ACCESS POLICY                               BY ACCESS                                BY ACCESS
GRANT ANY OBJECT PRIVILEGE                         BY ACCESS                                BY ACCESS
GRANT ANY PRIVILEGE                                BY ACCESS                                BY ACCESS
GRANT ANY ROLE                                     BY ACCESS                                BY ACCESS
PROFILE                                            BY ACCESS                                BY ACCESS
PUBLIC DATABASE LINK                               BY ACCESS                                BY ACCESS
PUBLIC SYNONYM                                     BY ACCESS                                BY ACCESS
ROLE                                               BY ACCESS                                BY ACCESS
SYSTEM AUDIT                                       BY ACCESS                                BY ACCESS
SYSTEM GRANT                                       BY ACCESS                                BY ACCESS
USER                                               BY ACCESS                                BY ACCESS
CREATE JOB                                         BY SESSION                               BY SESSION

And you’re probably writing it into a database table [AUDIT_TRAIL=’DB’]

so how does that work if you open a Dataguard database read only? You are writing into sys.aud$ on the Primary, and that table is replicated to the Standby. So what happens?

From Oracle 11G, if you are running Dataguard and opening the standby up for read access, you may not notice the line in the ALERT log which reads:

AUDIT_TRAIL initialization parameter is changed to OS, as DB is NOT compatible for database opened with read-only access

So it starts writing down O/S audit trail files for all of your auditing options (well, the session connect and DML audit options – you can’t run DDL in a r/o DB). You might want to go and see just how many files it has written to [audit_file_dest], as you may be surprised at just how many are in there.

You may, one day, either run out of space or (more worryingly) have so many millions of files that it causes a performance problem when Oracle access the O/S directory. You might want to think about some sort of periodic clean-up job.

The trouble with Timezones & Grid Infrastructure

When installing Oracle Grid Infrastructure 11.2 (and all other releases), you need to make sure that you have all of the server setting correct and to standard before you do the install. One that bit me recently was the timezone setting. The Red Hat 6.4 server(s) in question has the correct file in /etc/localtime (copied from /usr/share/zoneinfo/whatever). If I type in date, I get the reply in the correct timezone (GMT/BST as I’m in London), so all seems correct.

However, the slack Unix Sysadmin (which might or might not have been me) had not put the correct setting in /etc/sysconfig/clock. Unfortunately, when you install Grid Infrastructure, the setting is read from /etc/sysconfig/clock and embedded into a Grid Inforastructure config file. $GRID_HOME/crs/install/s_crsconfig_hostname_env.txt


### This file can be used to modify the NLS_LANG environment variable, which determines the charset to be used for messages.

### For example, a new charset can be configured by setting NLS_LANG=JAPANESE_JAPAN.UTF8

### Do not modify this file except to change NLS_LANG, or under the direction of Oracle Support Services

TZ=Europe/London

NLS_LANG=AMERICAN_AMERICA.AL32UTF8

TNS_ADMIN=

ORACLE_BASE=


If you change this entry, and you should check with Oracle Support if this is OK for your site, and you will need to restart Grid Infrastructure. The one thing about this that I really don’t like is that Oracle is storing a runtime configuration file in a an install directory. Does it do that anywhere else?

Killing a session. Dead!

… or what ALTER SYSTEM KILL SESSION doesn’t do.

So, you have a runaway session and you want it to stop now, please. You enter “ALTER SYSTEM KILL SESSION (sid,serial#) IMMEDIATE” – supplying the parameters from GV$SESSION (adding @instance_id if it’s running in a different node) and you wait a minute…. and eventually it gets “ORA-00031: session marked for kill” [“KILLED”]. So why didn’t it die immediately? What’s going on? Isn’t KILL a strong enough command?

What actually happened when you submit “ALTER SYSTEM KILL SESSION” is that a flag within the session it set to for the session to kill itself. It’s not so much KILL, as please commit suicide when you notice that I’ve asked. The session needs to wait for a chance to notice the flag, die and clean-up the session.

While you’re waiting for this to happen, it’s not uncommon for DBA’s to hop onto the operating system and do a “kill -9” (or orakill on Windows) to wipe out the process/thread that was running. This then causes SMON to clean up the mess, rather than the user process (which may or may not be desirable – SMON can spawn parallel clean-up slaves which can speed things up).

So, what if you have no access to the O/S, or want neater, tidier (and safer?) results? Use the ALTER SYSTEM DISCONNECT SESSION command instead. Get the same SID and SERIAL# from (g)v$session and give it a go. You have 3 options;

  • ALTER SYSTEM DISCONNECT SESSION ‘sid,serial#’ POST_TRANSACTION
    ( waits for in-flight transactions to complete )
  • ALTER SYSTEM DISCONNECT SESSION ‘sid,serial#’
    (I have read it works like ALTER SYSTEM KILL… IMMEDIATE? Yet to confirm by experiment)
  • ALTER SYSTEM DISCONNECT SESSION ‘sid,serial#’ IMMEDIATE
     (just like kill -9; wipes the O/S process out.)

So, the next time you want that session real dead, real quick, you know what to do.

DROP DATABASE command

I have been DBA-ing for a while now, and I today I used a “new” command which I have never used in the previous 20+ years I have worked with Oracle: DROP DATABASE. It’s amazing what you miss sometimes!

So, what does it do? As the name implies, it drops the database. That is, it deletes all of your database files. This is significantly safer than using traditional Operating System methods (assuming you even have access and are not using ASM), and for added safety you have to be running the database in RESTRICT MOUNT mode.

What a lovely command. It goes to all the trouble of locating and removing those troublesome files that were put into the wrong place by another DBA that you would have missed if cleaning up manually.

It certainly made the pre-production clone script (cloning from the Physical Standby) that I have just written quite a bit easier to code.

Does anybody know if there is any other reason for running the database in RESTRICT MOUNT mode, or was it designed just for this command?

EDIT: I was just informed via Twitter (@chandlerdba) by @pfierens that this also takes out your SPFILE. If you are using an SPFILE (you should be!) and want to keep your db parameters for a db rebuild, I would recommend creating a pfile from the spfile first…

Has anyone ever done an RMAN “drop database including backups” ?   You REALLY need to be sure you don’t want it back to run that one!

Dennis Ritchie RIP

There can be few scientists who have contributed so much to the world as Dennis Ritchie. Completely anonymous to the world at large, and to far too much of the computing fraternity too, his involvement in the development of C – the first portable programming language, and Unix cannot be understated. Who uses Unix? Well, everybody. What is the O/S on embedded devices? What did Steve Jobs base his Mac O/S and Mobile Operating systems on? Upon what did Bill Gates base the operating system for the new IBM PC in 1980? What do corporations across the world use to power their servers? What is it all built on? Unless you’re running some proprietary mainframe software, the building blocks of what you are using were set, to some degree, by Dennis Ritchie.

Whilst I never met him myself I did work with his sister, Lynn, for several years at a software house in the North East of England, where she still resides. She told me some of the stories about him; how the invention of C and Unix were really aims in making computing more portable, more standard, easier to use. How all that Ritchie, Ken Thompson, Brian Kernighan and the others got for this was their salary [although they did make a few quid on top out of their books 🙂 ], and the ability to distribute Unix free, with C, to all and sundry as Bell labs, a subsidiary of AT&T, had no financial interest in computing as they were a regulated telephone monopoly at the time.

Thanks Dennis.