KILL! KILL! KILL! (of Unix processes)

The start of this isn’t my post – I got it from here: but I wanted to reblog/repost and enhance it because as far as I can tell, 99% of all known DBA’s only use kill -9 to remove unhappy processes.


Original Post:

Useless Use of Kill -9 form letter

No no no.  Don't use kill -9.

It doesn't give the process a chance to cleanly:

1) shut down socket connections

2) clean up temp files

3) inform its children that it is going away

4) reset its terminal characteristics

and so on and so on and so on.

Generally, send 15, and wait a second or two, and if that doesn't
work, send 2, and if that doesn't work, send 1.  If that doesn't,
REMOVE THE BINARY because the program is badly behaved!**

Don't use kill -9.  Don't bring out the combine harvester just to tidy
up the flower pot.

**don’t remove your Oracle or any other binaries please.


 

I hope you found that useful. I know I did. But what do the numbers mean? Well, they are increasingly violent ways to ask the program to stop itself. The command kill -9 isn’t asking the program to stop, it’s asking the O/S to stop running the program now, regardless of what it’s doing.

Run order of kills:

kill -15 : this is the equivalent of kill -sigterm and it the default. The program should terminate after it has finished what it is doing.

kill -2 : this is the equivalent of kill -sigint and is the same as pressing CTRL+C. This should mean “stop what you’re doing” — and it may or may not kill the program.

kill -1 : this is the equivalent of kill -sighup and tells the program that the user has disconnected. (e.g. SSH session or terminal window was closed). It usually results in a graceful shutdown of the program.

The executing program needs to be coded to recognise these kill signals, and all good software will spot them.

The other fun kill command is kill -sigstop. This can’t be blocked (like -9) as it’s an O/S level command too, but freezes the program execution like pressing CTRL+Z. You can continue the program execution later using kill -sigcont.

Advertisement

Killing Sessions

You have a session which is out of control. You decide to kill it. What options are available to you as a DBA?

Assuming you don’t have access to the O/S to be able to use O/S commands to kill the session (using kill -9 in Unix/Linux or orakill in windows to kill the thread within the process – “orakill sid thread”), you probably turn to:

alter system kill session 'sid, serial#, @inst_id' immediate;

(note the undocumented @inst_id, which can be used to identify specific RAC instances, rather than the instance you are connected to) Note that immediate does not change the processing, but just returns control to the command line immediately.

You can generate this command using

select 'alter system kill session ''' || sid || ', ' || serial# || ', @' || inst_id || ''';' from gv$session;

 

However, despite the name, this does not actually kill the session. You may get a pause issuing this command, followed by a “session marked for kill” message. The session then gets a status of “KILLED” in gv$session, but the session does not go away. As a further side effect, frequently the O/S process id is removed from gv$process, so you can’t now locate the process in the O/S to (ora)kill it.

What actually happens is a bit gets set against the session which effectively says “When you next check this bit, kill yourself”. But sometimes the session is busy, is rolling back (check for changing values in gv$transaction.used_urec), is waiting for a reply which never comes or is caught in a tight loop, and never gets around to checking the bit, and so it doesn’t die.

Does anyone know if pmon comes along, notices that you have a session marked for kill, kills it and initiates recovery for the session???

So, what’s a DBA to do? How do I actually kill the session? Step forward little-known command:

alter system disconnect session 'sid,serial#,@inst_id' POST_TRANSACTION | IMMEDIATE

And to generate it:
select 'alter system disconnect session ''' || sid || ', ' || serial# || ', @' || inst_id || ''';' from gv$session;

Again, note the undocumented @inst_id option for RAC.

What this command does is effectively kill the Operating system process. Right now (or after the next commit/rollback if you include the option POST_TRANSACTION.) It is the same as issuing an (ora)kill command.The IMMEDIATE here means “kill it now!”

Bang! and the session is gone.

Don’t kill the wrong session… you may well take your database down.

Developers Killing Sessions

When you end up spending a far great a percentage of your day than seems sensible killing off Java connections that Developers have carelessly left lying around, locking objects all over the place, you need a solution to get them to go away. The solution is to let them do it themselves!

I’m not advocating granting ALTER SYSTEM to Developers! That way madness lies, or certainly some unintended consequences. I’m all for Dev’s having a lot of freedom in the database, just not freedom with the database.

So, creating a stored procedure (in this example as sys, but as any user with an explicit ALTER SYSTEM privilege granted will do) to kill sessions without allowing too much latitude to do anything else seems appropriate. Here’s one I built earlier:

 

create or replace procedure sys.kill_session 
 ( p_sid IN number, p_serial IN number, p_instance IN number) as
 -- Neil Chandler. Grant the ability to kill session on a restricted basis. 21.07.2010
 l_username varchar2(30) := null;
 l_priv number := 1;
begin
 -- Who owns the session?
 select username into l_username
 from gv$session
 where sid = p_sid and serial#=p_serial and inst_id = p_instance;
 -- Check for DBA role
 select count(*) into l_priv
 from dba_role_privs
 where grantee = l_username and granted_role = 'DBA';
 -- If the user has the DBA priv, deny the kill request
 if l_priv > 0 or l_username is null
 then
 dbms_output.put_line 
  ('User request to kill session '||p_sid||','||p_serial||',@'||p_instance||
   ' denied. Session is for privileged user '||l_username||'.');
 else
 dbms_output.put_line 
   ('Killing user '||l_username||' - '||p_sid||','||p_serial||',@'||p_instance);
 execute immediate 'alter system disconnect session '''||
                    p_sid||','||p_serial||',@'||p_instance||
                   ''' immediate';
 end if;
end;
/

-- and let the proc be seen and used
create or replace public synonym kill_session for sys.kill_session; 
grant execute on kill_session to (whomever);

Then a nifty bit of sql to generate the kill commands for the Developers. 
Please include your favourite columns from gv$session:

select username,status,blocking_session,
 'exec kill_session ('|| sid||','||serial#||','||inst_id||')' Kill_Command 
 from gv$session
where username is not null
  and type <> 'BACKGROUND'
/

USERNAME  STATUS  BLOCKING_SESSION KILL_COMMAND
--------- ------- ---------------- ------------------------------------ 
SYS       ACTIVE                   exec kill_session (31,65,1)
SYSTEM    INACTIVE                 exec kill_session (58,207,1)
USER_1    INACTIVE                 exec kill_session (59,404,1)
USER_2    INACTIVE                 exec kill_session (72,138,1)
USER_2    INACTIVE                 exec kill_session (46,99,2)


May the odds be forever in your favour. Let the killing commence…

Killing a session. Dead!

… or what ALTER SYSTEM KILL SESSION doesn’t do.

So, you have a runaway session and you want it to stop now, please. You enter “ALTER SYSTEM KILL SESSION (sid,serial#) IMMEDIATE” – supplying the parameters from GV$SESSION (adding @instance_id if it’s running in a different node) and you wait a minute…. and eventually it gets “ORA-00031: session marked for kill” [“KILLED”]. So why didn’t it die immediately? What’s going on? Isn’t KILL a strong enough command?

What actually happened when you submit “ALTER SYSTEM KILL SESSION” is that a flag within the session it set to for the session to kill itself. It’s not so much KILL, as please commit suicide when you notice that I’ve asked. The session needs to wait for a chance to notice the flag, die and clean-up the session.

While you’re waiting for this to happen, it’s not uncommon for DBA’s to hop onto the operating system and do a “kill -9” (or orakill on Windows) to wipe out the process/thread that was running. This then causes SMON to clean up the mess, rather than the user process (which may or may not be desirable – SMON can spawn parallel clean-up slaves which can speed things up).

So, what if you have no access to the O/S, or want neater, tidier (and safer?) results? Use the ALTER SYSTEM DISCONNECT SESSION command instead. Get the same SID and SERIAL# from (g)v$session and give it a go. You have 3 options;

  • ALTER SYSTEM DISCONNECT SESSION ‘sid,serial#’ POST_TRANSACTION
    ( waits for in-flight transactions to complete )
  • ALTER SYSTEM DISCONNECT SESSION ‘sid,serial#’
    (I have read it works like ALTER SYSTEM KILL… IMMEDIATE? Yet to confirm by experiment)
  • ALTER SYSTEM DISCONNECT SESSION ‘sid,serial#’ IMMEDIATE
     (just like kill -9; wipes the O/S process out.)

So, the next time you want that session real dead, real quick, you know what to do.