Turning rows into columns

Penshaw Monument
Turning Moo’s into Columns?

It’s odd how some things pass you by in new Oracle releases. Today I discovered “sys.odcinumberlist” and “sys.odcivarchar2list“.

I was wanting to generate a pre-defined list for a join to ensure each value was represented, and came up with the new in Oracle 12 function “sys.odcinumberlist”, so not new! Well, there’s 2 straightforward functions: “sys.odcinumberlist” and “sys.odcivarchar2list”.

How are they used?

SELECT column_value AS number_list
FROM sys.odcinumberlist(1,2,5,7,9,5,2,-34,5,43,2,0)
ORDER BY number_list;

NUMBER_LIST
-----------
        -34
          0
          1
          2
          2
          2
          5
          5
          5
          7
          9
         43

12 rows selected.

Note how I have had to alias the "column value" virtual column to allow mne to sort the list
SELECT column_value AS char_list FROM sys.odcivarchar2list('A','V','DD');

CHAR_LIST
---------
A
V
DD

Nice, simple, with a limit of 32768 items in the list. So how was I using it? I was checking to see which profiles existed in a database (the list was somewhat longer than this, as was the SQL, but this is good for a demo) so I generated my full list of known profiles to join to the existing profiles in each database.

WITH my_profiles AS (
    SELECT
        column_value AS profile_name
    FROM
        sys.odcivarchar2list(
            'DEFAULT'
          , 'SOME_OTHER_PROFILE' --etc etc etc 
        )
)
SELECT
    mp.profile_name
  , nvl( profile , '<does-not-exist>') AS profile
  , resource_name
  , limit
FROM
    my_profiles  mp
    LEFT OUTER JOIN dba_profiles dp ON ( mp.profile_name = dp.profile )
ORDER BY
    profile
  , resource_name;

PROFILE_NAME         PROFILE              RESOURCE_NAME                  LIMIT
-------------------- -------------------- ------------------------------ -----------
SOME_OTHER_PROFILE   <does-not-exist>
DEFAULT              DEFAULT              COMPOSITE_LIMIT                UNLIMITED
DEFAULT              DEFAULT              CONNECT_TIME                   UNLIMITED
DEFAULT              DEFAULT              CPU_PER_CALL                   UNLIMITED
DEFAULT              DEFAULT              CPU_PER_SESSION                UNLIMITED
DEFAULT              DEFAULT              FAILED_LOGIN_ATTEMPTS          10
DEFAULT              DEFAULT              IDLE_TIME                      UNLIMITED
DEFAULT              DEFAULT              INACTIVE_ACCOUNT_TIME          UNLIMITED
DEFAULT              DEFAULT              LOGICAL_READS_PER_CALL         UNLIMITED
DEFAULT              DEFAULT              LOGICAL_READS_PER_SESSION      UNLIMITED
DEFAULT              DEFAULT              PASSWORD_GRACE_TIME            7
DEFAULT              DEFAULT              PASSWORD_LIFE_TIME             180
DEFAULT              DEFAULT              PASSWORD_LOCK_TIME             1
DEFAULT              DEFAULT              PASSWORD_REUSE_MAX             UNLIMITED
DEFAULT              DEFAULT              PASSWORD_REUSE_TIME            UNLIMITED
DEFAULT              DEFAULT              PASSWORD_ROLLOVER_TIME         -1
DEFAULT              DEFAULT              PASSWORD_VERIFY_FUNCTION       NULL
DEFAULT              DEFAULT              PRIVATE_SGA                    UNLIMITED
DEFAULT              DEFAULT              SESSIONS_PER_USER              UNLIMITED

There’s a number of other ways to do this, including using the UNPIVOT function, creating your own table type, etc, but this was the nice and simple option I was looking for.

Inserting data in SQL*Plus correctly

When inserting data into the database, it is occasionally forgotten (especially by English-speakers) that we need to take steps to ensure we are inserting data correctly and without unexpected character translation.

For example, in SQL*Plus we need to ensure we set the NLS_LANG environment variable to the correct setting for our database before we initiate SQL*Plus.

Here’s a quick example showing what can go wrong:

[oracle@ORA122 ~]$ echo $NLS_LANG

[oracle@ORA122 ~]$ sqlplus neil/neil
SQL*Plus: Release 12.2.0.1.0 Production on Thu Dec 22 13:56:23 2016
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Last Successful login time: Thu Dec 22 2016 13:46:58 +00:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

NEIL @ pdb1 > create table lang (col1 varchar2(50 CHAR));
Table created.

NEIL @ pdb1 > insert into lang values (q'{J'ai cassé l'insert}')
  2  /
1 row created.

NEIL @ pdb1 > commit;
Commit complete.

NEIL @ pdb1 > select * from lang;

COL1
--------------------------------------------------
J'ai cass? l'insert

As we can see, the accented ” é ” has been lost in translation somewhere. This is not good.

If we set NLS_LANG correctly and repeat the insert, we get a different result:

[oracle@ORA122 ~]$ export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
[oracle@ORA122 ~]$ sqlplus neil/neil
SQL*Plus: Release 12.2.0.1.0 Production on Thu Dec 22 13:56:23 2016
Copyright (c) 1982, 2016, Oracle.  All rights reserved.
Last Successful login time: Thu Dec 22 2016 13:46:58 +00:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

NEIL @ pdb1 > insert into lang values (q'{J'ai cassé l'insert}')
  2  /
1 row created.

NEIL @ pdb1 > commit;
Commit complete.

NEIL @ pdb1 > select * from lang;

COL1
--------------------------------------------------------------------------------
J'ai cass� l'insert
J'ai cassé l'insert

Note how the original insert is now returning even more troublesome nonsense that it was previously!

So, how do we know what to set the NLS_LANG variable to avoid this? The answer is partly in the database.

NEIL @ pdb1 > select * from v$nls_parameters 
              where parameter in ('NLS_LANGUAGE','NLS_TERRITORY','NLS_CHARACTERSET');

PARAMETER                                VALUE                    CON_ID
---------------------------------------- -------------------- ----------
NLS_LANGUAGE                             AMERICAN                      3
NLS_TERRITORY                            AMERICA                       3
NLS_CHARACTERSET                         AL32UTF8                      3

Which gives us:

export NLS_LANG=AMERICAN_AMERICA.AL32UTF8

However, this just tells you about the database character set.
That may not be what we are after!

The NLS_LANG should reflect the setting of the operating system character set of the client. In Windows there is no UTF-8 client, so you need to select the correct client code page.

On Linux, check out the value of $LANG

[oracle@ORA122 ~]$ echo $LANG
en_US.UTF-8

Chances are it’s UTF8, so it all matches up.

export NLS_LANG=AMERICAN_AMERICA.UTF8

If it is Windows, well that’s more complex as you need to match the code page of the client correctly. Windows generally matches the code page to the Unicode via its API’s, but you may have different code pages based upon whether you are on the command line (sqlplus) or OEM (SQL Developer)

A list of Windows code pages is held here.

On an English Windows client, the code page is 1252.

If you are using SQL Developer, in Tools => Preferences, check the encoding.

sqldev_encoding

You can find out what command-line code page you are running using “chcp”

Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.

C:\Users\neil>chcp
Active code page: 850

NOTE: This is the DOS (sqlplus) codepage, not the GUI Codepage.

MS-DOS codepage Oracle Client character set (3rd part of NLS_LANG)
437 US8PC437
737 EL8PC737
850 WE8PC850
852 EE8PC852
857 TR8PC857
858 WE8PC858
861 IS8PC861
862 IW8PC1507
865 N8PC865
866 RU8PC866
set NLS_LANG=american_america.WE8PC850

sqlplus ...

 

It is also worth bearing in mind that the new database characterset default in Oracle 12.2 is now AL32UTF8.

Don’t forget to ensure your terminal session won’t mess up any character translation too. Set it to the correct server character set.

putty_utf8

Happy Inserting!

Use Unicode character sets, and go read the manual on this. It’s not straightforward!

 

Accessing STATUS columns efficiently

A frequently reoccuring design problem with relational databases is the issue locating unprocessed rows in a large table, so we know which rows of data are still yet to be processed.

The problem with a STATUS column is that it generally has low cardinality; there are probably only a handful of distinct values [(C)omplete, (E)rror, (U)nprocessed or something like that]. Most records will be (C)omplete. This makes STATUS a poor candidate for standard B-Tree indexation. In a high throughput OLTP database, using bitmap indexes is probably not an option due to concurrency.

[Aside: When coding flag columns in Oracle, ALWAYS use a VARCHAR2(1 CHAR) {or CHAR(1 CHAR) if you prefer, but a CHAR is a VARCHAR2 under the covers and occupies the same number of bytes}. This is in preferance to a NUMBER(1). which occupies more bytes for a “1” than a “0”, so when you update it, you run the risk of row migration, chained rows and a performance hit. Frequently, ORM’s like Hibernate code for NUMBER by default. Override this!]

So what are my options? There’s a short list of possible table accesses for a low cardinality column.

1. Table scan. In an OLTP database where you only want a tiny fraction of the rows in the table, this would be a bad chouce.
2. Index the accessed columns and accept the inevitable INDEX_SCAN or FAST_FULL_INDEX_SCAN. This is not great and you probably need a Histogram on the column to convince the optimizer to use the index for your low frequency values. Otherwise you may be back to the table scan.
3. Make the “Complete” status “NULL”.
4. Uses a function-based index which makes the Complete status seems to be NULL for a specific query.

So what’s with options 3 and 4, why are they good, and how do we use them?

Unlike some RBDMS’s, Oracle does not store NULL values in it’s simple (non-composite) b-tree indexes. Therefore, if you choose Option (3) and make your “Complete” status be represented by a NULL, you will maintain an index on STATUS in which the only values that are stored are values you are interested in. This makes the index very sexy to the optimizer as it will generally be very tiny. However, we face one small problem. Convincing Developers that having a NULL as a valid status can be difficult. A NULL is a non-representative value. It is not supposed to represent anything. It means “I don’t know”. It doesn’t behave the same an normal values. This tends to freak out Developers and designers sometimes.

That’s where Option 4 comes in. If we wrap the index definition in a CASE statement, to produce a function-based index, we have have a highly specific tailored index on our table. If the SQL predicate matches the query exactly, we get a serious performance payoff.

But don’t take my word for it. Here’s a worked example from my laptop:

 
Here’s the table, it’s data distribution (16m rows, and a handful we care about)

NEIL @ ORCL01 > desc test_table
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ID                            NOT NULL NUMBER
 STATUS                        NOT NULL VARCHAR2(1 CHAR)
 DESCRIPTION                   NOT NULL VARCHAR2(100 CHAR)

NEIL @ ORCL01 > select status,count(*) from test_table group by status

S   COUNT(*)
- ----------
E         16
C   16777216
Y         32

 
Here are the indexes on the table, and their sizes. As you can see, the function-based index is absolutely tiny, making it as attractive to storage admins as it is to the optimizer.

- alter table test_table add constraint test_table_pk primary key (id);
- create index test_table_CASE on test_table (case status when 'Y' then status else null end);
- create index test_table_COVER_COMP on test_table (status, id) compress 1;
- create index test_table_STATUS on test_table (status) compress 1;



NEIL @ ORCL01 > select segment_name,segment_type,sum(bytes/1024) kb from user_extents 
where segment_name like 'TEST_TABLE%' 
group by segment_type,segment_name order by 2 desc,1;

SEGMENT_NAME               SEGMENT_TYPE               KB
-------------------------- ------------------ ----------
TEST_TABLE                 TABLE                  555008
TEST_TABLE_CASE            INDEX                      64
TEST_TABLE_COVER_COMP      INDEX                  658432
TEST_TABLE_PK              INDEX                  319488
TEST_TABLE_STATUS          INDEX                  413696

Some Index stats:
INDEX_NAME                DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR STATUS     NUM_ROWS SAMPLE_SIZE LAST_ANAL
------------------------- ------------- ----------------------- ----------------------- ----------------- -------- ---------- ----------- ---------
TEST_TABLE_CASE                       1                       1                       6                 6 VALID            32          32 21-FEB-16
TEST_TABLE_COVER_COMP          16748149                       1                       1            125447 VALID      16748149      234974 21-FEB-16
TEST_TABLE_PK                  17003239                       1                       1             91391 VALID      17003239      492287 21-FEB-16
TEST_TABLE_STATUS                     3                   13828                   32011             96034 VALID      16257590      363295 21-FEB-16

 
Where we have a choice of useful indexes, we get a FAST FULL SCAN with a hefty cost. A histogram could have given us an index RANGE SCAN, which can be very good.
With no Histogram:

select id from test_table where status = 'Y';

Plan hash value: 1140618830

----------------------------------------------------------------------------------------------
| Id  | Operation            | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                       |       |       | 18753 (100)|          |
|*  1 |  INDEX FAST FULL SCAN| TEST_TABLE_COVER_COMP |  5592K|    42M| 18753   (1)| 00:00:01 |
----------------------------------------------------------------------------------------------

 
With a histogram in place on STATUS, you get a much better plan as the covering index avoids the need for the table look-up. You also get the risk that the optimizer may have bind variable peeking issues and other complications should we have lots of table joins.

select id from test_table where status = 'Y'

Plan hash value: 2912582684

------------------------------------------------------------------------------------------
| Id  | Operation        | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                       |       |       |     3 (100)|          |
|*  1 |  INDEX RANGE SCAN| TEST_TABLE_COVER_COMP |    32 |   256 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

NOTE: Ditching the covering index and just using the index on STATUS is pretty efficient too when combined with a histogram:

select id from test_table where status = 'Y'

Plan hash value: 2416598805

---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TEST_TABLE        |    32 |   256 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | TEST_TABLE_STATUS |    32 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------


 
And now with the function-based index; having the case statement removing all statuses we are not interested-in for a tiny tidy index.

NOTE: The Predicate in the query must EXACTLY match the function-based index for it to be used.

select id from test_table where case status when 'Y' then status else null end = 'Y'

Plan hash value: 2073004851

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |       |       |     7 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TEST_TABLE      |    32 |   256 |     7   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | TEST_TABLE_CASE |    32 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Conclusion: For a highly skewed STATUS column you need a histogram, which is something you should mostly avoid in OLTP systems using BIND variables. Having a highly focussed function-based index allows for a tiny self-maintaining index which is guaranteed to only be used for queries that you want it to be used for.
 

NOTE: The original idea behind using NULLS to minimise index size came from the performance expert, Jonathan Lewis. I have implemented both NULL-as-complete design and case-based indexes at several clients, in varying forms, and always to great success.

Primarys Keys and their supporting indexes

Sometimes things just happen which makes you realise that stuff you thought all experienced DBA’s knew about isn’t common knowledge. I suppose it’s a side effect of working with the same evolving product for over 25 years at dozens of clients, when the colleague you are talking to has a mere decade of experience at fewer companies (and therefore less diversity of challenge).

Today I noticed that a release was creating a table, then an index, and then adding a Primary Key based upon the index. Pretty normal, but the indexes were non-unique. Interesting. Is that valid? (yes) It is sensible? (under some circumstances, also yes). Is it wrong but nobody noticed until it got to me? Probably.

However, there’s more variables at play here than first meets the eye. The Primary Key is being enforced by a pre-existing non-unique index, rather than create its own index “dynamically” (which would be unique under most circumstances). So therefore we have a few questions to answer

  1. Is the index being used optimal?
  2. How do I know it’s using a pre-created index and the constraint did not create its own index?
  3. What happens if I drop the PK constraint?

 

Scenario 1: create table, create constraint

You get a nice neat fast index enforcing the constraint. If you drop the constraint, the index will also get removed. This may or may not be what you want. You may need to put a new index back on the table…

Scenario 2: create table, create unique index, create constraint

You get a nice neat fast index enforcing the constraint. If you drop the constraint, the index does not get removed. This may or may not be what you want. The index which was left behind might cause a problem if you miss the fact it’s still around…

Scenario 3: create table, create non-unique index (same as constraint columns), create constraint

The index enforcing the constraint is slightly less efficient enforcing the constraint as it has to optimize to an index range scan, not a unique scan. However, if you drop the constraint you still have a non-unique index. This may be a very useful ability for bulk data loads with questionable data, although the use-cases are limited and tend to be Warehouse-centric.

If you have a deferrable constraint, you must have a non-unique index to enforce the constraint as there may temporarily be duplicate keys in the table part-way through the transaction.

Scenario 4: create table, create non-unique composite index, create constraint which uses leading columns of that index but not all of the columns

The index may be somewhat less efficient enforcing the constraint by having less rows per block and possibly more levels, but this inefficiency may help overall system efficiency by having to maintain less indexes – maintaining 2 indexes instead of 1 is a 100% overhead in index maintenance. The composite index cannot be a unique index.

 

Some worked example showing the above scenarios (ran on Oracle 12.1.0.2.0 in a PDB) :

NEIL @ ORCL01 > -- Scenario 1
NEIL @ ORCL01 > drop table neil;
Table dropped.
NEIL @ ORCL01 > create table neil (col_id number, col_txt varchar2(10 char));
Table created.
NEIL @ ORCL01 > alter table neil add constraint neil_pk primary key (col_id) ;
Table altered.
NEIL @ ORCL01 > select index_name,uniqueness from user_indexes where table_name = 'NEIL';
INDEX_NAME           UNIQUENESS
-------------------- --------------------
NEIL_PK              UNIQUE
NEIL @ ORCL01 > insert into neil values (1,'A');
1 row created.
NEIL @ ORCL01 > insert into neil values (1,'B');
insert into neil values (1,'B')
*
ERROR at line 1:
ORA-00001: unique constraint (NEIL.NEIL_PK) violated
NEIL @ ORCL01 > commit;
Commit complete.
NEIL @ ORCL01 > alter table neil drop constraint neil_pk;
Table altered.
NEIL @ ORCL01 > -- And the index is gone...
NEIL @ ORCL01 > select index_name,uniqueness from user_indexes where table_name = 'NEIL';
no rows selected
NEIL @ ORCL01 > -- Scenario 2
NEIL @ ORCL01 > drop table neil;
Table dropped.
NEIL @ ORCL01 > create table neil (col_id number, col_txt varchar2(10 char));
Table created.
NEIL @ ORCL01 > create unique index neil_i on neil(col_id);
Index created.
NEIL @ ORCL01 > alter table neil add constraint neil_pk primary key (col_id) ;
Table altered.
NEIL @ ORCL01 > select index_name,uniqueness from user_indexes where table_name = 'NEIL';
INDEX_NAME           UNIQUENESS
-------------------- --------------------
NEIL_I               UNIQUE
NEIL @ ORCL01 > insert into neil values (1,'A');
1 row created.
NEIL @ ORCL01 > insert into neil values (1,'B');
insert into neil values (1,'B')
*
ERROR at line 1:
ORA-00001: unique constraint (NEIL.NEIL_PK) violated
NEIL @ ORCL01 > commit;
Commit complete.
NEIL @ ORCL01 > alter table neil drop constraint neil_pk;
Table altered.
NEIL @ ORCL01 > -- And the index survives
NEIL @ ORCL01 > select index_name,uniqueness from user_indexes where table_name = 'NEIL';
INDEX_NAME           UNIQUENESS
-------------------- --------------------
NEIL_I               UNIQUE
NEIL @ ORCL01 > -- Scenario 3
NEIL @ ORCL01 > drop table neil;
Table dropped.
NEIL @ ORCL01 > create table neil (col_id number, col_txt varchar2(10 char));
Table created.
NEIL @ ORCL01 > create index neil_i on neil(col_id);
Index created.
NEIL @ ORCL01 > alter table neil add constraint neil_pk primary key (col_id) using index neil_i;
Table altered.
NEIL @ ORCL01 > select index_name,uniqueness from user_indexes where table_name = 'NEIL';
INDEX_NAME           UNIQUENESS
-------------------- --------------------
NEIL_I               NONUNIQUE
NEIL @ ORCL01 > insert into neil values (1,'A');
1 row created.
NEIL @ ORCL01 > insert into neil values (1,'B');
insert into neil values (1,'B')
*
ERROR at line 1:
ORA-00001: unique constraint (NEIL.NEIL_PK) violated
NEIL @ ORCL01 > commit;
Commit complete.
NEIL @ ORCL01 > alter table neil drop constraint neil_pk;
Table altered.
NEIL @ ORCL01 > -- And the index survives
NEIL @ ORCL01 > select index_name,uniqueness from user_indexes where table_name = 'NEIL';
INDEX_NAME           UNIQUENESS
-------------------- --------------------
NEIL_I               NONUNIQUE
NEIL @ ORCL01 > -- Scenario 4
NEIL @ ORCL01 > drop table neil;
Table dropped.
NEIL @ ORCL01 > create table neil (col_id number, col_txt varchar2(10 char));
Table created.
NEIL @ ORCL01 > create index neil_i_composite on neil(col_id,col_txt);
Index created.
NEIL @ ORCL01 > alter table neil add constraint neil_pk primary key (col_id) using index neil_i_composite;
Table altered.
NEIL @ ORCL01 > select index_name,uniqueness from user_indexes where table_name = 'NEIL';
INDEX_NAME           UNIQUENESS
-------------------- --------------------
NEIL_I_COMPOSITE     NONUNIQUE
NEIL @ ORCL01 > insert into neil values (1,'A');
1 row created.
NEIL @ ORCL01 > insert into neil values (1,'B');
insert into neil values (1,'B')
*
ERROR at line 1:
ORA-00001: unique constraint (NEIL.NEIL_PK) violated
NEIL @ ORCL01 > commit;
Commit complete.
NEIL @ ORCL01 > alter table neil drop constraint neil_pk;
Table altered.
NEIL @ ORCL01 > -- And the index survives
NEIL @ ORCL01 > select index_name,uniqueness from user_indexes where table_name = 'NEIL';
INDEX_NAME           UNIQUENESS
-------------------- --------------------
NEIL_I_COMPOSITE     NONUNIQUE

So how do I tell if, when I drop my constraint, I will also drop the index? This does not appear to be exposed in any DBA_ view of which I am aware, so I have had to rely upon this post from Jonathan Lewis to help me decipher the information. You need to see if certain bits are set in the column sys.ind$.property. Here’s my worked example to show the information we seek:

NEIL @ ORCL01 > drop table neil_cons_create;
Table dropped.
NEIL @ ORCL01 > create table neil_cons_create (col_id number, col_txt varchar2(10 char));
Table created.
NEIL @ ORCL01 > alter table neil_cons_create add constraint neil_cons_create_pk primary key (col_id);
Table altered.
NEIL @ ORCL01 > drop table neil_pre_create;
Table dropped.
NEIL @ ORCL01 > create table neil_pre_create (col_id number, col_txt varchar2(10 char));
Table created.
NEIL @ ORCL01 > create unique index neil_pre_create_i on neil_pre_create(col_id);
Index created.
NEIL @ ORCL01 > alter table neil_pre_create add constraint neil_pre_create_pk primary key (col_id);
Table altered.
NEIL @ ORCL01 > select index_name,uniqueness from user_indexes where table_name like 'NEIL%';
INDEX_NAME           UNIQUENESS
-------------------- --------------------
NEIL_CONS_CREATE_PK  UNIQUE
NEIL_PRE_CREATE_I    UNIQUE

select do.owner,do.object_name,do.object_id,ind.property, 'Keep Index' action
  from sys.ind$ ind
       ,dba_objects do
where do.object_name like 'NEIL%'
  and do.object_id = ind.obj#
  and bitand(ind.property,4097) = 4097
union all
select do.owner,do.object_name,do.object_id,ind.property, 'Drop Index' action
  from sys.ind$ ind
      ,dba_objects do
where do.object_name like 'NEIL%'
  and do.object_id = ind.obj#
  and bitand(ind.property,4097) <> 4097
order by 1,2,3
/
OWNER        OBJECT_NAME         OBJECT_ID  PROPERTY   ACTION
------------ ------------------- ---------- ---------- --------------------
NEIL         NEIL_CONS_CREATE_PK      93814       4097 Keep Index
NEIL         NEIL_PRE_CREATE_I        93816          1 Drop Index

When did I update that row?

I had a requirement the other day to understand when some data had been changed, but there was no column on the table which showed this.

So how do I find out? Well I could go off mining redo and lots of other time consuming and exotic things, but you can use the Oracle Pseudocolumn ORA_ROWSCN. This gives the SCN assocaited with each row. Well, actually it usually doesn’t. It does not show when the individual row of data was changed but, by default, gives the last changed SCN for the block where the row of data lives.

If you want to know with accuracy the SCN for the row change, you need to create your table with the extension “rowdependencies”. This adds a hidden column to each row, taking 6 bytes and storing the SCN on a row-by-row basis with your data. NOTE: This is a CREATE TABLE option only. You can’t modify a table to add rowdependencies, and there are a few restrictions for tables where this is enabled. Check the documentation for your version.

So, we now have the SCN, whether for the BLOCK or the ROW. What good is that? Well, there’s a chance that Oracle will have remembered approximately when that SCN came out. I think you are guaranteed this for about 120 hours – nearly a week – but depending upon a number of factors including the flashback retention and undo retention times.

You can get a rough idea of the SCN time from V$LOG_HISTORY, a more accurate idea from SYS.SMON_SCN_TIME, or just use the SCN_TO_TIMESTAMP function to make your life easier! If you are within 120 hours and have rowdependencies enabled, it will be roughly accurate to the time of the commit, depending upon a couple of factors – please see comments.

Here’s a short worked example to show the sorting of SCN’s in both normal (block) and rowdependency-enabled tables. Note how the ORA_ROWSCN is the same for each row in the same block in the normal table.

 

00:29:34 NEIL @ ORCL01 > create table scn_block (col1 number, col2 date, c_scn number);
Table created.
00:29:34 NEIL @ ORCL01 > create table scn_row (col1 number, col2 date, c_scn number) rowdependencies;
Table created.
00:29:34 NEIL @ ORCL01 > insert into scn_block values (1,sysdate,userenv('commitscn') );
1 row created.
00:29:34 NEIL @ ORCL01 > commit;
Commit complete.
00:29:34 NEIL @ ORCL01 > host sleep 5
00:29:39 NEIL @ ORCL01 > insert into scn_row values (1,sysdate,userenv('commitscn') );
1 row created.
00:29:39 NEIL @ ORCL01 > commit;
Commit complete.
00:29:39 NEIL @ ORCL01 > host sleep 5
00:29:44 NEIL @ ORCL01 > insert into scn_block values (1,sysdate,userenv('commitscn') );
1 row created.
00:29:44 NEIL @ ORCL01 > commit;
Commit complete.
00:29:44 NEIL @ ORCL01 > host sleep 5
00:29:49 NEIL @ ORCL01 > insert into scn_row values (1,sysdate,userenv('commitscn') );
1 row created.
00:29:49 NEIL @ ORCL01 > commit;
Commit complete.
00:29:49 NEIL @ ORCL01 > column dt_1 format a30 truncate
00:29:49 NEIL @ ORCL01 > column dt_2 format a30 truncate
00:29:49 NEIL @ ORCL01 >
00:29:49 NEIL @ ORCL01 > select col1,col2,c_scn,ora_rowscn,scn_to_timestamp(c_scn) dt_1,scn_to_timestamp(ora_rowscn) dt_2 from scn_block;

COL1       COL2      C_SCN      ORA_ROWSCN DT_1                           DT_2
---------- --------- ---------- ---------- ------------------------------ ------------------------------
1          05-DEC-15 3670102    3670149    05-DEC-15 00.29.34.000000000   05-DEC-15 00.29.43.000000000
1          05-DEC-15 3670148    3670149    05-DEC-15 00.29.43.000000000   05-DEC-15 00.29.43.000000000

00:29:49 NEIL @ ORCL01 > select col1,col2,c_scn,ora_rowscn,scn_to_timestamp(c_scn) dt_1,scn_to_timestamp(ora_rowscn) dt_2 from scn_row;

COL1       COL2      C_SCN      ORA_ROWSCN DT_1                           DT_2
---------- --------- ---------- ---------- ------------------------------ ------------------------------
1          05-DEC-15 3670133    3670134    05-DEC-15 00.29.39.000000000   05-DEC-15 00.29.39.000000000
1          05-DEC-15 3670160    3670161    05-DEC-15 00.29.48.000000000   05-DEC-15 00.29.48.000000000

In an interesting convergance, whilst I was doing this, Martin Widlake was looking at the same thing in a slightly different way. How meta.

 

Developers

Just a small Sunday night anecdote with a wider point. I, or maybe a colleague, recently received an update statement from a developer. Now, this developer is long of tooth and is well versed in the ways of Oracle data manipulation.

The aforementioned update statement contained an interesting hint. BYPASS_UJVC. You may not have heard of this hint. It’s not commonly used, although it’s been around since Oracle 8.1.5. Mainly because it is both undocumented and unsupported by Oracle. In the right hands, it’s a very neat way around a problem of doing an update through a join where you would otherwise be restricted by the potential of having transient keys (i.e. multiple updates via the join giving random results). There’s a bunch of other blogs around about how/why/not to use it so I won’t waffle on here.

However, the Dev was disappointed when we [the DBA’s] told him to, erm, rewrite his code (probably as a merge – tends to let you get round the same problem), given he has been using this hint for as long as it’s been around (a long time!) but as it’s NOT supported we wouldn’t allow it. I’m not about to update millions of rows in a multi-billion row database with an unsupported function unless I have a MAJOR problem.

The point of the story is, that evening, we met up for drinks with another Developer (Dev2) whom we have both known for a couple of decades. Dev says to Dev2 “DBA’s – they are never happy, going around being all DBA-ish and No-you-cant today”, and (unprompted) Dev2 says “BYPASS_UJVC?”

I like Developers – I used to be one – but the role of Developer and DBA should be symbiotic and not adversarial as it can become upon occasion. We should work together – all of the best system I have delivered have a great relationship between Dev and DBA – but don’t ask me to bend my database out of shape just so you can take a short cut. Talk about physicalities, data access, data life-cycle, volume and performance before a line of code has been written and we will all work much better together. If all parties feel they have some ownership with the system, there is a greater chance of success and joy.

When to use the NOLOCK hint in SQL Server

I frequently hear of, and see, developers and DBA’s using the NOLOCK hint within SQL Server to bypass the locking mechanism and return their data sets as soon as possible. There are times when this is OK, such as when you are running an ad hoc query and are only interested in approximate results. It is somewhat less OK to write this hint into application code and reports, unless you don’t actually care whether the data returned is accurate.

The big problem with NOLOCK is that the effects of using it are not fully understood by many of the coders who are using it. The common perception is that you’re simply reading uncommitted data, and the odd roll-back isn’t too much to worry about. If that was the full extent of the problem, then the developer would be fairly right – we tend not to roll back too often so don’t worry about it. However, there are more insidious side effects which are not generally understood. Effects caused by how the underlying database actually works.

To try and explain the true nature of issuing READ UNCOMMITTED selects, via NOLOCK, I have created an example so you can see the issue at work.

Here’s 2 scripts. SCRIPT 1 creates a table, puts some static data into it, then starts inserting lots of data. Each row is padded for realism, to get a few rows per block. The Primary Key is a “UNIQUEIDENTIFIER”, so we should expect to get the keys spread, and subsequent inserts into the same blocks as our initial inserts. This should generate some block splits – something that happens a lot in SQL Server.
SCRIPT1:

IF OBJECT_ID('dbo.test_table') IS NOT NULL
DROP TABLE dbo.test_table;
-- create a table and pad it out so we only get a few rows per block
CREATE TABLE dbo.test_table
(
 pk UNIQUEIDENTIFIER DEFAULT ( NEWID() ) NOT NULL
,search_col VARCHAR(10)
,count_col  INT
,padding CHAR(100) DEFAULT ( 'pad' )
);
alter TABLE dbo.test_table add constraint test_table_pk primary key clustered (pk);
DECLARE @LOOP1 INT
SET @LOOP1=0
WHILE (@LOOP1 < 100)
BEGIN
 SET @LOOP1=@LOOP1+1
 INSERT INTO dbo.test_table ( search_col, count_col ) VALUES('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1),('THIS_ONE',1);
END;
select getdate(),sum(count_col) from dbo.test_table (NOLOCK) where search_col = 'THIS_ONE';
set nocount on
-- insert 100,000 rows, which should cause some lovely block splits as the PK will look to insert into the same block as the data we already have in there
-- we need to run the select in another windoow at the same time
DECLARE @LOOP INT
SET @LOOP=0
WHILE (@LOOP < 100000)
BEGIN
 SET @LOOP=@LOOP+1
 INSERT INTO dbo.test_table ( search_col, count_col ) VALUES ( CAST( RAND() * 1000000 AS CHAR) , 100000 )
END
select getdate(),sum(count_col) from dbo.test_table (NOLOCK) where search_col = 'THIS_ONE';

Output from SCRIPT1 – note that the 2 selects, before and after inserts, give the same output.

----------------------- -----------
2014-10-12 23:51:34.210 1000
---------------------- -----------
2014-10-12 23:51:53.490 1000

Whilst SCRIPT1 is running, run SCRIPT 2 in another window in the same database. It’s just repeating the same SELECT with (NOLOCK) over and over again. The WHERE clause doesn’t change, and the correct result set should never change… but due to the block splits we see it change. A lot. As the data from the block split is duplicated into the split block before cleanup on the old block, the NOLOCK, performing the READ UNCOMMITTED select sees the “data duplication” in the newly split block.
SCRIPT2:

set nocount on
DECLARE @LOOP INT
SET @LOOP=0
WHILE (@LOOP < 10000)
begin
 SET @LOOP=@LOOP+1
 select getdate(),sum(count_col) from dbo.test_table (NOLOCK) where search_col = 'THIS_ONE';
end;

Output from SCRIPT2 (trimmed)

2014-10-12 23:51:35.473 1000
.
2014-10-12 23:51:35.530 1000
2014-10-12 23:51:35.530 1000
2014-10-12 23:51:35.533 1005
2014-10-12 23:51:35.533 1000
2014-10-12 23:51:35.537 1000
2014-10-12 23:51:35.537 1000
2014-10-12 23:51:35.540 1003
2014-10-12 23:51:35.540 1000
2014-10-12 23:51:35.543 1001
2014-10-12 23:51:35.543 1000
2014-10-12 23:51:35.547 1000
2014-10-12 23:51:35.550 1000
2014-10-12 23:51:35.550 1000
2014-10-12 23:51:35.553 1000
2014-10-12 23:51:35.557 1006
2014-10-12 23:51:35.557 1003
2014-10-12 23:51:35.560 1000
2014-10-12 23:51:35.560 1000
.
2014-10-12 23:51:53.383 1000
2014-10-12 23:51:53.400 1000
2014-10-12 23:51:53.417 1004
2014-10-12 23:51:53.433 1001
2014-10-12 23:51:53.450 1000
2014-10-12 23:51:53.467 1002
2014-10-12 23:51:53.483 1000
2014-10-12 23:51:53.507 1000
Query was cancelled by user.

 
So, using the NOLOCK hint can return incorrect results, even if the data you are selecting is unchanged, unchanging, and NOT subject to rollback.
Locking is there for a reason. ACID transactions exist for a reason.
If you care about your data, you should try to access it correctly and treat it well, otherwise you have to ask if the code you are writing really has value. If it doesn’t have value, why are you storing the data in an expensive relational database, when you could use a freeware database engine or just pipe it straight to /dev/null – that’s really quick.

One solution to this problem is to change the locking method of SQL Server, and start using Read Committed Snapshot Isolation** mode. This allows readers to access the data without blocking writers or be blocked by writers. It works similarly to Oracle’s Multi-Version Concurrency Control, and (sweeping generalisation alert!) allows SQL Server to scale better.

**NOLOCK still “works” the same in this mode – it needs to be removed from your code.

SQL Plan Management – 12C dumb feature

In Oracle 11G, Oracle introduced SQL Plan Management (SPM). It is excellent (I love it to bits). It allows you to create Baselines against SQL which lock-down the SQL execution plan. No more plan flips. More consistency. Perfect**.

Whenever some Baselined SQL is ran, Oracle still parses it and compares the parsed output to the accepted (Evolved) baselines. If the newly parsed plan is better, a new baseline is added to DBA_SQL_PLAN_BASELINES but is NOT accepted. This means that you need to spend time manually accepting the baseline; running the command DBMS_SPM.EVOLVE_SQL_BASELINE plan and checking the new plan.

If you want it, and/or Oracle evaluates that is it a better plan for that particular set of bind variables, the plan is accepted and becomes a candidate to be used by future execution of your SQL. Complete control over your execution plans.

So, Oracle, what’s all this about in Oracle 12C, eh?

In Oracle 12C there’s a new SPM Evolve advisor task. “By default, SYS_AUTO_SPM_EVOLVE_TASK runs daily in the scheduled maintenance window” – So, it runs every night and by default it runs DBMS_SPM.EVOLVE_SQL_BASELINE for all new baselines created today and automatically accepts the new plans.

BY DEFAULT? NO! NO! NO!

That is precisely what I don’t want from baselines – Oracle making it’s own mind up about plans without any input from me. I’m using baselines to stop Oracle changing its mind. To explicitly limit the number of paths allowed by the Optimizer to ones I know about and with which I am comfortable. Don’t introduce functionality to do the opposite.

So, immediately following the installation of 12C, I would recommend running (you need to be SYS for this):

SELECT PARAMETER_NAME, PARAMETER_VALUE AS "VALUE"
FROM   DBA_ADVISOR_PARAMETERS
WHERE  TASK_NAME = 'SYS_AUTO_SPM_EVOLVE_TASK' AND
         PARAMETER_NAME in ('ACCEPT_PLANS','TIME_LIMIT')
ORDER BY 1;

PARAMETER_NAME            VALUE
------------------------- ----------
ACCEPT_PLANS              TRUE
TIME_LIMIT                3600

Then run:

BEGIN
  DBMS_SPM.SET_EVOLVE_TASK_PARAMETER('SYS_AUTO_SPM_EVOLVE_TASK',
    'ACCEPT_PLANS', 'false');
END;
/

OK, back where we were, with any baselines fixed in place and doing what I want them to do! Not change.

 

**Perfect? No. But Baselines are good and, as long as your DB structure does not change, they should keep working. If they don’t, raise an SR with Oracle as it’s probably a bug.

UPDATE 2015-11-25: This is still as true in 12.1.0.2.5 as it was in 12.0 Grrrr!!!

SQL Developer insight

When using SQL Developer, there is a feature called Completion Insight which help you write SQL and PL/SQL in the SQL Worksheets. It will auto-pop helpful things, like object name when typing a SELECT statement, column names if it is aware of the object which you are accessing and things like that.

The pop-up seems to pop-up inconsistently, appearing when I don’t want it and not appearing when I do. I finally snapped and decided to work out (look up) exactly how this feature works. And it’s like this.

1. Make sure your SQL is correctly formed. If you have multiple bits of SQL in the Worksheet, and they are not terminated with a “;” or “/“, then it can get confused about where you are contextually within a statement and not pop-up.

2. The Auto-popup has a delay of either 0.3s (SQL) or 0.6s (PL/SQL). This can be changed in Tools -> Preferences -> Code Editor -> Completion Insight. I have now set a 1.0s delay so it’s not quite so eager to appear.

3. You want the pop-up to appear? Press CTRL + SPACE. If it still does not appear, then it cannot determine your SQL Context (see point 1) or there’s nothing to pop.

Hope this helps you. It’s certainly helped me.

Oracle Join Syntax for ANSI people

I was talking with a non-Oracle DBA the other day about Oracle SQL syntax for joins, as I had written some SQL which he didn’t understand, using the old form of Oracle notation to signify outer joins – the (+) operator. It got me thinking; I have been using Oracle for a very long time, before ANSI Join syntax was allowed. I therefore tend to use the Oracle-specific format for joins as my brain it wired to simply understand the notation. However, I understand the standard ANSI syntax as I do use and administer other RDBMS systems (maily SQL Server, but I am also familiar with the *= syntax used by Sybase.)

I was wondering, which syntax do you use? If you use the ANSI standard joins with Oracle, have you come across any drawbacks or problems; I seem to recall some anecdotes but can’t put my finger on any?

Examples of the Syntax:

--Inner Join ANSI
select def.username,usr.username,usr.account_status
from dba_users_with_defpwd def join dba_users usr on (def.username = usr.username)
order by 2
--Inter Join Oracle
select def.username,usr.username,usr.account_status
from dba_users_with_defpwd def, dba_users usr
where def.username = usr.username
order by 2

--Right outer join ANSI
select def.username,usr.username,usr.account_status
from dba_users_with_defpwd def right outer join dba_users usr on (def.username = usr.username)
order by 2

-- Right outer join oracle
-- put the (+) where you want to say "match regardless" [on the "wrong" side]
select def.username,usr.username,usr.account_status
from dba_users_with_defpwd def, dba_users usr
where def.username(+) = usr.username
order by 2