Fundamental Security Part Seven– Data Encryption At Rest

October 2023 – Oracle 19C+

Photo by Sajad Nori on Unsplash

What is Data Encryption At Rest and why should we use it?

We have looked at Data Encryption in Transit, encrypting your network traffic. Everyone should be doing this. But what about Encrypting Data At Rest? Data that is stored permanently (on your hard drives).


The simplest way to do this, from a database/DBA perspective, is to use an encrypted storage device. Most SAN/NAS devices come with encryption built-in, and encryption of the storage presented to servers is abstracted away to the SAN. The CPU overhead is handled by the SAN and you don’t need to worry about it. However, if you are using Exadata or any Oracle Cloud Service, this is not an option. You will need to implement Oracle Transparent Data Encryption. This is mandatory with the Oracle Cloud. Some cloud providers provide a Storage-Level encryption option (which is frequently an added cost and is usually not the default!)

Why Do This?

There are 3 reasons to implement Encryption At Rest

1. In case the hardware gets stolen. This may not just be directly from your data centre, but following the decommissioning or break/replace of the hardware too.
2. Bypassing database controls to read or amend the data in your database directly from the files.
3. Because the auditors insist upon it (see points 1 and 2)


Storage-Level Encryption [SLE] v Oracle Transparent Data Encryption [TDE]

Neither Storage-Level Encryption [SLE] nor TDE means your data is safe. The processes running against your data need to see it unencrypted. If the processes can do this, so can suitably privileged Operating System administrators.

No matter what security you have put within your databases to prevent unauthorised access to your data, if your data is unencrypted (or encrypted using SLE where it is presented to the server “unencrypted”), you can run a “dd” (in Linux, other O/S have similar commands) and view the data directly. Don’t think for a moment that because it is in an Oracle internal format that it is difficult to read.

> create table read_me (c1 varchar2(1000)) tablespace users
;
Table created.

> insert into read_me values ('I CAN SEE YOU!');
1 row created.

> commit;
Commit complete.

select segment_name,block_id,blocks from dba_extents where segment_name = 'READ_ME';

SEGMENT_NAME                     BLOCK_ID     BLOCKS
------------------------------ ---------- ----------
READ_ME                               136          8

$ dd if=./users01.dbf bs=8192 skip=136 count=8 | od -xa --endian=big

*
0157740    0000    0000    0000    0000    0000    2c01    010e    4920
        nul nul nul nul nul nul nul nul nul nul   , soh soh  so   I  sp
0157760    4341    4e20    5345    4520    594f    5521    0106    7004
          C   A   N  sp   S   E   E  sp   Y   O   U   ! soh ack   p eot
0160000    06a2    0000    8f00    c003    6f04    4200    0000    0204
        ack   " nul nul  si nul   @ etx   o eot   B nul nul nul stx eot
0160020    73e2    0000    0100    0000    4426    0100    6f04    4200
          s   b nul nul soh nul nul nul   D   & soh nul   o eot   B nul
0160040    0080    0000    0200    3200    8800    c003    0000    0000
        nul nul nul nul stx nul   2 nul  bs nul   @ etx nul nul nul nul
0160060    0000    0000    0000    0000    0000    0000    0000    0000
        nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
*

Encrypting your data on the storage layer DOES NOT PREVENT this sort of access or potential attack.

Oracle Transparent Data Encryption [TDE]

First of all, Oracle Transparent Data Encryption is a COST OPTION. It is not part of the standard Oracle license.

TDE encrypts the actual Oracle data files. Running a “dd” against an encrypted tablespace reveals no usable information. However, you now need to handle all of the keys to be able to open your databases. Key management is the most important aspect of using TDE, ensuring they are available and are not compromised. If you lose the keys, your data is unrecoverable. Remember to backup the keys at the same time (but not with!) as the database as you will need the relevant key should you need to recover the database.

Important note: Most companies who implement TDE do it at the tablespace level. This means the data on disk is encrypted and the CPU overhead is generally low. However, data in memory is NOT encrypted. In Linux, as memory is effectively treated as a file by the O/S, it is possible to read whatever is in your buffer cache using the “dd” method. If you are very skilled (hackers are!), this is also possible from the dom0 layer of the hypervisor, which is accessible by the staff of the cloud provider. Only data encrypted at a column level remains encrypted within the buffer cache. The overhead for accessing column-level encrypted data is high.

It is debatable whether data in the server memory is “at rest”. If you have extremely valuable data, such as credit card information, this should be encrypted by the application (for example by adhering to PCI DSS standards).

For a basic implementation of TDE, I would recommend looking at Tim’s article here (rather than me repeat his excellent website) or one from an Oracle PM here.

When I have time, I’ll do a follow-on article about using a HSM or OKV to manage your keys.


Now I would like to talk about Fundamental Security Part Eight – Unified Audit

SAN Migration and moving files with ASM

Here’s a quick post to help with LUN, SAN or File migration when using Oracle ASM.

Please note that this is just an example method, and is not a definitive guide.
You need to ensure your approach is appropriate for your environment.

Did you know that you can add and drop disks on a diskgroup in a single command.
This saves Oracle from doing 2 REBAL actions and speeds things up considerably using a lot fewer resources.

sqlplus / as sysasm;

> ALTER DISKGROUP data 
2   ADD  DISK 'ORCL:disk101','ORCL:disk102' 
3   DROP DISK 'disk042'     ,'disk043' 
4   REBALANCE POWER n;

Diskgroup altered.

You can monitor the REBAL operation:

select * from v$asm_operation;

GROUP_NUMBER OPERA PASS      STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE CON_ID
------------ ----- --------- ---- ----- ------ ----- -------- -------- ----------- ---------- ------
           6 REBAL REBALANCE  RUN     1      1    68    20397     3030           8                 0
           6 REBAL COMPAT    WAIT     1      1     0        0        0           0                 0

If you are just migrating from one SAN to another, without needing to move files between diskgroups, this is a really easy way to achieve it.

From Oracle 12.1 onwards. if you are moving Datafiles from one Diskgroup to a new Diskgroup, that’s straightforward too:

export ORACLE_SID=ORCL
sqlplus  / as sysdba

alter database move datafile '+DATA_DG/ORCL/DATAFILE/system.101.902468275' to '+NEW_DG';

Moving logfiles involves a drop and re-create, but it still fully online.
Take care if using DataGuard that you do not try to drop logfiles before they are archived and applied to the standby:

alter database  add logfile member '+NEW_DG' to group 1;
alter database drop logfile member '+DATA_DG/ORCL/ONLINELOG/group_1.102.902049284';

Moving tempfiles can be tricky. The default gets used really quickly after startup.

You need to add a tempfile and then get all processes to not be using the old tempfile before you can drop it [select * from gv$sort_usage]. Worst-case, this may involve a SHUTDOWN, then a STARTUP RESTRICT to stop processes connecting. Alternatively, create a new default TEMP tablespace and convince processes to use that one.

alter tablespace temp add  tempfile '+NEW_DG';
alter tablespace temp drop tempfile '+DATA_DG/ORCL/TEMPFILE/temp.204.992482742';

Moving controlfiles requires a stop and start of the database. There’s no on-line options here. I tend to do the following:

srvctl stop database -d ORCL

sqlplus / as sysdba
startup nomount
show parameter control_files


rman target /
restore controlfile to '+NEW_DG' from '+DATA_DG/ORCL/CONTROLFILE/current.291.939274434';

sqlplus / as sysdba
alter system set control_files='+NEW_DG/ORCL/CONTROLFILE/current.992.346677889'
                              ,'+FRA_DG/ORCL/CONTROLFILE/current.???.?????????'
                               scope=spfile sid='*';
shutdown

srvctl start database -d ORCL

To move your spfile in ASM is a 2-stage process which will automatically update your database config in grid control:

sqlplus / as sysdba
create  pfile='/tmp/ORCL.ora' from spfile;
create spfile='+NEW_DG'       from pfile='/tmp/ORCL.ora';

Moving a password file in ASM is straightforward too:
NOTE: you are not allowed to move OMF files names and you must copy files via their alias.

as grid:
asmcmd
cp +DATA_DG/ORCL/orapworcl +NEW_DG/ORCL/orapworclexit
as oracle:
srvctl modify database -d ORCL -pwfile +NEW_DG/ORCL/orapworclsrvctl config database -d ORCL | grep Password

If you are using Block Change Tracking, you may need to move that file too:

alter database disable block change tracking;
alter database enable block change tracking using file '+NEW_DG';

Finally, if you’re dropping disk groups don’t forget to modify the disk group dependencies in Grid Infra so your DB isn’t dependent upon groups you have now removed:

srvctl config datbase -d ORCL | grep Group
srvctl modify database -d ORCL -diskgroup 'NEW_DG,FRA_DG,other_DG'

I hope you found this quick guide useful.

SAN Migration: When modern SANs “Fail”

… or at least, when they don’t behave like you were expecting them to.

I recently performed a major migration of a dozen missions critical Oracle and SQL Server systems from a pair of old EMC CX700’s to 2 brand new shiny HP XP 20000’s. This blog post is intended to show, at a very high level, what steps were taken to ensure the new SAN could cope with the volume of traffic. It also shows that, despite the new SAN being much more modern and intelligent, how you can still get caught out by unexpected behaviour.

The databases to be migrated varied from high throughput OLTP, low throughput – fast response time OLTP, a small Data Warehouse and Grid Control. The systems are 24 x 7 x 365, but a nightly maintenance window can be arranged on some days of the week between midnight and 3am. Not a large window, but invaluable.

What are we migrating?

For a successful SAN migration, we first need to understand what we are migrating. To do this we need to profile the I/O requirement of each Oracle database. Oracle provides us with all of the tools we need to be able to do this. If you have the Tuning and Diagnostics Pack licensed (which all Oracle sites should have – it should really be part of base product.) then you can pull the information you need directly out of the AWR table DBA_HIST_SYSTEM_EVENT. By default your AWR stats gather every hour and only last a week. Hourly averages of system event wait times will give a good starting point, but I would not want to use them to determine peak volumes. If you capacity plan to your averages, you will run out of resources before you hit your peaks. A lot can happen in an hour when you are looking at millisecond response times. You need to be much more granular than an hourly average, but you might not want change AWR to gather all of its information as frequently as you need for the I/O events. You need to develop some code to gather the information from v$system_event directly, so I did [I will post this code up shortly]. You can be as granular as you want, but I balanced peaks against volume of information and the impact of gathering it and was running every 5 minutes. Mostly. I also took some much more frequent samples to ensure that I wasn’t wildly wrong with the 5 minute averages, and that I wasn’t experiencing very short peaky loads. This method also has the advantage of not needing any additional pack licensing from Oracle. Sorry Larry.

What are the resources available on the new SAN?

The second requirement for a SAN migration is to understand the capabilities of your new SAN. There were already some systems on the new SAN so I was unable to test to destruction. However, some dedicated resources were available on the shared SAN, namely dedicated disks, cache and front-end ports. I needed to throw substantial workloads at the SAN to stress the resources and find the tipping point.  There are many workload tools available, such as Hammerora and RAT. Some are expensive, such as RAT. However, for my purposes I was only looking to stress the SAN and get a comfortable feeling about the IOPS and throughput performance. This was achieved using Oracle’s free SAN stress tool, Orion [I will hopefully post more details about this shortly]. By taking all of the profiled I/O rates, adding them up and comparing against the new SAN and it’s much greater cache capacity (16GB dedicated to my systems, up from 4GB on the EMC’s), we were able to see that the footprint of the databases to be migrated fitted comfortably into the capabilities of the new SAN. And so the migrations began.

Thin Provisioning

To provide resilience for the databases, LUNs were presented from each SAN to each cluster node, and were mirrored at the Operating System level, either using native mirroring on the Unix platforms, or Veritas on the Windows platforms. This allows either SAN (or SAN location) to fail with no loss of service. The LUNs were either RAID-10 or RAID-5, and were carved out of the storage pools using Thin Provisioning.

Thin Provisioning (or using Meta-Luns or similar techniques from your SAN provider) is a way to allow LUN’s to be easily carved out of storage pools and extended at a later date. They are wonderful for SAN administrators and people who have not capacity planned their systems due to their dynamic nature and minimization of space wastage. However, they work on the basis of providing a bit of storage to your LUN from every disk in the array. This gives a maximum concurrency, maximum contention model. Every LUN is on the same disk competing for the same resource. Lots of IOPS, but everyone wants them. The Thin Provisioning also adds a small overhead to the I/O processing. Not much. Not even a whole milli-second.

Migration Problem

The migrations progressed well, from least critical to most critical system. As each system was migrated, we kept monitoring and measuring the I/O response times. Migrating the Data Warehouse showed that the new SAN was performing as expected. Processing times were lower. Not much, but in line with expectation and allocated resources. However, Grid Control was exhibiting some strange response times. Despite theoretically having more resources, some I/O response time metrics were worse on average, and much more variable than before. With the EMC SAN’s, we  had experienced a good, consistent level of performance for each LUN. The XP’s were proving to be more erratic.

I did a lot of investigation, moving resources around (e.g. dedicated front-end ports for the Grid Control LUN’s.) and measuring the affect. The level of I/O associated with Grid Control, with only a couple of hundred servers under management, was substantially lower than that of systems competing for shared resources (e.g. the Data Warehouse, which was exponentially more demanding). It seemed that Grid Control’s LUNs were not able to survive in the SAN cache. The old EMC cx700’s were relatively “dumb” compared to the new XP20000’s, and had effectively dedicated an amount of  cache to each LUN. The more intelligent XP’s would not be so potentially wasteful of resources, and so deemed the (relatively) low level of activity from the Grid Controls LUN’s to be unimportant and aged the blocks out.

We could live with this for Grid Control. Blisteringly fast response time is not a critical factor in the success of Grid Control. The same could not be said for one of our mission critical applications. It requires a very low end-to-end response time for each transaction. We carefully measure each step – client, calculation, render, network latency, database response time, etc. It needs to be fast and consistent. From the database, it really needs to hit cache or SSD’s. Spinning rust (ordinary disk) is simply too slow. However, the level of activity for this system, in terms of MB/s and IOPS, is relatively low compared to all of the other systems competing for SAN resources. About 3% of the whole. Add to this the overhead of Thin Provisioning and we have a problem. When a 2ms average turns into a nearly 4ms average with much higher peaks, as it did for db_file_sequential_read on Grid Control, that would be a major problem for this application.

Solution

Talking to HP, they would only guarantee the response time we demand by using SSD’s and a cost that made my eyes spin and think of buying a Supercar instead. A more practical (i.e. cheaper) solution had to be found, and it was (with the help amd support of HP, just no guarantees 🙂 ). We bought some more conventional disk and configured it into a traditional Raid-10 storage pool. No Thin Provisioning. We then partitioned the 16GB SAN cache into a 4GB cache dedicated to this system, and left the 12GB cache behind for everything else. We were migrating from 4GB, and understood that we had plenty of wriggle room in this configuration.

And the result? After migration, the performance of the mission critical low throughput system improved substantially, with consistent log_file_parallel_write times of less than 2ms and db_file_sequential_read times of less than 1ms, better than we were experiencing on the old reliable EMC’s. I mean, less than 1ms? That’s fast.

I/O Response Times – Graph

So how big is that disk?

I’m doing a fair bit of SAN-based work at the moment, migrating a bunch of Oracle databases from EMC to HP (posts to follow regarding Orion testing and other related topics). One thing that annoys me is the way SAN Manufacturers have changed the meaning of technical terms over time to suit themselves. Firstly, they changed the meaning of RAID from “Redundant Array of Inexpensive Disk” to “Redundant Array of Independent Disk”. Apparently it’s hard to put a $1,000,000 price tag on “inexpensive” disk.

Also, the industry changed what a Megabyte is. This really annoys me as it can lead to unexpected shortfalls in the LUN allocations if your traditional Oracle Megabyte is different to the SAN Admin’s Megabyte.

In our industry, the terms “kilo”, “mega”, “giga“, “tera”, “peta”, and “exa” are commonly used prefixes for computing performance and capacity. SAN manufacturers use the terms defined in “powers of ten” measurement units:

  • A kilobyte (KB) is equal to 1,000 (103) bytes.
  • A megabyte (MB) is equal to 1,000,000 (106) bytes.
  • A gigabyte (GB) is equal to 1,000,000,000 (109) bytes.
  • A terabyte (TB) is equal to 1,000,000,000,000 (1012) bytes.
  • A petabyte (PB) is equal to 1,000,000,000,000,000 (1015) bytes
  • An exabyte (EB) is equal to 1,000,000,000,000,000,000 (1018) bytes

Most operating systems components and (importantly for us) Oracle use “powers of two” measurement units rather than “power of ten” units. They are defined as:

  • A kibibyte (KiB) is equal to 1,024 (210) bytes.
  • A mebibyte (MiB) is equal to 1,048,576 (220) bytes.
  • A gibibyte (GiB) is equal to 1,073,741,824 (230) bytes.
  • A tebibyte (TiB) is equal to 1,099,511,627,776 (240) bytes.
  • A pebibyte (PiB) is equal to 1,125,899,906,842,624 (250) bytes.
  • An exbibyte (EiB) is equal to 1,152,921,504,606,846,976 (260) bytes.

That means that 1 terabyte is about 9% smaller than a tebibyte. Thanks, SAN manufacturers, for making our lives just a tiny bit more difficult than it needed to be so you can market drives as being seemingly larger than they really are (in my head anyway.)