SAN Migration and moving files with ASM

Here’s a quick post to help with LUN, SAN or File migration when using Oracle ASM.

Please note that this is just an example method, and is not a definitive guide.
You need to ensure your approach is appropriate for your environment.

Did you know that you can add and drop disks on a diskgroup in a single command.
This saves Oracle from doing 2 REBAL actions and speeds things up considerably using a lot fewer resources.

sqlplus / as sysasm;

2   ADD  DISK 'ORCL:disk101','ORCL:disk102' 
3   DROP DISK 'disk042'     ,'disk043' 

Diskgroup altered.

You can monitor the REBAL operation:

select * from v$asm_operation;

------------ ----- --------- ---- ----- ------ ----- -------- -------- ----------- ---------- ------
           6 REBAL REBALANCE  RUN     1      1    68    20397     3030           8                 0
           6 REBAL COMPAT    WAIT     1      1     0        0        0           0                 0

If you are just migrating from one SAN to another, without needing to move files between diskgroups, this is a really easy way to achieve it.

From Oracle 12.1 onwards. if you are moving Datafiles from one Diskgroup to a new Diskgroup, that’s straightforward too:

sqlplus  / as sysdba

alter database move datafile '+DATA_DG/ORCL/DATAFILE/system.101.902468275' to '+NEW_DG';

Moving logfiles involves a drop and re-create, but it still fully online.
Take care if using DataGuard that you do not try to drop logfiles before they are archived and applied to the standby:

alter database  add logfile member '+NEW_DG' to group 1;
alter database drop logfile member '+DATA_DG/ORCL/ONLINELOG/group_1.102.902049284';

Moving tempfiles can be tricky. The default gets used really quickly after startup.

You need to add a tempfile and then get all processes to not be using the old tempfile before you can drop it [select * from gv$sort_usage]. Worst-case, this may involve a SHUTDOWN, then a STARTUP RESTRICT to stop processes connecting. Alternatively, create a new default TEMP tablespace and convince processes to use that one.

alter tablespace temp add  tempfile '+NEW_DG';
alter tablespace temp drop tempfile '+DATA_DG/ORCL/TEMPFILE/temp.204.992482742';

Moving controlfiles requires a stop and start of the database. There’s no on-line options here. I tend to do the following:

srvctl stop database -d ORCL

sqlplus / as sysdba
startup nomount
show parameter control_files

rman target /
restore controlfile to '+NEW_DG' from '+DATA_DG/ORCL/CONTROLFILE/current.291.939274434';

sqlplus / as sysdba
alter system set control_files='+NEW_DG/ORCL/CONTROLFILE/current.992.346677889'
                               scope=spfile sid='*';

srvctl start database -d ORCL

To move your spfile in ASM is a 2-stage process which will automatically update your database config in grid control:

sqlplus / as sysdba
create  pfile='/tmp/ORCL.ora' from spfile;
create spfile='+NEW_DG'       from pfile='/tmp/ORCL.ora';

Moving a password file in ASM is straightforward too:
NOTE: you are not allowed to move OMF files names and you must copy files via their alias.

as grid:
cp +DATA_DG/ORCL/orapworcl +NEW_DG/ORCL/orapworclexit
as oracle:
srvctl modify database -d ORCL -pwfile +NEW_DG/ORCL/orapworclsrvctl config database -d ORCL | grep Password

If you are using Block Change Tracking, you may need to move that file too:

alter database disable block change tracking;
alter database enable block change tracking using file '+NEW_DG';

Finally, if you’re dropping disk groups don’t forget to modify the disk group dependencies in Grid Infra so your DB isn’t dependent upon groups you have now removed:

srvctl config datbase -d ORCL | grep Group
srvctl modify database -d ORCL -diskgroup 'NEW_DG,FRA_DG,other_DG'

I hope you found this quick guide useful.

SAN Migration: When modern SANs “Fail”

… or at least, when they don’t behave like you were expecting them to.

I recently performed a major migration of a dozen missions critical Oracle and SQL Server systems from a pair of old EMC CX700’s to 2 brand new shiny HP XP 20000’s. This blog post is intended to show, at a very high level, what steps were taken to ensure the new SAN could cope with the volume of traffic. It also shows that, despite the new SAN being much more modern and intelligent, how you can still get caught out by unexpected behaviour.

The databases to be migrated varied from high throughput OLTP, low throughput – fast response time OLTP, a small Data Warehouse and Grid Control. The systems are 24 x 7 x 365, but a nightly maintenance window can be arranged on some days of the week between midnight and 3am. Not a large window, but invaluable.

What are we migrating?

For a successful SAN migration, we first need to understand what we are migrating. To do this we need to profile the I/O requirement of each Oracle database. Oracle provides us with all of the tools we need to be able to do this. If you have the Tuning and Diagnostics Pack licensed (which all Oracle sites should have – it should really be part of base product.) then you can pull the information you need directly out of the AWR table DBA_HIST_SYSTEM_EVENT. By default your AWR stats gather every hour and only last a week. Hourly averages of system event wait times will give a good starting point, but I would not want to use them to determine peak volumes. If you capacity plan to your averages, you will run out of resources before you hit your peaks. A lot can happen in an hour when you are looking at millisecond response times. You need to be much more granular than an hourly average, but you might not want change AWR to gather all of its information as frequently as you need for the I/O events. You need to develop some code to gather the information from v$system_event directly, so I did [I will post this code up shortly]. You can be as granular as you want, but I balanced peaks against volume of information and the impact of gathering it and was running every 5 minutes. Mostly. I also took some much more frequent samples to ensure that I wasn’t wildly wrong with the 5 minute averages, and that I wasn’t experiencing very short peaky loads. This method also has the advantage of not needing any additional pack licensing from Oracle. Sorry Larry.

What are the resources available on the new SAN?

The second requirement for a SAN migration is to understand the capabilities of your new SAN. There were already some systems on the new SAN so I was unable to test to destruction. However, some dedicated resources were available on the shared SAN, namely dedicated disks, cache and front-end ports. I needed to throw substantial workloads at the SAN to stress the resources and find the tipping point.  There are many workload tools available, such as Hammerora and RAT. Some are expensive, such as RAT. However, for my purposes I was only looking to stress the SAN and get a comfortable feeling about the IOPS and throughput performance. This was achieved using Oracle’s free SAN stress tool, Orion [I will hopefully post more details about this shortly]. By taking all of the profiled I/O rates, adding them up and comparing against the new SAN and it’s much greater cache capacity (16GB dedicated to my systems, up from 4GB on the EMC’s), we were able to see that the footprint of the databases to be migrated fitted comfortably into the capabilities of the new SAN. And so the migrations began.

Thin Provisioning

To provide resilience for the databases, LUNs were presented from each SAN to each cluster node, and were mirrored at the Operating System level, either using native mirroring on the Unix platforms, or Veritas on the Windows platforms. This allows either SAN (or SAN location) to fail with no loss of service. The LUNs were either RAID-10 or RAID-5, and were carved out of the storage pools using Thin Provisioning.

Thin Provisioning (or using Meta-Luns or similar techniques from your SAN provider) is a way to allow LUN’s to be easily carved out of storage pools and extended at a later date. They are wonderful for SAN administrators and people who have not capacity planned their systems due to their dynamic nature and minimization of space wastage. However, they work on the basis of providing a bit of storage to your LUN from every disk in the array. This gives a maximum concurrency, maximum contention model. Every LUN is on the same disk competing for the same resource. Lots of IOPS, but everyone wants them. The Thin Provisioning also adds a small overhead to the I/O processing. Not much. Not even a whole milli-second.

Migration Problem

The migrations progressed well, from least critical to most critical system. As each system was migrated, we kept monitoring and measuring the I/O response times. Migrating the Data Warehouse showed that the new SAN was performing as expected. Processing times were lower. Not much, but in line with expectation and allocated resources. However, Grid Control was exhibiting some strange response times. Despite theoretically having more resources, some I/O response time metrics were worse on average, and much more variable than before. With the EMC SAN’s, we  had experienced a good, consistent level of performance for each LUN. The XP’s were proving to be more erratic.

I did a lot of investigation, moving resources around (e.g. dedicated front-end ports for the Grid Control LUN’s.) and measuring the affect. The level of I/O associated with Grid Control, with only a couple of hundred servers under management, was substantially lower than that of systems competing for shared resources (e.g. the Data Warehouse, which was exponentially more demanding). It seemed that Grid Control’s LUNs were not able to survive in the SAN cache. The old EMC cx700’s were relatively “dumb” compared to the new XP20000’s, and had effectively dedicated an amount of  cache to each LUN. The more intelligent XP’s would not be so potentially wasteful of resources, and so deemed the (relatively) low level of activity from the Grid Controls LUN’s to be unimportant and aged the blocks out.

We could live with this for Grid Control. Blisteringly fast response time is not a critical factor in the success of Grid Control. The same could not be said for one of our mission critical applications. It requires a very low end-to-end response time for each transaction. We carefully measure each step – client, calculation, render, network latency, database response time, etc. It needs to be fast and consistent. From the database, it really needs to hit cache or SSD’s. Spinning rust (ordinary disk) is simply too slow. However, the level of activity for this system, in terms of MB/s and IOPS, is relatively low compared to all of the other systems competing for SAN resources. About 3% of the whole. Add to this the overhead of Thin Provisioning and we have a problem. When a 2ms average turns into a nearly 4ms average with much higher peaks, as it did for db_file_sequential_read on Grid Control, that would be a major problem for this application.


Talking to HP, they would only guarantee the response time we demand by using SSD’s and a cost that made my eyes spin and think of buying a Supercar instead. A more practical (i.e. cheaper) solution had to be found, and it was (with the help amd support of HP, just no guarantees 🙂 ). We bought some more conventional disk and configured it into a traditional Raid-10 storage pool. No Thin Provisioning. We then partitioned the 16GB SAN cache into a 4GB cache dedicated to this system, and left the 12GB cache behind for everything else. We were migrating from 4GB, and understood that we had plenty of wriggle room in this configuration.

And the result? After migration, the performance of the mission critical low throughput system improved substantially, with consistent log_file_parallel_write times of less than 2ms and db_file_sequential_read times of less than 1ms, better than we were experiencing on the old reliable EMC’s. I mean, less than 1ms? That’s fast.

I/O Response Times – Graph

So how big is that disk?

I’m doing a fair bit of SAN-based work at the moment, migrating a bunch of Oracle databases from EMC to HP (posts to follow regarding Orion testing and other related topics). One thing that annoys me is the way SAN Manufacturers have changed the meaning of technical terms over time to suit themselves. Firstly, they changed the meaning of RAID from “Redundant Array of Inexpensive Disk” to “Redundant Array of Independent Disk”. Apparently it’s hard to put a $1,000,000 price tag on “inexpensive” disk.

Also, the industry changed what a Megabyte is. This really annoys me as it can lead to unexpected shortfalls in the LUN allocations if your traditional Oracle Megabyte is different to the SAN Admin’s Megabyte.

In our industry, the terms “kilo”, “mega”, “giga“, “tera”, “peta”, and “exa” are commonly used prefixes for computing performance and capacity. SAN manufacturers use the terms defined in “powers of ten” measurement units:

  • A kilobyte (KB) is equal to 1,000 (103) bytes.
  • A megabyte (MB) is equal to 1,000,000 (106) bytes.
  • A gigabyte (GB) is equal to 1,000,000,000 (109) bytes.
  • A terabyte (TB) is equal to 1,000,000,000,000 (1012) bytes.
  • A petabyte (PB) is equal to 1,000,000,000,000,000 (1015) bytes
  • An exabyte (EB) is equal to 1,000,000,000,000,000,000 (1018) bytes

Most operating systems components and (importantly for us) Oracle use “powers of two” measurement units rather than “power of ten” units. They are defined as:

  • A kibibyte (KiB) is equal to 1,024 (210) bytes.
  • A mebibyte (MiB) is equal to 1,048,576 (220) bytes.
  • A gibibyte (GiB) is equal to 1,073,741,824 (230) bytes.
  • A tebibyte (TiB) is equal to 1,099,511,627,776 (240) bytes.
  • A pebibyte (PiB) is equal to 1,125,899,906,842,624 (250) bytes.
  • An exbibyte (EiB) is equal to 1,152,921,504,606,846,976 (260) bytes.

That means that 1 terabyte is about 9% smaller than a tebibyte. Thanks, SAN manufacturers, for making our lives just a tiny bit more difficult than it needed to be so you can market drives as being seemingly larger than they really are (in my head anyway.)

%d bloggers like this: