Which Hot Spare will be used for a failed drive? (EMC Clariion / VNX)

Hard Drive

How does an EMC Clariion or VNX decide which Hot Spare will be used for any failed drive?

First of all not the entire failed drive will be rebuilt, but only the LUNs that reside on the failed drive. Furthermore all LUNs on the failed drive will be rebuilt to the same Hot Spare, so a single failed drive will be replaced by a single Hot Spare. So if for example a 600GB drive fails with only 100GB worth of LUNs on it, in theory a 146GB drive could be invoked to rebuild the data. The location of the last LUN block on the failed drive specifies how large the Hot Spare needs to be. If on a 600GB drive the last block of the last LUN sits on “location 350GB”, but the amount of disk space used by all LUNs residing on that drive is 100GB, the 146 and 300GB Hot Spares aren’t valid choices, since the last block address is beyond the 300GB mark (350GB). So valid Hot Spares would be 400GB or larger.

Read more »

Facilitate the conversation: say what you mean and don’t make assumptions

We all work with words every day. Words that can cause confusion if used incorrectly, but words can also make the conversation smoothless … if used correctly.

I’d like to name a few of these possible confusion from my daily experience in the IT Storage business.

 

  • Network versus fileserver

How many of you store their data on the network? The network connects clients to servers (or other clients). The network consists of network devices like switches, routers, bridges, firewalls and the cables to connect all these devices together. I store my data on a file server and the network helps me getting it there.

Read more »

Setting the queue depth or Execution throttle parameters – Clariion / VNX

Each FC storage array port has a maximum queue depth of 2048. For performance reasons we’ll have to do the math with 1600. Suppose a large number of HBAs (initiators) are generating IOs, a specific port queue can fill up to the maximum. The host’s HBA will notice this by getting queue full (QFULL) messages and very poor response times. It depends on the Operating system how this is dealt with. Older OSs could loose access to it’s drives or even freeze or get a blue screen. Modern OSs will throttle IOs down to a minimum to get rid of this inconvenience. VMware ESX for example decreases it’s LUN queue depth down to 1. When the number of queue full messages disappear, ESX will increase the queue depth a bit until it’s back at the configured value. This could take up to around a minute.
During the QFULL events the hosts may experience some timeouts, even if the overall performance of the CLARiiON is OK. The response to a QFULL is HBA dependent, but it typically results in a suspension of activity for more than one second. Though rare, this can have serious consequences on throughput if this happens repeatedly.

Read more »

Accelerating your storage array by using SSD technology

It’s out there since quite a few years already. It started becoming available to the general public about 12 years ago or so and was commonly seen in digital cameras: FLASH storage! At first the devices couldn’t store more than just a few MB and prices were high, but over time the size went up and prices went down and the first SSD drives (should we say “drives”?) were born. Still expensive but they were very usable in the computer industry. Mainly heavily used databases could be accelerated by using SSD because there was no rotational latency and avg access latency was in the sub mili second range instead of multiple mili seconds! The common problem in the last few years was mainly durability, but currently the SSD technology is just as reliable as the old rotating disks.

Read more »