How to replace a failed disk in 3par storage?

It is simple to replaced a failed hdd in the 3par storage systems most of the time however we need to ensure that right procedure and guidelines are being followed.

Here are we’ll discuss the the procedure to replace a failed hdd in the 3par storage systems. First of all we need to find about if there is a hdd failure.

Service Processor

First of all you many receive an email notification by the IRS server about failed hdd if the Service processor is connected to the 3par. It can also send an alert saying for example pd 30 has failed. As soon as we receive this type of alert we need validate if that is a genuine alert about a failed hdd and not a false alert.

You can log in to Service Processor or Store Serve Management Console (SSMC) and check the alerts status. If the alert has been resolved by itself then you may not need to do anything. You may see the message like below, for example:

 Magazine 3:7:0, Physical Disk 95 Failed (Replace Drive {0x46}, Vacated {0x45}, Errors on A Port {0x87}, Errors on B Port {0x8b}, Invalid Media {0x98}, Smart Threshold Exceeded {0x9a}, No Valid Ports {0xa1})

Note: Magazine 3:7:0 means – Cage 3, Magazine 7, Hdd 0

A pd (hdd) can be reported as degraded as well and we need to verify why the pd shows as degraded.

However failed hdd can be check by running below commands:

>showpd -failed -degraded

95 3:7:0 FC 10 failed 838656 0 1:0:2 0:0:2* 900

You can also command > servicemag status 3 23 to find out the status of the failed hdd.

Servicemag is process which runs to ensure there is no hdd failure. If there is any hdd failure due to bad blocks or some bad chunklets, then a PD would be marked as failed by the Inform OS of the 3par storage.

Once a hdd is marked as failed then the Inform OS informs all the 3par controllers that a particular pd has failed and not good for data storage anymore. So no chunklets should be drawn from the failed hdd to write anymore data.

Servicemag commands:

servicemag start [options] <cage_ID> <magazine>

servicemag start [options] -pdid <PD_ID_0>…<PD_ID_3>

servicemag resume |unmark [options] <cage_ID> <magazine>

servicemag status [options] [<cage_ID> <magazine>]

servicemag clearstatus <cage_ID> <magazine>

Subcommands

Start – This command informs the 3par system manager to relocate all the data chunklets from this magazine / hdd to other disk location so that this drive magazine can be removed to replace a faulty hdd.

Resume – It informs that 3par system manager that a hdd has been replaced and data can start moving to this replaced hdd.

Unmark – This command stops the servicemag process and resets the internal status of the pd. If servicemag resume command is not working while trying to run servicemag on a newly replaced hdd or servicemag start while trying to fail a hdd manually or vacate the magazine, then servicemag unmark command can be run.

Clearstatus – Clears the log shown by the servicemag command status for the given cage and magazine.

Once the hdd failure is confirmed then failed hdd can be replaced. Now we need to ensure that compatible hdd is replaced as 3par Inform OS. If the compatible part is not replaced then hdd would not work and it would not be detected by the 3par as well.

Also check: What is AWS Elastic Compute Cloud (EC2)?

………………………………





Leave Comment