Solaris SVM : Disk replacement for Systems With Internal FCAL Drives Under SVM (V280R, V480, V490, V880, V890):
Beginning with Solaris 9 Operating System (OS), Solaris Volume Manager (SVM) software uses a new feature called Device-ID (or DevID), which identifies each disk not only by it’s c#t#d# name, but also by a unique ID generated by the disk’s World Wide Number (WWN), or serial number. The SVM software relies on the Solaris OS to supply it with each disk’s correct DevID.
To replace a disk, use the luxadm command to remove it and insert the new disk. This procedure causes an update of the Solaris OS device framework, so that the new disk’s DevID is inserted and the old disk’s DevID is removed.
PROCEDURE FOR REPLACING MIRRORED DISKS
The following set of commands should work in all cases. Follow the exact sequence to ensure a smooth operation.
To replace a disk, which is controlled by SVM, and is part of a mirror, perform the following steps:
1. Run "metadetach" to detach all the submirrors on the failing disk from their respective mirrors (see the following):
# metadetach -f <mirror> <submirror>
NOTE: The "-f" option is not required if the metadevice is in an "okay" state.
2. Run metaclear to remove the <submirror> configuration from the disk:
# metaclear <submirror>
You can verify there are no existing metadevices left on the disk, by running the following:
# metastat -p | grep c#t#d#
3. If there are any replicas on this disk, note the number of replicas, and remove them using the following:
# metadb -i (number of replicas to be noted). # metadb -d c#t#d#s#
Verify that there are no existing replicas left on the disk by running the following:
# metadb | grep c#t#d#
4. If there are any open filesystems on this disk not under SVM control, or non-mirrored metadevices, unmount them.
5. Run "format" or "prtvtoc/fmthard" to save the disk partition table information.
# prtvtoc /dev/rdsk/c#t#d#s2 > file
6. Run the 'luxadm' command to remove the failed disk.
#luxadm remove_device -F /dev/rdsk/c#t#d#s2 At the prompt, physically remove the disk and continue. The picld daemon notifies the system that the disk has been removed.
7. Initiate devfsadm cleanup subroutines by entering the following command:
# /usr/sbin/devfsadm -C -c disk
The default devfsadm operation is, to attempt to load every driver in the system, and attach these drivers to all possible device instances. The devfsadm command then creates device special files, in the /devices directory, and logical links in /dev.
With the "-c disk" option, devfsadm will only update disk device files. This saves time, and is important on systems that have tape devices attached. Rebuilding these tape devices could cause undesirable results on non-Sun hardware.
The -C option cleans up the /dev directory, and removes any lingering logical links to the device link names.
This should remove all the device paths for this particular disk. This can be verified with:
# ls -ld /dev/dsk/cxtxd*
This should return no devices.
8. It is now safe to physically replace the disk. Insert a new disk, and configure it. Create the necessary entries in the Solaris OS device tree, with one of the following commands
# devfsadmor# /usr/sbin/luxadm insert_device <enclosure_name,sx>
where sx is the slot number
or
# /usr/sbin/luxadm insert_device (if enclosure name is not known)
Note: In many cases, luxadm insert_device does not require the enclosure name and slot number. Use the following to find the slot number:
# luxadm display <enclosure_name>
To find the <enclosure_name> use:
# luxadm probe
Run "ls -ld /dev/dsk/c1t1d*" to verify that the new device paths have been created.
CAUTION: After inserting a new disk and running devfsadm (or luxadm), the old ssd instance number changes to a new ssd instance number. This change is expected, so ignore it.
For Example: When the error occurs on the following disk, whose ssd instance is given by ssd3:
WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19920,0 (ssd3): Error for Command: read(10) Error Level: Retryable Requested Block: 15392944 Error Block: 15392958
After inserting a new disk, the ssd instance changes to ssd10 as shown below. It is not a cause of concern as this is expected.
picld[287]: [ID 727222 daemon.error] Device DISK0 inserted qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(2): Loop ONLINE scsi: [ID 799468 kern.info] ssd10 at fp2: name w21000011c63f0c94,0, bus address ef genunix: [ID 936769 kern.info] ssd10 is /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0 scsi: [ID 365881 kern.info] <SUN72G cyl 14087 alt 2 hd 24 sec 424> genunix: [ID 408114 kern.info] /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c 63f0c94,0 (ssd10) online
9. Run "format" or "prtvtoc/fmthard" to put the desired partition table on the new disk.
# fmthard -s file /dev/rdsk/c#t#d#s2
['file' is the prtvtoc saved in step 5]
10. Use "metainit" and "metattach" to create and attach those submirrors to the mirrors to start the resync:
# metainit <submirror> 1 1 c#t#d#s# # metattach <mirror> <submirror>
11. If necessary, re-create the same number of replicas that existed previously, using the -c option of the metadb(1M) command:
# metadb -a -c# c#t#d#s#
12. Be sure to correct the EEPROM entry for the boot-device (only if one of the root disks has been replaced).
PROCEDURE FOR REPLACING A DISK IN A RAID-5 VOLUME
Note: If a disk is used in BOTH a mirror and a RAID5, do not use the following procedure; instead, follow the instructions for the MIRRORED devices (above). This is because the RAID5 array just healed, is treated as a single disk for mirroring purposes.
To replace an SVM-controlled disk, which is part of a RAID5 metadevice, the following steps must be followed:
1. If there are any open filesystems on this disk not under SVM control,or non-mirrored metadevices, unmount them.
2. If there are any replicas on this disk, remove them using:
# metadb -d c#t#d#s#
Verify there are no existing replicas left on the disk by running:
# metadb | grep c#t#d#
3. Run "format" or "prtvtoc/fmthard" to save the disk partition table information.
# prtvtoc /dev/rdsk/c#t#d#s2 > file
4. Run the 'luxadm' command to remove the failed disk.
# luxadm remove_device -F /dev/rdsk/c#t#d#s2
At the prompt, physically remove the disk and continue. The picld daemon notifies the system that the disk has been removed.
5. Initiate devfsadm cleanup subroutines by entering the following command:
# /usr/sbin/devfsadm -C -c disk
The default devfsadm operation, is to attempt to load every driver in the system, and attach these drivers to all possible device instances. The devfsadm command then creates device special files in the /devices directory, and logical links in /dev.
With the "-c disk" option, devfsadm will only update disk device files. This saves time and is important on systems that have tape devices attached. Rebuilding these tape devices could cause undesirable results on non-Sun hardware.
The -C option cleans up the /dev directory, and removes any lingering logical links to the device link names.
This should remove all the device paths for this particular disk. This can be verified with:
# ls -ld /dev/dsk/cxtxd*
This should return no devices.
6. It is now safe to physically replace the disk. Insert a new disk, and configure it. Create the necessary entries in the Solaris OS device tree, with one of the following commands:
# devfsadmor# /usr/sbin/luxadm insert_device <enclosure_name,sx> where sx is the slot number
or
# /usr/sbin/luxadm insert_device (if enclosure name is not known)
Note: In many cases, luxadm insert_device does not require the enclosure name and slot number. Use the following to find the slot number:
# luxadm display <enclosure_name>
To find the <enclosure_name> you can use:
# luxadm probe
Run "ls -ld /dev/dsk/c1t1d*" to verify that the new device paths have been created.
CAUTION: After inserting a new disk and running devfsadm(or luxadm), the old ssd instance number changes to a new ssd instance number. This change is expected, so ignore it.
For Example:
When the error occurs on the following disks(ssd3).
WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cfa19920,0 (ssd3): Error for Command: read(10) Error Level: Retryable Requested Block: 15392944 Error Block: 15392958
After inserting a new disk, the ssd instance changes to ssd10 as shown below. It is not a cause of concern as this is expected.
picld[287]: [ID 727222 daemon.error] Device DISK0 inserted qlc: [ID 686697 kern.info] NOTICE: Qlogic qlc(2): Loop ONLINE scsi: [ID 799468 kern.info] ssd10 at fp2: name w21000011c63f0c94,0, bus address ef genunix: [ID 936769 kern.info] ssd10 is /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0 scsi: [ID 365881 kern.info] <SUN72G cyl 14087 alt 2 hd 24 sec 424> genunix: [ID 408114 kern.info] /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000011c63f0c94,0 (ssd10) online
7. Run 'format' or 'prtvtoc' to put the desired partition table on the new disk
# fmthard -s file /dev/rdsk/c#t#d#s2
['file' is the prtvtoc saved in step 3]
8. Run 'metadevadm' on the disk, which will update the New DevID.
# metadevadm -u c#t#d#
Note: Due to BugID 4808079 a disk can show up as “unavailable” in the metastat command, after running Step 8. To resolve this, run “metastat -i”.
After running this command, the device should show a metastat status of “Okay”.The fix for this bug has been delivered and integrated in s9u4_08,s9u5_02 and s10_35.
9. If necessary, recreate any replicas on the new disk:
# metadb -a c#t#d#s#
10. Run metareplace to enable and resync the new disk:
# metareplace -e <raid5-md> c#t#d#s#







7 Comments on “Solaris SVM : Disk replacement for Systems With Internal FCAL Drives Under SVM (V280R, V480, V490, V880, V890):”
I could not think you are more right..
@Yoga, thanks but reality cant be changed….
Thanks Yogesh/Ram,
I’ve a below concern,Could you please advise mee…
Hi All,
Can anyone help me how to extend SAN filesystem on solaris critical box which is under SVM and 1) No powerpath is installed on it.
2) I don’t see C3 controller to which storage is connected
3) OS is solaris 9 and its sun fire 480R
4) I see persistent binding in sd.conf
5) storage team allocated LUN with ID#A06 (which is 2566 in decimal and I believe if I add this in sd.conf file it wont detect as its greater than 255)
Basic Info..
Here is the FS which needs to extend by 20gb
#df -h
/dev/md/dsk/d1 46G 37G 8.4G 82% /install2
###Disks in d1 ###
# metastat d1
c3t0d32
c3t0d33
c3t0d75
###No powerpath ##
# powermt display dev=all
powermt: not found
#
### inq output ###
#/emc_migration/bin_SUN/inq
/dev/rdsk/c3t0d32s2 :EMC :SYMMETRIX :5874 :77D20008 :4419840
/dev/rdsk/c3t0d33s2 :EMC :SYMMETRIX :5874 :77D21008 :4419840
—–
### format output #####
# echo | format
Searching for disks…done
0. c1t0d0
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000005ag9c4862,0
1. c1t1d0
/pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000005ag9c48ac,0
2. c3t0d28
### sd.conf output##
I SEE PERSISTENT BINDING.
output from sd.conf
name=”sd” parent=”lpfc” target=0 lun=32 hba=”lpfc0″;
name=”sd” parent=”lpfc” target=0 lun=32 hba=”lpfc1″;
name=”sd” parent=”lpfc” target=0 lun=32 hba=”lpfc2″;
#### luxadm probe
Found Fibre Channel device(s):
Node WWN:20000005ag9c48ac Device Type:Disk device
Logical Path:/dev/rdsk/c1t1d0s2
Node WWN:20000005ag9c4862 Device Type:Disk device
Logical Path:/dev/rdsk/c1t0d0s2
##### cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
#### luxadm -e port
Found path to 1 HBA ports
/devices/pci@9,600000/SUNW,qlc@2/fp@0,0:devctl CONNECTED
#
Thanks,
Raj
Raju, i have replied you at http://gurkulindia.com/unixforum/viewtopic.php?f=3&t=37#p67
Thanks. I found this very helpful replacing a drive in a V880 running solaris 8 with a fairly complicated raid 0+1 setup that had been grown onto some 3310 LUNS. SAVE A COPY of /etc/lvm/md.cf before you start if you are not sure how to re-create your submirror. I couldn’t replace the failed disk slice in my submirror, and had to metaclear the submirror, rebuild it with metainit, and reattach it.
Another hurdle you may hit – luxadm remove_device didn’t want to play nice. I eventually got it to go by physically removing and reinserting the failed disk after I broke the mirror.
@mark , thanks for sharing the information.