HPWorld 98 & ERP 98 Proceedings

"My Disk Just Crashed! How to Prepare for that Day"

Scott Heddinger

Hewlett Packard Co. 20 Perimeter Summit Blvd Atlanta, GA 30319 Phone: 404-648-3940 Fax:404-648-160 E-mail: seh1@atl.hp.com


My disk drive has failed, now what! A good question, one that we get in the Hewlett Packard Response Center from time to time. Unix operating systems, whether they be Hewlett Packard, IBM, Digital, Sun, etc all have disk drives. Since disk drives are mechanical pieces of hardware there is the chance that one of those disk drives will fail. How we handle that event depends a great deal on the amount of information we have about that disk and the system it is attached to before the failure occurs. This paper will assist you in getting the information before the disk crashes, making you much better prepared if the situation ever occurs.

There are many types of disks and disk drive configurations out there, here are a few of them;

The types of filesystem choices we have for the different types of disks are:

Logical Volume Manager or Lvm and Lvm commands will be the main topic to be covered. At 9.x on the 700 workstations Lvm was not available, it became available for the workstations at 10.x. The 800’s were able to use Lvm for disk management beginning with 9.x. There are several other Unix commands that will be covered in this paper, some of which you will recognize and others that you may not. Ioscan, bdf, lssf, swapinfo and others are commands that will be used to gather information about the system and when put together with Lvm commands such as, vgdisplay, lvdisplay, pvdisplay and lvlnboot will give us all the information we need to be prepared for a disk crash. Other Lvm commands such as vgcfgrestore, vgcfgbackup, lvsynch and vgsynch will be covered as wel.

There are a couple of important files that we will be looking at as well, /etc/checklist, for 9.x system or /etc/fstab for 10.x systems. The files /etc/Lvmtab and /dev/<volume group name>/<volume group name>.conf are two Lvm files that will get our attention also.

We will start with some basic information and then move on to the volume groups from there.

First lets find out the version of the operating system and what type of system it is, the version will help us later on to determine the device files for the disk drives.

Lets start with the basics:

Boot definitions for Volume Group /dv/vg00:
Physical Volumes belonging in Root Volume Group:

/dev/dsk/c0t9d0 (8/4.9.0) – Boot Disk
/dev/dsk/c0t5d0 (8/4.5.0)

Boot: lvol1 on: /dev/dsk/c0t9d0
Root: lvol3 on: /dev/dsk/c0t9d0
Swap: lvol2 on: /dev/dsk/c0t9d0
Dump: lvol2 on: /dev/dsk/c0t9d0, 0

Lets move on to putting the names of the volume groups and the device files for the disks in each volume group together. This information is maintained on the system in a file called /etc/Lvmtab. Lvmtab is a data file and requires the strings command to read the data that it contains. A strings on the Lvmtab file will give us the information we need to start gathering information about the volume groups and the disks that are in them.

Strings /etc/Lvmtab

/dev/vg00
/dev/dsk/c0t9d0
/dev/dsk/c0t5d0
/dev/vg01
/dev/dsk/c0t10d0
/dev/vg03
/dev/dsk/c0t8d0
/dev/vg02
/dev/dsk/c0t11d0

Looking at the above output, we see that there are four volume groups on the system and five disks currently being used by Lvm. Vgoo currently has two disks in it and VG01, VG02 and VG03 each have one disk each. Once we know the names of the volume groups on the system we can gather info about them. The information about the disks includes the number of disks in the volume group, whether or not any logical volumes in the volume group are mirrored or not and whether all the disks in the volume group are active or not. Whether or not all the disks in the volume group are active can be a key to determining which volume group has a bad disk in it.

We will start with the volume group itself, gather information about it and then move on the to logical volumes within the volume group. The command vgdisplay is used to gather information about each volume group, the number of drives that are in the volume group, what the physical extent size is, how much disk space is there in the volume group and how much is free. For the vgdisplay command and all the subsequent Lvm commands the –v or verbose switch will give more information, most times the information is helpful but there are times when it can be repetitive and unnecessary, since we will be getting the same information using different commands. Which method is best to use is entirely up to you.

Vgdisplay –v /dev/vg00

--- Volume groups ---
VG Name
VG Write Access
VG Status
Max LV
Cur LV
Open LV
Max PV
Cur PV
Act PV
Max PE per PV
VGDA
PE Size (Mbytes)
Total PE
Alloc PE
Free PE
Total PVG
/dev/vg00
read/write
available
255
10
10
16
2
2
2000
4
4
1015
816
199
0

 

--- Logical volumes ---
LV Name
LV Status 
LV Size (Mbytes)
Current LE
Allocated PE
Used PV 
LV Name
LV Status 
LV Size (Mbytes) 
Current LE 
Allocated PE 
Used PV
*
*
*
/dev/vg00/lvol1
available/syncd
96
24
24
1
/dev/vg00/lvol10
available/syncd
20
5
5
1


 

--- Physical volumes ---

PV Name
PV Status
Total PE
Free PE
PV Name
PV Status 
Total PE
Free PE
/dev/dsk/c0t9d0
available
508
0
/dev/dsk/c0t5d0
available
508
199

The output from the vgdisplay command has given us the number of drives in each volume group, information about the logical volumes in the volume group (I have trimmed the output down to save space) and how much space is left on each of the disks in the volume group. As you can see the disk /dev/dsk/c0t9d0 has 508 total pe’s (physical extents), using the pe size of 4 mb and multiplying it by 508 we can determine that this disk is a 2 gb disk and that there is no disk space left to use, free pe is equal to 0. The second disk in the volume group is also a 2 gb drive and has 199 free pe’s, or 4 * 199 = 796 mb free to use. Another piece of important information is contained in the lines, Cur PV and Act PV. PV stands for physical volume, or disk, and in this volume group there are 2 disks. Cur stands for current and Act stands for active. In this volume group both disks are fine since both current and active are equal. In the event of a failed disk within a volume group the active number would be one less than current. Vgdisplay gives us a good bit of very useful information about the disks in the volume group; current disks, active disks as well as some informatin about the logical volumes within the volume group. However in order to get all the information we need about the logical volumes we will use the command lvdisplay.

Lvdisplay is used to get information about the status of the logical volume. The name of the volume group It belongs too, the number of mirrored copies, its status, is it available and is it current, or synched. Its size in megabytes, is striping being used, the size of the physical extents (PE) and what its allocation is. Using the –v option will show what disks the logical volume actually resides on. Is the logical volume only on one disk or does it spread across multple disks? Useful information that will be needed if the failed disk has this logical volume on it.

lvdisplay –v /dev/vg00/lvol3

--- Logical volumes ---
LV Name
VG Name
LV Permission 
LV Status
Mirror copies
Consistency Recovery
Schedule 
LV Size (Mbytes) 
Current LE
Allocated PE
Stripes
Stripe Size (Kbytes)
Bad block
Allocation
IO Timeout (Seconds)
/dev/vg00/lvol3
/dev/vg00
read/write
available/syncd
0
MWC
parallel
800
200
200
0
0
off
strict/contiguous
default
 

--- Distribution of logical volume ---

PV Name
/dev/dsk/c0t9d0 
LE on PV     PE on PV
200              200
 

--- Logical extents ---

LE     PV1 
0000 /dev/dsk/c0t9d0 
0001 /dev/dsk/c0t9d0 
0002 /dev/dsk/c0t9d0 
0003 /dev/dsk/c0t9d0 
*
*
*
0198 /dev/dsk/c0t9d0 
0199 /dev/dsk/c0t9d0
PE1       Status 1
0152     current
0153     current
0154     current
0155     current



0350     current
0351     current

 

Now that we know more about the volume groups; what disks are in each volume group and have detailed information about each logical volume in the volume group lets gather information about each disk in the volume group using the Pvdisplay command. Some of the information shown by pvdisplay appears in the vgdisplay and lvdisplay commands but the information from pvdisplay is linked directly to the disk. In the case of a logical volume that spans multiple disks, we have the information from the lvdisplay –v command on the logical volume. The only problem is that we have to look through the entire listing to determine which disk or disks the logical volume resides on or spans. If the logical volume is ten gb in size, and remember at HP-UX Version 10.20 we can have a logical volume that is 128 gb in size, it is possible that we could miss some information from the lvdisplay command. By using the pvdisplay on each disk we minimize the amount of information we need to concern ourselves with. If this drive has failed the pvdisplay output will show us which logical volume or volumes are affected.

Pvdisplay –v /dev/dsk/c0t9d0

--- Physical volumes ---
PV Name
VG Name
PV Status
Allocatable
VGDA
Cur LV
PE Size (Mbytes)
Total PE
Free PE
Allocated PE
Stale PE
IO Timeout (Seconds)
/dev/dsk/c0t9d0
/dev/vg00
available
yes
2
6
4
507
0
507
0
default
 

--- Distribution of physical volume ---

LV Name 
/dev/vg00/lvol1
/dev/vg00/lvol2
/dev/vg00/lvol3
/dev/vg00/lvol4
/dev/vg00/lvol5
/dev/vg00/lvol6
LE of LV
24
128
200
125
8
22
PE for LV
24
128
200
125
8
22
--- Physical extents ---
PE Status
0000 current
0001 current 
*
*
*
0505 current
0506 current 
LV
/dev/vg00/lvol1
/dev/vg00/lvol1
 


/dev/vg00/lvol6
/dev/vg00/lvol6
LE
0000
0001



0020
0021

Knowing this information about what logical volume is on which disk will go a long way to making recovery from a failed disk much easier. There is however a file that is absolutely essential to making the recovery as easy as possible. The file contains the Lvm structure for each disk in the volume group. The file is located in /etc/Lvmconf and takes on the form of <volume group name>.conf. For example if the name of the volume group is vg03 then the configuration file would be /etc/Lvmconf/vg03.conf. This file is created each time a change is made to the volume group, a disk is added or removed, a logical volume is extended or reduced. At 9.x if the changes to the volume group were done via sam this file was automatically updated or created, sam called vgcfgbackup. If the the changes were done from the command line at 9.x vgcfgbackup was NOT run automatically. This meant that the file was not created or updated. A little word of caution is appropriate here, be careful about running vgcfbackup as a cron job. You may end up with an lvmconf file that is not current or valid. This caused problems in the event of a disk crash, the configuration file did not exist and therefore we could not easily fix the problem. At 10.x whether you modify the volume group from the command line or from sam, vgcfgbackup is run automatically. This is an important thing to keep in mind if you are doing your Lvm work from the command line at 9.x. When you have finished your work to the volume group run vgcfbackup /dev/<volume group name>. We can check for the existence of these configuration files by using the ll command.

Ll /etc/Lvmconf/*

-rw------- 1 root 158720 Apr 4 13:40 /etc/Lvmconf/vg00.conf
-rw------- 1 root 83968 Apr 2 12:32 /etc/Lvmcong/vg01.conf

We now have the information about the volume groups, the logical volumes and the disks on the system. Next we need to find out where the disks are located on the system. Are they internal, external, what card are they connected to, etc. This is one of the few times you can use Unix commands to give you a real picture of where things are on the system. To accomplish this we use the command ioscan. Ioscan can be used two ways, the first ioscan –fn shows everything on the system, muxes, lan cards, memory, cpus, scsi cards etc. It’s a lot of information but it gives us information not only about the disks but about the controller cards that they are attached to and where they would be located. A shorter output of just the disk drives can be obtained by using ioscan –fnC disk. This will show us only the disks on the system, nothing else.

ioscan –fn

Class       I  H/W Path    Driver      S/W State H/W Type  Description
===================================================
bc          0              root        CLAIMED   BUS_NEXUS 
bc          1  8           ccio        CLAIMED   BUS_NEXUS I/O Adapter
bc          2  8/0         bc          CLAIMED   BUS_NEXUS Bus Converter
tty         0  8/0/0       mux2        CLAIMED   INTERFACE MUX
                          /dev/diag/mux0     /dev/mux0        
                          /dev/diag/tty0p0   /dev/tty0p0      
                          /dev/diag/tty0p1   /dev/tty0p1      
                                               *
                          /dev/diag/tty0p9   /dev/tty0p9      
ext_bus     0  8/4         c720        CLAIMED   INTERFACE GSC add-on Fast/Wide SCSI Interface
target      0  8/4.3       tgt         CLAIMED   DEVICE    
tape        0  8/4.3.0     stape       CLAIMED   DEVICE    Quantum DLT4000
                          /dev/rmt/0m                 /dev/rmt/c0t3d0BESTn      
                          /dev/rmt/0mb                /dev/rmt/c0t3d0BESTnb     
                          /dev/rmt/0mn                /dev/rmt/c0t3d0DDS1C      
                          /dev/rmt/0mnb               /dev/rmt/c0t3d0DLT62500_64
                          /dev/rmt/c0t3d0BEST         /dev/rmt/c0t3d0DLT81633_64
                          /dev/rmt/c0t3d0BESTb        /dev/rmt/mrc              
target      1  8/4.4       tgt         CLAIMED   DEVICE    
tape        1  8/4.4.0     stape       CLAIMED   DEVICE    Quantum DLT4000
                          /dev/rmt/1m            /dev/rmt/c0t4d0BEST  
                          /dev/rmt/1mb           /dev/rmt/c0t4d0BESTb 
                          /dev/rmt/1mn           /dev/rmt/c0t4d0BESTn 
                          /dev/rmt/1mnb          /dev/rmt/c0t4d0BESTnb
target      2  8/4.5       tgt         CLAIMED   DEVICE    
disk        0  8/4.5.0     sdisk       CLAIMED   DEVICE    SEAGATE ST32550W
                          /dev/dsk/c0t5d0   /dev/rdsk/c0t5d0
target      3  8/4.6       tgt         CLAIMED   DEVICE    
autoch      0  8/4.6.0     schgr       CLAIMED   DEVICE    HP      C1194F
                          /dev/rac/c0t6d0  /dev/rac/tara1 
target      4  8/4.7       tgt         CLAIMED   DEVICE    
ctl         0  8/4.7.0     sctl        CLAIMED   DEVICE    Initiator
                          /dev/rscsi/c0t7d0
target      5  8/4.8       tgt         CLAIMED   DEVICE    
disk        1  8/4.8.0     sdisk       CLAIMED   DEVICE    SEAGATE ST32171W
                          /dev/dsk/c0t8d0   /dev/rdsk/c0t8d0
target      6  8/4.9       tgt         CLAIMED   DEVICE    
disk        2  8/4.9.0     sdisk       CLAIMED   DEVICE    SEAGATE ST32550W
                          /dev/dsk/c0t9d0   /dev/rdsk/c0t9d0

Here is the output from ioscan –fnC disk, only showing the disks on the system, nothing else.

ioscan –fnC disk

target      5  8/4.8       tgt         CLAIMED   DEVICE    
disk        1  8/4.8.0     sdisk       CLAIMED   DEVICE    SEAGATE ST32171W
                          /dev/dsk/c0t8d0   /dev/rdsk/c0t8d0
target      6  8/4.9       tgt         CLAIMED   DEVICE    
disk        2  8/4.9.0     sdisk       CLAIMED   DEVICE    SEAGATE ST32550W
                          /dev/dsk/c0t9d0   /dev/rdsk/c0t9d0
target      7  8/4.10      tgt         CLAIMED   DEVICE    
disk        3  8/4.10.0    sdisk       CLAIMED   DEVICE    SEAGATE ST32171W
                          /dev/dsk/c0t10d0   /dev/rdsk/c0t10d0
target      8  8/4.11      tgt         CLAIMED   DEVICE    
disk        4  8/4.11.0    sdisk       CLAIMED   DEVICE    SEAGATE ST34572WC
                          /dev/dsk/c0t11d0   /dev/rdsk/c0t11d0
ba          0  8/16        bus_adapter CLAIMED   BUS_NEXUS Core I/O Adapter
ext_bus     2  8/16/0      CentIf      CLAIMED   INTERFACE Built-in Parallel Interface
                          /dev/c2t0d0_lp
pc          0  8/16/1      fdc         CLAIMED   INTERFACE Built-in Floppy Drive
floppy      0  8/16/1.1    pflop       CLAIMED   DEVICE    HP_PC_FDC_FLOPPY
                          /dev/floppy/c0t1d0   /dev/rfloppy/c0t1d0
target     10  8/16/5.2    tgt         CLAIMED   DEVICE    
disk        5  8/16/5.2.0  sdisk       CLAIMED   DEVICE    TOSHIBA CD-ROM XM-5401TA

With the information about the volume groups, the logical volumes, the physical disks themselves and where everything attaches to now in our possession it is time to gather more specific information about the disks, the filesytems and swap. The first command we will use is lssf, or list special files.

Lssf, list special files, is useful to determine the scsi addresses of the disks on the system. Here is where the version of the operating system can be of use. If the system is a 9.x 800 server the device files for the disks will be /dev/dsk/cNdNsN, where s2 is the entire disk. For 10.x systems, both 700’s and 800’s the device files are /dev/dsk/cNtNdN, information we can use to gather all the scsi addresses at one time.

The command lssf /dev/dsk/c0t10d0 gives us the scsi address of a single disk; sdisk card instance 0 SCSI target 10 SCSI LUN 0 section 0 at address 8/4.10.0 /dev/dsk/c0t10d0

If we put the command into a loop we can get the addresses of all the disks on the system:
Lssf /dev/dsk/c*d* would give us the following output:

sdisk card instance 0 SCSI target 10 SCSI LUN 0 section 0 at address 8/4.10.0 /dev/dsk/c0t10d0
*
sdisk card instance 1 SCSI target 2 SCSI LUN 0 section 0 at address 8/16/5.2.0 /dev/dsk/c1t2d0

We know where they are but what type of disks are they, who is the vendor and how big are they? To answer this question we use the command diskinfo –v on the raw device file;

Diskinfo –v /dev/rdsk/c0t8d0

SCSI describe of /dev/rdsk/c0t8d0:
vendor: SEAGATE
product id: ST32171W
type: direct access
size: 2082636 Kbytes
bytes per sector: 512
rev level: HPC1
blocks per disk: 4165272
ISO version: 0
ECMA version: 0
ANSI version: 2
removable media: no
response format: 2
(Additional inquiry bytes: (32)45 (33)32 (34)36 (35)34……..

To get all the disks at once on a 10.x system the command would be:


Diskinfo –v /dev/rdsk/c*d*

 

What are the mount points for each of the logical volumes on the system and what percentage of the logical volume is being used? To find out this information we look at the output from bdf and the contents of the file /etc/fstab, for 10.x and /etc/checklist for 9.x systems.

Bdf

Filesystem          kbytes    used   avail %used Mounted on
/dev/vg00/lvol3     800811  296554  424175   41% /
/dev/vg00/lvol1      95701   17471   68659   20% /stand
/dev/vg00/lvol7     654048  388645  199998   66% /var
/dev/vg00/lvol6     504547  252168  201924   56% /usr
/dev/vg01/lvol1     698133  598297   30022   95% /users/oraomni
/dev/vg00/lvol5      30597   12213   15324   44% /tmp
/dev/vg02/ked        16384    1109   14324    7% /testke
/dev/vg00/lvol4     498645  416376   32404   93% /opt
/dev/vg00/lvol8      39829     598   35248    2% /home
/dev/vg02/lvol7      99669      10   89692    0% /George
/dev/vg02/lvol1    2097152 1503558  556554   73% /depots/ignite_archives

cat /etc/fstab

 
/dev/vg00/lvol3 / hfs defaults 0 1
/dev/vg00/lvol1 /stand hfs defaults 0 1
/dev/vg00/lvol4 /opt hfs defaults 0 2
/dev/vg00/lvol5 /tmp hfs defaults 0 2
/dev/vg00/lvol6 /usr hfs defaults 0 2
/dev/vg00/lvol7 /var hfs defaults 0 2
/dev/vg00/lvol8 /home hfs defaults 0 2
/dev/vg01/lvol1 /users/oraomni hfs defaults 0 2
/dev/vg02/lvol1 /depots/ignite_archives vxfs rw,suid,delaylog 0 2
/dev/vg02/ked /testke vxfs rw,suid,delaylog,datainlog 0 2
/dev/vg02/lvol7 /George hfs defaults 0 2

We will also need to find out Which logical volumes are being used for swap. The swapinfo command with the –tam switch will give us that information;

Swapinfo -tam

                   Mb          Mb         Mb       PCT      START/        Mb
TYPE      AVAIL    USED    FREE  USED   LIMIT RESERVE  PRI  NAME
dev         512       0     512    0%       0       -    1  /dev/vg00/lvol2
dev           4       0       4    0%       0       -    1  /dev/vg01/lvol5
reserve       -     143    -143
memory      483      64     419   13%
total       999     207     792   21%       -       0    -

Now that we know where each disk is located we can face a disk crash with more confidence. We know from the information we have gathered what type of disk is located at which scsi address. We know which volume group a disk belongs too. We can tell which logical volumes reside on what disk and the mount points for each of the logical volumes, the filesystem type and we know how big each logical volume is. Armed with this information we are more than prepared to replace a failed disk drive. There is one little thing I forgot to mention, you did remember to do a backup before the disk crashed, didn’t you?

 

Steps for replacing a drive using the four possible scenarios, Non-root disk with no mirroring, non-root disk with mirroring, root disk with no mirroring and a root disk using mirroring. I had originally intended to include the steps for each of the previous scenarios in this paper. But as I started writing it and realized how much information was going to be in the paper, and how much more in depth I could have gone, I decided not to include the steps. I apologize for the ‘bait and switch’ but the documents will be available at the lecture and can also be obtained by calling the Hewlett Packard Response Center.

The steps above are just suggestions on how to get information about your system, remember its Unix there is always more than one way to do something. Take the commands experiment with them, determine which option or switch gives you the amount of information that is right for your system. No matter how you decide to gather the data, gather it! Gathering the information is the key, with it you can follow the steps without a problem. Having the information on hand will make recovering from a crashed disk drive less difficult and stressful than it needs to be. And it will give you a great deal of useful information about your system that can be useful in other ways as well.

Author | Title | Tracks | Home


Send email to Interex or to theWebmaster
©Copyright 1998 Interex. All rights reserved.