• Learning Map
  • Unix Magazine
  • Unix Professional Network
  • Just-Unix-No-Noise FB Group

unixadminschool.com

  • Home
  • Announcements
    • Feed
    • MISC
  • Beginners zone
    • Beginners Lessons
    • Career Guidance
  • Experts Zone
    • Cloud Computing
    • Configuration Solutions
    • Hadoop & Bigdata
    • Migrations
    • Network Design
    • Scripting
    • Server Security
    • SUN CLUSTERS
    • SUN LDOMS
    • Tools & Applications
    • Veritas Cluster Services ( VCS ) Learning
  • Intermediate Zone
    • Linux Learning
      • Linux Booting
      • Linux Disk Management
      • Linux LVM
      • Linux Networking
      • Linux Performance
      • Linux Troubleshooting
      • Linux YUM/RPM
      • Performance Analysis
      • Redhat Linux Kernel
      • RHEL 6
        • RHEL LDAP
        • Rhel6 Storage
      • Web Servers
    • Solaris Admin
      • Blog for Unix Admin
        • Storage Administration – SAN
      • Oracle Hardware
      • Reference Docs
      • Solaris 10 Zones & LDOMs
      • Solaris 11
      • Solaris Access Control
      • Solaris Best Practices
      • Solaris Booting
      • Solaris Disk Management
      • Solaris DNS
      • Solaris How-to
      • Solaris Installation
      • Solaris Kernel
      • Solaris Networking
      • Solaris NFS
      • Solaris NIS
      • Solaris Packages & Patching
      • Solaris Performance
      • Solaris Tips
      • Solaris Troubleshooting
      • Solaris User Authentication
      • solaris X86
      • Solaris ZFS and Boot Environment
      • Storage Configurations
      • SUN Hardware
      • Troubleshooting Flow charts
    • Veritas Admin
      • Veritas Netbackup
      • VxVM Learning
      • VxVM Troubleshooting
  • QUIZ Center
  • Storage
    • EMC Administration
    • Netapp Administration

Subscribe

Investigate a SCSI disk error from Linux

Collect the following information when a platform reports SCSI disk errors:

The following files:
/var/log/messages*

The output from the following commands:
/sbin/fdisk -l
/bin/cat /proc/scsi/scsi
/bin/dmesg



Once you have collected the data, review the output for any similarities in the errors. If the error always reports the same target, then consider replacing the device itself.
If the errors change to different targets on the same bus, then further troubleshooting is required. In the following references, the id is the target number of the device.

Here are examples of the output you will see when the scsi disks have errors from Red Hat Linux.

Notice in this example, we can see that the channel is 0, id is 1 and lun is 0. Each line of the error refers to the same id. We can also see that the disk is getting errors on different sectors. In this case, id 1 is target 1. This would indicate that target 1 is getting the errors and is the suspect fru.

/var/log/messages

Dec 20 10:33:23 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:23 localhost kernel: I/O error: dev 08:03, sector 0
Dec 20 10:33:23 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:23 localhost kernel: I/O error: dev 08:03, sector 13631520
Dec 20 10:33:23 localhost kernel: EXT3-fs error (device sd(8,3)): ext3_get_inode_loc: unable to read inode block – inode=852064, block=1703940
Dec 20 10:33:23 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:23 localhost kernel: I/O error: dev 08:03, sector 0
Dec 20 10:33:23 localhost kernel: EXT3-fs error (device sd(8,3)) in ext3_reserve_inode_write: IO failure
Dec 20 10:33:24 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:24 localhost kernel: I/O error: dev 08:03, sector 0
Dec 20 10:33:24 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:24 localhost kernel: I/O error: dev 08:03, sector 13631520
Dec 20 10:33:24 localhost kernel: EXT3-fs error (device sd(8,3)): ext3_get_inode_loc: unable to read inode block – inode=852064, block=1703940
Dec 20 10:33:24 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:24 localhost kernel: I/O error: dev 08:03, sector 0
Dec 20 10:33:24 localhost kernel: EXT3-fs error (device sd(8,3)) in ext3_reserve_inode_write: IO failure
Dec 20 10:33:24 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 27010000
Dec 20 10:33:24 localhost kernel: I/O error: dev 08:03, sector 0

Notice in the following example We see that the errors also report the same channel 0, id 1, and lun 0 each time. We can also note scsi bus timeouts along with scsi check condition messages for the same device. In this case, target 1 is the suspect fru.

Another example of /var/log/messages

Sep 21 23:35:41 localhost kernel: klogd 1.4.1, log source = /proc/kmsg started.
Sep 21 23:35:41 localhost kernel: Inspecting /boot/System.map-2.4.18-17.7.x.4smp
Sep 21 23:35:41 localhost kernel: Loaded 17857 symbols from /boot/System.map-2.4.18-17.7.x.4smp.
Sep 21 23:35:41 localhost kernel: Symbols match kernel version 2.4.18.
Sep 21 23:35:41 localhost kernel: Loaded 256 symbols from 11 modules.
Sep 21 23:35:41 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0
return code = 27010000
Sep 21 23:35:41 localhost kernel: I/O error: dev 08:17, sector 66453508
Sep 21 23:35:41 localhost kernel: SCSI disk error : host 0 channel 0 id 1 lun 0
return code = 27010000
:
:
Sep 21 23:35:49 localhost kernel: scsi :”’ aborting command due to timeout : pid
43891492, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 4b ae 5b 00 00 02 00”’
Sep 21 23:35:49 localhost kernel: mptscsih: OldAbort scheduling ABORT SCSI IO
(sc=c2db7200)
Sep 21 23:35:49 localhost kernel: IOs outstanding = 5
Sep 21 23:35:49 localhost kernel: scsi : aborting command due to timeout : pid
43891493, scsi0, channel 0, id 1, lun 0 Write (10) 00 00 43 2e 5d 00 00 02 00
:
:
Sep 21 23:35:49 localhost kernel: mptscsih: ioc0: Issue of TaskMgmt Successful!
Sep 21 23:35:49 localhost kernel: SCSI host 0 abort (pid 43891492) timed out – resetting
Sep 21 23:35:49 localhost kernel: SCSI bus is being reset for host 0 channel 0.
Sep 21 23:35:50 localhost kernel: mptscsih: OldReset scheduling BUS_RESET (sc=c2db7200)
Sep 21 23:35:50 localhost kernel: IOs outstanding = 6
Sep 21 23:35:50 localhost kernel: SCSI host 0 abort (pid 43891493) timed out – resetting
:
:
Sep 21 23:35:51 localhost kernel: SCSI host 0 reset (pid 43891492) timed out again -
Sep 21 23:35:51 localhost kernel: probably an unrecoverable SCSI bus or device hang.
Sep 21 23:35:51 localhost kernel: SCSI host 0 reset (pid 43891493) timed out again -
Sep 21 23:35:51 localhost kernel: SCSI Error Report =-=-= (0:0:0)
Sep 21 23:35:51 localhost kernel: SCSI_Status=02h (CHECK CONDITION)
Sep 21 23:35:51 localhost kernel: Original_CDB[]: 28 00 02 B1 4E 62 00 00 04 00
Sep 21 23:35:51 localhost kernel: SenseData[12h]: 70 00 06 00 00 00 00 0A 00 00 00 00 29 02 02 00 00 00
Sep 21 23:35:51 localhost kernel: SenseKey=6h (UNIT ATTENTION); FRU=02h
Sep 21 23:35:51 localhost kernel: ASC/ASCQ=29h/02h “SCSI BUS RESET OCCURRED”
Sep 21 23:35:51 localhost kernel: SCSI Error Report =-=-= (0:1:0)
Sep 21 23:35:51 localhost kernel: SCSI_Status=02h (CHECK CONDITION)
Sep 21 23:35:51 localhost kernel: Original_CDB[]: 2A 00 00 45 EE 5F 00 00 02 00
Sep 21 23:35:51 localhost kernel: SenseData[12h]: 70 00 06 00 00 00 00 0A 00 00 00 00 29 02 02 00 00 00
Sep 21 23:35:51 localhost kernel: SenseKey=6h (UNIT ATTENTION); FRU=02h
Sep 21 23:35:51 localhost kernel: ASC/ASCQ=29h/02h “SCSI BUS RESET OCCURRED”
Sep 21 23:35:51 localhost kernel: md3: no spare disk to reconstruct array! — continuing in degraded mode
Sep 21 23:35:51 localhost kernel: md: updating md2 RAID superblock on device
Sep 21 23:35:52 localhost kernel: md: (skipping faulty sdb5 )
Sep 21 23:35:52 localhost kernel: md: sda5 [events: 00000012]<6>(write) sda5′s sb offset: 4192832
Sep 21 23:35:52 localhost kernel: raid1: sda7: redirecting sector 30736424 to another mirror

In the following example, we see write errors on channel 0, id 0, lun 0 along with a medium error indicating an issue with the device itself. In this example, target 0 is the suspect part.

/var/log/messages

scsi0: ERROR on channel 0, id 0, lun 0, CDB: Write (10) 00 06 1d 3a 0d 00 00 08 00
Info fld=0x61d3a0d, Deferred sd08:02: sense key Medium Error
Additional sense indicates Write error
I/O error: dev 08:02, sector 102498376
SCSI Error: (0:0:0) Status=02h (CHECK CONDITION)
Key=3h (MEDIUM ERROR); FRU=0Ch
ASC/ASCQ=0Ch/02h “”
CDB: 2A 00 07 F9 3A 2D 00 00 08 00

scsi0: ERROR on channel 0, id 0, lun 0, CDB: Write (10) 00 07 f9 3a 2d 00 00 08 00
Info fld=0x8153a0d, Deferred sd08:02: sense key Medium Error
Additional sense indicates Write error – auto reallocation failed I/O error: dev 08:02, sector 133693544

Here is an example of the output you would see from cat /proc/scsi/scsi. This will give you a model number of the drive which could help with determining a part number and disk ID. Output will deviate depending on platform model and configuration:

Attached devices:
cat /proc/scsi/scsi

Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: SEAGATE Model: ST373307LC Rev: 0007
Type: Direct-Access ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: SEAGATE Model: ST373307LC Rev: 0007
Type: Direct-Access

Here’s an example of what you might see with dmesg. In the example shown, we can see that there is an issue with channel 0, id 1 and lun 0. This example needs more data to determine the failure. However, one may suspect that the disk needs to be replaced as it shows an i/o error 2 times on the same disk.

Sector 3228343
scsi0 (1:0): rejecting I/O to offline device
RAID1 conf printout:
— wd:1 rd:2
disk 0, wo:0, o:1, dev:sda1
disk 1, wo:1, o:0, dev:sdb1
RAID1 conf printout:
— wd:1 rd:2
disk 0, wo:0, o:1, dev:sda1
scsi0 (1:0): rejecting I/O to offline device
md: write_disk_sb failed for device sdb2
md: errors occurred during superblock update, repeating scsi0 (1:0): rejecting I/O to offline device

 

You might be interested to read below :


  • Redhat Enterprise Linux – Network Bonding – Quick Reference ( RHEL5 / RHEL6)

  • Hands on Lab – Replacing Failed Disks from ZFS Pools ( RaidZ2 / RaidZ3 ) – Part2

  • Solaris Network Performance Tuning : know about TCP window size

  • Volume manager Migration from LVM to VxVM in Linux – Part 1

  • Solaris Troubleshooting NIS: add a user for NIS maps when passwd file is not in the /etc directory

  • ILOM ( Integrated Lights out Manager) Command line Reference
  • Email
  • More
  • Print
  • Digg
Posted by Ramdev
Comment it
Tagged with: [ disk errors, linux, scsi, SCSI disk errors ]
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Comment

Join to our Professional Network (of 1400+ unixadmins ) to receive Unix Administration and Job Updates -

Pages1

Don't Miss Updates

 

Beginners Zone

 

Unixadmin Careers

Server Hardware

Beginners Lessons

Troubleshooting-Flowchart

 

Intermediate Zone

 

Solaris Booting

Solaris Volume Manager

Storage Configurations

Solaris Networking

Solaris X86

Solaris ZFS

Solaris NFS

Solaris NIS

Solaris Patching

Solaris Booting

Solaris Kernel

Veritas Volume Manager

Solaris NIS

Logical Volume Manager

Linux Networking

Linux Disk Management

Linux Troubleshooting

 

Experts Zone 

 

Solutions

Scripting and Automation

Server Security

Veritas Cluster Services

Sun Cluster Services

Cloud Computing

SUN LDOMS

Copyright © 2009 unixadminschool.com. All rights reserved.
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.