The one thing you must understand for better System Administration – SCSI
The most popular and widely used term in any system administrator’s life is SCSI ( Small Computer System Interface ) , the name says it Small but you have to believe me that this is the one that makes you big as SysAdmin. Let us explore it today, in detail.
SCSI ( Small Compute System Interface) is nothing but an interface that used to connect a host system to an internal or external storage devices. In SCSI protocol ( Oops what is SCSI protocol, don’t worry about it, it is my practice to use the term protocol when ever the discussion is related to the rules of communication ) there are four components .. the initiator, the target, the bus and the terminator
We can assume that ” the initiator and the target” are like ” two persons who are going to communicate over a telephone channel that can be treated as bus”. The initiator is the one which initiates the communication by sending a one byte address of the target device over the bus. Once the target is able to accept the request it makes will mark the bus as busy and controls the communication until the request from initiator completes/fails.
We will discuss about the terminator at the end of the section, because it need some understanding on SCSI IDs.
Whenever there is problem either with initiator or target while performing the SCSI related transaction ( read/ write / seek …etc) , one of the device will send a check condition over the bus, the other device then respond with a sense key code ( if the request can be completed ) or error message ( if the request cannot be completed). And Operating system always log the the failure response to the syslog ( in solaris it is /var/adm/messages file ). SCSI errors that appear in the system log also included the physical device path of the device that is reporting the problem.
How to read syslog ( /var/adm/messages) to assess a SCSI problem
When ever a System Admin opens the System log to understand the SCSI problems, he will be having four objectives in his mind, and they are
- Identification of the life cycle of the problem – that means identifying whether the issue is s one time or recurrent issue.
- Identification of the error level – that means identifying whether the problem is just a warning or fatal error
- Identification of Physical Device – that means identifying the SCSI device path of the device, which reported the error.
- Actual Error Message – to identify the actual problem
Look at the below SCSI error message picked from the /var/log/messages, and identify the information the is required to troubleshoot the issue.
Feb 19 10:30:22 gurkulindia1 unix: ID[SUNWssa.soc.link.6010] soc9: port 0: Fibre Channel is ONLINE
1. The error message appeared only once in the log, that says it is a single time event. And after the logging the error message the status of error situation remain stable.
2. It is a FATAL error, because it made the device completely offline and unavailable. Normally, the error messages which appears with the status like ” retriyable” or kind of warning and high frequency of warnings again treated as FATAL errors.
3. The Actual device failed here is ”soc9: port 0″.
4. The actual problem is the device goes to OFFLINE and unavailable.
Understanding Device Paths in Solaris ( Same rules applies to Identify SCSI device paths)
Solaris uses two types of device paths for accessing disks, and tape drives to perform data reading and writing. The two device path types are the physical, and logical device paths.
1. Physical Device Paths
When a SPARC based system is initially powered on, the Power On Self Test (POST) will run. POST probes the devices and buses required to access the boot device. POST then saves this information for the operating system’s automatic reconfiguration (boot -r). The Open Boot PROM (OBP), builds the rest of the device paths for various other hardware attached to the system. Once POST has completed, the Solaris kernel boots and initializes device drivers based on the physical device paths that have been constructed.
The kernel talks directly to hardware devices using the long physical device paths constructed by POST and the OBP. The kernel accesses these physical device paths through a “shortcut” known as an instance number. An instance number is simply a pointer that references the long device path for each device. The mapping of instance numbers to the long physical device path is stored in the /etc/path_to_inst file.
The following is an example of a disk entry in the /etc/_path_to_inst file:
“/sbus@1f,0/SUNW,fas@e,8800000/sd@0,0″ 0 “sd”
In the above path /sbus@1f,0/SUNW,fas@e,8800000/sd@0,0 represents the physical device path. The 0 after the physical device path is the instance number for the device sd@0,0. The sd, at the end of the instance line, defines the type of device that is associated with the instance, in this case, sd represents a SCSI disk.
2. Logical Device Paths
The Solaris kernel creates the second type of device path known as the logical device path. The logical device path is created for devices that may require system administrator intervention for configuration. In the case of disk drives, there are two types of logical paths for each drive. The first logical path is a pointer to the unformatted partitions for each disk. This unformatted partition pointer is known as the raw device. The second type of logical device path for disk drives points to the disk partitions in a “formatted” state. This formatted state is referred to as the block device. Generally speaking, a formatted state reflects that there is some type of underlying structure that data is built on top of.
All of the native Solaris logical device paths for raw disk partitions are stored in the /dev/rdsk directory. Below is an example of a logical device path for an unformatted disk partition.
c0t0d0s0 -> ../../devices/sbus@1f,0/SUNW,fas@e,8800000/sd@0,0:a,raw%
Notice that the logical device above is actually a link to the physical device path of the device. When the kernel builds its logical device paths, it uses the /devices directory as the root for all the devices.
The logical name for this particular disk is c0t0d0s0. The disk wasfound on the first controller (c0), at target 0 (t0), its logical unit number (disk)is d0, and the slice represented is 0 (s0), thus the first part of the entry c0t0d0s0.
A disk partition and slice are synonymous. The entry above from the /dev/rdsk directory points to the physical device entry for the first raw disk partition which is referenced by the letter a.
For native Solaris file system layouts, the logical paths to formatted partitions are stored in the /dev/dsk directory. Below is an example of the logical path to a formatted disk partition:
c0t0d0s0 -> ../../devices/sbus@1f,0/SUNW,fas@e,8800000/sd@0,0:a
Notice that the only difference between the raw and formatted path is the word `raw’ at the end of the physical device path, and the directory in which the different paths are stored.
The `sd@0,0:a’ references the first partition on disk drive sd@0,0. The first partition is referenced by the `:a’. The letter after the colon indicates which individual partition is referenced from that device path, `a’ references partition 1, `b’ is partition 2, `c’ is partition 3, `d’ is partition 4, `e’ is partition 5, `f’ is partition 6, and `g’ is partition 7.
There are three types SCSI standards 1. SCSI-1 2. SCSI-2 3. SCSI-3
Below table describes the differences among the three standards
|Devices per cable||8||8||16|
|Cable||50pin (centronics)||50pin (narrow)||68pin (wide)|
Note: SCSI further classified into standard, wide and Ultra SCSI depending on the data transfer rate.
Each controller on the SCSI bus (including the host adapter) has an address referred to as its ”controller” ID or ”target” ID. Up to eight controllers may be present on a SCSI-1 bus with IDs from 0 to 7; up to 16 controllers may be present on a 16-bit Wide SCSI-2, Ultra-SCSI (SCSI-III), or Ultra2SCSI bus with IDs from 0 to 15. The host adapter itself is usually assigned ID 7.
Controllers may be placed in any order on the bus, but they must have a unique controller ID. The controller ID is usually set on the SCSI peripheral device using jumpers, DIP switch, or thumb-wheel. Refer to the adapter documentation for specific instructions.
A peripheral device connected to a controller is identified by its “logical unit number” (LUN). Bridge controllers can support up to eight devices with unique LUNs 0 to 7. However, most SCSI devices have a single embedded controller with a LUN of 0. The SCO OpenServer SCSI device drivers have only been tested on SCSI devices with embedded controllers.
You can do an experiment (either physically or mentally) to illustrate why termination is required on a SCSI bus. Hold one end of a piece of rope about six feet long and have someone else hold the other end. Stretch the string so it is reasonably taut, but not tight, and then snap down on one end sharply. You will form a wave that travels down the string. When it reaches the end of the string it will “reflect” off the end and travel back again toward you, and then reflect again. It will go back and forth across the string, decreasing in amplitude each time until it eventually dies out.
Electrical signals travel across wires in much the same way as physical waves travel across a string. When they reach the end of the wire, they will reflect and travel back across the wire. The problem is that if this is allowed to happen, the reflected signals will interfere with the “real” data on the bus and cause signal loss and data corruption. To ensure that this does not happen, each end of the SCSI bus is terminated. Special components are used that make the bus appear electrically as if it is infinite in length. Any signals sent along the bus appear to go to all devices and then disappear, with no reflections.
There are several different kinds of termination used on SCSI buses. They differ in the electrical circuitry that is used to terminate the bus. Better forms of termination make for more reliable SCSI chains; the better the termination, the fewer problems (all else being equal) with the bus, though cost is generally higher as well. In general terms, slower buses are less particular about the kind of termination used, while faster ones have more demanding requirements. In addition, buses using differential signaling (either HVD or LVD) require special termination.
Here are the different types of SCSI termination:
- Passive Termination: This is the oldest, simplest and least reliable type of termination. It uses simple resistors to terminate the bus, similar to the way terminators are used on coaxial Ethernet networks. Passive termination is fine for short, low-speed single-ended SCSI-1 buses but is not suitable for any modern SCSI speeds; it is rarely used today.
- Active Termination: Adding voltage regulators to the resistors used in passive termination allows for more reliable and consistent termination of the bus. Active termination is the minimum required for any of the faster-speed single-ended SCSI buses.
- Forced Perfect Termination (FPT): This is a more advanced form of active termination, where diode clamps are added to the circuitry to force the termination to the correct voltage. This virtually eliminates any signal reflections or other problems and provides for the best form of termination of a single-ended SCSI bus.
- High Voltage Differential (HVD): Buses using high voltage differential signaling require the use of special HVD terminators.
- Low Voltage Differential (LVD): Newer buses using low voltage differential signaling also require their own special type of terminators. In addition, there are special LVD/SE terminators designed for use with multimode LVD devices that can function in either LVD or SE modes; when the bus is running single-ended these behave like active terminators.
Failure to terminate a bus correctly can result in various error conditions such as signal reflection, parity errors and the weird Fatal Error messages. We always terminate scsi bus at the END of the BUS, so the terminator is attached to the bus after the last device we want to access.
That means that you place the terminator on the other side of device 0 from the hba, or the other side of device 15 from the hba.
Good Examples for SCSI Termination:
Example 1 : terminator ID 0…6 hba (ID 7) <– Good Addressing
Example 2 : hba (ID 7) ID 8…15 terminator <– Good Addressing
Bad Examples of SCSI Termincation:
ID 0 ID 2 terminator ID 4 ID 6 hba (ID 7) <– then you will be unable to see device ID 0 or ID 2 because the bus is terminated before the signal can reach those devices.
How to Set SCSI Initiator IDs in Solaris
ok show-devs (choose the relevant device)
ok cd /pci@1f,4000/SUNW,isptwo@3
ok setenv scsi-initiator-id 6
ok printenv scsi-initiator-id
scsi-initiator-id = 6
Using EPROM for X86
# eeprom “scsi-initiator-id=7″
Using a driver.con file
For the specific bus’ driver.conf (eg, glm.conf for a glm interface) include the single line
This will only become effective in your host once the driver is attached by the kernel.