Redhat Linux : Collect System Diagnosis report for the Support Call purpose
Red Hat Enterprise Linux 4.5 and previous
On a default installation the package Sysreport should be already installed. If not you need to install the package “sysreport-.rpm” with the following command
# rpm -ivh sysreport-.rpm
or, if your system is registered at the Red Hat Network “RHN”, simply running
# up2date -i sysreport
This will install the latest version of Sysreport on your system.
To collect the information you need to start troubleshooting just enter the command
and follow the instructions on screen. At the end you get a filename and the location where to find the compressed information collected by this script. Please keep this data for further support.
Please note that Sysreport will need some time to collect all the data, depending on the speed of the system and how many packages are installed.
In cases you experience that Sysreport seems to hang and will not return after a while, you may pass the parameter “-norpm” to the command. This will skip the checking of the RPM database which may be broken.
Red Hat Enterprise Linux 4.6 and later
The “sosreport” command is a tool that collects information about a Red Hat Enterprise Linux system. To run sosreport, the “sos” package must be installed. The package should be installed by default, but if the package is not installed, follow the steps below:
Installation on Red Hat Enterprise Linux 4.6 and later
If the system is registered with Red Hat Network (RHN), “sos” can be installed using the up2date command:
# up2date sos
Installation on Red Hat Enterprise Linux 5 and later
If the system is registered with RHN, use the yum command:
# yum install sos
If the system is not registered with RHN, the “sos” package can be downloaded from the RHN website or found on the installation CDs. The RPM command can be used to install the package on any version of Red Hat Enterprise Linux:
# rpm -Uvh sos-..rpm
To collect the system information to start troubleshooting just enter the command and follow the instructions
The sosreport will run for several minutes, according to the system, the running time maybe more longer. Once completed, sosreport will generate a compressed a bz2 file under /tmp. Normally, the size of the bz2 file will be about 3MB.
The sosreport has some plugins which can be turn on and off, the following command lists the plugins:
# sosreport -l
If Sosreport seems to hang and will not return after a while, you may pass the parameter “-k rpm.rpmva=off” to the command. This will skip to verify on all packages.
# sosreport -k rpm.rpmva=off
Even though Sysreport and Sosreport collects most of the needed data for analysis, it is suggested that the content of the directory “/var/log/” is provided, to get all relevant data (such as older message files, service related log files, mcelogs etc).
You might tar this data with the following command:
# tar czvf logfiles.tar.gz /var/log
Be cautious, one of my real experience :
As per the Redhat these tools are safe to run on production system at any time, but I had experienced a problem when I ran sosreport on a production machine which had failed power supply fan.
The actually Scenario is :
One fine morning, a linux server ( part of three node VCS cluster hosting a critical application ) was configured on HP hardware, had thrown an hardware alert . As per the ILO logs , the machine had Power supply FAN issues. To raise a RedHat support call we need sosreport output, then we started the command during the production time, which created more cpu and disk activity on the machine, which inturn raised the temperature in the machine ( this is because the cooling fan already failed). And the over temperature in the CPU caused the server to stop responding from external connections but still left the server pingable in network.
As per the VCS setup if one of the node crashes then the other node should automatically pick the applications and continue to operate, and in this case since the system was just hung ( didn’t respondig to external connectioned) but still pinging, the VCS couln’t take any quick decision to failover the application to the running nodes which inturn caused all the customer connections to fail. And to recover the machine, we had to halt the troubled node forcefully and manually failed over all the services manually to the working node.
This whole process took 20 mins, and later we had to deal with many customer escalations with a question
why the diagnosis ran during the production hours”. And after that sysadmins were instructed to take business team permission to run any diagnosis on production server during production times.