• Learning Map
  • Unix Quiz Center
  • Unix Professional Network
  • Just-Unix-No-Noise FB Group

unixadminschool.com

  • Home
  • Announcements
    • Feed
    • MISC
  • Beginners zone
    • Beginners Lessons
    • Career Guidance
  • Experts Zone
    • Cloud Computing
    • Configuration Solutions
    • Migrations
    • Network Design
    • Scripting
    • Server Security
    • SUN CLUSTERS
    • SUN LDOMS
    • Tools & Applications
    • Veritas Cluster Services ( VCS ) Learning
  • Intermediate Zone
    • Linux Learning
      • Linux Booting
      • Linux Disk Management
      • Linux LVM
      • Linux Networking
      • Linux Performance
      • Linux Troubleshooting
      • Linux YUM/RPM
      • Performance Analysis
      • Redhat Linux Kernel
      • RHEL 6
        • RHEL LDAP
        • Rhel6 Storage
      • Web Servers
    • Solaris Admin
      • Blog for Unix Admin
        • Storage Administration – SAN
      • Oracle Hardware
      • Reference Docs
      • Solaris 10 Zones & LDOMs
      • Solaris 11
      • Solaris Access Control
      • Solaris Best Practices
      • Solaris Booting
      • Solaris Disk Management
      • Solaris DNS
      • Solaris How-to
      • Solaris Installation
      • Solaris Kernel
      • Solaris Networking
      • Solaris NFS
      • Solaris NIS
      • Solaris Packages & Patching
      • Solaris Performance
      • Solaris Tips
      • Solaris Troubleshooting
      • Solaris User Authentication
      • solaris X86
      • Solaris ZFS and Boot Environment
      • Storage Configurations
      • SUN Hardware
      • Troubleshooting Flow charts
    • Veritas Admin
      • Veritas Netbackup
      • VxVM Learning
      • VxVM Troubleshooting
  • QUIZ Center
  • Vlabs

Subscribe

Solaris IPMP – Diagnosis and Troubleshooting

Symptoms:

*  mpathd error messages in /var/adm/messages:
“No test address configured on interface <interface_name> disabling probe-based failure detection on it”
“Test address address is not unique; disabling probe based failure detection on <interface_name>”
“The link has gone down on <interface_name>”
“Successfully failed over from  NIC  <interface_name1> to NIC <interface_name2>
“NIC repair detected on <interface_name>”
“Successfully failed back to NIC <interface_name>”
“The link has come up on <interface_name>”

*  interfaces configured for IPMP missing an UP and/or RUNNING flag in the ifconfig -a output
*  interfaces configured for IPMP showing as FAILED in ifconfig -a output

Diagnosis and Troubleshooting

Please validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.

STEP 1: Check and validate the IPMP configuration.

For Solaris 10, link-based:             Check Configuration

For Solaris 8, 9 and 10:                   Check Configuration

Ensure eeprom is configured to issue unique MAC addresses to all system interfaces.

STEP 2. Check the status of the the interfaces in the IPMP group.

The “ifconfig -a” output for the interfaces in the IPMP group MUST indicate “UP” *AND* “RUNNING”.

If “UP” is missing from the output:

# ifconfig <interface in group> up

If “RUNNING” is missing:

Check the physical link between the interface and the switchport for faulty/disconnected cabling and/or faulty/uninitialized switch port. Eliminate any misconfigurations affecting communication by ensuring that auto-negotiation is enabled on the Sun interface (the default setting) and on the switch side (consult the switch documentation):

(use ndd for older devices, like hme):

# ndd -get /dev/<interface> adv_autoneg_cap

(use kstat for most devices):

# kstat -p |grep e1000g:0 |grep auto

(use dladm for GLDv3 devices like nxge, e1000g, bge):

# dladm show-dev

The proper setting for “adv_autoneg_cap” is 1, meaning that the Sun interface is advertising it’s autonegotiation capability to the link partner (switch).

If “adv_autoneg_cap” is set to “0″, correct with ndd for an immediate change:

Note:  ce and hme device requires the instance to be set before any commands. Other devices identify the instance in the /dev/ argument e.g. to retrieve information on the first instance of bge: ndd -get /dev/bge0 adv_autoneg_cap.

# ndd -set /dev/ce instance (device instance)

to check:

# ndd -get /dev/ce adv_autoneg_cap

# ndd -set /dev/ce instance 0
# ndd -get /dev/ce adv_autoneg_cap

1

if the setting  shows “1″ after running the ndd command, but the link is not restored:

-ensure the switchport is set to autonegotiate.
-disconnect and reconnect the cable from the interface to the switch to allow the link partners to re-negotiate.

Use OBP “watch-net-all” to test Sun interfaces on SPARC hardware:
If you need further assistance to verify your network or switch connections, please consult your local network administrator.

STEP 3.  Determine if the default router is properly answering ICMP probes.

If Solaris 8 or 9 or Solaris 10 probe-based (to determine, there must be an interface marked as “-failover” in the ifconfig -a output):

# pkill -USR1 mpathd

# tail -20 /var/adm/messages

Mar 5 15:06:23 solarishost27 in.mpathd[6338]: [ID 942985 daemon.error] Missed sending total of 0 probes spread over 0 occurrences
Mar 5 15:06:23 solarishost27 in.mpathd[6338]: [ID 373034 daemon.error]
Mar 5 15:06:23 solarishost27 Probe stats on (inet aggr6)
Mar 5 15:06:23 solarishost27 Number of probes sent 419987
Mar 5 15:06:23 solarishost27 Number of probe acks received 419987
Mar 5 15:06:23 solarishost27 Number of probes/acks lost 0  <<———-
Mar 5 15:06:23 solarishost27 Number of valid unacknowledged probes 0
Mar 5 15:06:23 solarishost27 Number of ambiguous probe acks received 0
Mar 5 15:06:23 solarishost27 Probe stats on (inet aggr1)
Mar 5 15:06:23 solarishost27 Number of probes sent 419923
Mar 5 15:06:23 solarishost27 Number of probe acks received 123490
Mar 5 15:06:23 solarishost27 Number of probes/acks lost 296324
Mar 5 15:06:23 solarishost27 Number of valid unacknowledged probes 0
Mar 5 15:06:23 solarishost27 Number of ambiguous probe acks received 0

The pkill command can be repeated for ongoing checks or when troubleshooting link failover/failback situations.

If configuration link-based (i.e. no interface marked as “-failover” in the “ifconfig -a” output)   skip to step #6.

STEP 4. Are systems on the subnet able to respond to all-hosts multicast?

For Solaris, use netstat and check for the interfaces’ membership in 224.0.0.1 OR ALL-SYSTEMS.MCAST.NET:

solarishost#             netstat -g|grep ALL-SYSTEMS.MCAST.NET
lo0 ALL-SYSTEMS.MCAST.NET 1
hme0 ALL-SYSTEMS.MCAST.NET 1

solarishost#             netstat -gn|grep 224.0.0.1
lo0 224.0.0.1 1
hme0 224.0.0.1 1

If the netstat -gn outputs show interfaces that cannot respond to ALL-SYSTEMS multicast, the configuration MUST
be setup using “host routes”.

STEP 5. Is Veritas “Multi-NIC” in use along with IPMP?

To determine:

# ps -ef|grep -i multi
# grep -i LLT /var/adm/messages
# grep -i GAB /var/adm/messages

Identify and clear any errors for LLT and/or GAB.

Consult Symantec for information and assistance with MultiNIC

STEP 6. Gather troubleshooting and configuration data specified below and contact Sun Support.

At this point, if you have validated that each troubleshooting step above is true for your environment, and the issue still exists, further troubleshooting is required:
I. packet capture using the “snoop” command.  Follow these steps:

a. snoop -d (first interface in the group) -o /tmp/<interface name or instance> -s 54 -q

b. snoop -d (second interface in the group) -o /tmp/<interface name or instance> -s 54 -q

c. monitor for error condition in messages:

tail -f /var/adm/messages  or otherwise reproduce the failure

d. then control-c the snoop commands and provide the output files /tmp/<interface name or instance> for each network interface in the IPMP group.
note: explorer should be run with the “-w localzones” option to collect information on any configured local zones.

II. collect the following outputs to a file using these commands:

# dladm show-dev > show-dev.out
# dladm show-link > show-link.out
# dladm show-aggr -L > show-aggr.out

The following commands will be collected for machines till Solaris 10 update4
1.dladm_show-link.out
2.dladm_show-dev.out
3.dladm_show-aggr_-L.out

And the following commands will be collected for machines Solaris 10 update 4 onwards
1.dladm_show-link.out
2.dladm_show-dev.out
3.dladm_show-aggr_-L.out
4.dladm_show-linkprop.out

You might be interested to read below :


  • RHEL 6 – ISCSI Administration Series – Configuring ISCSI Server and Client

  • SAN Storage Migration – Solaris with VxVM

  • Solaris host level SAN migration from Clariion to VMAX – Hands on Lab

  • Hands on Lab – Replacing Failed Disks from ZFS Pools ( RaidZ2 / RaidZ3 ) – Part2

  • Enabling SVM in Failsafe and password recovery in Solaris.

  • Hands on Lab – Replacing Failed Disks from ZFS Pools ( Simple / Mirrored / RaidZ )
  • Email
  • More
  • Print
  • Digg
Posted by Ramdev
Comment it
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Comment

Join to our Professional Network (of 1400+ unixadmins ) to receive Unix Administration and Job Updates -

Pages1

Don't Miss Updates

 

Beginners Zone

 

Unixadmin Careers

Server Hardware

Beginners Lessons

Troubleshooting-Flowchart

 

Intermediate Zone

 

Solaris Booting

Solaris Volume Manager

Storage Configurations

Solaris Networking

Solaris X86

Solaris ZFS

Solaris NFS

Solaris NIS

Solaris Patching

Solaris Booting

Solaris Kernel

Veritas Volume Manager

Solaris NIS

Logical Volume Manager

Linux Networking

Linux Disk Management

Linux Troubleshooting

 

Experts Zone 

 

Solutions

Scripting and Automation

Server Security

Veritas Cluster Services

Sun Cluster Services

Cloud Computing

SUN LDOMS

Copyright © 2009 unixadminschool.com. All rights reserved.
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.