Solaris Troubleshooting : Calculate TCP retransmission rate using netstat
This post explains you the information on the TCP retransmission measurement using netstat -s for Solaris. A system with a high retransmission rate indicates network congestion and can cause network oriented applications (like web servers) to operate slowly.
Procedure to Calculate TCP retransmission rate:
TCP expects acknowledgments from the destination system when it successfully receives segments from the sender. If it does not receive the acknowledgment within a certain time, it will retransmit the segment.
To implement this scheme of retransmission, TCP starts a timer for each packet transmitted. Unless the acknowledgment is received before this timer expires, TCP will assume that the packet is lost and will retransmit the packet. The rate at which packets are retransmitted is called the ”retransmission rate”, and is an indicator of network health.
To calculate the retransmission rate, use the output of the ‘netstat -s -P tcp‘ command. We need the values for the counters
The retranmission rate is given either in terms of bytes or segments. To calculate the byte retransmission rate:
%retrans = ( tcpRetransBytes / tcpOutDataBytes ) * 100
To calculate the segments retransmission rate:
%retrans = ( tcpRetransSegs / tcpOutDataSegs ) * 100
As a rule of thumb, retransmission rates over 10% can indicate degraded network performance on a LAN. The internet may vary between 10 and 20 percent depending upon traffic conditions. In some environments, delaying the retransmission to accommodate slower networks may be needed.
General Retransmission Rules:
moderate retransmissions < 10%
warning > 15% ( > 2/sec )
excessive retransmissions > 25%
action required > 40%
High rates (greater than 30%) may be relieved by adjusting the tcp_rexmit_interval* timers in Solaris. Some of the reasons for a high retransmission rate are:
- Congested network, packets dropped. This is the most common cause.
- Bad network hardware. Check ‘netstat -i‘ output for collisions or errors, also check the various network components involved.
- Missing or out of date TCP or IP patches.
- Incorrectly tuned TCP parameters (tcp_rexmit_interval_min,tcp_rexmit_interval_max, tcp_rexmit_interval_initial, tcp_ip_abort_interval).
- Clients accessing server have slow or error-prone connections.
Notes: Be aware that these counters are 32 bits, and on a system with a high network load. They can experience an overflow and result in rates over 100%. In those cases, monitoring the system (with the command ‘netstat -s’) at periodical intervals or after the next downtime would be the recommended action to differentiate between a system with a high network load and a real retransmission problem.
The most acurate way to get the correct rate is by taking the values twice and using the deltas of each counter.