Solaris Troubleshooting Jumpstart – Common Problems
1) When the “boot net – install” command is issued at the ok prompt, the JumpStart client looks for a Jumpstart boot server.
2) The boot server responds to the rarp request via the “rarpd” daemon (in.rarpd). Using the information in the /etc/ethers file, the server can obtain the client’s IP address and communicate it to the client.
3) A JumpStart server, on the local subnet, receives the RARP request (Reverse Address Resolution Protocol), and maps it to an IP address, using its /etc/ethers and /etc/hosts files. A name service, such as NIS or NIS+, could also be used to map the address.
4) With the IP address known, the JumpStart server generates a RARP reply to the JumpStart client.
5) The JumpStart server responding to the client’s RARP request maps the client’s ethernet address to its IP address and host name, returning this data to the client.
Second Level Boot Process
1) The JumpStart client downloads a minimal kernel(miniroot) from the JumpStart server, into the JumpStart client’s memory. This comes by way of a TFTP request, issued by the Jumpstart client.
2) When the JumpStart server receives the TFTP request, it searches for a matching IP address and architecture, in the “/tftpboot” directory.
3) Once the JumpStart client is booted from the miniroot, it locates the “rules.ok” file. The entry is checked, to make sure it matches that of the JumpStart client.
4) When the match is found, the actions specified are executed. First, the “begin” scripts(if any) are executed. Then the specified profile is installed, and finally the “finish” scripts(if any) are executed.
Points about Boot Servers!!!!!!
Normally, The JumpStart server provides the boot program for booting clients. However, under one condition, the Solaris OS network booting architecture, requires you to set up a separate “boot server”. A boot server, is a system with just enough information to boot up a client over a network. A boot server
must be set up, when the install client is on a different sub-net from the install server.
SPARC based technology install-clients, require a boot server when they exist on different subnets, because the network booting architecture uses the Reverse Address Resolution Protocol(RARP).
When a client boots, it issues a RARP request in order to obtain its IP address. RARP, however does not acquire the netmask number, which is required to distribute information across a router on a network. If the install/boot server exists across a router the boot will fail because the network traffic cannot be routed correctly without a netmask number. The result is, that you can install a client across a router, but you cannot boot a client across a router. So you will have to setup a separate boot server, on the same subnet as the client.
Common problems with Jumpstart a client
boot net fails
network boot fails
Timeout waiting for ARP/RARP packet
Cannot net boot system
Troubleshooting Jumpstart :
1. Ensure that sufficient time is allowed for RARP request and response.
Some servers, in particular, the Sun Fire F15K and E25K servers can take up to 60 seconds or more before they bring up a link and send a RARP request. If no RARP response is received within 5 minutes, you can be reasonably certain it won’t work at all.
2. Verify that the boot server can see incoming RARP requests from your client
follow the proedure “troubleshooting rarp/arp timeout requests” to verify the RARP requests are the expected ethernet address and if it is successful go forward for next checks.
3. Verify that in.rarpd is running on the boot server
login to boot server ( many times boot server and install servers are same) and check for rarpd process as below
#ps -ef|grep rarp
root 546 10 Sep 02 0:00 /usr/sbin/in.rarpd -a
if you find the process is not running, just as mentioned below
For Solaris 10 , as root: # svcadm enable svc:/network/rarp:default
For Solaris 9 and Earlier, as root: # /usr/sbin/in.rarpd -a
4. Confirm that there is a unique, valid entry in the appropriate nameservice for the client on the boot server .
If your environment having common practice of decommissiong servers and reinstall them with different hostname or IP, then it is possible to have the duplicate entries in servers’ “/etc/ethers” and “/etc/hosts” files. Please make simple checks as mentioned below
# grep myclient /etc/ethers /etc/hosts
Additional validity checks include:
In the event that entries have to be changed, it is a good idea to restart the nscd and in.rarpd daemons.
Confirm that the client hostname matches exactly between the ethers and hosts database. For example, using the Fully Qualified Domain Name (FQDN) in the ethers database and then the short name in the hosts database will not work.
Confirm that the /etc/nsswitch.conf file is setup to point to the correct ethers and hosts database locations and that the ordering is as you expect. For example, if the client information is located in /etc/ethers and /etc/hosts, confirm that /etc/nsswitch.conf has “files” first for the ethers and hosts entries.
If, for example, there is an entry in both NIS and files, and NIS was first, but invalid for that client, the arp/rarp will return the wrong details.
Confirm that the mac address alpha characters in the ethers database are all lower case – upper case has been seen to be a problem with some version of OS.
Confirm that there are no leading zeros in the mac address for the client in the ethers database. For example, change 08:00:20:0f:07:45 to 8:0:20:f:7:45.
5. Ensure that the netmask for the interface on the boot server that should service the RARP request is correct
The most common netmask is 255.255.255.0, the typical Class ‘C’ netmask, however it’s still common for other netmasks to be used. Note also that Solaris still adheres to the older style of netmask assumptions, in that if your network address is in the Class A address range, Solaris will by default assume that your netmask is 255.0.0.0. If your network address is in the Class B range, Solaris will assume a netmask of 255.255.0.0. In modern environments, these assumptions are frequently incorrect.
If you are uncertain of the netmask for the network you are attached to, discuss it with your site network administrators