The hostview utility
Introduction
The hostview utility gives you a visual overview of the state of an
E10000 and can be a useful tool in diagnosing machine conditions.
1. Invoking hostview
- Ensure that remote applications can display on your local workstation:
% xhost +
- Log in to the SSP for the E10000 you wish to monitor. The SSPs have names
like tp01-e10k-ssp1, ip02-e10k-ssp1, etc. The username you
should use is ssp. After your login has been accepted, you will be
prompted to enter the name of one of the E10000's domains. Type the name of any
of the domains:
% rlogin -l ssp tp01-e10k-ssp1
Password:
Last login: Sun Nov 28 18:16:38 from tpmpt04.mprn.bt.
Sun Microsystems Inc. SunOS 5.5.1 Generic May 1996
Please enter SUNW_HOSTNAME: tp01-e10k-dm01
- The prompt for the ssp user account is normally the SSP machine
name, then a colon, then the name of the domain you have typed. At this
prompt, type the following command to instruct the hostview program to
display on your local workstation, substituting in the place of the character
string tpmpt04.mprn.bt.com the name of your local workstation:
tp01-e10k-ssp1:tp01-e10k-dm01% setenv DISPLAY tpmpt04.mprn.bt.com:0
- Now type the following command to invoke hostview:
tp01-e10k-ssp1:tp01-e10k-dm01% hostview &
You should now see the main hostview window, which looks like this:
2. Using hostview
This section introduces you to some of the capabilities of hostview.
It covers facilities that are likely to be useful to operators and OAs, and is
therefore not a complete guide to all the facilities that the program
offers.
The main hostview window consists of three parts. The menu bar at the
top provides commands for monitoring and controlling the E10000. The power,
teemperature, and fan buttons bring up status reports. The rest of the main
window provides a graphical view of the E10000 system boards and buses.
2.1. Graphical view of boards and buses
The graphical view of the machine shows the control boards (labelled
CB 0 and CB 1), the centreplane support boards (labelled
CSB 0 and CSB 1), the system buses (the six bars in the centre
of the display) and the system boards. The system boards are numbered from
SB 0 to SB 15. The machine cabinet will not necessarily
contain sixteen system boards; in the example above, only boards 0, 1, 3, 4, 6,
7, 14 and 15 are present. The coloured outlines of the system boards represent
how they are grouped together to form domains; in the example above, the two
boards with white outlines form one domain, the two with brown outlines form a
second domain, and so on. Misleadingly, system boards 14 and 15 have a grey
outline that is indistinguishable from the grey background against which they
are displayed.
The two-digit numbers within the outlines of the system boards are numbers that
identify the machine's processors (each system board has four processors). To
the right of each two-digit number is a black shape on a coloured background
that indicates the processor's state, as follows:
- A black diamond indicates that the processor is running the operating
system.
- A black circle indicates that the processor is running its POST (Power-On
Self-Test).
- A black triangle indicates that the processor is at the OK prompt.
- A black square indicates that the machine is between the POST and the
OK prompt (downloading the OBP).
- A black question mark indicates that the processor is in an unknown state.
The shape should always be a diamond under normal running.
- A green background indicates that the processor is running.
- A maroon background indicates that the processor is exiting.
- A yellow background indicates that the OS is being loaded.
- A blue background indicates that the processor is in an unknown state.
- Black, red and white backgrounds are also possible, but should not normally
be seen.
The background should always be green under normal running.
One of the two control boards should contain the letters C and
J within the borders of its icon. This indicates that the clock
distribution signals and JTAG connection (a low-level connection between the
E10000 and the SSP) are coming from the control board in question.
2.2. Status reports
2.2.1. Power
Click on the power icon (the leftmost one) to monitor the status of the
E10000's power supplies. The resulting display should look like this:
The display shows the status of the power supplies to the control boards,
centreplane support boards and system boards. It also shows the status of the
machine's external power supplies. All the coloured icons should be green under
normal running.
Additional detail about the voltage being supplied to individual components can
be obtained by clicking the left-hand mouse button with the cursor positioned
over the component in question. This facility is available for the control
boards, centreplane support boards, and system boards (the components labelled
in bold font on the display).
2.2.2. Temperature
Click on the temperature icon (second from the left) to monitor the E10000's
temperature. The resulting display should look like this:
The display shows the temperature of individual components of the E10000. All
the coloured icons should be green under normal running.
Additional detail about the temperature of individual components can be
obtained by clicking the left-hand mouse button with the cursor positioned over
the component in question.
2.2.3. Fans
Click on the fan icon (third from the left) to monitor the E10000's fans. The
resulting display should look like this:
The display shows the status of the E10000's fans. All the coloured icons
should be green under normal running. They turn amber when the fans are
running at high speed. Note that the machine from which this example display
was taken has no fans in fan tray 4.
2.2.4. Failures
Certain serious machine conditions will cause the failure icon to turn red.
Clicking on the icon brings up a window that displays the condition(s) that
occurred. The following machine conditions are detected in this way:
- The operating system on a domain has failed and is rebooting
- The SSP is not receiving expected status information from the E10000.
- A parity error or other fatal error has occurred, and the domain is
rebooting.
- The domain is being manually rebooted.
- The platform and domains have failed due to a power outage. Power has been
restored, and domains are rebooting.
2.3. Menu bar
Most of the choices on the hostview menu bar are unlikely to be useful
for day-to-day operator and OA activity (and some of them should be carefully
avoided). However, the File item contains two useful options.
The Quit submenu item terminates the hostview program.
The SSP Logs submenu item displays the contents of log files collected
on the SSP. There are two ways to use this item. Normally, clicking on this
item will cause all messages for all the domains within the E10000 to be
displayed. Alternatively, the messages can be restricted to a single domain by
first clicking the left-hand mouse button with the cursor positioned over one of
the system boards that constitutes the domain of interest. If a particular
domain is selected in this way it will be highlighted with a black outline in
the main window of the hostview display. Selecting the SSP
Logs submenu item with a domain highlighted in this way will restrict the
display of the log messages to those that relate to the highlighted domain.
© Grenville Consulting Ltd.