ELabs Cluster power-up checklist
The following steps should be performed whenever a cluster machine (or all cluster machines) are rebooted or powered up from maintenance, just to be sure that everything is working as expected.
THIS CHECKLIST IS STILL UNDER DEVELOPMENT Please add or fix what's needed
Boot Sequence checks
- boot the data server,
data2
- boot the database server,
data1
, and verify that both postgres
and mysqld
are running:
-
sudo /etc/init.d/postgres status
-
sudo /etc/init.d/mysqld status
- boot each individual cluster node:
wwwXX
where XX=10 to 17 (does the order matter?)
- check RAID mounts; no complaints and all disks healthy?
-
/nfs
use 'mount -a' (to be set with run level?)
- verify mounts & partitions --> throw success/failure message
- start http servers: Tomcat, apache (or verify they are running)
- scan ports: correct? (how?)
- confirm http (how?)
- confirm memory available (how?)
- security checks: ??
Sanity checks
- confirm
login.mcs.anl.gov
available (and/or terra, harley, shakey individually)
- ping all cluster machines
- all hosts: login --> dataX, wwwXX via 'terra' (or run Eric's
hey-you-guys uptime
)
- all servers: URL offers --> wwwXX,i2u2.org/elab/cosmic/project.jsp
- confirm all fwds: 'quarknet.fnal.gov/e-lab' and
- www.i2u2.org/elab/cosmic'
- confirm all cross mounts
- database responding: users & data
- all servers: run JMeter tests
LIGO e-Lab checks
- verify Bluestone works for "User" login: http://www13.i2u2.org/tla_test
- verify Bluestione works for "Guest" login: http://www13.i2u2.org/tla_test
QuarkNet Fellows Library
- verify it is up and working correctly: http://www13.i2u2.org/cosmic/library