QuarkNet Servers

Bare-Metal Servers

The e-Labs website and e-Labs are served from three SuperMicro servers located at Notre Dame's Center for Research Computing. The servers were purchased through Fermilab in July 2014 and stored there until they were moved to Notre Dame and placed into production in Q4 2015 when we moved IT operations from Argonne to Notre Dame.

Purchase Order with specs.

As labeled by the CRC, the physical ("bare metal") servers are

  • i2u2-vmhost01
  • i2u2-vmhost02
  • i2u2-store01
i2u2-vmhost01 and i2u2-vmhost02 are ostensibly identical general-purpose servers. i2u2-store01 is the primary data storage server, with 28TB.

The CRC has installed Red Hat Enterprise Linux on these servers, from which their resources are apportioned into several virtual machines (VMs) described below. All interaction that e-Lab developers have with the servers is in terms of the VMs, so you typically don't need to know the physical machine names unless something goes wrong.

Something goes wrong: In December 2016 two of the drives failed on i2u2-vmhost02 along with its power supply unit. CRC engineers warned us that the failure of an additional drive would wipe out the VMs stored on that server. These VMs were moved to i2u2-vmhost01 (with reduced RAM) for safety until i2u2-vmhost02 can be repaired. The servers were still under a 3-year parts warranty from SuperMicro.

The CRC handled the warranty submission to SuperMicro, which shipped replacement parts. The server was repaired by the end of January 2017, and the affected VMs were returned to normal service over the following week (i2u2-data, being critical for the website function, had to wait until the next weekend to be moved).

The engineers recommend against purchasing SuperMicro equipment in the future, since their products tend to be not as robust as "Tier 1" equipment.

VMs

All e-Lab IT functions are performed on virtual machines (VMs) created on the two i2u2-vmhost bare-metal servers listed above. The CRC is in charge of the virtualization software running on the underlying RHEL OS, so contact them if you need a new VM or a clone or something.

The VMs are

(VM-name).crc.nd.edu Public IP description
  i2u2-prod 129.74.246.110 Server for e-Labs site www.i2u2.org
  i2u2-dev 129.74.246.106 For development prior to deployment on i2u2-prod
  i2u2-db N/A Database server for user data to i2u2-prod / dev
  i2u2-data N/A Database server for physics data to i2u2-prod / dev
  i2u2-quarknet 129.74.246.125 Server for quarknet.i2u2.org
  i2u2-wiki 129.74.246.153 Server for wiki.i2u2.org (this one) and bugzilla.i2u2.org
  i2u2-ligo N/A Temporary server to help fix a problem with LIGO in 2016. Since deleted
  i2u2-jupyter N/A Jupyter Notebook server

More details on the VMs (private page)
Obtaining access to the VMs

Backups

The drives on all physical servers are kept in a RAID array as a first measure against data loss.

The VMs themselves are backed up nightly to tape.

Maintenance and Security

Cron jobs

Updates

The VM's need to have their packages updated regularly using apt-get update and apt-get upgrade in order to stay secure. After apt-get upgrade, a restart may be required.

Restarts

Naturally, you want to avoid restarting public VM's while users are logged in. In either i2u2-prod (www.i2u2.org) or i2u2-dev, you can login as the administrator and select "Session Tracking" to see who's currently logged in. It's typical to have many users logged into the Cosmic Ray e-Lab on i2u2-prod, for example.

Once SSH'd into the VM itself, you can also check the Tomcat logs at /home/quarkcat/tomcatlogs/ to see who's doing what on the website. The terminal commands users, ps and w are also useful to see who else is logged into the VM directly and what they're doing (this should only be other developers or sysadmins).

Restarts to i2u2-prod, i2u2-data, and i2u2-db are best done at night (and preferably over the weekend) to avoid disrupting users.

SSL Certificates

We maintain Let's Encrypt SSL certificates for www.i2u2.org on i2u2-prod and for bugzilla.quarknet.org and wiki.quarknet.org on i2u2-wiki. The CRC maintains SSL certs for crc.nd.edu domains (e.g. i2u2-dev.crc.nd.edu) on servers that we don't serve anything under our own domains on.

ELabs SSL Certificates

Old Servers

Confused by references to servers you've never heard of, like www18, www13, or data4? These were the names of servers we used when the e-Labs site was served from Argonne. Learn More

Troubleshooting

Unknown MySQL server host 'i2u2-db.crc.nd.edu'

This can happen when an e-Lab or CIMA attempts to pull data from i2u2-db, and it's generally a DNS resolution error (that is, the calling VM can't turn "i2u2-db.crc.nd.edu" into an IP address). First do the obvious cursory checks:

  1. SSH into i2u2-db to make sure it's up and connected to the network
  2. $ ping -c 5 i2u2-db.crc.nd.edu from the calling VMs (likely i2u2-prod and/or i2u2-dev)

Assuming the above checks are nominal, these solutions have worked:

  • Restart Apache. If you can't restart Apache - for example, if there are active e-Lab users:
  • Use the direct IP. In the relevant code, try replacing "i2u2-db.crc.nd.edu" with the IP address of i2u2-db as given on the VM info page. This is mostly useful with CIMA where you can make direct changes to the code without deployment.
  • Look at /etc/resolv.conf on the calling VM. On one occasion, this file became wrong, possibly after being overwritten by resolvconf. Ask the CRC engineers to confirm that the current file is correct; as of March 2017 the correct configuration is given here.

If none of these work, you can try restarting MySQL using either of
$ sudo service mysql restart
$ sudo /etc/init.d/mysql restart
If that fails, reboot the calling VMs if possible.

-- Main.JoelG - 2016-12-07

Comments

 

This topic: ELabs > WebHome > HardwareGuide
Topic revision: 2019-08-02, JoelG
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback