LIGO Data Overview
This describes what the LIGO data flow looks like from QuarkNet's end. This documentation is current as of December 2016, and it supersedes any documentation you find referring to data2.i2u2.org or data4.i2u2.org, or to www13/www18 (the Argonne severs, defunct as of Q1 2016).
If you're troubleshooting a problem, the information in Edit's notes
(item 8) and Mihael's "How LIGO Works"
summary are helpful.
Data is collected from the two LIGO observatory sites at Hanford and Livingston. The sensors that provide the data operate around-the-clock over spans of years (2003 - present, with a break during the upgrade shutdown from 2011-2013), and one of the main challenges of constructing the LIGO e-Lab is transferring and storing this continuously-flowing stream of data.
Between the two sites, nine seismic sensors continuously generate 189 separate streams of data that are delivered to the ELabs e-Lab. Most of these are then made available to plot within the e-Lab. A bird's-eye overview of the process:
- LIGO delivers
.gwf frame files every night
- ImportData uses the new frames to update the streams
- DataServer.py serves streams to users via the e-Lab plotter
The first two are governed by cronjobs that typically require little attention. The third, DataServer.py, requires manual start and restart, but is otherwise low-maintenance. At present (Q1 2017), the e-Lab receives and stores far more data than is used in the e-Lab.
The e-Lab receives no gravitational wave data from LIGO
LIGO stores its sensor data in the form of frame files ending in the
extension ("gravitational wave frame", I assume). Each
frame file represents a "snapshot" of all data generated by a set of sensors over a relatively short period of time (one hour for minute-trend data, or either one minute or ten minutes for second-trend data). LIGO does a little pre-processing of this frame data before delivering it to i2u2-data
Frame file directory structure
ELabs writes the frame files that LIGO sends us into a set of directories within
The first division of the frame files within
is into the directories
. From 2011 to 2013, the LIGO seismic sensors were turned off during general maintenance and upgrades to the experiment, so no data exists from that period. All of the files from before the shutdown (2003 to 2011) are in
- this directory is now static, and its data shouldn't ever change. Files from after the 2013 restart are in
, including newly-delivered data.
Within each of
, the frame files are further subdivided into
directories according to which type of time-sampling the data uses. The e-Lab uses "minute trend" data; the "second trend" data is higher-resolution and has been used for testing, but no finished products that use it have been rolled out. Nevertheless, we still receive, process and store it in case it's used in the future.
As of Q1 2017, only minute-trend data is plotted in the e-Lab, so if you're troubleshooting a problem with incoming frame data, you'll typically want to go straight to
to check on the incoming data. Note that
is owned by quarkcat and has owner-only permissions, so you'll need to
$ sudo su
before you can
Within each of
, the frame files are divided by their origin site:
for "LIGO Hanford Observatory" or
for "LIGO Livingston Observatory." In terms of data flow, there's no real difference between the two.
Within each of
is a set of subdirectories with names like
, indexed by the site (
for "Livingston" or
for "Hanford") and trend-type (
for "minute" and
for "second", for some reason), as well as a 4-digit number. The number is the first 4 digits of the timestamp of all the frame files within it, representing millions of seconds (1 million seconds is a little over 11 1/2 days). The frame files are bundled into these subdirectories to keep them organized. For example, we have the directory
containing all minute-trend frame files from the Livingston observatory for the 11.5-day period where all timestamps begin with 1147. New directories are created automatically whenever the fourth digit rolls over.
Frame file naming conventions
The general standard for naming frame files is described in this project note
from LIGO (from 2001 - old, but still accurate as of 2017).
frame files themselves are contained within the bundle directories (e.g.,
). A typical example is
prefix again refers to the site (Livingston) and trend-type (Minute). Files in the other directories will have
, as appropriate.
The long string of digits that follows is a timestamp in GPS time format, which is the number of seconds since midnight on 6 January 1980. If you want to know the regular date and time associated with a frame file but for some reason you can't work that out in your head (:/), the LIGO experiment provides a nifty converter
The last bit represents the timespan covered by the file's data in seconds. Minute-trend data is sampled over a span of an hour (3600 seconds) before being packaged into the frame, while second-trend data is sample over a span of ten minutes (600 seconds). You'll notice that the timestamps of sequential frame files are incremented by these values. So, the file
contains minute-trend data from all Livingston sensors taken between 1147986000 and 1147986000 + 3600 = 1147989600 seconds. The next frame would be named
and contain data in the range [1147989600, 1147993200), etc.
Even though the file and directory names look arcane, pretty much everything is determined by the combination of observatory site and trend-type. Going back to the example above,
- The trend-type items will always match and will be either (minute-trend - M - M - 3600) or (second-trend - T - T - 600)
- The site items will always match and will be either L or H
- The first GPS timestamp item will always match the first four digits of the second
It looks complex only because there's a lot of redundancy.
Each frame file contains information which must be appended to the streams of many different sensors. This is what the
Stream file naming conventions
The data from each individual seismic channel is stored on i2u2-data
in the directory
as sets of files called "stream files." Stream filenames are constructed of a succession of labels indicating
The LIGO Channels
page details each of these identifiers.
Each sensor's data stream appears as a set of three files within the
directory; for example,
file is the primary data file and will typically be on the order of GB in size. The much smaller
files are auxiliary files that help with the processing and plotting of the main file.
The filenames encode the exact seismic sensor and data channel of the stream, and they correspond closely to the stream names as identified in the e-Lab Analysis Tool. For the example given above,
- L1 indicates the Livingston site
- DQ indicates that this stream is directly from the PEM subsystem and does not have DMT frequency-processing applied to it
- LVEA_VERTEX indicates the vertex station of the observatory, at the Laser and Vacuum Equipment Area
- SEIS_..._X indicates the x-direction accelerometer of the seismic sensor (seismic as opposed to tilt or magnetometer)
- This example has no sampling identifier, because only DMT subsystem streams have frequency sampling.
- I still haven't figured out what CS indicates
The stream file directory
The full contents of the
directory are, in order of
DataServer.py, the RESTful python server that delivers requested streams to the e-Lab Analysis Tool. It should always be running, or else the e-Lab can't get data to plot.
H0 files representing 269 data streams from the Hanford Observatory.
H1 files representing 230 data streams from the Hanford Observatory.
ImportData.errors, the error log for the
ImportData script that creates the stream files out of the frame files.
L0 files representing 75 data streams from the Livingston Observatory.
L1 files representing 190 data streams from the Livingston Observatory.
ligoimport.files, the log that records which frame files have been imported into their respective sets of stream files.
nohup.out, the log file to which output from
DataServer.py is redirected when it is started using the
nohup ("no hangup") command.
old_ligoimport.files, an old version of
Unlike frame files, which increase in number nightly, the number of stream files is fixed according to the number of seismic sensors at LIGO.
The e-Lab cronjobs on i2u2-data
belong to user quarkcat, and you can see them with the command
$ crontab -l -u quarkcat
specifies the user, just as with
directs the output to the terminal) (if you're curious, user-owned cronjobs like this are stored in
, but you shouldn't edit them there. Use the
command). The LIGO-relevant part should look like
#Ligo data import and conversion
0 0 * * * rsync -a --verbose --password-file=/password/folder/.pwligo email@example.com::ligo/trend_after23April2013/second-trend/ /disks/i2u2/ligo/data/frames/trend_after23April2013/second-trend > /tmp/second.log 2>&1
0 0 * * * rsync -a --verbose --password-file=/password/folder/.pwligo firstname.lastname@example.org::ligo/trend_after23April2013/minute-trend/ /disks/i2u2/ligo/data/frames/trend_after23April2013/minute-trend > /tmp/minute.log 2>&1
50 0 * * * /usr/local/ligotools/i2u2tools/bin/ImportData /disks/i2u2/ligo/data/frames/trend_after23April2013 /usr/local/ligotools/ligotools /disks/i2u2/ligo/data/streams > /tmp/convert.log 2>&1
The first two are
commands to pull second-trend and minute-trend frame files, respectively, from the Caltech LIGO server terra.ligo.caltech.edu, acting as user i2u2data on that machine. This is done every day at midnight (Eastern time, I assume, since that's where i2u2-data
is). The files are written to i2u2-data
in the appropriate subdirectory of
The third command runs the
script every morning at 12:50am, which converts the frame files into stream files that the e-Lab can plot. Note that there are three arguments to
. The first gives the source directory of the files to be converted, the second gives the location of the LIGOtools programs that do the conversion, and the third is the destination directory where the converted stream files are written.
Note the location of the error logs for these processes:
The first two are useful if you think frame files aren't being delivered from Caltech and written to i2u2-data
properly. The third is useful if you think the frames aren't being converted to streams properly.
-- Main.JoelG - 2016-05-25