LIGO Data Overview

This describes what the LIGO data flow looks like from QuarkNet's end. This documentation is current as of December 2016, and it supersedes any documentation you find referring to data2.i2u2.org or data4.i2u2.org, or to www13/www18 (the Argonne severs, defunct as of Q1 2016).

If you're troubleshooting a problem, the information in Edit's notes (item 8) and Mihael's "How LIGO Works" summary are helpful.

Data is collected from the two LIGO observatory sites at Hanford and Livingston. The sensors that provide the data operate around-the-clock over spans of years (2003 - present, with a break during the upgrade shutdown from 2011-2013), and one of the main challenges of constructing the LIGO e-Lab is transferring and storing this continuously-flowing stream of data.

Between the two sites, nine seismic sensors continuously generate 189 separate streams of data that are delivered to the ELabs e-Lab. Most of these are then made available to plot within the e-Lab. A bird's-eye overview of the process:

  1. LIGO delivers .gwf frame files every night
    • i2u2-data: /disks/i2u2/ligo/data/frames/
  2. ImportData uses the new frames to update the streams
    • i2u2-data: /disks/i2u2/ligo/data/streams/
  3. DataServer.py serves streams to users via the e-Lab plotter
    • i2u2-data: /disks/i2u2/ligo/data/streams/DataServer.py

The first two are governed by cronjobs that typically require little attention. The third, DataServer.py, requires manual start and restart, but is otherwise low-maintenance. At present (Q1 2017), the e-Lab receives and stores far more data than is used in the e-Lab.

The e-Lab receives no gravitational wave data from LIGO

Frame Files

LIGO stores its sensor data in the form of frame files ending in the .gwf extension ("gravitational wave frame", I assume). Each .gwf frame file represents a "snapshot" of all data generated by a set of sensors over a relatively short period of time (one hour for minute-trend data, or either one minute or ten minutes for second-trend data). LIGO does a little pre-processing of this frame data before delivering it to i2u2-data nightly.

Frame file directory structure

ELabs writes the frame files that LIGO sends us into a set of directories within /disks/i2u2/ligo/data/frames/ on i2u2-data.

trends

The first division of the frame files within frames/ is into the directories trend/ and trend_after23April2013/. From 2011 to 2013, the LIGO seismic sensors were turned off during general maintenance and upgrades to the experiment, so no data exists from that period. All of the files from before the shutdown (2003 to 2011) are in trend/ - this directory is now static, and its data shouldn't ever change. Files from after the 2013 restart are in trend_after23April2013/, including newly-delivered data.

Within each of trend/ and trend_after23April2013/, the frame files are further subdivided into minute-trend/ and second-trend/ directories according to which type of time-sampling the data uses. The e-Lab uses "minute trend" data; the "second trend" data is higher-resolution and has been used for testing, but no finished products that use it have been rolled out. Nevertheless, we still receive, process and store it in case it's used in the future.

As of Q1 2017, only minute-trend data is plotted in the e-Lab, so if you're troubleshooting a problem with incoming frame data, you'll typically want to go straight to
    i2u2-data:/disks/i2u2/ligo/data/frames/trend_after23April2013/minute-trend/
to check on the incoming data. Note that trend_after23April2013/ is owned by quarkcat and has owner-only permissions, so you'll need to $ sudo su before you can cd into it.

sites

Within each of minute-trend/ and second-trend/, the frame files are divided by their origin site: LHO/ for "LIGO Hanford Observatory" or LLO/ for "LIGO Livingston Observatory." In terms of data flow, there's no real difference between the two.

Within each of LHO/ and LLO/ is a set of subdirectories with names like L-M-1147/, indexed by the site (L for "Livingston" or H for "Hanford") and trend-type (M for "minute" and T for "second", for some reason), as well as a 4-digit number. The number is the first 4 digits of the timestamp of all the frame files within it, representing millions of seconds (1 million seconds is a little over 11 1/2 days). The frame files are bundled into these subdirectories to keep them organized. For example, we have the directory
/disks/i2u2/ligo/data/frames/trend_after23April2013/minute-trend/LLO/L-M-1147/

containing all minute-trend frame files from the Livingston observatory for the 11.5-day period where all timestamps begin with 1147. New directories are created automatically whenever the fourth digit rolls over.

Frame file naming conventions

The general standard for naming frame files is described in this project note from LIGO (from 2001 - old, but still accurate as of 2017).

The .gwf frame files themselves are contained within the bundle directories (e.g., L-M-1147/). A typical example is

/disks/i2u2/ligo/data/frames/trend_after23April2013/minute-trend/LLO/L-M-1147/L-M-1147986000-3600.gwf

The L-M- prefix again refers to the site (Livingston) and trend-type (Minute). Files in the other directories will have L-T-, H-M- or H-T-, as appropriate.

The long string of digits that follows is a timestamp in GPS time format, which is the number of seconds since midnight on 6 January 1980. If you want to know the regular date and time associated with a frame file but for some reason you can't work that out in your head (:/), the LIGO experiment provides a nifty converter for you.

The last bit represents the timespan covered by the file's data in seconds. Minute-trend data is sampled over a span of an hour (3600 seconds) before being packaged into the frame, while second-trend data is sample over a span of ten minutes (600 seconds). You'll notice that the timestamps of sequential frame files are incremented by these values. So, the file L-M-1147986000-3600.gwf contains minute-trend data from all Livingston sensors taken between 1147986000 and 1147986000 + 3600 = 1147989600 seconds. The next frame would be named L-M-1147989600-3600.gwf and contain data in the range [1147989600, 1147993200), etc.

Even though the file and directory names look arcane, pretty much everything is determined by the combination of observatory site and trend-type. Going back to the example above,

/disks/i2u2/ligo/data/frames/trend_after23April2013/minute-trend/LLO/L-M-1147/L-M-1147986000-3600.gwf
  • The trend-type items will always match and will be either (minute-trend - M - M - 3600) or (second-trend - T - T - 600)
  • The site items will always match and will be either L or H
  • The first GPS timestamp item will always match the first four digits of the second
It looks complex only because there's a lot of redundancy.

Each frame file contains information which must be appended to the streams of many different sensors. This is what the ImportData script does.

Stream Files

Stream file naming conventions

The data from each individual seismic channel is stored on i2u2-data in the directory /disks/i2u2/ligo/data/streams/ as sets of files called "stream files." Stream filenames are constructed of a succession of labels indicating

site - subsystem - station - sensor - sampling

The LIGO Channels page details each of these identifiers.

Each sensor's data stream appears as a set of three files within the streams/ directory; for example,

L1:PEM-CS_SEIS_LVEA_VERTEX_X_DQ.bin
L1:PEM-CS_SEIS_LVEA_VERTEX_X_DQ.index.bin
L1:PEM-CS_SEIS_LVEA_VERTEX_X_DQ.info

The regular .bin file is the primary data file and will typically be on the order of GB in size. The much smaller .index.bin and .info files are auxiliary files that help with the processing and plotting of the main file.

The filenames encode the exact seismic sensor and data channel of the stream, and they correspond closely to the stream names as identified in the e-Lab Analysis Tool. For the example given above,

L1:PEM-CS_SEIS_LVEA_VERTEX_X_DQ

  • L1 indicates the Livingston site
  • DQ indicates that this stream is directly from the PEM subsystem and does not have DMT frequency-processing applied to it
  • LVEA_VERTEX indicates the vertex station of the observatory, at the Laser and Vacuum Equipment Area
  • SEIS_..._X indicates the x-direction accelerometer of the seismic sensor (seismic as opposed to tilt or magnetometer)
  • This example has no sampling identifier, because only DMT subsystem streams have frequency sampling.
  • I still haven't figured out what CS indicates

The stream file directory

The full contents of the /disks/i2u2/ligo/data/streams/ directory are, in order of $ ls,
  • DataServer.py, the RESTful python server that delivers requested streams to the e-Lab Analysis Tool. It should always be running, or else the e-Lab can't get data to plot.
  • 807 H0 files representing 269 data streams from the Hanford Observatory.
  • 690 H1 files representing 230 data streams from the Hanford Observatory.
  • ImportData.errors, the error log for the ImportData script that creates the stream files out of the frame files.
  • 225 L0 files representing 75 data streams from the Livingston Observatory.
  • 570 L1 files representing 190 data streams from the Livingston Observatory.
  • ligoimport.files, the log that records which frame files have been imported into their respective sets of stream files.
  • nohup.out, the log file to which output from DataServer.py is redirected when it is started using the nohup ("no hangup") command.
  • old_ligoimport.files, an old version of ligoimport.files

Unlike frame files, which increase in number nightly, the number of stream files is fixed according to the number of seismic sensors at LIGO.

Cronjobs

The e-Lab cronjobs on i2u2-data belong to user quarkcat, and you can see them with the command
$ crontab -l -u quarkcat
(-u specifies the user, just as with sudo, and -l directs the output to the terminal) (if you're curious, user-owned cronjobs like this are stored in /var/spool/cron/crontabs, but you shouldn't edit them there. Use the crontab command). The LIGO-relevant part should look like
  #Ligo data import and conversion
   0 0 * * * rsync -a --verbose --password-file=/password/folder/.pwligo i2u2data@terra.ligo.caltech.edu::ligo/trend_after23April2013/second-trend/ /disks/i2u2/ligo/data/frames/trend_after23April2013/second-trend > /tmp/second.log 2>&1
   0 0 * * * rsync -a --verbose --password-file=/password/folder/.pwligo i2u2data@terra.ligo.caltech.edu::ligo/trend_after23April2013/minute-trend/ /disks/i2u2/ligo/data/frames/trend_after23April2013/minute-trend > /tmp/minute.log 2>&1
  50 0 * * * /usr/local/ligotools/i2u2tools/bin/ImportData /disks/i2u2/ligo/data/frames/trend_after23April2013 /usr/local/ligotools/ligotools /disks/i2u2/ligo/data/streams > /tmp/convert.log 2>&1

The first two are rsync commands to pull second-trend and minute-trend frame files, respectively, from the Caltech LIGO server terra.ligo.caltech.edu, acting as user i2u2data on that machine. This is done every day at midnight (Eastern time, I assume, since that's where i2u2-data is). The files are written to i2u2-data in the appropriate subdirectory of /disks/i2u2/ligo/data/frames/trend_after23April2013/.

The third command runs the ImportData script every morning at 12:50am, which converts the frame files into stream files that the e-Lab can plot. Note that there are three arguments to ImportData. The first gives the source directory of the files to be converted, the second gives the location of the LIGOtools programs that do the conversion, and the third is the destination directory where the converted stream files are written.

Note the location of the error logs for these processes:
  • /tmp/second.log
  • /tmp/minute.log
  • /tmp/convert.log
The first two are useful if you think frame files aren't being delivered from Caltech and written to i2u2-data properly. The third is useful if you think the frames aren't being converted to streams properly.

-- Main.JoelG - 2016-05-25

Comments

 

Topic attachments
I Attachment Action Size Date Who Comment
ligo_i2u2_channelnames_121814.docxdocx ligo_i2u2_channelnames_121814.docx manage 14 K 2016-06-09 - 13:43 Main.JoelG Complete list of LIGO channels delivered to QuarkNet
Topic revision: r13 - 2019-05-22, AdminUser
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback