LIGO Data and Dataflow
By "dataflow" I generally mean the flow of data from the source (the LIGO sensors) to the Analysis Tool. So think of it as everything "upstream" from the Analysis Tool. In contrast, I will use "workflow" to describe the transformations of data performed within the Analysis Tool (or out on the Grid, when the Analysis Tool can do that). I don't know if this terminoligy is standard or not, but it seems useful to have two different words to make the distinction here.
- Frame Files: LIGO data are recorded at periodic intervals into "frame files" (in "IGWD frame format"). The data from a particular sensor is called a channel. A library of C routines from the Virgo collaboration, called FrameL, is available to read these files. A utility function called
FrCopy makes it easy to extract selected channels from one or more separate input streams and write these to new frame files. A separate library from Virgo, called Frv allows one to deal with vectors of frame objects. A separate collection of ROOT container classes and functions allows one to manipulate data extracted from a frame file.
Between the LIGO S5 and S6 runs, over the summer of 2009, the frame file format was changed from "version 6" to "version 8". The newest version of the FrameL library is required to read frame files in the newer format.
- Trends: *LIGO data can be classified as either "raw", which means they are recorded at the full sampling rate of the sensor (which varies for different sensors), or "trended". Trend data are summaries over a given time interval. LIGO generates both second-trend and minute-trend data. For student use, especially for longer analyses, we would also like to have 10-min and 1-hr trends, but Eric will have to write some code to produce these. To get the project started we are using only minute-trend data.
- Data Monitoring Tool (DMT): The Data Monitoring Tool (DMT) is a system of "monitors" (background daemon processes) which run on Sun workstations in the LIGO control rooms. DMT is also a software framework for creating these monitors. The DMT seismic monitor takes live raw data as input, filters it by bandwidth, and then displays the past 12 hours on the control room projector. In addition, it writes the bandwidth filtered data as minute-trends to frame files.
- BLRMS: Thus we can easily make use of the existing DMT seismic monitors by using the "BLRMS" output frame files, and then we don't have to write our own code to do any bandwidth filtering. These are minute-trends. We can also get the unfiltered PEM minute trends from a separate source.
- New Minute-trends every hour: New minute-trend frame files are available about every hour, though they take some time to process. DMT frames come from the DMT monitor, while the unfiltered PEM frames come from the Frame Builder. The Frame Builder gives higher priority to recording raw data than to trends. But eventually new frame files show up from both sources.
- Transfer to Caltech: frame files produced at Hanford and Livingston are transfered regularly to Caltech. The channels used by ELabs are included in frame files that also contain real GW data. We need to extract just the ELabs channels from these into new frame files.
- ELabs RDS: A script to extract channels into an I2U2 "Reduced Data Set" (RDS) runs on
terra.ligo.caltech.edu. it is essentially a wrapper to get the directory structure right and then invoke
FrCopy. During the day it's run once an hour. At night it runs once per night. This is just to test both methods. We will likely switch to running it every hour all day.
- RDS Sync: A script to copy (currently via rsync) RDS frame files from
data2 is now being put in place....
- Second-trends: The ability to deal with second-trends has just been added to the Analysis Tool, but only on the "Test" release and only for "advanced" users. Raw data frame are "almost there" but not yet fully tested or debugged.
- Longer trends: Need to write program to do 10-min and 1-hr trends. The "decimation" funtionality of the
FrCopy program is not trending and will not work for this. This is higher priority than getting "raw" data. This would be run by cron, separate from DMT, on a machine at Argonne.
- LDR: LIGO uses LDR (the Lightweight Data Replicator) to serve GW data out for analysis. It might seem reasonable to try to use LDR to make data available for running analyses on Grid nodes, but Greg Mendell suggests otherwise. There is enough overhead that just serving the data via rsync or http from Argonne with standardized URL paths should be sufficient.