Fermilab TARGET 2016
For discussion of using ELabs Cosmic Ray e-Lab data for the 2016 Fermilab TARGET program to be held from 5 July - 12 August.
Our primary interest is developing a learning project for which students analyze Cosmic e-Lab data using Python tools of their own development.
- The timeline makes a project in which students interact directly with the e-Lab code unfeasible. The Cosmic e-Lab is not designed to allow users to access data outside of the histogram plots. Though we'd like to allow greater access to the data - in particular, Tom's mention of a HiSPARC/Sapphire-type API - such a feature cannot be implemented before the start of the TARGET program.
- On the other hand, we can easily create a package of data and make it available to the TARGET program for students to analyze. Raw data files are probably not appropriate for this, since interpreting them requires extensive calibration and knowledge of individual detector parameters. Threshold files accompanied by instructive documentation may be better, or the files that are generated during an analysis.
- It would probably be best if the project allowed the students to contribute to the QuarkNet project in a concrete way, in order to increase their sense of motivation and accomplishment.
- The potential HiSPARC/Sapphire-type API that QuarkNet (Tom) has in mind would use Python-based Jupyter notebooks; it's possible that the students could investigate some basic functions of this idea.
- QuarkNet (Edit and Mark) has completed a good chunk of code for a "Flux as a function of barometric pressure" analysis. The students might work on improving or completing the existing code in some way (though note that this code is not in Python).
- If the students are able to develop a novel analysis of the dataset we provide, QuarkNet might be able to integrate it into the e-Lab (though note that integration with the live www.i2u2.org site would not be possible before the end of the program in August).
- Most of the perl tools should have a description of what they do in their headers, and they should also print usage information if invoked without any arguments.
- The data files produced by most tools are simple CSV and the meanings of the columns are usually described in a comment in the beginning of each file.
The raw files record events from the acquisition board. In particular, the GPS time and DAQ CPU clock tick when the amplified electrical signal coming from the photomultipliers exceeds some pre-set value (threshold), and the time when it goes back under that value. The time resolution is on the order of nanoseconds, so the GPS time is not enough to resolve that. This is why the DAQ CPU clock is used to interpolate between GPT time ticks.
There is usually a naming convention for these (and, as much as possible, various derived files down the workflow will share portions of the name):
This file was produced by detector 180 and it was uploaded on 11/01/2006. It was the fourth file uploaded that day from detector 180 (the index is 0-based). [I think this indexing is true of the threshold files, but the raw data files seem to have something different going on - Joel]
These files are stored on i2u2-data
in the directory
. Raw files are stored in compressed format (
) and many (though not all) are accompanied by a meta file:
Each raw data file consists of a readable preamble describing the detector and its parameters, followed by event data itself in a format like
0000000A 80 00 3C 00 00 00 00 00 00000000 143121.022 110113 A 07 A +0056
0000000B 00 00 00 23 00 00 00 00 00000000 143121.022 110113 A 07 A +0056
00000010 80 00 3E 00 00 00 00 00 00000000 143121.022 110113 A 07 A +0056
00000011 00 00 00 24 00 00 00 00 00000000 143121.022 110113 A 07 A +0056
This data format is explained in the e-Lab
. Some terms used in that page:
- CPLD - Complex Programmable Logic Device, a "fast logic" chip on the DAQ that provides a precise clock
- 1PPS - One Pulse Per Second, a standard clock provided by GPS devices
- TMC - Time Measurement Chips, a specific type of time-to-digital converter (TDC) on the DAQ
- uC - The DAQ's "slow logic" microcontroller that forms the interface between the CPLD (and thence the rest of the DAQ) and the external GPS unit and the PC
The threshold-ed files contain, roughly, a date, a time when the signal went over the threshold, a time when it fell back below, and the number of nanoseconds it stayed above. The latter is a rough measure of the energy of the muon that hit the scintillator.
Every detector has its own directory within
in which its threshold files are stored. For detector 6511, for example:
i2u2-data:/disks/i2u2/cosmic/data/6511$ ls -l
-rw-r--r-- 1 quarkcat 51 66796679 Jan 11 2012 6511.2012.0111.1
-rw-r--r-- 1 quarkcat 51 53588430 Jun 25 2015 6511.2012.0111.1.thresh
-rw-r--r-- 1 quarkcat 51 181714447 Jan 23 2012 6511.2012.0120.0
-rw-r--r-- 1 quarkcat 51 118050701 Dec 29 15:19 6511.2012.0120.0.thresh
-rw-r--r-- 1 quarkcat 51 1000 Mar 29 03:58 6511.geo
Note that the files come in pairs and that the smaller
files are created or modified long after the larger no-extension files (at least for this example). Plus the
geometry file (below) and potentially some
These files describe the physical properties of each detector. They are named as follows:
The detector ID also appears in the raw data file names (see above).
The structure of a geometry file is this (X+ means at least one X, but possibly more):
geometryFile = geometryEntry+
latitude = degrees '.' arc-minutes '.' arc-seconds
longitude = degrees '.' arc-minutes '.' arc-seconds
channeldata = x y z area cableLength
The latitude and longitute are assumed N W, respectively, with negative angles for S/E.
There can be multiple entries. Each such entry starts with a validity date. The last entry represents (in theory) the current configuration.
Stacked detectors are such that the scintillators are on top of each other. This configuration is suitable for coincidence detection and muon lifetime measurements. Non-stacked configurations are better at flux measurements, since the total surface area is larger, but reliable noise management is difficult, since it is generally hard to ensure that the sources of noise have constant contributions over time.
The channel configuration is the physical configuration of the 4 scintillator paddles. The coordinates are self explanatory. Their units, I'm not sure. Inches would be my best guess.
The cable lengths are used to correct timing delays caused by propagation of the signal from the paddles to the DAQ board.
There may or may not be direct access to the geometry files through the I2U2 site. However, the information should still be there in one form or another.
A sample analysis (flux)
The Swift programs referred to here are found in the repository under
. Each form of analysis available to the e-Lab has its own program; the one for Flux Analysis is
Perl scripts used by these programs are found in
The flux analysis tries to produce the number of muon hits per unit area per time.
The Swift program starts from thresholdAll
, the already thresholded files (the i2u2 software automatically runs the thresholding tool when files are uploaded).
The steps are:
- Wire Delay
wireDelayData = WireDelayMultiple(thresholdAll, geoDir, geoFiles, detectors, firmwares) detectors and firmwares are ignored. The call applies WireDelay to all input files. The relevant bit is in the beginning:
WireDelay @filename(thresholdData) @filename(wireDelayData) @filename(geoDir) That translates into the following command:
WireDelay.pl <inputFile.threshold> wireDelayed.dat <geometryDirectory> WireDelay looks at the detector data (see above) and corrects for cable signal propagation delays.
combineOut = Combine(wireDelayData); aka.
Combine.pl wireDelayed1.dat wireDelayed2.dat... combined.dat If the analysis started with multiple files, this concatenates them all into a single file. This step does (almost) nothing if the analysis is done on a single file.
- Single Channel
SingleChannel.pl combined.dat singleChannel.dat 1 This extracts data for only one of the channels (1 in this example). It seems contradictory with the idea of maximizing the area by using all channels in a non-stacked configuration. It is also a bit inefficient to do all the previous steps on all the channels and then discard most of them.
Sort.pl singleChannel.dat sorted.dat 2 3 This sorts the events in
singleChannel.dat on the 2nd and 3rd column. These are the default sort columns for a flux study and correspond to the day and time of the beginning of the event. Sadly, I had to look at the JSP code to figure out what columns were being used.
Flux.pl sorted.dat flux.dat <binWidth> <geometryDirectory> The main part. Calculates the number of hits in each <binWidth> (default 600s) period and the statistical error for the counts.
A perl tool that sets up a gnuplot file and runs gnuplot to produce the output. Has lots of options.
Jupyter (python) notebooks
These notebooks are first attempts to handle the data files using python: