CIMA

CIMA is the CMS Instrument for Masterclass Analysis, a special piece of PHP software used for CMS Masterclasses. It's not part of the CMS e-Lab, but it is web-accessible and included with the website code.

CIMA in Use

During a CMS Masterclass, participants use the iSpy event display to view a series of CMS particle collision events while using CIMA to record and graph their observations. The Masterclass proctors guide participants through identifying the set of leptons (e or μ) visible in the detector and then using this final state to infer the unseen primary state boson that produced it (Z, W, Higgs, etc.). After examining multiple events, participants plot the invariant mass of Z-type events into a histogram; if all goes as intended, they will see a peak at the Z mass. The experience illustrates how fundamental particles are observed and measured using accelerator data.

Masterclass Session and Locations

Masterclass sessions may be hosted from anywhere in the world, and they generally involve multiple groups from multiple locations joined via video link. Participants are typically high school students (or the local equivalent) and their teachers.

On the front page (index.php), participants first choose their Masterclass session from the "Choose your Masterclass" column. For example, CERN-10Mar2017 would refer to a CMS Masterclass hosted from CERN on March 10, 2017. Once selected, users will then see the "Choose your location" column. Although it probably no longer exists as you read this, CERN-10Mar2017 had groups of participants in Debrecen (Hungary), Lyon (France), Palaiseau (France), São Paulo SPRACE (Brazil) and Zagreb (Croatia). Users select the appropriate option for their location (Debrecen2017, LyonB2017, etc.) from this column.

Data Groups and Events

As of Q1 2017, CIMA and iSpy use 10,000 CMS detector events divided into 100 groups of 100 events each. After choosing their location, users will then be able to see which data groups are assigned to them. Selecting a group opens fillOut.php, which is where much of the work is done during the Masterclass.

During the session, participants use the iSpy event display (accessible through a link in the upper-right of the page) to work their way through the 100 events in the data group that they've selected. iSpy maintains the same group/event numbering system as CIMA so that users can select the same data event in both iSpy and the particle selection table of fillOut.php. iSpy shows a transparent model of the CMS detector along with a reconstruction of the tracks observed in the selected event.

The most salient feature of the collision shown in the event display are the electron and muon tracks. Users identify which type of lepton appears in a given event according to their paths and which part of the detector they appear in, and they select it from the "final state" section of the CIMA particle selection panel. Next, users infer what type of primary state boson decayed into this observed final state using various physical principles. For example, charge conservation dictates that charged bosons (W±) can decay only into odd numbers of electrons or muons (or their antiparticles), while neutral bosons (Z, Higgs, etc.) must decay evenly into lepton/antilepton pairs. Users select their choice from the "primary state" section of the CIMA particle selection panel.

Particle Selection

The options given for "primary state" are W+, W-, W, NP, Higgs, and Zoo.
  • W+ and W- should be self-explanatory
  • W is selected when a user identifies the primary boson as a W but cannot determine the sign of its charge
  • NP is selected to indicate a neutral particle: typically a Z, and it used to be "Z," but we changed it to NP to account for the fact that the data includes J/Ψ and Υ events, which are chargeless particles whose decays are similar to Z's at this level of observation
  • Higgs is selected if the user believes the event shows the decay of a Higgs boson
  • Zoo is selected when the user cannot confidently identify the source of the decay.

Higgs and Zoo are marked as "special" events on the panel, and selecting them will disable any selection of the final lepton state. I've never been entirely sure why (Joel).

If a Z-type neutral particle is selected, the user will be asked to determine the invariant mass of the decay from the information provided by the iSpy event display and then enter it into the input box in the particle selection panel.

Once the user submits an event through the particle selection panel, the event and the user's selections appear in a table below, and the selection panel cycles to the next event. If the user selected NP as the primary state, the entered mass will appear in the table for that event. If the user selected Higgs, the invariant mass of the event is taken from the database and displayed without user input (even if it's not really a Higgs event! Users are allowed to make mistakes). No other event types display a mass in the table, though all events have an associated mass in the database.

Results and Histogram

After users are done analyzing individual events, the header of the event analysis page (fillOut.php) offers links to two tools for interpreting their work: a tabulated results page and a histogram plot.

The results table on results.php shows the results of all data groups that all locations analyzed during the Masterclass so that users can see what their co-participants in different parts of the world found using other data groups. The histogram on hist.php is an interactive feature that lets users manually enter the masses of Z-type primary states that they found in their data group; like the results table, the histogram captures input from all locations, which allows participants to help construct a better graph than their single location could by itself. The end result is a histogram with sufficient data to show a clear peak at the Z-mass.

The Source Code

In the repository, the code is found in the cima/ folder of the repo root. On the VMs, its live files are served from the directory /home/quarkcat/sw/www-php/cima/.

NB this last fact! Since CIMA is PHP and not JSP, it is not served from the same Tomcat directory as the rest of the site, which is /var/lib/tomcatX/webapps/elab/ (AKA quarkcat/sw/tomcat/webapps/elab/). This can be confusing since the CIMA home page URL www.i2u2.org/elab/cms/cima/index.php suggests that the directory tomcat/webapps/elab/cms/cima/ ought to exist, and indeed it does - but it's unused and contains no files. It's probably deletable. We should maybe consider that.

CIMA is not deployed

The deployment scripts deploy-from-svn et al. do not affect CIMA files. To place CIMA files into service on either i2u2-prod or i2u2-dev, you must manually copy files into the /home/quarkcat/sw/www-php/cima/ directory of the VM (the same is true of the other www-*/ directories, by the way).

Even though CIMA is not deployed from the repository, you should still commit changes you make to the source code to the repo for version control and group access.

Ideally, the fileset in the repository branch directory 4.0-ND-dev/cima/ would correspond exactly to the i2u2-dev directory i2u2-dev:/home/quarkcat/sw/www-php/cima/, and 4.0-ND-prod/cima/ to i2u2-prod:/home/quarkcat/sw/www-php/cima/, but over time differences have accumulated between the VM files and their respective repo files. This is the state as of Q1 2017, at least.

The Data

Database Tables

CIMA data is stored on i2u2-db on a MySQL database called Masterclass. The vast majority of tables in Masterclass are "Location" tables (see Location tables, below). In March of 2017, for example, there are 481 tables, 11 of which are NOT Location tables. They are:
mysql> SHOW TABLES FROM Masterclass WHERE `Tables_in_Masterclass` 
NOT IN (SELECT `name` FROM `Masterclass`.`Tables`);
+------------------------+
| Tables_in_Masterclass  |
+------------------------+
| EventTables            |
| Events                 |
| EventsExt              |
| Events_Backup27Feb2017 |
| Events_New             |
| Events_Old             |
| MclassEvents           |
| TableGroups            |
| Tables                 |
| groupConnect           |
| histograms             |
+------------------------+

The tables EventsExt, Events_Backup27Feb2017, Events_New, and Events_Old are backup or auxiliary tables created during the Feb2017 CIMA upgrade; they may be deleted in the future. That leaves seven tables for you to be familiar with individually.

Events

The most important table is Events, which is the master list of all 10,000 particle events used in CIMA. It has the form
+-------+------+---------+------------+-------------+
|  o_no | g_no | g_index |  ev_no     | mass        |
+-------+------+---------+------------+-------------+
|     1 |    1 |       1 |  490868544 |     75.6802 |
|     2 |    1 |       2 |  489963747 |     59.0754 |
|     3 |    1 |       3 |  329045512 |     70.5787 |
|     4 |    1 |       4 |  328573895 |     81.5894 |
|     5 |    1 |       5 |   75779415 |     90.3327 |
...
|   100 |    1 |     100 |  490570312 |     64.9327 |
|   101 |    2 |       1 |   39338918 |     10.2441 |
|   102 |    2 |       2 |  329158332 |     75.3631 |
|   103 |    2 |       3 |   70443694 |      93.785 |
|   104 |    2 |       4 |   77255513 |     78.8601 |
|   105 |    2 |       5 |  328781228 |     81.2225 |
...
|  9995 |  100 |      95 | 1764877904 |     86.5181 |
|  9996 |  100 |      96 |  200025102 |     9.83702 |
|  9997 |  100 |      97 | 1460456769 |      91.212 |
|  9998 |  100 |      98 |  254964165 |     83.4258 |
|  9999 |  100 |      99 | 1765859249 |     69.2061 |
| 10000 |  100 |     100 |   95312939 |     10.8871 |
+-------+------+---------+------------+-------------+

The primary key is o_no, which ranges from 1 to 10000. This index uniquely identifies every event used in CIMA. The data group that a given event has been assigned to is given by g_no, which ranges from 1 to 100. Each data group therefore has 100 events; the index of an event within its group is given by g_index, which ranges from 1 to 100. The ev_no identifier is something used by the CMS experiment, and it isn't used at all in CIMA as far as I can tell. The invariant mass associated with the event is given by mass.

The g_index column was added by Joel in Feb2017. It isn't fully implemented within the code, which often uses ad-hoc formulae to extract the group index from o_no and g_no. Doing so is part of the upgrades indicated by the keyword TASMANIA in the code's comments.

MclassEvents

The table MclassEvents is of the form
+-----+------------------------------+--------+
| id  | name                         | active |
+-----+------------------------------+--------+
|   8 | Test2                        |      0 |
|  11 | 31Jan2015                    |      0 |
|  12 | 10Feb2015                    |      0 |
|  14 | 01Jan2015(orientations)      |      0 |
|  15 | 04Mar2015                    |      0 |
|  16 | 09Feb2015                    |      0 |
|  17 | Fermilab-06Mar2015           |      0 |
|  18 | Fermilab-07Mar2015-14CT      |      0 |
...
| 160 | Mayaguez-25Feb2017           |      1 |
| 161 | Orientations2017             |      1 |
| 162 | CERN-04Mar2017               |      1 |
| 163 | CERN-08Mar2017               |      1 |
| 164 | CERN-10Mar2017               |      1 |
| 165 | CERN-14Mar2017               |      1 |
...
This contains the names of Masterclass events. In this context, "event" refers to a Masterclass session, not to an accelerator collision event. I (Joel) presume that the name of every Masterclass session, past and present, is stored here except for a handful near the beginning that seem to have been manually deleted. The id value is used as a cross-reference with other tables; it functions as a primary key. The active value is a boolean that determines whether or not the given session appears in the selection menu on the CIMA front page.

The name value of the MclassEvents table is referenced within the CIMA source code as $_SESSION["Masterclass"].

Tables

Each of the Masterclass sessions identified in MclassEvents includes participants from multiple locations. Each location is given its own table in the database with a name chosen by the administrator. The names of these Location tables are stored in the Tables database, which has the form
+-----+------------------------------+------+
| id  | name                         | hist |
+-----+------------------------------+------+
| 209 | 17July                       |  214 |
| 426 | 20Feb2017-test1              |  431 |
| 427 | 20Feb2017-test1a             |  432 |
| 428 | 20Feb2017-test1b             |  433 |
| 210 | 23July                       |  215 |
| 117 | Aachen                       |  122 |
| 236 | Aachen2016                   |  241 |
| 502 | Aachen2017                   |  507 |
...
| 433 | uprm-tchrs                   |  438 |
| 434 | uprm-tchrs2                  |  439 |
...
| 246 | ZagrebA2016                  |  251 |
| 498 | ZagrebA2017                  |  503 |
| 293 | ZagrebB2016                  |  298 |
| 499 | ZagrebB2017                  |  504 |
| 113 | Zagreb_2                     |  118 |
| 235 | Zilina2016                   |  240 |
|  68 | Zurich                       |   73 |
| 332 | Zurich2016                   |  337 |
| 477 | Zurich2017                   |  482 |
+-----+------------------------------+------+

For example, the Masterclass session hosted from Mayagüez, Puerto Rico in February of 2017 is identified in the MclassEvents table above with the name "Mayaguez-25Feb2017" and the id 160. This session included two groups of people at the University of Puerto Rico at Mayagüez who were assigned the tables uprm-tchrs and uprm-tchrs2 in the Masterclass database. The names of these tables are shown as they appear in the Tables table above.

The id value functions as a primary key for this table. There's about 38 missing, probably manually deleted. The hist value likely refers to a table that records each Location's contributions to the Masterclass's histogram.

Location tables

The Location tables whose names are stored in Tables record how users at each location analyze the data they're assigned. For example, the Location table uprm-tchrs used during the Mayaguez-25Feb2017 Masterclass has the form
+------+-------------+
| o_no | checked     |
+------+-------------+
|    1 | mu;W-       |
|    2 | mu;W+       |
|    3 | e;W-        |
|    8 | mu;NP;90.33 |
|    7 | H           |
|    5 | H           |
|   10 | H           |
|    6 | H           |
|  301 | e;W+        |
|  101 | mu;NP;10.29 |
|  901 | mu;NP;93.05 |
|  201 | e;W         |
|  102 | e;W         |
|  902 | mu;W+       |
|  302 | mu;NP;36.08 |
|  903 | mu;W-       |
...
Every time a user presses the "Submit" button on the particle selection panel of fillOut.php, this table is updated to record which particles' checkboxes were selected and what mass was entered (if applicable). The o_no value is the unique event index given in the Events table, while the checked value is a string that encodes what information users submitted through the particle selection panel of fillOut.php.

This example is actually a bad one: at the time this table was created, the group indices for the newly-imported 10,000 data events were not being properly assigned. A typical Location table should contain o_no from within a single range of 100 events assigned to a given data group. That is, the o_no values recorded here should be within a range like (1-100), (1401-1500), (9801-9900), etc.

The value checked was originally intended to be particle checkboxes only, but in Feb2017 Joel added user-submitted masses to the string as the quickest way to implement that feature. More properly, these tables should instead be created with a separate mass column to store this number. Doing so is part of the upgrades indicated by the keyword TASMANIA in the code's comments.

The name of a given Location table is referenced within the CIMA source code as $_SESSION["database"]. Data within the table is usually (but not always) accessed as the variables
  $events["id"] = o_no;
  $events["checked"] = checked;

Importing Event Data

CIMA's event data originates with Tom McCauley, the maintainer of iSpy, who selects events from publicly-available CMS data for use with the CMS Masterclasses.

For the Q1 2017 upgrade, Tom provided CSV files of data from these events for import into the CIMA database. For two examples,

masterclass_1-2gamma.csv:
Run,Event,pt1,eta1,phi1,pt2,eta2,phi2,M,Index
199319,641436592,77.2006,0.250438,0.6055050000000001,60.1382,0.650821,-1.5390000000000001,122.79790133899999,97
masterclass_60-4lepton.csv:
Event,Run,E1,px1,py1,pz1,pt1,eta1,phi1,Q1,E2,px2,py2,pz2,pt2,eta2,phi2,Q2,E3,px3,py3,pz3,pt3,eta3,phi3,Q3,E4,px4,py4,pz4,pt4,eta4,phi4,Q4,M,Index
137440354,195099,92.5961775474,8.5353921252,-22.575752798699998,-89.39532085489999,24.1354,-2.02028,-1.20933,-1,59.8124628499,-10.7217014151,41.810988378699996,-41.4053850271,43.1638,-0.8522719999999999,1.82182,1,21.4101492687,6.953341653760001,-20.2443480232,0.460330755882,21.4052,0.021503900000000003,-1.23995,1,11.013022368900002,-7.746923757160001,6.874656868580001,-3.74311724054,10.3574,-0.353958,2.41578,-1,127.047752639,96

The filename contains the data group (1-100) of the enclosed events immediately after the underscore, along with the physical event type (2gamma, 4lepton, etc.). Each physical event type has a different number and structure of CSV columns, but only "Event," "M" or "Mt," and "Index" are relevant to CIMA.

"Event" is the ev_no column value of Masterclass.Events, the unique CMS identifier for the event that CIMA doesn't really use, but we record it anyway.

"M" is the mass column value of Masterclass.Events, the invariant mass of the decay. Some physical event types have a transverse mass "Mt" instead, which is not the same thing. Nevertheless, we import "Mt" as mass into the Events table so that the results table has a value to display if the user makes a mistake in analyzing the event.

"Index" is the group_index value of Masterclass.Events, the value between 1-100 that identifies this event within its data group.

For the 2017 upgrade, this made for about 400 CSV files of different column formatting. Joel wrote a Bash script to process these into a single CIMA-master.csv file. Once constructed and moved to i2u2-db:/var/lib/mysql-files/, this file can be easily imported into the Masterclass database with the command
mysql> LOAD DATA INFILE '/var/lib/mysql-files/CIMA-master.csv'
    -> INTO TABLE Events
    -> FIELDS TERMINATED BY ','
    -> LINES TERMINATED BY '\n'
    -> IGNORE 1 LINES;

This process ended up working well enough that we should stick to for future dataset upgrades, if possible.

Development History

CIMA was originally written by Stefan Schoppmann while a grad student at RWTH Aachen.

As of 2016, CIMA used data from 3000 CMS events divided into 30 groups. In February 2017, Joel made upgrades to
  • Implement a new set of 10000 CMS events divided into 100 groups
  • Improve the look of the particle selection panel (table.tpl) of fillOut.php
  • Allow the user to manually enter mass for neutral-boson primary states (NP, formerly Z)

To-Do

Joel keyworded code comments about the next round of upgrades as TASMANIA to make them greppable.

  • Completely re-write the CSS for the particle selection panel (/templates/table.tpl). Bootstrap doesn't seem to be good for this application. In particular, fix the incomplete vertical divider and to allow the "NP" label to be changed to something longer without distorting the table.
  • Overall, the CSS just plain needs to be sorted out and whipped into shape. It's a bit of a mess.
  • Update the location tables in Tables to have `mass`, `group_index` columns.
  • Put in an easier way to clear a Results table (in fillOut.php)
  • Ken suggests an easier way to get rid of old MC groups, I think? Ask for clarification.
  • Fix the histogram to stop the vertical auto-rescaling
  • Fix the histogram so that mass values on the x-axis are centered on the dividers, not the bins
  • Consider an option to automatically fill the histogram.
    • Idea: students manually fill the dilepton and diphoton events, then auto-fill the rest?
    • Idea: students have a histogram of their own data that they fill manually. Then, that data can be automatically combined with the rest of their location group to form one histogram. Then, all locations are automatically combined into one MCEvent histogram the way it is now.
  • When the user selects both a final state and primary state, the table.tpl "Submit" button activates only if the primary state is selected last. It should work in either order.
  • Both "mu" and "electron" can be selected in the table in fillOut.php. This shouldn't be. Fix in js/fcns.js.
  • An updated screencast showing the mass-entry and histogram creation process would be good. The current one links to leptoquark.
  • Ken has an idea to make the interface more general. Particle selection has "Tracks" (electron, muon, photon, zoo) and "Number" (of tracks) (1,2,4,2+2 mixed) in one box, "Charge" (+,-,0,unknown) and "Mass" (entry box). This is adaptable to Masterclasses other than Z-mass. See attached scan.

-- Main.JoelG - 2017-01-10

Comments

 
Topic revision: r19 - 2018-09-17, JoelG
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback