This tutorial gives a basic introduction into fundamental concepts of data analysis in HEP experiments. In the current format it has been used for a one-week course in the context of a HEP workshop and a data analysis school. It should be most useful for undergraduate students who have the basic knowledge of particle physics, but without any experience in data analysis. Prerequisites are some knowledge of C++ and the ROOT data analysis framework (there are dedicated tutorials on ROOT as well).
This tutorial is using a small fraction (50 pb-1) of real CMS data taken in 2011, stored in plain root tuples, to facilitate simplified analyses (the CMS collaboration board has agreed that this particular dataset can be released for educational purposes).
The event information is kept to a minimum: four-momentum components of leading jets, electrons, muons, and photons; combined particle based isolation for leptons and photons as well as b-tag information.
We've chosen as an example a TTbar analysis to explain the concepts of invariant mass, purity and efficiency of a selection, trigger efficiency, and event reconstruction. The goal is a simple cross section measurement and a top quark mass measurement.
The starting point is an introduction to the analysis framework, including examples for producing histograms of basic quantities such as momentum distributions. The students are then supposed to develop the analysis by themselves, following some guidelines and suggestions provided through exercise sheets. A sample solution can be provided as well (which can be handed to the students at the very end).
The full analysis, including data and MC files fits into a 30 Megabyte tar ball (see below) and runs on a standard computer within a few seconds. The only requirement for the computing environment is a ROOT installation.
Anybody is free to use, extend and modify the provided material. The original authors would appreciate an acknowledgement in this case. We are also looking for feedback and hope to improve this tutorial in the future based on your suggestions.
The detailed documentation of the tutorial and the description of the data and Monte Carlo samples, as well as the analysis framework, are provided below as .pdf files - along with a tarball with the analysis framework, the data files and a simple introductory example.
The documentation, together with this tarball, are supposed to be the starting point for the students. The full sample solution is also available upon request; make sure the students don't see this solution before the very end.
The following samples are included in the current version of the tutorial (all MC samples are reweighted to approximately match the pile-up distribution in data, and to correspond to an intergrated luminosity of 50 fb-1)
|Sample||Number of events||Comments|
|Data||~470k||triggered on isolated Muons with pT > 24 GeV|
|MC: TTbar||~37k||generated with MadGraph; isoMuon24 trigger bit stored|
|MC: W+jets||~110k||generated with MadGraph; triggered on simulated isolated Muons with pT > 24 GeV|
|MC: Drell Yan||~78k||generated with MadGraph; triggered on simulated isolated Muons with pT > 24 GeV|
|MC: WW||~4.6k||generated with Pythia; triggered on simulated isolated Muons with pT > 24 GeV|
|MC: WZ||~3.4k||generated with Pythia; triggered on simulated isolated Muons with pT > 24 GeV|
|MC: ZZ||~2.4k||generated with Pythia; triggered on simulated isolated Muons with pT > 24 GeV|
|MC: single top||~5.7k||generated with Powheg; triggered on simulated isolated Muons with pT > 24 GeV|
|MC: QCD||~100||generated with Pythia; triggered on simulated isolated Muons with pT > 24 GeV|
For solutions to the exercises, please contact Christian Sander & Alexander Schmidt directly.