Exploring the dataset
Last updated
Was this helpful?
Last updated
Was this helpful?
Now is the time where you can start looking at the simulated dataset.
To this end, there is a gitlab project with code and data for the simple statement of the research problem.
If you are not familiar with git or version control systems in general, please have a look at this tutorial: https://www.codecademy.com/learn/learn-git
We have already prepared for you a set of 20.000 collision events using the LHE file format, conveniently converted into a ROOT file.
You can find the dataset here:
To start the project, please open a new terminal and create a new directory for your summer student project. In case you are not familiar with the command line, have a look at this tutorial: https://www.codecademy.com/learn/learn-the-command-line
First, download the git project:
Then move into the directory for the exploration of the dataset:
We will use python throughout the project to benefit from a number of useful packages. Setting these up works best in a clean environment. Therefore, we use the python virtual environment (similar to conda if you have heard of that one).
A setup script is provided which takes care of that for you:
Now the virtual environment is activated. You notice that the command line now starts with "(venv)". Whenever you return to the project and open a new terminal, make sure to execute the setup script every time before you start.
You can now start making first plots of distributions. There is already a script provided which takes the simple dataset. You can execute it as follows:
As a result, you should see a few png image files in your directory. What can you learn from them?
Your next step is more challenging, finally you can start working on your own.
Open the script in an editor and create the following set of plots:
the invariant mass distribution for the top quarks from the associated production
the DeltaR distrance between the top quarks from the resonance decay
the DeltaR distance between the top quarks from the associated production