[MUSIC]
In this video, we'll talk about how to train a classifier to provide flat or
uniform dependency of signal efficiency on the particle parameters.
Firstly, let's consider how to train a boosting over decision trees
classifier to provide flat performance on the set of features.
This example is based on the AdaBoost classifier.
This classifier uses the loss function shown on this slide.
In this function, gamma is a true label of an event, and s is score obtained for
each event as the sum of predictions for all trees in the series.
To provide the flat classifier efficiency on a set of features,
we modify the loss function by adding a new term that is responsible for flatness.
Let's consider how this term works.
Suppose that we would like to provide flat signal efficiency on a particle momentum.
For that, we divide the momentum values into bins.
Then for each bin, we integrate the differences between
the cumulative distribution of the classifier output in that bin and
the global cumulative distribution of the classifier output.
And finally, we calculate the weighted sum over all bins.
Weight of a bin is a fraction of signal particles in this bin.
This term tries to minimize the differences
between the global distribution of the classifier output and
the output distribution in the bins during the classifier training.
In case of the ideal flatness, this term is close to zero.
The modified loss function provides a trade-off between
classifier quality and classifier output flatness.
The better flatness the worse quality of particle type identification.
And the balance is determined by the problem requirements
you solve using such modification.
This is just one possible modification of the loss function described in
the original paper provided at the bottom of the slide.
Consider results of these modifications.
The figure on the slide is provided by the original paper.
It demonstrates dependencies of signal efficiency from a selected feature,
minimum distance from the corner in our case.
For several classifiers for
the same global efficiency of 50%.
This global efficiency is illustrated as a gray line.
Black curve in a figure corresponds to conventional AdaBoost classifier
without any modifications of its loss function.
This classifier has strong dependency of signal efficiency from
the selected feature.
The second curve, with the name kNNAda,
is also represents a non-flat AdaBoost classifier but with other loss function.
All other curves correspond to different flat modifications of AdaBoost classifier.
The results for the modification described in the previous
slide are shown as the first red curve.
It demonstrates much better flatness compared with conventional AdaBoost.
Consider how it works in the particle modification.
This figure shows the dependencies of the pion efficiency from
its transverse momentum at the LHCb experiment at CERN.
The figure compares two classifiers for three different
global efficiencies: 60%, 80% and 90%.
The blue curves correspond to the uniform boosting,
which is a AdaBoost classifier,
modified as it was described previously.
And the green curves correspond to the non-modified
gradient boosting classifier on decision trees.
The uniform boosting provides significantly better
flatness of the signal efficiency for all three global efficiencies.
And the trade-off between the uniform boosting flatness and
quality was tuned to provide the same quality as for the non-flat classifier.
In the paper provided at the bottom of the slide,
you can find plots for other particle types.
We considered how to modify loss function for an AdaBoost classifier
to provide flatness of its signal efficiency on a set of features.
Now let's consider a method which allows to train a neural network with flat
signal efficiency without any special modifications of its loss function.
This method is called decorrelation using adversarial neural network.
The network in this approach consists of two parts.
The first one, is a classifier that is trained to predict
a particle type for an example.
This classifier has its own loss function used for the training.
For an example, binary or categorical cross-entropy.
So this is just a usual neural network without any special modifications.
The second part of the network is an adversary network.
It takes outputs of the classifier for a particle as inputs and
predicts a particle momentum value, for example.
The momentum prediction is performed to using multiclassification
problem instead of the regression one.
For that all particle momentum values are divided into bins.
And each bin represents a separate class.
Adversary network predicts a bin for each output of the classifier.
Adversary network also uses its own loss function, and
in this case, it can be categorical cross entropy.
To provide flat output of the classifier,
we should minimize concurrently two loss functions
shown in the slide during the neural network training.
In case of the particle identification,
the first loss function represents quality of a particle momentum
reconstruction based on the classifier output for this particle.
The second loss function represents quality of the classifier.
But also this function penalizes the classifier if it's possible
to reconstruct the particle momentum based on its output.
The lambda in the loss function, is an adjustable parameter which
defines the trade-off between the classifier flatness and quality.
Consider how this approach works.
The figure on the slide is taken from the original paper.
And it demonstrates outputs for the decorrelated adversarial network,
compared to the traditional neural network.
Black curve corresponds to the decorrelated network, and the red curve
corresponds to the traditional neural network without adversary part.
The decorrelated network shows much flatter
dependency of its output from the jet invariant mass.
These two methods of uniform classifiers were developed for high energy physics.
However, these approaches can be used in other fields
where dependency of a classifier output on a set of features is undesirable.
It was last example of flat or uniform classifiers,
let's make summary of this week.
So we considered an example of a particle decay and
learned that mass of the mother particle depends on the energies,
momentum and masses of the daughter particles.
Tracks of the daughter particles are needed to check that the particles
originate from the same point, and belong to the same decay.
Detectors in high energy physics, and the different systems,
are needed to estimate a particle trajectory, momentum,
energy, and its type to be able to recognize particle decays and
reconstruct parameters of mother particles.
We also considered several of the most common detector systems
in high energy physics.
The systems are tracking system, ring-imaging Cherenkov detector,
electromagnetic and hadron calorimeters and muon system.
All these systems recognize particles,
measure their energy and momentum and identify their types.
All charge particles have responses in the tracking
system to reconstruct their tracks and measures their momentum.
Ring-imaging Cherenkov detector identify particle
type based on it's track and momentum.
Electrons and photons are stopped by the electromagnetic calorimeter, which also
measures their energies, other particles fly farther to the hadron calorimeter.
And hadron calorimeter stops protons, neutrons and
other particles containing quarks and estimates their energies.
Muons pass all detector systems and are detected in the muon system.
Also we saw different cases in high energy physics experiments where
machine learning can be applied.
Machine learning can be used for particle tracks pattern
recognition among detector hits and reject ghost tracks.
Combining tracks into vertices to recognize particle decays and
to estimate properties of mother particles.
Machine learning can help to increase precision of particle momentum
estimation in tracking systems or to improve ring image recognition
in RICH subdetectors for better particle type identification.
Particle energy estimation and neutral particle identification
based on calorimeter responses are also examples of such cases.
We detailed discussed how different classifiers can be used for global
particle identification based on responses of different systems of the detector.
And how to train them to provide flatness of signal efficiency on
a set of features like particle momentum, transverse momentum or energy.
Finally, if you would like to get more information about machine
learning in the high energy physics, this slide will help you.
It provides a list of references to different articles and
talks about successful examples.
Thank you very much for your attention and see you next week.