GammaLearn - Deep Learning for the CTAO event reconstruction

Workshop on Machine Learning for Analysis of High-Energy Cosmic Particles

Thomas Vuillaume for the gammalearn team

27-31 janvier 2025
thomas.vuillaume@lapp.in2p3.fr
This work is largely based on Mikaël Jacquemont's and Michaël Dell'aiera's PhD thesis.
The presentation is in great part a copy of Michaël's defense presentation from October 14th, 2024.

Introduction

GammaLearn & Cherenkov Telescope Array Observatory (CTAO)


**Exploring the Universe at Very High Energies (VHE)**

* CTAO * Next-generation ground-based observatory for gamma-ray astronomy * Increased sensitivity * First Large-Sized Telescope (LST-1) operational * GammaLearn project * Collaboration between LAPP (CNRS) and LISTIC (USMB) * Fosters innovative methods in AI for CTAO * https://purl.org/gammalearn

The LST-1 under commissioning

[Science with the Cherenkov Telescope Array.](https://www.worldscientific.com/doi/abs/10.1142/10986) WORLD SCIENTIFIC, 2019

Gamma-ray astronomy


**Observation of the universe in the gamma-ray segment of the electromagnetic spectrum (> 0.1 MeV)**

Energy ranges * CTAO: 20 GeV to 300 TeV * LST prevailing in the lowest energies Scientific objectives * Understanding the origin and role of relativistic cosmic particles * Probing extreme environments (e.g. black holes, neutron stars) * Multi-messenger analysis (neutrinos, gravitational waves and cosmic rays) * Exploring frontiers in physics (e.g. dark matter)


Werner Hofmann. “Perspectives from CTA in relativistic astrophysics”. In: Fourteenth Marcel Grossmann Meeting - MG14. Ed. by Massimo Bianchi, Robert T. Jansen, and Remo Ruffini. Jan. 2018, pp. 223–242

Reconstruction workflow


Presentation outline


  • The reconstruction procedure
  • $\gamma$-PhysNet application to LST-1
  • Challenging transition from simulations to real observations
  • Information fusion
  • Domain adaptation
  • Transformers
  • Conclusion and perspectives

The reconstruction procedure

Discrimination patterns


Fig. Gamma- (left) and proton-induced (right) showers at the same energy

→ Different Cherenkov light emission patterns make distinction possible in acquired images


https://www.iap.kit.edu/corsika/71.php

Fig. Energy flux

Standard approaches


Moment-based
Deep learning
Template-based
Hybrid

Hillas+Random Forest (RF)

* Morphological prior hypothesis: ellipsoidal integrated signal * Uses Hillas parameters (moments) * Leverages multiple RFs * Pros * Fast and robust * Cons * Necessitates image cleaning * Limited at lower energy levels * In production on LST-1 (baseline)

Image parameters computed on intensity and time maps

A. M. Hillas. [Cerenkov Light Images of EAS Produced by Primary Gamma Rays and by Nuclei.](https://ntrs.nasa.gov/api/citations/19850026666/downloads/19850026666.pdf) In: 19th International Cosmic Ray Conference (ICRC19), Volume 3. Vol. 3. International Cosmic Ray Conference. Aug. 1985, p. 445. H. Abe et al. [Observations of the Crab Nebula and Pulsar with the Large-sized Telescope Prototype of the Cherenkov Telescope Array](https://arxiv.org/abs/2306.12960) In: Astrophys. J. 956.2 (2023), p. 80

$\gamma$-PhysNet

* CNN-based (interpolated inputs) * Backbone with dual attention * Multi-task architecture * Pros * No prior hypothesis * Less preprocessing (e.g. no cleaning) * Cons * Tricky to optimize * Black-box nature * Expectations * Best performances * Fast inference


Mikaël Jacquemont. [Cherenkov Image Analysis with Deep Multi-Task Learning from Single-Telescope Data.](https://hal.archives-ouvertes.fr/hal-03043188) Theses. Université Savoie Mont Blanc, Nov. 2020.

ImPACT and Model++

* Simulation of templates * Matching templates to real data using per-pixel likelihood * Background rejection using Boosted Decision Tree on discriminant parameters * Pros * Best overall performance * Cons * Computationally expensive


R.D. Parsons and J.A. Hinton. [A Monte Carlo template based analysis for airCherenkov arrays.](https://arxiv.org/abs/0907.2610) In: Astroparticle Physics 56 (Apr. 2014), pp. 26–34 Mathieu de Naurois and Loïc Rolland. [A high performance likelihood reconstruction of gamma-rays for imaging atmospheric Cherenkov telescopes](https://arxiv.org/abs/0907.2610). In: Astroparticle Physics 32.5 (Dec. 2009), pp. 231–252.

FreePACT


* Neural network to estimate the likelihood-to-evidence ratio * Training the neural network (classifier trained to differentiate between samples drawn from the joint PDF and from the product of the marginal) * Pros * Faster than ImPACT * Yields similar or improved results

Fig. Illustration of the neural network architecture used with FreePACT (dense)

Georg Schwefer, Robert Parsons, and Jim Hinton. [A Hybrid Approach to Event Reconstruction for Atmospheric Cherenkov Telescopes Combining Machine Learning and Likelihood Fitting.](https://arxiv.org/abs/2406.17502) 2024

$\gamma$-PhysNet application to LST-1

Results on Monte-Carlo data


  • Training on Monte-Carlo (MC) data
  • Testing on MC data
  • Compute metrics (IRFs) for both chains


$\rightarrow$ $\gamma$-PhysNet increases the sensitivity at all energies, with greater factors at low energies.
IRFs comparison
Comparison of IRFs for Hillas+RF and $\gamma$-PhysNet

Thomas Vuillaume et al. [Analysis of the Cherenkov Telescope Array first LargeSized Telescope real data using convolutional neural networks](https://arxiv.org/abs/2108.04130.pdf) 2021

Generalization issue - impact of NSB


* Real labelled data are intrinsically unobtainable

* MC simulations are approximations of the reality

* The Night Sky Background (NSB) is a key difference between them

Left: an event imaged in dark conditions.
Right: The same event imaged with higher NSB.

Impact of NSB on $\gamma$-PhysNet - results on MC data


IRFs comparison for $\gamma$-PhysNet with different NSB levels. The same model trained in dark conditions is tested with higher NSB levels. M. Dell'aiera's PhD thesis.

Generalization issue - impact of NSB on real data


NSB intensity distribution - comparison of MC and real data
Reconstruction algorithm Significance (higher is better) Excess of $\gamma$ Background count
lstchain (Hillas+RF) 12.0 σ 379 308
$\gamma$-PhysNet 12.5 σ 395 302
$\gamma$-PhysNet + Background matching 14.3 σ 476 317

Detection capability of the $\gamma$-PhysNet on Crab observations


Thomas Vuillaume et al. [Analysis of the Cherenkov Telescope Array first LargeSized Telescope real data using convolutional neural networks](https://arxiv.org/abs/2108.04130.pdf) 2021

Results on Crab - Baseline



Research direction: don't change the data, change the model

$\rightarrow$ to produce a more general model
$\rightarrow$ to adapt to unknown differences between MC simulations and real data


1. Inject NSB information into the model

Information fusion


2. Make the model agnostic to changes

Domain adaptation


3. Improve generalization with pre-training

Transformers

Information fusion

Multi-modality: the $\gamma$-PhysNet-CBN architecture


Additionnal information (such as NSB level) passed to the network through conditionnal batch norm (CBN)

Results with multi-modality on simulations


$\gamma$-PhysNet-CBN tested for different NSB levels. M. Dell'aiera's PhD thesis.

Results on Crab - $\gamma$-PhysNet-CBN



Domain adaptation

Unsupervised Domain Adaptation (UDA)


**[Domain adaptation](https://arxiv.org/abs/2009.00155): Set of algorithms and techniques to reduce domain discrepancies**

* Domain $\mathcal{D} = (\mathcal{X}, P(x))$ * Take into account unknown differences between * Source domain (labelled simulations) * Target domain (unlabelled real data) * Include unlabelled real data in the training * No target labels → Unsupervised * Selection and improvement of relevant SOTA: DANN, DeepJDOT, DeepCORAL

Fig. Domain confusion in the feature space

Yaroslav Ganin et al. [Domain-Adversarial Training of Neural Networks.](https://arxiv.org/abs/1505.07818) 2016. Bharath Bhushan Damodaran et al. [DeepJDOT: Deep Joint Distribution Optimal Transport for Unsupervised Domain Adaptation”](https://arxiv.org/abs/1803.10081) 2018. Baochen Sun and Kate Saenko. [Deep CORAL: Correlation Alignment for Deep Domain Adaptation](https://arxiv.org/abs/1607.01719) 2016.

The $\gamma$-PhysNet-DANN architecture


Results on Crab - $\gamma$-PhysNet-(C)DANN



Transformers

Transformer models


Masked Auto-Encoder (MAE)
Fine-tuning

* Image contains redundances * Keep 25% of the patches * In-painting task * Allows to use the hexagonal grid of pixels * Use simulations and real data * Improved generalization


Kaiming He et al. [Masked Autoencoders Are Scalable Vision Learners.](https://arxiv.org/abs/2111.06377) 2021.

Event reconstruction example 1


Fig. Left to right: Initial, masked, reconstructed

Event reconstruction example 2


Fig. Left to right: Initial, masked, reconstructed

Event reconstruction example 3


Fig. Left to right: Initial, masked, reconstructed

The $\gamma$-PhysNet-Prime


Results on Crab - $\gamma$-PhysNet-Prime



Conclusion and perspectives

Conclusions


  • Novel techniques (Information fusion, Unsupervised domain adaptation, Transformers) implemented to solve simulations vs real data discreprency
    • Tested on simulations, in different settings (Light pollution and label shift)
    • Tested on real data (Crab), both moonlight and no moonlight conditions
  • $\gamma$-PhysNet much more affected by light pollution than Hillas+RF
  • Domain adaptation, information fusion and Transformers increase the performance in degraded conditions without data adaptation
  • Still, best results obtained using *data* adaptation
  • In our case, NSB is dominant compared to unkown differences
  • Transformers are still under exploration and yield the best results on simulations

Perspectives


  • Dell'aiera 2024 submitted, Dell'aiera 2025 in prep.
  • Incorporate these findings to work in producion with LST-1
  • Use gammalearn in real-time $\rightarrow$ ANR grant DIRECTA
  • Stereoscopic analysis $\rightarrow$ see Stereograph talk on Thursday

Acknowledgments


Acknowledgments

  • This project is supported by the facilities offered by the Univ. Savoie Mont Blanc - CNRS/IN2P3 MUST computing center
  • This project was granted access to the HPC resources of IDRIS under the allocation 2020-AD011011577 made by GENCI
  • This project is supported by the computing and data processing resources from the CNRS/IN2P3 Computing Center (Lyon - France)
  • We gratefully acknowledge the support of the NVIDIA Corporation with the donation of one NVIDIA P6000 GPU for this research.
  • We gratefully acknowledge financial support from:
    • the agencies and organizations listed here
    • the Fondation Université Savoie Mont-Blanc
    • the European Union’s Horizon 2020 research and innovation programme under grant agreement No 653477
    • the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824064
    • the European Union Horizon Programme call INFRAEOSC-03-2020 - Grant Agreement No 101017536
  • The author acknowledges the support of the French Agence Nationale de la Recherche (ANR) under reference ANR-23-CE31-0021 (DIRECTA)

Contributors (with many thanks)

  • Hana Ali Messaoud (LAPP)
  • Luca Antiga (Orobix)
  • Alexandre Benoit (LISTIC)
  • Sami Caroff (LAPP)
  • Daniele Ciriello (Orobix)
  • Daniele Cortinovis (Orobix)
  • Michaël Dell'aiera (LAPP, LISTIC)
  • Tom François (LAPP)
  • Mikaël Jacquemont (LAPP, LISTIC)
  • Giovanni Lamanna (LAPP)
  • Patrick Lambert (LISTIC)
  • Gilles Maurin (LAPP)
  • Cyann Plard (LAPP)
  • Vincent Pollet (LAPP)
  • Filippo Quarenghi (Orobix)
  • Giorgia Silvestri (Orobix)
  • Justine Talpeart (LAPP)
  • Thomas Trivellato (LAPP)
  • Thomas Vuillaume, PI (LAPP)
  • Brondon Waffa-Pagou (LAPP)