Skip to content

mikelballay/signals_anomaly_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unsupervised Anomaly Detection in X-Ray Time Series with Transformers

Python 3.10+ PyTorch scikit-learn License: MIT

Bachelor's Thesis (TFG) — Data Engineering · Universidad Carlos III de Madrid (UC3M) · 2025


Executive Summary

Problem. The XMM-Newton EPIC-pn X-ray telescope produces thousands of multi-band light curves per observation. Identifying astrophysically interesting transients (stellar flares, eclipses, X-ray bursts) in these archives is a labour-intensive expert task with no clean labels — a classic unsupervised anomaly-detection challenge on multivariate time series.

Solution. An end-to-end deep-learning pipeline built on reconstruction-based anomaly scoring. The centrepiece is a Transformer Autoencoder (TAE) trained with a novel Masked Denoising objective: 30 % of input timesteps are randomly zeroed, forcing the model to learn robust temporal representations rather than exploiting the identity shortcut. A complementary validity masking mechanism excludes instrumental background contamination from both training loss and anomaly scoring.

Key Results — blind test set, N = 56 signals, stratified 70/30 holdout:

Model AUC-ROC · Scenario 1 AUC-ROC · Scenario 2 Notes
TAE + Masked Denoising (best run) 0.918 0.915 Final blind test
TAE + Masked Denoising (5-seed mean) 0.843 ± 0.019 0.856 ± 0.010 Bootstrap 95 % CI [0.820, 0.974]
LSTM-AE + Masked Denoising 0.407 ± 0.022 5-seed mean; masking alone is insufficient
LSTM-AE (no masking) 0.267 Identity-shortcut failure
Isolation Forest ~0.862 TAE advantage +5.6 pp (DeLong p = 0.192)
Anomaly Transformer < TAE < TAE Underperforms all 3 scenarios (p < 0.05)

Scenario 1: all "Interesting" signals · Scenario 2: Interesting excluding background-type events

Headline findings:

  • Masked denoising is the single largest design improvement: +8.0 pp AUC-ROC in a 43-experiment ablation.
  • TAE advantage over IF is not significant under broad detection (+5.6 pp, p = 0.192) but becomes formally confirmed for genuine astrophysical events (+14.3 pp, p = 0.004).
  • The Anomaly Transformer consistently underperforms, attributed to its Gaussian temporal prior mismatching Poisson-noise statistics of X-ray light curves.
  • Post-hoc attention analysis reveals anomaly detection emerges as a by-product of reconstruction failure, not learned anomaly spotting (attention entropy ≈ 4.0 bits uniformly, indistinguishable between anomalous and normal signals).

Architecture & Methodology

Raw XMM-Newton EPIC-pn light curves (.parquet)
        │
        ▼
┌─────────────────────────────────────────────┐
│  Sliding Window Extraction                   │
│  L=128, stride=64 · 5 energy bands (RATE1–5)│
└─────────────────────────────────────────────┘
        │
        ▼ optional
┌─────────────────────────────────────────────┐
│  Feature Engineering                         │
│  Hardness Ratios · Total Count Rate          │
└─────────────────────────────────────────────┘
        │
        ▼
┌─────────────────────────────────────────────┐
│  Masked Denoising (30 % timestep masking)    │
│  + Validity Masking (telescope artefacts)    │
│  Loss computed on clean targets only         │
└─────────────────────────────────────────────┘
        │
        ├──────────────────┬──────────────────┐
        ▼                  ▼                  ▼
  Transformer AE    Anomaly Transformer    LSTM AE
  (main model)      (attention-based)     (baseline)
  6L · 4H · d=128   3L · 4H · d=128      2-layer BiLSTM
        │                  │                  │
        └──────────────────┴──────────────────┘
                           │
                           ▼
        ┌─────────────────────────────────────┐
        │  Reconstruction Error Scoring        │
        │  + Association Discrepancy (AT only) │
        └─────────────────────────────────────┘
                           │
                           ▼
        ┌─────────────────────────────────────┐
        │  Hold-Out Evaluation (no leakage)    │
        │  Val N=128 → threshold + model sel.  │
        │  Test N=56 → final blind metrics     │
        │  DeLong tests · Bootstrap 95 % CIs  │
        └─────────────────────────────────────┘

Models

Model Architecture Key Innovation
TransformerAE (TAE) 6-layer Transformer encoder–decoder Masked Denoising (30 % masking) + Validity Masking
AnomalyTransformer (AT) Transformer + Gaussian prior attention Association Discrepancy loss
LSTM Autoencoder Bidirectional LSTM seq2seq Masked Denoising variant (ablation baseline)
Isolation Forest Ensemble on per-window statistical features Classical ML baseline

Repository Structure

.
├── src/tfg/                          # Core installable package
│   ├── models/
│   │   ├── transformer_AE.py         # TransformerAE architecture + training utilities
│   │   └── anomaly_transformer.py    # AnomalyTransformer architecture + loss
│   ├── data/
│   │   └── datasets.py               # Dataset classes, sliding windows, feature engineering
│   └── inference/
│       └── anomaly_detection.py      # Scoring, thresholding, plotting
│
├── scripts/
│   ├── train_transformer_AE.py       # Train TransformerAE
│   ├── train_anomaly_transformer.py  # Train AnomalyTransformer
│   ├── train_lstm_ae.py              # Train LSTM Autoencoder baseline
│   ├── train_isolation_forest.py     # Train Isolation Forest baseline
│   ├── detect_anomalies.py           # Unified anomaly scoring (TAE / AT)
│   ├── eval_roc.py                   # ROC / PR evaluation at window & signal level
│   ├── generate_val_test_split.py    # Reproducible stratified holdout split
│   ├── compute_delong_tests.py       # Paired DeLong (1988) AUC-ROC comparison
│   ├── compute_confidence_intervals.py  # Bootstrap 95 % CIs
│   ├── compute_cost_table.py         # Cost-sensitive threshold analysis
│   ├── multires_scoring.py           # Multi-resolution window scoring
│   ├── extract_attention_analysis.py # Attention pattern visualisation
│   ├── extract_signal_embeddings.py  # Latent space extraction
│   ├── plot_qualitative_analysis.py  # Qualitative anomaly case studies
│   ├── plot_signal_graph.py          # Signal similarity graph
│   ├── regenerate_figures_local.py   # Reproduce all thesis figures (local)
│   └── regenerate_figures_server.py  # Reproduce all thesis figures (server)
│
├── notebooks/
│   ├── 01_data_exploration.ipynb     # Signal statistics, band distributions, correlations
│   └── 02_mask_inspection.ipynb      # Telescope mask format inspection
│
├── experiments/                      # Bash orchestration for full experiment matrices
│   ├── run_tfg_experiment.sh         # Main TFG ablation matrix (4 groups, 12 configs)
│   ├── run_tfg_experiment2.sh        # Extended experiment batch
│   ├── run_tfg_experiment3.sh        # Final experiment batch
│   ├── run_holdout_evaluation.sh     # End-to-end holdout val → test pipeline
│   ├── run_confidence_intervals.sh   # Bootstrap CI generation
│   ├── run_b3_groupC_masked.sh       # Group C masked denoising ablation
│   └── run_lstm_ae_masked.sh         # LSTM-AE masked denoising ablation
│
├── data/
│   ├── raw/            # XMM-Newton .parquet signals  [NOT committed — see Data section]
│   ├── masks/          # Telescope mask files          [NOT committed]
│   ├── processed/      # Intermediate features         [NOT committed]
│   ├── labels/
│   │   ├── signals_to_assess.xlsx    # Expert anomaly annotations (ground truth)
│   │   └── windows.xlsx              # Window-level label metadata
│   └── splits/
│       ├── data_split.json           # Reproducible train/val/test assignment (seed 42)
│       └── optimal_thresholds.json   # Val-tuned thresholds per model × scenario
│
├── requirements.txt
├── LICENSE
└── README.md

Quickstart

1. Clone & Install

git clone https://github.com/mikelballay/signals_anomaly_detection.git
cd signals_anomaly_detection
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

2. Provide Data

Raw XMM-Newton light curves are not included due to size. Place .parquet files as:

data/raw/full_signals/<obsid>.parquet   # multiband signals (MultiIndex columns: signal_id × band)
data/masks/<obsid>.parquet             # telescope validity masks (optional)

The ground-truth labels (data/labels/) and the train/val/test split (data/splits/) are committed and ready to use.

3. Train the Best Model (TAE + Masked Denoising)

python scripts/train_transformer_AE.py \
    --data_dir   data/raw/full_signals \
    --out_dir    results/runs/tae_masked \
    --signals    data/labels/signals_to_assess.xlsx \
    --windows    data/labels/windows.xlsx \
    --L 128 --stride 64 --epochs 50 --batch 64 \
    --lr 5e-4 --d_model 128 --nhead 4 --num_layers 3 \
    --loss poisson --extra_features all \
    --denoise_mode mask --denoise_p 0.30 \
    --mask_dir data/masks

4. Train Baselines

# LSTM Autoencoder + Masked Denoising
python scripts/train_lstm_ae.py \
    --data_dir data/raw/full_signals --out_dir results/baselines/lstm_ae \
    --signals data/labels/signals_to_assess.xlsx --windows data/labels/windows.xlsx \
    --L 128 --stride 64 --hidden 128 --denoise_p 0.30

# Isolation Forest
python scripts/train_isolation_forest.py \
    --data_dir data/raw/full_signals --out_dir results/baselines/isolation_forest \
    --signals data/labels/signals_to_assess.xlsx --windows data/labels/windows.xlsx \
    --L 128 --stride 64

5. Score & Evaluate

# Score windows
python scripts/detect_anomalies.py \
    --model_type tae \
    --data_dir data/raw/full_signals \
    --run_dir results/runs/tae_masked/<run_id> \
    --thr_mode percentile --thr_value 99.0

# ROC / PR curves against ground truth
python scripts/eval_roc.py \
    --signals data/labels/signals_to_assess.xlsx \
    --windows data/labels/windows.xlsx \
    --scores  results/runs/tae_masked/<run_id>/anomaly_scores.csv \
    --output-dir results/eval/tae_masked_simple \
    --label-mode simple

6. Full Hold-Out Evaluation + Statistical Tests

# End-to-end: val threshold tuning → blind test → DeLong + bootstrap CIs
bash experiments/run_holdout_evaluation.sh

# Standalone DeLong test (TAE vs IF, TAE vs AT)
python scripts/compute_delong_tests.py \
    --tae_scores results/runs/tae_masked/ \
    --if_scores  results/baselines/isolation_forest/anomaly_scores.csv \
    --at_scores  results/runs/at_baseline/ \
    --split test --split-file data/splits/data_split.json

Hyperparameter Guide

Argument Values Notes
--loss mse, huber, poisson Poisson-weighted MSE best matches count-rate statistics
--extra_features none, total, hr, all Hardness ratios improve band-relative anomaly scoring
--denoise_mode none, mask, gaussian mask with p=0.30 is the validated best configuration
--denoise_p 0.01.0 Masking fraction; 0.30 confirmed by ablation
--mask_dir path Telescope validity masks; enables validity-masking in loss

Data Format

Signals (data/raw/full_signals/<obsid>.parquet): pandas.DataFrame with MultiIndex columns (signal_id, band), band ∈ {RATE1, …, RATE5}. Rows = time steps, values = photon count rates.

Masks (data/masks/<obsid>.parquet): same naming, columns (signal_id, "OK"). Boolean: True = valid, False = artefact/background.


Citation

@thesis{ballay2025anomaly,
  author  = {Mikel Ballay},
  title   = {Unsupervised Anomaly Detection in X-Ray Time Series with Transformers},
  school  = {Universidad Carlos III de Madrid},
  year    = {2025},
  type    = {Bachelor's Thesis (TFG), Data Engineering}
}

Full thesis PDF: [link to be added after defence]


Author

Mikel Ballaymikel.ballay@gmail.com
Data Engineering · UC3M · 2025

About

Unsupervised anomaly detection in XMM-Newton X-ray multiband time series. Transformer Autoencoder with Masked Denoising achieves AUC-ROC 0.918 on blind test set. Includes LSTM-AE & Isolation Forest baselines, DeLong tests, bootstrap CIs. TFG · UC3M 2025.ifically interesting anomalies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors