How LARUN Works
From raw telescope data to validated exoplanet candidates - a complete walkthrough of our detection and analysis pipeline.
The Complete Pipeline
1. Data Acquisition
Fetch light curves from NASA MAST archive
2. Preprocessing
Clean, normalize, and detrend data
3. BLS Analysis
Search for periodic transit signals
4. TinyML Detection
Neural network confirms signals
5. Vetting
Calculate false positive probability
6. Report
Generate analysis summary
Stage 1: Data Acquisition
┌─────────────────────────────────────────────────────────────┐
│ USER REQUEST │
│ "Analyze TIC 307210830" │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ TARGET RESOLVER │
│ TIC ID → Coordinates → Cross-match with catalogs │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ TESS │ │ Kepler │ │ Gaia │
│ MAST │ │ MAST │ │ DR3 │
└──────────┘ └──────────┘ └──────────┘
│ │ │
└───────────────┼───────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LIGHT CURVE DATA │
│ Time (BJD) │ Flux │ Flux Error │ Quality Flags │
└─────────────────────────────────────────────────────────────┘
What Happens
- Parse user request to extract target identifier
- Resolve TIC/KIC ID to celestial coordinates
- Query NASA MAST archive for available observations
- Download FITS files with light curve data
- Fetch stellar parameters from Gaia DR3
Data Sources
| Source | Data Type | Cadence |
|---|---|---|
| TESS FFI | Full-frame images | 30 min / 10 min |
| TESS 2-min | Target pixel files | 2 minutes |
| TESS 20-sec | Fast cadence | 20 seconds |
| Kepler | Long/short cadence | 30 min / 1 min |
Stage 2: Preprocessing
Steps
- Quality Filtering: Remove data points flagged for spacecraft anomalies, cosmic rays, or other issues
- Outlier Removal: Sigma-clipping to remove statistical outliers (typically 5σ)
- Gap Filling: Handle missing data from Earth occultations, momentum dumps
- Normalization: Scale flux to median = 1.0 for consistent analysis
- Detrending: Remove long-term stellar variability using polynomial or spline fits
Detrending Methods
- Polynomial: Fast, good for simple trends
- Spline: Flexible, handles complex variability
- Gaussian Process: Best for active stars, computationally expensive
- Cofiam: Cotrending Basis Vectors from Kepler
Before Preprocessing
After Preprocessing
Stage 3: BLS Analysis
Box Least Squares (BLS)
BLS is the gold standard algorithm for finding periodic transit signals. It searches for box-shaped dips in the light curve by testing thousands of possible periods.
How It Works
- Define period search range (0.5 - 15 days typical)
- For each trial period, phase-fold the light curve
- Fit a box-shaped transit model
- Calculate Signal Detection Efficiency (SDE)
- Peaks in SDE indicate potential transit periods
Key Parameters
| Parameter | Description | Typical Value |
|---|---|---|
| period_min | Minimum search period | 0.5 days |
| period_max | Maximum search period | 15 days |
| duration_grid | Transit duration range | 0.01 - 0.15 (phase) |
| frequency_factor | Oversampling factor | 5-10 |
Stage 4: TinyML Detection
Neural Network Confirmation
After BLS identifies potential signals, our TinyML models provide independent confirmation. This two-stage approach reduces false positives.
Model Federation
Multiple specialized models vote on each candidate:
- EXOPLANET-001: Transit shape classifier
- VSTAR-001: Rules out eclipsing binaries
- FLARE-001: Identifies stellar flare contamination
Ensemble Voting
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ EXOPLANET-001 │ │ VSTAR-001 │ │ FLARE-001 │
│ Transit? │ │ Variable? │ │ Flare? │
│ P = 0.85 │ │ P = 0.12 │ │ P = 0.03 │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
│ │ │
└────────────┬────┴────────────────┘
│
▼
┌────────────────┐
│ WEIGHTED VOTE │
│ Transit: 0.78 │
│ EB: 0.15 │
│ Other: 0.07 │
└────────────────┘
Stage 5: Vetting & FPP
False Positive Probability
Not every transit-like signal is a planet. LARUN calculates the probability that the signal is caused by something else.
False Positive Scenarios
| Scenario | Description | Distinguishing Features |
|---|---|---|
| EB | Eclipsing binary | Deep eclipses, secondary eclipse, odd-even differences |
| BEB | Background eclipsing binary | Centroid shift during eclipse, color-dependent depth |
| HEB | Hierarchical triple | RV variations, diluted eclipses |
| NEB | Nearby eclipsing binary | Contamination from nearby star |
Vetting Tests
- Secondary Eclipse: Check for brightness increase at phase 0.5
- Odd-Even Test: Compare depths of alternating transits
- Centroid Analysis: Does the photocenter shift during transit?
- Depth vs. Aperture: Does depth change with aperture size?
- Ephemeris Match: Check against known eclipsing binary catalogs
Stage 6: Characterization & Reporting
Transit Model Fitting
For validated candidates, we fit a detailed transit model to extract planetary parameters.
Derived Parameters
| Parameter | Derivation | Accuracy |
|---|---|---|
| Period | BLS + transit timing | ~0.001% |
| T0 | Transit center fitting | ~1 minute |
| Rp/Rs | Transit depth | ~5% |
| a/Rs | Transit duration + stellar density | ~10% |
| i | Impact parameter fitting | ~2° |
| Rp | Rp/Rs × stellar radius | ~10% |
Report Contents
- Target summary and stellar parameters
- Light curve plots (raw, detrended, phase-folded)
- BLS periodogram with detected period
- Transit model fit and residuals
- False positive probability breakdown
- Planet parameter estimates
- Habitable zone assessment
- Recommendations for follow-up
Model Training & Deployment
Automated Training Pipeline
LARUN models are continuously improved through automated training on new data.
┌─────────────────────────────────────────────────────────────────┐
│ TRAINING ORCHESTRATOR │
│ Schedule │ Data Fetcher │ Trainer │ Validator │ Publisher │
└─────────────────────────────────────────────────────────────────┘
│ │ │ │ │
┌────┴────┐ ┌────┴────┐ ┌─────┴─────┐ ┌───┴───┐ ┌────┴────┐
│ Weekly │ │ NASA │ │ GPU/CPU │ │ Test │ │ GitHub │
│ CRON │ │ MAST │ │ Training │ │ Suite │ │ Release │
└─────────┘ └─────────┘ └───────────┘ └───────┘ └─────────┘
Quality Gates
- Accuracy: Must exceed 80% on test set
- Size: Must be under 100KB after quantization
- Inference: Must complete in under 50ms
- Regression: No performance drop on validation set
Deployment Strategy
Models are deployed using a blue-green strategy for zero-downtime updates:
- New model deployed to "blue" slot
- Traffic gradually shifted from "green" to "blue"
- If errors detected, instant rollback to "green"
- On success, "blue" becomes primary
Ready to Discover Exoplanets?
Now that you understand how LARUN works, try it yourself!