The Complete Pipeline

1. Data Acquisition

Fetch light curves from NASA MAST archive

2. Preprocessing

Clean, normalize, and detrend data

3. BLS Analysis

Search for periodic transit signals

4. TinyML Detection

Neural network confirms signals

5. Vetting

Calculate false positive probability

6. Report

Generate analysis summary

Stage 1: Data Acquisition

┌─────────────────────────────────────────────────────────────┐
│                       USER REQUEST                           │
│              "Analyze TIC 307210830"                         │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    TARGET RESOLVER                           │
│     TIC ID → Coordinates → Cross-match with catalogs        │
└─────────────────────────────────────────────────────────────┘
                            │
            ┌───────────────┼───────────────┐
            │               │               │
            ▼               ▼               ▼
     ┌──────────┐    ┌──────────┐    ┌──────────┐
     │   TESS   │    │  Kepler  │    │   Gaia   │
     │   MAST   │    │   MAST   │    │  DR3     │
     └──────────┘    └──────────┘    └──────────┘
            │               │               │
            └───────────────┼───────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    LIGHT CURVE DATA                          │
│        Time (BJD) │ Flux │ Flux Error │ Quality Flags       │
└─────────────────────────────────────────────────────────────┘
            

What Happens

  • Parse user request to extract target identifier
  • Resolve TIC/KIC ID to celestial coordinates
  • Query NASA MAST archive for available observations
  • Download FITS files with light curve data
  • Fetch stellar parameters from Gaia DR3

Data Sources

Source Data Type Cadence
TESS FFI Full-frame images 30 min / 10 min
TESS 2-min Target pixel files 2 minutes
TESS 20-sec Fast cadence 20 seconds
Kepler Long/short cadence 30 min / 1 min

Stage 2: Preprocessing

Steps

  1. Quality Filtering: Remove data points flagged for spacecraft anomalies, cosmic rays, or other issues
  2. Outlier Removal: Sigma-clipping to remove statistical outliers (typically 5σ)
  3. Gap Filling: Handle missing data from Earth occultations, momentum dumps
  4. Normalization: Scale flux to median = 1.0 for consistent analysis
  5. Detrending: Remove long-term stellar variability using polynomial or spline fits

Detrending Methods

  • Polynomial: Fast, good for simple trends
  • Spline: Flexible, handles complex variability
  • Gaussian Process: Best for active stars, computationally expensive
  • Cofiam: Cotrending Basis Vectors from Kepler

Before Preprocessing

Raw light curve with systematics, outliers, and stellar variability

After Preprocessing

Clean, normalized light curve ready for transit search

Stage 3: BLS Analysis

Box Least Squares (BLS)

BLS is the gold standard algorithm for finding periodic transit signals. It searches for box-shaped dips in the light curve by testing thousands of possible periods.

How It Works

  1. Define period search range (0.5 - 15 days typical)
  2. For each trial period, phase-fold the light curve
  3. Fit a box-shaped transit model
  4. Calculate Signal Detection Efficiency (SDE)
  5. Peaks in SDE indicate potential transit periods

Key Parameters

Parameter Description Typical Value
period_min Minimum search period 0.5 days
period_max Maximum search period 15 days
duration_grid Transit duration range 0.01 - 0.15 (phase)
frequency_factor Oversampling factor 5-10

Stage 4: TinyML Detection

Neural Network Confirmation

After BLS identifies potential signals, our TinyML models provide independent confirmation. This two-stage approach reduces false positives.

Model Federation

Multiple specialized models vote on each candidate:

  • EXOPLANET-001: Transit shape classifier
  • VSTAR-001: Rules out eclipsing binaries
  • FLARE-001: Identifies stellar flare contamination

Ensemble Voting

┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ EXOPLANET-001 │ │   VSTAR-001   │ │  FLARE-001    │
│    Transit?   │ │   Variable?   │ │    Flare?     │
│   P = 0.85    │ │   P = 0.12    │ │   P = 0.03    │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
        │                 │                 │
        └────────────┬────┴────────────────┘
                     │
                     ▼
            ┌────────────────┐
            │ WEIGHTED VOTE  │
            │  Transit: 0.78 │
            │  EB: 0.15      │
            │  Other: 0.07   │
            └────────────────┘
              

Stage 5: Vetting & FPP

False Positive Probability

Not every transit-like signal is a planet. LARUN calculates the probability that the signal is caused by something else.

False Positive Scenarios

Scenario Description Distinguishing Features
EB Eclipsing binary Deep eclipses, secondary eclipse, odd-even differences
BEB Background eclipsing binary Centroid shift during eclipse, color-dependent depth
HEB Hierarchical triple RV variations, diluted eclipses
NEB Nearby eclipsing binary Contamination from nearby star

Vetting Tests

  • Secondary Eclipse: Check for brightness increase at phase 0.5
  • Odd-Even Test: Compare depths of alternating transits
  • Centroid Analysis: Does the photocenter shift during transit?
  • Depth vs. Aperture: Does depth change with aperture size?
  • Ephemeris Match: Check against known eclipsing binary catalogs

Stage 6: Characterization & Reporting

Transit Model Fitting

For validated candidates, we fit a detailed transit model to extract planetary parameters.

Derived Parameters

Parameter Derivation Accuracy
Period BLS + transit timing ~0.001%
T0 Transit center fitting ~1 minute
Rp/Rs Transit depth ~5%
a/Rs Transit duration + stellar density ~10%
i Impact parameter fitting ~2°
Rp Rp/Rs × stellar radius ~10%

Report Contents

  • Target summary and stellar parameters
  • Light curve plots (raw, detrended, phase-folded)
  • BLS periodogram with detected period
  • Transit model fit and residuals
  • False positive probability breakdown
  • Planet parameter estimates
  • Habitable zone assessment
  • Recommendations for follow-up

Model Training & Deployment

Automated Training Pipeline

LARUN models are continuously improved through automated training on new data.

┌─────────────────────────────────────────────────────────────────┐
│                    TRAINING ORCHESTRATOR                         │
│  Schedule │ Data Fetcher │ Trainer │ Validator │ Publisher      │
└─────────────────────────────────────────────────────────────────┘
        │           │            │           │           │
   ┌────┴────┐ ┌────┴────┐ ┌─────┴─────┐ ┌───┴───┐ ┌────┴────┐
   │ Weekly  │ │  NASA   │ │ GPU/CPU   │ │ Test  │ │ GitHub  │
   │ CRON    │ │  MAST   │ │ Training  │ │ Suite │ │ Release │
   └─────────┘ └─────────┘ └───────────┘ └───────┘ └─────────┘
              

Quality Gates

  • Accuracy: Must exceed 80% on test set
  • Size: Must be under 100KB after quantization
  • Inference: Must complete in under 50ms
  • Regression: No performance drop on validation set

Deployment Strategy

Models are deployed using a blue-green strategy for zero-downtime updates:

  1. New model deployed to "blue" slot
  2. Traffic gradually shifted from "green" to "blue"
  3. If errors detected, instant rollback to "green"
  4. On success, "blue" becomes primary

Ready to Discover Exoplanets?

Now that you understand how LARUN works, try it yourself!