How It Works

The Complete Pipeline

1. Data Acquisition

Fetch light curves from NASA MAST archive

2. Preprocessing

Clean, normalize, and detrend data

3. BLS Analysis

Search for periodic transit signals

4. TinyML Detection

Neural network confirms signals

5. Vetting

Calculate false positive probability

6. Report

Generate analysis summary

Stage 1: Data Acquisition

┌─────────────────────────────────────────────────────────────┐
│                       USER REQUEST                           │
│              "Analyze TIC 307210830"                         │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    TARGET RESOLVER                           │
│     TIC ID → Coordinates → Cross-match with catalogs        │
└─────────────────────────────────────────────────────────────┘
                            │
            ┌───────────────┼───────────────┐
            │               │               │
            ▼               ▼               ▼
     ┌──────────┐    ┌──────────┐    ┌──────────┐
     │   TESS   │    │  Kepler  │    │   Gaia   │
     │   MAST   │    │   MAST   │    │  DR3     │
     └──────────┘    └──────────┘    └──────────┘
            │               │               │
            └───────────────┼───────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                    LIGHT CURVE DATA                          │
│        Time (BJD) │ Flux │ Flux Error │ Quality Flags       │
└─────────────────────────────────────────────────────────────┘

What Happens

Parse user request to extract target identifier
Resolve TIC/KIC ID to celestial coordinates
Query NASA MAST archive for available observations
Download FITS files with light curve data
Fetch stellar parameters from Gaia DR3

Data Sources

Source	Data Type	Cadence
TESS FFI	Full-frame images	30 min / 10 min
TESS 2-min	Target pixel files	2 minutes
TESS 20-sec	Fast cadence	20 seconds
Kepler	Long/short cadence	30 min / 1 min

Stage 2: Preprocessing

Steps

Quality Filtering: Remove data points flagged for spacecraft anomalies, cosmic rays, or other issues
Outlier Removal: Sigma-clipping to remove statistical outliers (typically 5σ)
Gap Filling: Handle missing data from Earth occultations, momentum dumps
Normalization: Scale flux to median = 1.0 for consistent analysis
Detrending: Remove long-term stellar variability using polynomial or spline fits

Detrending Methods

Polynomial: Fast, good for simple trends
Spline: Flexible, handles complex variability
Gaussian Process: Best for active stars, computationally expensive
Cofiam: Cotrending Basis Vectors from Kepler

Before Preprocessing

Raw light curve with systematics, outliers, and stellar variability

After Preprocessing

Clean, normalized light curve ready for transit search

Stage 3: BLS Analysis

Box Least Squares (BLS)

BLS is the gold standard algorithm for finding periodic transit signals. It searches for box-shaped dips in the light curve by testing thousands of possible periods.

Define period search range (0.5 - 15 days typical)
For each trial period, phase-fold the light curve
Fit a box-shaped transit model
Calculate Signal Detection Efficiency (SDE)
Peaks in SDE indicate potential transit periods

Key Parameters

Parameter	Description	Typical Value
period_min	Minimum search period	0.5 days
period_max	Maximum search period	15 days
duration_grid	Transit duration range	0.01 - 0.15 (phase)
frequency_factor	Oversampling factor	5-10

Stage 4: TinyML Detection

Neural Network Confirmation

After BLS identifies potential signals, our TinyML models provide independent confirmation. This two-stage approach reduces false positives.

Model Federation

Multiple specialized models vote on each candidate:

EXOPLANET-001: Transit shape classifier
VSTAR-001: Rules out eclipsing binaries
FLARE-001: Identifies stellar flare contamination

Ensemble Voting

┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ EXOPLANET-001 │ │   VSTAR-001   │ │  FLARE-001    │
│    Transit?   │ │   Variable?   │ │    Flare?     │
│   P = 0.85    │ │   P = 0.12    │ │   P = 0.03    │
└───────┬───────┘ └───────┬───────┘ └───────┬───────┘
        │                 │                 │
        └────────────┬────┴────────────────┘
                     │
                     ▼
            ┌────────────────┐
            │ WEIGHTED VOTE  │
            │  Transit: 0.78 │
            │  EB: 0.15      │
            │  Other: 0.07   │
            └────────────────┘

Stage 5: Vetting & FPP

False Positive Probability

Not every transit-like signal is a planet. LARUN calculates the probability that the signal is caused by something else.

False Positive Scenarios

Scenario	Description	Distinguishing Features
EB	Eclipsing binary	Deep eclipses, secondary eclipse, odd-even differences
BEB	Background eclipsing binary	Centroid shift during eclipse, color-dependent depth
HEB	Hierarchical triple	RV variations, diluted eclipses
NEB	Nearby eclipsing binary	Contamination from nearby star

Vetting Tests

Secondary Eclipse: Check for brightness increase at phase 0.5
Odd-Even Test: Compare depths of alternating transits
Centroid Analysis: Does the photocenter shift during transit?
Depth vs. Aperture: Does depth change with aperture size?
Ephemeris Match: Check against known eclipsing binary catalogs

Stage 6: Characterization & Reporting

Transit Model Fitting

For validated candidates, we fit a detailed transit model to extract planetary parameters.

Derived Parameters

Parameter	Derivation	Accuracy
Period	BLS + transit timing	~0.001%
T0	Transit center fitting	~1 minute
Rp/Rs	Transit depth	~5%
a/Rs	Transit duration + stellar density	~10%
i	Impact parameter fitting	~2°
Rp	Rp/Rs × stellar radius	~10%

Report Contents

Target summary and stellar parameters
Light curve plots (raw, detrended, phase-folded)
BLS periodogram with detected period
Transit model fit and residuals
False positive probability breakdown
Planet parameter estimates
Habitable zone assessment
Recommendations for follow-up

Model Training & Deployment

Automated Training Pipeline

LARUN models are continuously improved through automated training on new data.

┌─────────────────────────────────────────────────────────────────┐
│                    TRAINING ORCHESTRATOR                         │
│  Schedule │ Data Fetcher │ Trainer │ Validator │ Publisher      │
└─────────────────────────────────────────────────────────────────┘
        │           │            │           │           │
   ┌────┴────┐ ┌────┴────┐ ┌─────┴─────┐ ┌───┴───┐ ┌────┴────┐
   │ Weekly  │ │  NASA   │ │ GPU/CPU   │ │ Test  │ │ GitHub  │
   │ CRON    │ │  MAST   │ │ Training  │ │ Suite │ │ Release │
   └─────────┘ └─────────┘ └───────────┘ └───────┘ └─────────┘

Quality Gates

Accuracy: Must exceed 80% on test set
Size: Must be under 100KB after quantization
Inference: Must complete in under 50ms
Regression: No performance drop on validation set

Deployment Strategy

Models are deployed using a blue-green strategy for zero-downtime updates:

New model deployed to "blue" slot
Traffic gradually shifted from "green" to "blue"
If errors detected, instant rollback to "green"
On success, "blue" becomes primary

How LARUN Works