Formula 1 Competitive Performance Analytics

Scenario

As part of a strategic initiative, Formula 1 seeks to deepen competitive analysis and improve storytelling for fans, broadcasters, and partners. By standardizing official timing, telemetry, and weather into analytics-ready datasets, we aim to provide a 360 degree view of a Grand Prix weekend — covering car pace, tire behavior, pit-stop execution, qualifying performance, and strategy outcomes.

Objectives

Track sustainable race pace by tire compound and stint length across the grid
Quantify tire degradation and effective pit-lane time loss to explain strategy choices
Evaluate undercut/overcut and the advantages of pitting under SC/VSC conditions
Benchmark teammates like-for-like to separate driver execution from car pace
Highlight qualifying execution via sector-based “ideal lap” vs actual best lap
Deliver concise visuals and a one-page insights brief suitable for broadcast and digital

This initiative supports F1’s broader goals of enriching fan understanding, enabling data-driven storytelling, and enhancing the transparency and competitiveness of the sport.

Data Structure

Architecture: Landing (raw) → Staging (clean, dedupe, units) → Warehouse (star schema) → BI (Power BI semantic model).
Facts & grain:
- F_Orders = 1 row per order–model–customer–order_date;
- F_Deliveries = 1 row per delivered aircraft–delivery_date;
- F_Backlog_Snapshot = model–customer–month_end;
- F_MarketTraffic_Snapshot = region/country–year. Role-playing dates via order_date_key, delivery_date_key, snapshot_date_key.
Conformed dimensions: D_Date, D_Model (family, variant, seats, range_nm, MTOW, cabin_key), D_Customer (name, group, status, geo_key), D_Geography (country, region), D_Engine (maker, model, thrust), D_Cabin (cabin_type). Optional bridges: Model↔Engine, Customer↔Region for many-to-many.
Keys, SCD, quality: Integer surrogate keys + preserved business keys; SCD2 on Customer (and Model if needed); FK constraints; standard codes (IATA/ICAO, ISO-country); alias mapping tables; incremental loads by date_key, late-arriving handling, unit normalization.
Metrics & semantics: Measures for Orders, Deliveries, Backlog Units, YoY/MoM, CAGR, Fleet Growth; Date intelligence (DAX) from D_Date; yearly partitions; data dictionary + lineage; refresh cadence daily/weekly.

Business Questions Solved

Who sustains the fastest median race pace by tire compound on green-flag laps?
What are the per-stint degradation slopes (s/lap) by driver and compound?
Undercut vs overcut: within ±2s battles, which strategy gains more after 3 laps?
What is the effective pit-lane time loss, and which teams execute most consistently?
How much time is saved by pitting under SC/VSC vs green-flag stops?
In qualifying, how much time did drivers leave vs the ideal lap (sum of best sectors)?
Teammate like-for-like delta: average gap in the first 10 laps of a stint on the same compound.
DRS impact: main-straight average speed with DRS on vs off on representative laps.
Track evolution: median sector changes FP1 → FP2 → FP3 → Q to inform run sequencing.
Does stop count (1/2/3) correlate with finishing position for this track archetype?

Tools & Technologies Used

Python (3.10+), VS Code, Jupyter Notebooks
FastF1 → community library interfacing with timing/telemetry; on-disk cache
pandas, NumPy → data cleaning, joins, feature engineering
Matplotlib, Plotly → broadcast-ready charts and explanatory visuals
PyArrow/Parquet → fast, reproducible columnar storage
Tableau → visualize the data
Git/GitHub → versioning and collaboration

Business Impact & Insights

Enhanced broadcast storytelling: Clear visuals and metrics (degradation curves, pit-loss distributions, ideal-lap deltas) explain strategy calls in real time and deepen fan engagement.
Transparent competitive analysis: Standardized pace and strategy KPIs enable apples-to-apples comparisons across drivers, teams, and sessions.
Faster decision support: Pre-quantified break-even points for undercut/overcut and SC/VSC windows reduce reaction time for live analysis segments and digital products.
Driver & team benchmarking: Like-for-like comparisons separate execution from car pace, informing data-driven narratives and season-long story arcs.
Operational efficiency: Cached ingestion and Parquet workflows accelerate iteration, enabling consistent, race-over-race insights and scalable content for F1’s digital platforms.

View Code & Dashboard

click for tableau dashboard