Scenario
As part of a strategic initiative, Formula 1 seeks to deepen competitive analysis and improve storytelling for fans, broadcasters, and partners. By standardizing official timing, telemetry, and weather into analytics-ready datasets, we aim to provide a 360 degree view of a Grand Prix weekend — covering car pace, tire behavior, pit-stop execution, qualifying performance, and strategy outcomes.
Objectives
- Track sustainable race pace by tire compound and stint length across the grid
- Quantify tire degradation and effective pit-lane time loss to explain strategy choices
- Evaluate undercut/overcut and the advantages of pitting under SC/VSC conditions
- Benchmark teammates like-for-like to separate driver execution from car pace
- Highlight qualifying execution via sector-based “ideal lap” vs actual best lap
- Deliver concise visuals and a one-page insights brief suitable for broadcast and digital
This initiative supports F1’s broader goals of enriching fan understanding, enabling data-driven storytelling, and enhancing the transparency and competitiveness of the sport.
Data Structure
- Architecture: Landing (raw) → Staging (clean, dedupe, units) → Warehouse (star schema) → BI (Power BI semantic model).
- Facts & grain:
F_Orders= 1 row per order–model–customer–order_date;F_Deliveries= 1 row per delivered aircraft–delivery_date;F_Backlog_Snapshot= model–customer–month_end;F_MarketTraffic_Snapshot= region/country–year. Role-playing dates viaorder_date_key,delivery_date_key,snapshot_date_key.
- Conformed dimensions:
D_Date,D_Model(family, variant, seats, range_nm, MTOW, cabin_key),D_Customer(name, group, status, geo_key),D_Geography(country, region),D_Engine(maker, model, thrust),D_Cabin(cabin_type). Optional bridges: Model↔Engine, Customer↔Region for many-to-many. - Keys, SCD, quality: Integer surrogate keys + preserved business keys; SCD2 on Customer (and Model if needed); FK constraints; standard codes (IATA/ICAO, ISO-country); alias mapping tables; incremental loads by
date_key, late-arriving handling, unit normalization. - Metrics & semantics: Measures for Orders, Deliveries, Backlog Units, YoY/MoM, CAGR, Fleet Growth; Date intelligence (DAX) from
D_Date; yearly partitions; data dictionary + lineage; refresh cadence daily/weekly.
Business Questions Solved
- Who sustains the fastest median race pace by tire compound on green-flag laps?
- What are the per-stint degradation slopes (s/lap) by driver and compound?
- Undercut vs overcut: within ±2s battles, which strategy gains more after 3 laps?
- What is the effective pit-lane time loss, and which teams execute most consistently?
- How much time is saved by pitting under SC/VSC vs green-flag stops?
- In qualifying, how much time did drivers leave vs the ideal lap (sum of best sectors)?
- Teammate like-for-like delta: average gap in the first 10 laps of a stint on the same compound.
- DRS impact: main-straight average speed with DRS on vs off on representative laps.
- Track evolution: median sector changes FP1 → FP2 → FP3 → Q to inform run sequencing.
- Does stop count (1/2/3) correlate with finishing position for this track archetype?
Tools & Technologies Used
- Python (3.10+), VS Code, Jupyter Notebooks
- FastF1 → community library interfacing with timing/telemetry; on-disk cache
- pandas, NumPy → data cleaning, joins, feature engineering
- Matplotlib, Plotly → broadcast-ready charts and explanatory visuals
- PyArrow/Parquet → fast, reproducible columnar storage
- Tableau → visualize the data
- Git/GitHub → versioning and collaboration
Business Impact & Insights
- Enhanced broadcast storytelling: Clear visuals and metrics (degradation curves, pit-loss distributions, ideal-lap deltas) explain strategy calls in real time and deepen fan engagement.
- Transparent competitive analysis: Standardized pace and strategy KPIs enable apples-to-apples comparisons across drivers, teams, and sessions.
- Faster decision support: Pre-quantified break-even points for undercut/overcut and SC/VSC windows reduce reaction time for live analysis segments and digital products.
- Driver & team benchmarking: Like-for-like comparisons separate execution from car pace, informing data-driven narratives and season-long story arcs.
- Operational efficiency: Cached ingestion and Parquet workflows accelerate iteration, enabling consistent, race-over-race insights and scalable content for F1’s digital platforms.



Leave a comment