MLB DFS Matchup Engine
MLB matchup analysis and DFS scoring pipeline powered by Savant, FanGraphs, and DraftKings data
$ apt list --installed
An advanced MLB matchup analysis and DFS prediction pipeline that scrapes player statistics from three data sources, runs a 19-dimension scoring model, and outputs multi-sheet Excel workbooks with composite batter and pitcher ratings.
What It Does
Takes DraftKings lineups and probable pitchers as input, merges them against an S3 data lake of historical and current season stats, then scores every batter-pitcher matchup across 19 dimensions — GB/FB tendencies, advanced splits (AVG, ISO, wOBA), pitch tracking, batted ball metrics, run values, and park factor adjustments. Outputs a comprehensive Excel workbook with 18+ sheets for daily analysis.
How It Works
- •Data Ingestion — Automated scraping of MLB Savant matchup pages via a Dockerized Selenium container deployed to AWS Lambda with headless Firefox. FanGraphs splits and DraftKings CSVs stored in an organized S3 data lake across 3 seasons
- •ETL Pipeline — 1,362-line Python pipeline merges DraftKings lineups with Savant and FanGraphs data, normalizing across sources and handling missing data gracefully
- •Scoring Model — 19 metrics per player including GB/FB matchup rates, advanced batting splits, pitch-level tracking data, batted ball profiles, run values, and park factor adjustments — producing composite scores per batter and pitcher
- •Output — Multi-sheet Excel workbooks (18+ sheets) with matchup breakdowns, sortable rankings, and daily-ready analysis for DFS lineup construction
- •Exploratory Analysis — Jupyter notebooks for model iteration, feature importance testing, and season-over-season trend validation
Key Insight
Most DFS tools give you a single projection number. This pipeline gives you the 19 reasons behind it — so when a projection looks wrong, you can see exactly which dimension is driving it and decide whether the model or your intuition is right. That transparency is the difference between blindly following a number and making informed lineup decisions.