Data Science Judgment Lab

Train data judgment like an investigation.

This is a 25-case room for practicing the hard part of data science: deciding what messy evidence can support, what it cannot support, and how confident you should be when pressure is in the room.

Start case intake Find a case Choose a pathway

Purpose

Judgment, not trivia

The cases surface the habits senior analysts need: metric humility, causal caution, operational risk, fairness burden, and uncertainty communication.

Method

Productive struggle

You work the evidence before the explanation appears. The struggle is the point: tempting wrong stories become visible enough to learn from.

Record

Calibration over score

Completion creates a local judgment record after reviewed cases. It is a learning artifact, not a proctored credential.

Case library

Evidence docket

Start

Case 01Unopened

The Dashboard Spike

A launch-week chart jumps, a deck is due, and several teams have reasons to claim the movement.

Product analytics8 min

chartaudiolog

Case 02Unopened

The Checkout Readout

A checkout readout lands just before planning closes, and different artifacts point toward different launch stories.

Experimentation10 min

chartaudiotable

Case 03Unopened

The Churn Model Pitch

A polished retention model arrives with a renewal deadline, a crowded outreach queue, and a promise that the save team can act sooner.

ML evaluation12 min

model-outputtablememo

Case 04Unopened

The Inspection Queue

A city inspection team has a new routing screen, a long backlog, and one week to decide how much authority the score should have.

Public policy analytics12 min

mappolicytable

Case 05Unopened

The Spring Tutoring Brief

A district impact brief is headed to a funding vote after students who used a tutoring platform show stronger spring gains.

Education analytics11 min

press-releasecharttable

Case 06Unopened

The Winter Shelter Forecast

A city housing office must set winter overflow capacity from a forecast that fits ordinary nights better than pressure weeks.

Public service forecasting12 min

charttimelinetable

Case 07Unopened

The Benefits Queue Score

A state benefits agency wants to use a verification score to cut backlog, but the burden may land unevenly on applicants with messier administrative records.

Government benefits analytics12 min

model-outputpolicytable

Case 08Unopened

The Claimant Chatbot

A benefits agency chatbot demos smoothly before a filing surge, and the launch packet must decide what evidence is enough for live claimant use.

Public sector AI evaluation12 min

transcriptrubricmemo

Case 09Unopened

The Payment Hold Dial

The same benefits agency must choose a payment-hold threshold that catches fraud without turning suspicion into broad payment delay.

Government risk operations12 min

chartsimulatortable

Case 10Unopened

The Clearance Rate Metric

A modernization dashboard gets a cleaner headline metric, and leadership wants to use it as proof that service is improving.

Public administration analytics11 min

tablememochart

Case 11Unopened

The Survey Sample Mirage

A customer survey produces a clean majority for a support redesign, and tomorrow's slide asks how far that voice can carry.

Survey analytics12 min

tablechartmemo

Case 12Unopened

The Bed-Ready Field

A familiar hospital operations field powers a clean improvement story while source systems leave conflicting traces.

Healthcare operations12 min

tablememotimeline

Case 13Unopened

The Missingness Report

A clinical risk report looks steady after an analyst trims the file, and the committee wants to shift attention toward treatment timing.

Clinical analytics12 min

heatmaptablememo

Case 14Unopened

The Privacy-Safe Export

A de-identified public health export clears a checklist, but linkage, consent scope, and lifecycle controls make the release less simple.

Data governance12 min

policymemotable

Case 15Unopened

The Board Slide

A board packet turns an early operational shift into a dramatic story, and the chart frame is doing more work than it first appears.

Executive reporting9 min

chartmemotable

Case 16Unopened

The Geo Test Winner

A regional media test arrives just before a national buying window, with a confident lift estimate and a deadline to scale.

Retail media12 min

charttabletimeline

Case 17Unopened

The Parallel Trends Slide

A workforce pilot briefing shows a promising post-launch gap, and the budget office wants a statewide recommendation.

Labor policy12 min

charttabletimeline

Case 18Unopened

The Cutoff Policy Claim

A housing navigator pilot produces a sharp estimate near an eligibility line, and the agency wants to carry it into a budget request.

Benefits eligibility13 min

charttabletimeline

Case 19Unopened

The QuickStart Readout

A product experiment gets a fast no-go recommendation, but the exposure record and interval width leave more than one interpretation alive.

Product experimentation12 min

charttabletimeline

Case 20Unopened

The Short-Term Lift

A subscription checkout test lifts paid starts, but refunds, retention, and support burden make the growth claim less settled.

Subscription growth12 min

charttabletimeline

Case 21Unopened

The Discharge Score

A hospital readmission score looks unusually strong in validation, and the launch team wants it in the discharge workflow next month.

Hospital readmission13 min

charttabletimeline

Case 22Unopened

The Labeling Vendor Benchmark

A trust-and-safety model clears a vendor benchmark before peak season, and leadership wants to turn high scores into automatic action.

Trust and safety13 min

charttablememo

Case 23Unopened

The Drift Alarm Nobody Owned

An ETA model alarm appears while headline service levels still look acceptable, and no team is eager to slow the workflow.

Logistics ETA13 min

charttabletimeline

Case 24Unopened

The Holiday Override

A holiday replenishment model reports strong backtest accuracy, and operations wants to let it override store planners next week.

Retail supply chain13 min

charttabletimeline

Case 25Unopened

The DealDesk Pilot

An enterprise assistant performs well on clean sales workflows, and revenue operations wants tool-enabled expansion before renewal season.

Enterprise AI13 min

charttablememo

Submitted0

Score0

Calibration0

High-conf misses0

Scores stay in this browser. The profile gets useful after several completed cases.

Case sets

Judgment areas

Evidence and MetricsInterpret metrics as constructed evidence, not objective facts.

0/2

Experiments and CausalitySeparate causal evidence from tempting claims.

0/2

Models in the Real WorldEvaluate models as operational decisions, not leaderboard scores.

0/4

Uncertainty and DecisionsMake defensible calls with incomplete or biased evidence.

0/2

Data Provenance and Measurement IntegrityCheck whether fields, samples, visual frames, and releases are trustworthy enough to reason from.

0/5

Causal Designs Beyond the A/B TestJudge causal claims under spillover, timing, power, threshold, and lifecycle complications.

0/5

Operational Models and AI RiskDecide when model and AI performance claims survive contact with real workflows.

0/5