Getting Machine Learning Models to Production

title: "Getting Machine Learning Models to Production" date: "2026-01-20" author: "Xephyr Team" categories: ["Machine Learning", "MLOps"] excerpt: "90% of ML models never reach production. Here's how to be in the 10% that do."

There's a persistent gap between the machine learning that happens in notebooks and the machine learning that actually runs in production. Studies vary, but the numbers are consistent: most models never make it. They get stuck in review, stall on infrastructure, or simply never get prioritised once the exciting experiment phase is over.

Getting a model to production isn't a technical problem — it's an organisational and operational one. The technical challenges are solvable. The rest requires deliberate process.

Why Models Stall Before Production

The reasons are almost always the same:

No clear owner: The data scientist built the model; no one agreed to run it. Ownership gaps kill more ML projects than bad accuracy.
Infrastructure not ready: The serving infrastructure, feature store, and monitoring don't exist yet. Each dependency becomes a blocker.
No rollback plan: Nobody wants to deploy something they can't turn off. Without a rollback mechanism, risk-averse stakeholders stall deployment indefinitely.
Unclear success criteria: If no one defined what "working" looks like, no one can sign off on deployment.
Compliance and governance gaps: Who reviewed this model for bias? Does it meet data retention requirements? These questions surface late and cause delays.

The Production Readiness Checklist

Before any model ships, it should pass a production readiness review. Our checklist covers eight areas:

Model card: Document what the model does, what data it was trained on, known limitations, and intended use cases.
Performance benchmarks: Accuracy, latency, and throughput under expected load — not just on a test set.
Serving infrastructure: Containerised endpoint, health check, auto-scaling policy.
Feature pipeline: Are the features the model needs available in production? At the right latency?
Monitoring setup: Prediction distribution monitoring, input data drift detection, business metric correlation.
Rollback mechanism: Can you disable this model in under five minutes without a deployment? If not, it's not ready.
Logging: Every prediction logged with enough context to debug a complaint six months from now.
Stakeholder sign-off: Engineering, product, data, and (where relevant) legal have reviewed and approved.

This checklist adds two weeks to the first deployment. After the first few models, teams internalise it and the overhead drops to a day.

CI/CD for ML

Models need the same deployment rigour as software. That means:

Automated retraining triggered by data drift or a schedule
Automated evaluation against a held-out validation set
Promotion gates: a model only advances if it beats the current champion on defined metrics
Shadow deployment before full rollout: run the new model alongside the old one, compare outputs without affecting users

Shadow deployment is the most underused technique in ML production. It lets you validate a new model against real traffic before anyone relies on it.

Monitoring Is Not Optional

A model that isn't monitored degrades silently. The world changes, and models trained on old data produce increasingly wrong predictions. By the time a business metric drops, the root cause is months old.

Monitor at three levels:

Infrastructure: Latency, error rate, throughput — standard SRE metrics.
Data: Input feature distributions compared to training data. Sudden shifts indicate upstream data changes.
Model: Prediction distribution, confidence calibration, and where available, ground truth comparison.

Set alert thresholds. Assign owners. Review model health monthly even when nothing looks wrong.

The Uncomfortable Truth

Most ML teams invest heavily in model development and lightly in deployment. The ROI is backwards. A mediocre model that runs reliably in production delivers more value than an excellent model that never ships. Invest in the infrastructure, the process, and the monitoring. The models will follow.