Model Deployment in Machine Learning: Production Guide

Model deployment is where most ML projects die. I’ve seen data scientists build brilliant models that never leave their Jupyter notebooks. The model works perfectly in development, it just never makes it to production.

A model that sits in a notebook is worth exactly zero dollars. Deployment is what turns your ML work into business value, and it’s not that hard if you know what you’re doing.

In this tutorial, we’re deploying a customer churn model to production using FastAPI. You’ll build a REST API, handle real-time predictions, manage model versioning, and set up basic monitoring. This is what production ML actually looks like.

If you need the foundation, start with Tutorial 1: ML Fundamentals and work through the series. Make sure you’ve covered Tutorial 6: Hyperparameter Tuning because you need a trained model to deploy.

What We’re Building

We’re building a production-ready ML system with FastAPI for the REST API, model serialization to save and load trained models, input validation to ensure predictions get valid data, prediction logging to track what the model predicts, health checks to monitor if the service is running, and a Docker container to package everything for deployment.

By the end, you’ll have a production API that handles real prediction requests.

Why FastAPI for ML Deployment

FastAPI is the best choice for ML APIs. It’s fast, built on modern Python with Starlette and Pydantic. The automatic documentation through OpenAPI and Swagger UI is generated without any extra work. Pydantic validates inputs at runtime with type checking. The syntax is clean and works like Flask (if you know that framework.) And it’s production-ready; used by Netflix, Microsoft, and Uber.

Other options exist like Flask, Django, and Tornado, but they do more than just APIs and are therefore larger, bulkier code. FastAPI is purpose-built for this.

Project Structure

Let’s set up a proper production structure:

# Create project directory
mkdir churn-api
cd churn-api

# Create subdirectories
mkdir -p models app tests

# Your structure should look like this:
churn-api/
├── models/
│   └── churn_model.pkl
├── app/
│   ├── __init__.py
│   ├── main.py
│   ├── schemas.py
│   └── predict.py
├── tests/
│   └── test_api.py
├── requirements.txt
├── Dockerfile
└── README.md

This structure separates concerns: models, application code, and tests.

Step 1: Save Your Trained Model

First, we need to serialize the model we trained in previous tutorials:

import joblib
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Load and prepare data (from Tutorial 2)
df = pd.read_csv('customer_churn_preprocessed.csv')
X = df.drop(['customer_id', 'churned'], axis=1)
y = df['churned']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train model (or load your tuned model from Tutorial 6)
model = RandomForestClassifier(
    n_estimators=200,
    max_depth=20,
    min_samples_split=5,
    random_state=42
)
model.fit(X_train, y_train)

# Save model and feature names
model_data = {
    'model': model,
    'feature_names': list(X.columns),
    'model_version': '1.0.0',
    'training_date': '2025-11-10'
}

joblib.dump(model_data, 'models/churn_model.pkl')
print("Model saved to models/churn_model.pkl")
print(f"Model expects {len(X.columns)} features: {list(X.columns)}")

Why save feature names? Production data might have columns in different order. Feature names ensure we use the right columns.

Step 2: Define Request and Response Schemas

Pydantic schemas validate incoming requests and structure responses. Create app/schemas.py:

from pydantic import BaseModel, Field, validator
from typing import Optional

class CustomerFeatures(BaseModel):
    """Input schema for customer churn prediction"""
    
    # Customer demographics
    age: int = Field(..., ge=18, le=100, description="Customer age")
    tenure_months: int = Field(..., ge=0, description="Months as customer")
    
    # Usage metrics
    monthly_charges: float = Field(..., gt=0, description="Monthly charges in USD")
    total_charges: float = Field(..., ge=0, description="Total charges in USD")
    
    # Service features
    internet_service: str = Field(..., description="Internet service type")
    contract_type: str = Field(..., description="Contract type")
    payment_method: str = Field(..., description="Payment method")
    
    # Engagement metrics
    support_tickets: int = Field(..., ge=0, description="Number of support tickets")
    late_payments: int = Field(..., ge=0, description="Number of late payments")
    
    @validator('internet_service')
    def validate_internet_service(cls, v):
        valid_values = ['DSL', 'Fiber optic', 'No']
        if v not in valid_values:
            raise ValueError(f'internet_service must be one of {valid_values}')
        return v
    
    @validator('contract_type')
    def validate_contract(cls, v):
        valid_values = ['Month-to-month', 'One year', 'Two year']
        if v not in valid_values:
            raise ValueError(f'contract_type must be one of {valid_values}')
        return v
    
    class Config:
        schema_extra = {
            "example": {
                "age": 45,
                "tenure_months": 24,
                "monthly_charges": 75.50,
                "total_charges": 1812.00,
                "internet_service": "Fiber optic",
                "contract_type": "Month-to-month",
                "payment_method": "Electronic check",
                "support_tickets": 2,
                "late_payments": 0
            }
        }

class PredictionResponse(BaseModel):
    """Output schema for churn prediction"""
    
    customer_id: Optional[str] = None
    churn_probability: float = Field(..., ge=0, le=1, description="Probability of churn")
    churn_prediction: bool = Field(..., description="Will customer churn?")
    risk_level: str = Field(..., description="Risk level: Low, Medium, High")
    model_version: str = Field(..., description="Model version used")
    
    class Config:
        schema_extra = {
            "example": {
                "customer_id": "CUST-12345",
                "churn_probability": 0.73,
                "churn_prediction": True,
                "risk_level": "High",
                "model_version": "1.0.0"
            }
        }

class HealthResponse(BaseModel):
    """Health check response"""
    status: str
    model_loaded: bool
    model_version: str

These schemas provide automatic validation and documentation.

Step 3: Build the Prediction Service

Create app/predict.py to handle model loading and predictions:

import joblib
import pandas as pd
import numpy as np
from pathlib import Path
from typing import Dict, List

class ChurnPredictor:
    """Handles model loading and predictions"""
    
    def __init__(self, model_path: str = "models/churn_model.pkl"):
        self.model_path = Path(model_path)
        self.model_data = None
        self.model = None
        self.feature_names = None
        self.model_version = None
        self.load_model()
    
    def load_model(self):
        """Load the trained model"""
        if not self.model_path.exists():
            raise FileNotFoundError(f"Model not found at {self.model_path}")
        
        self.model_data = joblib.load(self.model_path)
        self.model = self.model_data['model']
        self.feature_names = self.model_data['feature_names']
        self.model_version = self.model_data.get('model_version', 'unknown')
        
        print(f"Model loaded: version {self.model_version}")
        print(f"Expected features: {len(self.feature_names)}")
    
    def preprocess_input(self, features: Dict) -> pd.DataFrame:
        """Convert input features to model format"""
        # Create DataFrame with correct feature order
        df = pd.DataFrame([features])
        
        # Ensure we have all required features
        missing_features = set(self.feature_names) - set(df.columns)
        if missing_features:
            raise ValueError(f"Missing required features: {missing_features}")
        
        # Reorder columns to match training
        df = df[self.feature_names]
        
        return df
    
    def predict(self, features: Dict) -> Dict:
        """Make prediction for a single customer"""
        # Preprocess
        X = self.preprocess_input(features)
        
        # Predict probability
        churn_proba = self.model.predict_proba(X)[0][1]
        
        # Binary prediction (threshold 0.5)
        churn_pred = churn_proba >= 0.5
        
        # Risk level
        if churn_proba >= 0.7:
            risk_level = "High"
        elif churn_proba >= 0.4:
            risk_level = "Medium"
        else:
            risk_level = "Low"
        
        return {
            "churn_probability": float(churn_proba),
            "churn_prediction": bool(churn_pred),
            "risk_level": risk_level,
            "model_version": self.model_version
        }
    
    def predict_batch(self, features_list: List[Dict]) -> List[Dict]:
        """Make predictions for multiple customers"""
        return [self.predict(features) for features in features_list]

This separates prediction logic from the API layer.

Step 4: Create the FastAPI Application

Create app/main.py with logging, error handling, and documentation:

from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import JSONResponse
import uvicorn
import logging
import time
from datetime import datetime

from app.schemas import CustomerFeatures, PredictionResponse, HealthResponse
from app.predict import ChurnPredictor

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize FastAPI app
app = FastAPI(
    title="Customer Churn Prediction API",
    description="REST API for predicting customer churn",
    version="1.0.0"
)

# Load model on startup
predictor = None

@app.on_event("startup")
async def startup_event():
    """Load model when API starts"""
    global predictor
    try:
        predictor = ChurnPredictor()
        logger.info("Model loaded successfully")
    except Exception as e:
        logger.error(f"Failed to load model: {e}")
        raise

@app.middleware("http")
async def log_requests(request: Request, call_next):
    """Log all requests"""
    start_time = time.time()
    response = await call_next(request)
    process_time = time.time() - start_time
    
    logger.info(
        f"{request.method} {request.url.path} "
        f"completed in {process_time:.3f}s with status {response.status_code}"
    )
    
    return response

@app.get("/", tags=["Root"])
async def root():
    """Root endpoint"""
    return {
        "message": "Customer Churn Prediction API",
        "version": "1.0.0",
        "docs": "/docs"
    }

@app.get("/health", response_model=HealthResponse, tags=["Health"])
async def health_check():
    """Health check endpoint"""
    return {
        "status": "healthy" if predictor else "unhealthy",
        "model_loaded": predictor is not None,
        "model_version": predictor.model_version if predictor else "unknown"
    }

@app.post("/predict", response_model=PredictionResponse, tags=["Predictions"])
async def predict_churn(
    features: CustomerFeatures,
    customer_id: str = None
):
    """
    Predict customer churn probability
    
    Returns churn probability, binary prediction, and risk level
    """
    if not predictor:
        raise HTTPException(status_code=503, detail="Model not loaded")
    
    try:
        # Convert Pydantic model to dict
        features_dict = features.dict()
        
        # Make prediction
        prediction = predictor.predict(features_dict)
        
        # Add customer_id if provided
        prediction['customer_id'] = customer_id
        
        # Log prediction
        logger.info(
            f"Prediction for customer {customer_id}: "
            f"churn_prob={prediction['churn_probability']:.3f}, "
            f"risk={prediction['risk_level']}"
        )
        
        return prediction
    
    except Exception as e:
        logger.error(f"Prediction error: {e}")
        raise HTTPException(status_code=400, detail=str(e))

@app.post("/predict/batch", tags=["Predictions"])
async def predict_churn_batch(features_list: list[CustomerFeatures]):
    """
    Predict churn for multiple customers
    
    Returns predictions for all customers in the batch
    """
    if not predictor:
        raise HTTPException(status_code=503, detail="Model not loaded")
    
    try:
        # Convert to list of dicts
        features_dicts = [f.dict() for f in features_list]
        
        # Make predictions
        predictions = predictor.predict_batch(features_dicts)
        
        logger.info(f"Batch prediction completed for {len(predictions)} customers")
        
        return {"predictions": predictions, "count": len(predictions)}
    
    except Exception as e:
        logger.error(f"Batch prediction error: {e}")
        raise HTTPException(status_code=400, detail=str(e))

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

This is a production-ready API with logging, error handling, and documentation.

Step 5: Create Requirements File

Create requirements.txt with all necessary dependencies:

fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pandas==2.1.3
scikit-learn==1.3.2
joblib==1.3.2
numpy==1.26.2

Step 6: Test the API Locally

Install dependencies and run the API:

# Install dependencies
pip install -r requirements.txt

# Run the API
python -m uvicorn app.main:app --reload

# API will be available at:
# http://localhost:8000
# Documentation at: http://localhost:8000/docs

Test with curl:

# Health check
curl http://localhost:8000/health

# Make a prediction
curl -X POST http://localhost:8000/predict 
  -H "Content-Type: application/json" 
  -d '{
    "age": 45,
    "tenure_months": 24,
    "monthly_charges": 75.50,
    "total_charges": 1812.00,
    "internet_service": "Fiber optic",
    "contract_type": "Month-to-month",
    "payment_method": "Electronic check",
    "support_tickets": 2,
    "late_payments": 0
  }'

The output will show the prediction:

{
  "customer_id": null,
  "churn_probability": 0.73,
  "churn_prediction": true,
  "risk_level": "High",
  "model_version": "1.0.0"
}

Your model is now serving predictions via HTTP.

Step 7: Add Prediction Logging

Track predictions for monitoring. Tutorial 8 will use this data. Add to app/predict.py:

import json
from datetime import datetime
from pathlib import Path

class ChurnPredictor:
    def __init__(self, model_path: str = "models/churn_model.pkl"):
        self.model_path = Path(model_path)
        self.log_path = Path("logs/predictions.jsonl")
        self.log_path.parent.mkdir(exist_ok=True)
        # ... existing code ...
    
    def log_prediction(self, features: Dict, prediction: Dict):
        """Log prediction for monitoring"""
        log_entry = {
            "timestamp": datetime.utcnow().isoformat(),
            "features": features,
            "prediction": prediction
        }
        
        with open(self.log_path, 'a') as f:
            f.write(json.dumps(log_entry) + 'n')
    
    def predict(self, features: Dict) -> Dict:
        """Make prediction for a single customer"""
        X = self.preprocess_input(features)
        churn_proba = self.model.predict_proba(X)[0][1]
        churn_pred = churn_proba >= 0.5
        
        if churn_proba >= 0.7:
            risk_level = "High"
        elif churn_proba >= 0.4:
            risk_level = "Medium"
        else:
            risk_level = "Low"
        
        prediction = {
            "churn_probability": float(churn_proba),
            "churn_prediction": bool(churn_pred),
            "risk_level": risk_level,
            "model_version": self.model_version
        }
        
        # Log the prediction
        self.log_prediction(features, prediction)
        
        return prediction

Every prediction is now logged with timestamp and inputs.

Step 8: Dockerize the Application

Create Dockerfile to containerize everything:

FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY app/ ./app/
COPY models/ ./models/

# Create logs directory
RUN mkdir -p logs

# Expose port
EXPOSE 8000

# Run the application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Build and run the container:

# Build Docker image
docker build -t churn-api:1.0.0 .

# Run container
docker run -p 8000:8000 churn-api:1.0.0

# Test
curl http://localhost:8000/health

Your API is now containerized and ready for deployment.

Step 9: Write API Tests

Create tests/test_api.py to verify everything works:

from fastapi.testclient import TestClient
from app.main import app

client = TestClient(app)

def test_root():
    """Test root endpoint"""
    response = client.get("/")
    assert response.status_code == 200
    assert "message" in response.json()

def test_health():
    """Test health check"""
    response = client.get("/health")
    assert response.status_code == 200
    data = response.json()
    assert data["status"] == "healthy"
    assert data["model_loaded"] == True

def test_predict():
    """Test single prediction"""
    payload = {
        "age": 45,
        "tenure_months": 24,
        "monthly_charges": 75.50,
        "total_charges": 1812.00,
        "internet_service": "Fiber optic",
        "contract_type": "Month-to-month",
        "payment_method": "Electronic check",
        "support_tickets": 2,
        "late_payments": 0
    }
    
    response = client.post("/predict", json=payload)
    assert response.status_code == 200
    data = response.json()
    assert "churn_probability" in data
    assert 0 <= data["churn_probability"] <= 1
    assert data["risk_level"] in ["Low", "Medium", "High"]

def test_predict_invalid_input():
    """Test prediction with invalid input"""
    payload = {
        "age": -5,  # Invalid age
        "tenure_months": 24
    }
    
    response = client.post("/predict", json=payload)
    assert response.status_code == 422  # Validation error

def test_batch_predict():
    """Test batch prediction"""
    payload = [
        {
            "age": 45,
            "tenure_months": 24,
            "monthly_charges": 75.50,
            "total_charges": 1812.00,
            "internet_service": "Fiber optic",
            "contract_type": "Month-to-month",
            "payment_method": "Electronic check",
            "support_tickets": 2,
            "late_payments": 0
        },
        {
            "age": 32,
            "tenure_months": 60,
            "monthly_charges": 55.20,
            "total_charges": 3312.00,
            "internet_service": "DSL",
            "contract_type": "Two year",
            "payment_method": "Bank transfer",
            "support_tickets": 0,
            "late_payments": 0
        }
    ]
    
    response = client.post("/predict/batch", json=payload)
    assert response.status_code == 200
    data = response.json()
    assert data["count"] == 2
    assert len(data["predictions"]) == 2

Run tests with pytest:

pytest tests/test_api.py -v

Deployment Options

You have several options for deploying your API. For cloud providers, AWS offers Elastic Beanstalk or ECS where you upload your Docker container to ECR and deploy to ECS Fargate or Elastic Beanstalk for around $20-50 per month for small workloads. Google Cloud Run lets you push your Docker image to GCR and deploy to their serverless platform, paying per request which is very cheap for low traffic. Azure Container Instances work similarly where you push to Azure Container Registry and deploy to ACI for costs similar to AWS.

If you want full control, you can run your Docker container on your own server using DigitalOcean Droplets or Linode for $6-12 per month.

For production ML, I recommend AWS ECS because it handles scale extremely well.

The Bottom Line

Model deployment in machine learning is what turns your work into business value. A brilliant model that sits in a notebook is worthless. Deployment is where ML becomes real.

FastAPI makes deployment straightforward. Build an API, add validation, log predictions, containerize with Docker, and deploy to the cloud. The technical part isn’t hard. The hard part is doing it at all.

A simple deployed model beats a perfect model in a notebook, every time. Ship it.

Model Deployment in Machine Learning: Getting to Production