Testing Guide

Overview

This guide covers testing strategies and best practices for the ML Pipelines project. Testing is essential for maintaining code quality, catching bugs early, and ensuring reliable production deployments.

Testing Philosophy

Test Pyramid

           ┌─────────────────┐
           │   E2E Tests     │  ← Few, slow, expensive
           │   (Manual)      │
           └─────────────────┘
         ┌───────────────────────┐
         │  Integration Tests    │  ← Some, medium speed
         │  (DLT, Model, API)    │
         └───────────────────────┘
    ┌──────────────────────────────────┐
    │       Unit Tests                 │  ← Many, fast, cheap
    │  (Functions, Classes, Logic)     │
    └──────────────────────────────────┘

Testing Principles

  1. Write tests before production: All code merged to main should have tests

  2. Test behavior, not implementation: Focus on what code does, not how

  3. Keep tests fast: Unit tests should run in seconds

  4. Make tests deterministic: No flaky tests, no random failures

  5. Test edge cases: Handle nulls, empty strings, boundary values

  6. Mock external dependencies: Don't hit production APIs in tests

Current Test Status

Note: As of October 2025, this repository has minimal test coverage. Tests are being added incrementally as features are developed.

Existing Tests:

  • /tests/main_test.py: Basic smoke test (122 lines)

  • /tests/conftest.py: Pytest configuration (2002 lines)

Test Coverage Goal: Target 80% code coverage for critical paths.

Unit Testing

What to Unit Test

  • Model logic: Prediction functions, preprocessing, postprocessing

  • Utility functions: Data transformations, validation, formatting

  • Configuration parsing: YAML loading, parameter validation

  • Data schemas: Table schema definitions, type conversions

Unit Test Structure

Directory Structure:

Example Unit Test

Testing a Model Class:

Testing Utility Functions:

Running Unit Tests

Run all tests:

Run specific test file:

Run specific test:

Run with coverage:

Run with verbose output:

Integration Testing

What to Integration Test

  • DLT pipelines: End-to-end pipeline execution

  • Model registration: MLflow registration and endpoint creation

  • API endpoints: Model serving endpoint responses

  • Database operations: Table creation, data insertion, queries

DLT Pipeline Testing

Testing Strategy:

  1. Create test data in sandbox environment

  2. Run DLT pipeline on test data

  3. Validate output tables have expected schema and data

Example DLT Pipeline Test:

Model Registration Testing

Testing Strategy:

  1. Register model to sandbox catalog

  2. Validate model metadata

  3. Test model prediction via MLflow

  4. Test model serving endpoint

Example Model Registration Test:

Running Integration Tests

Run all integration tests:

Run specific integration test:

Note: Integration tests require:

  • Databricks authentication configured

  • Access to sandbox environment

  • Test data in sandbox catalog

Model Testing

Model Validation

What to Validate:

  • Model loads successfully from MLflow

  • Model signature is correct

  • Model predictions have expected format

  • Model handles edge cases (empty input, nulls, special characters)

  • Model performance meets baseline metrics

Example Model Validation:

Performance Testing

Latency Testing:

Test Data Management

Test Data Strategy

Principle: Use realistic but small datasets for testing.

Test Data Sources:

  1. Synthetic data: Generated programmatically

  2. Anonymized samples: Real data with PII removed

  3. Fixture data: Hand-crafted examples for edge cases

Creating Test Data

Fixture Approach:

Using Fixtures in Tests:

Test Data in Databricks

Sandbox Test Tables:

Cleanup Test Data:

CI/CD Test Automation

Tests in PR Validation

Current State: PR validation workflow validates bundle configuration but doesn't run tests.

Recommended Addition (add to .github/workflows/ml_pipelines_pr_validate.yml):

Test Requirements for PR Merge

Recommended Policy:

  • All unit tests must pass

  • Code coverage must not decrease

  • New features must include tests

  • Bug fixes must include regression tests

Mocking Strategies

Mocking External Dependencies

Mock Databricks API:

Mock Model Predictions:

Mocking Databricks Utilities

Mock dbutils:

Coverage Requirements

Target Coverage Levels

  • Critical paths: 90%+ coverage

    • Model prediction logic

    • Data validation functions

    • Configuration parsing

  • Standard code: 80%+ coverage

    • Utility functions

    • Data transformations

    • API handlers

  • Infrastructure code: 60%+ coverage

    • Deployment scripts

    • CI/CD utilities

Measuring Coverage

Generate coverage report:

View HTML report:

Check specific file coverage:

Testing Best Practices

Do's

  • Write tests first (TDD when possible)

  • Test one thing per test

  • Use descriptive test names

  • Follow AAA pattern (Arrange, Act, Assert)

  • Use fixtures for setup/teardown

  • Mock external dependencies

  • Test edge cases and error conditions

  • Keep tests fast

  • Make tests deterministic

  • Clean up test data

Don'ts

  • Don't test implementation details

  • Don't write flaky tests

  • Don't skip test cleanup

  • Don't hardcode credentials in tests

  • Don't test external APIs directly

  • Don't duplicate test code

  • Don't ignore failing tests

  • Don't test framework code

  • Don't write tests that depend on order

Last updated