ADR-001: Four-Tier Deployment Architecture

Status: Accepted

Date: 2025-09-15

Decision Makers: CTO

Technical Story: Initial platform architecture design

Context

The ML pipelines platform needed a deployment architecture that would:

  1. Enable rapid developer iteration without conflicts

  2. Provide a shared integration testing environment

  3. Support pre-production validation identical to production

  4. Ensure production stability and safety

Key constraints:

  • Multiple developers working simultaneously

  • Need for isolated experimentation

  • Regulatory requirements for production data protection

  • Cost sensitivity (can't duplicate entire environments per developer)

  • Unity Catalog supports catalog-level isolation

Decision

Implement a four-tier deployment architecture with environment-specific Unity Catalog catalogs:

  1. Sandbox: {username}_sandbox - Individual developer catalogs

  2. Dev: dev - Shared development environment

  3. Staging: staging - Pre-production validation

  4. Prod: prod - Production workloads

Each environment has:

  • Dedicated catalog for data isolation

  • Environment-specific service principal for CI/CD

  • Workspace deployment (sandbox uses ref-dev workspace, others have dedicated workspaces)

Consequences

Positive

  • Zero developer conflicts: Each developer has isolated sandbox catalog

  • Cost efficient: Sandbox shares data from dev (no duplication)

  • Clear promotion path: Sandbox → Dev → Staging → Prod

  • Production safety: Multiple validation stages before prod

  • Governance compliant: Clear data ownership and access controls

  • Scalable: Adding developers doesn't impact others

Negative

  • Catalog proliferation: One sandbox per developer

  • Slightly complex: More environments to manage than 2-tier or 3-tier

  • Permission management: Need to configure access for each environment

  • Cost: More environments = more infrastructure (mitigated by serverless)

Neutral

  • Learning curve: Developers need to understand 4 environments

  • Documentation: Requires clear documentation (addressed)

  • Cleanup: Sandbox catalogs need periodic cleanup (addressed in runbooks)

Alternatives Considered

Option 1: Three-Tier (Dev → Staging → Prod)

Pros:

  • Simpler (one less tier)

  • Industry standard for many teams

Cons:

  • No developer isolation

  • Shared dev environment causes conflicts

  • Slower iteration cycles

Why rejected: Developer conflicts and lack of isolation would significantly slow down development velocity. The cost of an additional tier (sandbox) is minimal with Unity Catalog's zero-copy data sharing.

Option 2: Two-Tier (Dev → Prod)

Pros:

  • Simplest architecture

  • Minimal infrastructure

Cons:

  • No pre-production validation

  • High risk of prod issues

  • No developer isolation

Why rejected: Insufficient safety for production deployments. Staging is critical for validating changes before prod.

Option 3: Per-Developer Full Environments

Pros:

  • Complete isolation

  • Each developer has own dev/staging/prod

Cons:

  • Extremely expensive (3x environments × N developers)

  • Data duplication issues

  • Complex to manage

Why rejected: Cost prohibitive and unnecessary complexity. Sandbox + shared dev provides sufficient isolation at much lower cost.

Implementation Notes

Sandbox Data Sharing:

  • Sandbox reads from dev.bronze.* and dev.silver.* (no data copy)

  • Sandbox writes to {username}_sandbox.gold.* (isolated)

  • Implemented via Unity Catalog permissions

Environment Promotion:

Service Principal Isolation:

  • ml-pipelines-dev: Full access to dev catalog

  • ml-pipelines-staging: Full access to staging catalog

  • ml-pipelines-prod: Full access to prod catalog

Cost Optimization:

  • Sandbox/dev/staging pipelines PAUSED by default

  • Production pipelines UNPAUSED (continuous)

  • Serverless compute for all environments

References

Last updated