ADR-001: Four-Tier Deployment Architecture

Status: Accepted

Date: 2025-09-15

Decision Makers: CTO

Technical Story: Initial platform architecture design

Context

The ML pipelines platform needed a deployment architecture that would:

Enable rapid developer iteration without conflicts
Provide a shared integration testing environment
Support pre-production validation identical to production
Ensure production stability and safety

Key constraints:

Multiple developers working simultaneously
Need for isolated experimentation
Regulatory requirements for production data protection
Cost sensitivity (can't duplicate entire environments per developer)
Unity Catalog supports catalog-level isolation

Decision

Implement a four-tier deployment architecture with environment-specific Unity Catalog catalogs:

Sandbox: {username}_sandbox - Individual developer catalogs
Dev: dev - Shared development environment
Staging: staging - Pre-production validation
Prod: prod - Production workloads

Each environment has:

Dedicated catalog for data isolation
Environment-specific service principal for CI/CD
Workspace deployment (sandbox uses ref-dev workspace, others have dedicated workspaces)

Consequences

Positive

Zero developer conflicts: Each developer has isolated sandbox catalog
Cost efficient: Sandbox shares data from dev (no duplication)
Clear promotion path: Sandbox → Dev → Staging → Prod
Production safety: Multiple validation stages before prod
Governance compliant: Clear data ownership and access controls
Scalable: Adding developers doesn't impact others

Negative

Catalog proliferation: One sandbox per developer
Slightly complex: More environments to manage than 2-tier or 3-tier
Permission management: Need to configure access for each environment
Cost: More environments = more infrastructure (mitigated by serverless)

Neutral

Learning curve: Developers need to understand 4 environments
Documentation: Requires clear documentation (addressed)
Cleanup: Sandbox catalogs need periodic cleanup (addressed in runbooks)

Alternatives Considered

Option 1: Three-Tier (Dev → Staging → Prod)

Pros:

Simpler (one less tier)
Industry standard for many teams

Cons:

No developer isolation
Shared dev environment causes conflicts
Slower iteration cycles

Why rejected: Developer conflicts and lack of isolation would significantly slow down development velocity. The cost of an additional tier (sandbox) is minimal with Unity Catalog's zero-copy data sharing.

Option 2: Two-Tier (Dev → Prod)

Pros:

Simplest architecture
Minimal infrastructure

Cons:

No pre-production validation
High risk of prod issues
No developer isolation

Why rejected: Insufficient safety for production deployments. Staging is critical for validating changes before prod.

Option 3: Per-Developer Full Environments

Pros:

Complete isolation
Each developer has own dev/staging/prod

Cons:

Extremely expensive (3x environments × N developers)
Data duplication issues
Complex to manage

Why rejected: Cost prohibitive and unnecessary complexity. Sandbox + shared dev provides sufficient isolation at much lower cost.

Implementation Notes

Sandbox Data Sharing:

Sandbox reads from dev.bronze.* and dev.silver.* (no data copy)
Sandbox writes to {username}_sandbox.gold.* (isolated)
Implemented via Unity Catalog permissions

Environment Promotion:

Developer → Sandbox (manual deployment)
    ↓ PR
  → Dev (auto on merge)
    ↓ Manual approval
  → Staging (manual trigger)
    ↓ Manual approval
  → Prod (manual trigger)

Service Principal Isolation:

ml-pipelines-dev: Full access to dev catalog
ml-pipelines-staging: Full access to staging catalog
ml-pipelines-prod: Full access to prod catalog

Cost Optimization:

Sandbox/dev/staging pipelines PAUSED by default
Production pipelines UNPAUSED (continuous)
Serverless compute for all environments

References

Databricks Asset Bundles Best Practices
Unity Catalog Isolation Patterns
Internal architecture discussions (Sept 2025)

PreviousDecisions NextADR-002: Unity Catalog for Environment Isolation

Last updated 5 months ago

hashtagContext

hashtagDecision

hashtagConsequences

hashtagPositive

hashtagNegative

hashtagNeutral

hashtagAlternatives Considered

hashtagOption 1: Three-Tier (Dev → Staging → Prod)

hashtagOption 2: Two-Tier (Dev → Prod)

hashtagOption 3: Per-Developer Full Environments

hashtagImplementation Notes

hashtagRelated Decisions

hashtagReferences

Context

Decision

Consequences

Positive

Negative

Neutral

Alternatives Considered

Option 1: Three-Tier (Dev → Staging → Prod)

Option 2: Two-Tier (Dev → Prod)

Option 3: Per-Developer Full Environments

Implementation Notes

Related Decisions

References