CI/CD Implementation Plan for ML Pipelines
Databricks Asset Bundles with Unity Catalog Data Isolation
Table of Contents
Executive Summary
Problem Statement
Proposed Solution
Benefits
Current State Analysis
Workspace Configuration CONFIRMED
Environment
Workspace ID
Current Host
Status
Unity Catalog Structure
Databricks Profiles
Service Principals (Environment-Specific)
Current Pipelines
Target Architecture
Deployment Flow
Data Isolation Strategy
Unity Catalog Hierarchy
MLOps Model Promotion Strategy
Shared Metastore Architecture
Model Training & Promotion Flow
Model Promotion Implementation
Benefits of This Approach
Job Distribution by Environment
Environment
Training Jobs
Promotion Jobs
Pipelines
Purpose
Permissions Model
Principal
Sandbox Catalogs
Dev Catalog
Staging Catalog
Prod Catalog
Implementation Phases
Phase 1: Infrastructure Setup (Terraform) CRITICAL FIRST
1.1 Fix Production Workspace Separation
1.2 Grant Sandbox Catalog Creation Permissions
1.3 Developer Groups & Service Principals CONFIRMED
1.4 Grant Service Principal Catalog Permissions
Phase 2: Databricks Bundle Configuration (databricks.yml)
2.1 Update Bundle Variables
2.2 Add Sandbox Target (NEW)
2.3 Update Dev Target (Shared Integration)
2.4 Update Staging Target
2.5 Update Production Target
2.6 Update Pipeline Resource References
Phase 3: GitHub Actions Workflows
3.1 Update Dev Deployment Workflow
3.2 Update Staging Deployment Workflow
3.3 Update Production Deployment Workflow
Phase 4: Code & Notebook Updates
4.1 Update Pipeline Notebooks
4.2 Update MLflow Model Registration
Detailed Changes Required
Summary Checklist
Terraform Changes (/Users/taylorlaing/Development/refresh-os/infra-core/stacks/ml-databricks/)
/Users/taylorlaing/Development/refresh-os/infra-core/stacks/ml-databricks/)databricks.yml Changes
Pipeline YAML Changes (resources/**/*.yml)
resources/**/*.yml)GitHub Workflows (.github/workflows/*.yml)
.github/workflows/*.yml)Notebook Code Changes
Open Questions & Decisions Needed
🔴 Critical Decisions
Q1: Production Workspace Confirmation
Q2: Developer Group Existence
Q3: Sandbox Catalog Lifecycle
Q4: Existing Pipeline Migration
Non-Critical Questions
Q5: Staging Auto-Promotion
Q6: Sandbox Resource Limits
Q7: Dev Catalog Write Access
Testing & Validation
Phase 1: Local Sandbox Testing
Test 1.1: Developer Deploys Sandbox
Test 1.2: Parallel Sandbox Deployments
Test 1.3: Sandbox Data Isolation
Phase 2: CI/CD Integration Testing
Test 2.1: Dev Deployment (CI/CD)
Test 2.2: Staging Deployment
Test 2.3: Production Deployment
Phase 3: Permissions Testing
Test 3.1: Developer Permissions
Test 3.2: Service Principal Permissions
Rollback Strategy
Scenario 1: Sandbox Deployment Breaks
Scenario 2: CI/CD Deployment Fails
Scenario 3: Production Workspace Change Fails
Scenario 4: Permissions Misconfiguration
Implementation Timeline
Week 1: Infrastructure Foundation
Week 2: Bundle Configuration
Week 3: Testing & Migration
Week 4: Cleanup & Documentation
Next Steps
Appendix A: Variable Reference
databricks.yml Variables
Variable
Sandbox
Dev
Staging
Prod
Workspace Reference
Environment
Workspace ID
Host
Profile
Appendix B: Commands Reference
Bundle Management
Catalog Management
Pipeline Management
Last updated