Compliance & Governance
Platform Compliance: For comprehensive platform-wide compliance framework (SOC 2, GDPR, encryption standards, audit logging, etc.), see Infra-Core Compliance & Governance
Audience: Technical Executives, Compliance Officers, Legal Last Updated: October 2025
ML Pipelines-Specific Compliance
This document covers compliance aspects specific to the ML Pipelines service, focusing on Databricks Unity Catalog governance. For platform-wide compliance framework (SOC 2, GDPR, ISO 27001, encryption standards, audit logging, etc.), refer to the platform compliance documentation.
ML-Specific Compliance Highlights:
Unity Catalog data governance and lineage
Databricks-native RBAC and access control
ML model lineage and versioning compliance
Service principal per-environment isolation
Delta Lake time travel for data recovery
For complete compliance framework, see Infra-Core Compliance.
Technical Details: For technical implementation of security controls, see Security Architecture. For Unity Catalog structure, see Unity Catalog Design.
Data Governance Framework
Unity Catalog Governance
For detailed Unity Catalog architecture, permissions model, and schema design, see Unity Catalog Architecture.
Key Governance Features:
Centralized metadata management and data lineage
Catalog-level environment isolation (prod, staging, dev, sandbox)
Fine-grained access control with audit logging
Compliance-ready data classification and tagging
The catalog structure provides clear data ownership, environment segregation, and blast radius containment for simplified compliance audits.
Compliance Frameworks
Current Compliance
The ML Pipelines platform currently maintains compliance with the following frameworks:
SOC 2 Type 2
Status: Compliant (as of October 2025)
Security Controls:
Access Control
Unity Catalog RBAC + Service Principals
Permission logs
Data Encryption
At-rest (S3) + In-transit (TLS)
Databricks configuration
Audit Logging
System tables + CloudTrail
Queryable audit trail
Change Management
GitHub + Terraform + Asset Bundles
Git history + deployment logs
Monitoring
Databricks system tables + CloudWatch
Dashboard screenshots
Audit Preparation:
Access Control Matrix: Document who can access what data
Change Log: Git history provides complete audit trail
Incident Response: Runbooks document procedures
Monitoring Evidence: System tables provide queryable logs
GDPR
Status: Compliant (as of October 2025)
Data Subject Rights:
Right to Access: Query Unity Catalog for user data locations
Right to Erasure: Documented data deletion procedures
Right to Portability: Export capabilities via Databricks SQL
Right to Rectification: Update procedures documented
Data Protection Measures:
Data Classification: Tagging schema (PII, confidential, public)
Data Minimization: Only necessary data processed
Purpose Limitation: Clear data usage policies
Storage Limitation: Data retention policies
GDPR Implementation Checklist:
CCPA
Status: Compliant (as of October 2025)
California Consumer Privacy Act Compliance:
Right to Know: Consumers can request disclosure of data collected
Right to Delete: Implemented data deletion procedures
Right to Opt-Out: Consent mechanisms in place
Non-Discrimination: Equal service regardless of privacy rights exercise
CCPA Implementation:
Data inventory maintained in Unity Catalog
Consumer request handling procedures documented
Data retention and deletion policies enforced
Privacy notices provided to California residents
Roadmap: Planned Compliance (Early 2026)
The following compliance frameworks are planned for implementation in early 2026:
HIPAA Compliance (Planned Q1 2026)
Note: HIPAA compliance requires additional Databricks configuration and AWS BAA.
Technical Safeguards:
Encryption at rest and in transit (implemented)
Access control and authentication (implemented)
Audit logs (implemented)
Data backup and recovery (implemented)
Administrative Safeguards:
Documented security policies
Workforce training requirements
Incident response procedures
Regular security assessments
Required Actions for HIPAA:
Execute BAA with Databricks
Execute BAA with AWS
Enable HIPAA-eligible Databricks workspace
Implement additional audit controls
Document PHI handling procedures
Conduct security risk assessment
ISO 27001 Compliance (Planned Q1 2026)
Information Security Management System (ISMS):
Formal information security policies
Risk assessment and treatment processes
Security awareness training program
Regular internal audits
Technical Requirements:
Access control policies (partially implemented via Unity Catalog)
Cryptographic controls (encryption at rest and in transit - implemented)
Operations security procedures
Supplier relationships management
Required Actions for ISO 27001:
Develop and document ISMS
Conduct comprehensive risk assessment
Implement Statement of Applicability (SoA)
Establish internal audit program
Achieve third-party certification
Maintain continuous improvement process
Access Control & Permissions
Role-Based Access Control (RBAC)
For detailed access control implementation, permission matrices, and group configuration, see:
Security Architecture - Complete security model and permissions
Unity Catalog Architecture - Catalog-level permissions
Service Principals Guide - Service principal setup
Key Access Control Features:
Group-Based Permissions: All permissions via Okta groups synced to Databricks (SCIM)
Least Privilege: Service principals isolated per environment, users read-only by default
Sandbox Isolation: Developers can only modify own sandbox
Production Lockdown: No direct user write access to prod
Permissions are automatically audited via Unity Catalog system tables.
Audit Logging & Monitoring
Audit Trail Sources
1. Unity Catalog Audit Logs:
2. Databricks Workspace Audit Logs:
Job runs and pipeline updates
Cluster creation and termination
Notebook execution
SQL queries
3. AWS CloudTrail:
S3 access logs
IAM changes
Service principal authentication
4. GitHub Audit Log:
Code changes
Pull request approvals
Deployment actions
Compliance Reporting
Monthly Report Includes:
Access pattern analysis
Failed authentication attempts
Permission changes
Data export activities
Unusual access patterns
Automated Reports:
Data Protection & Privacy
Encryption
Data at Rest:
S3 buckets: AES-256 server-side encryption (SSE-S3)
Unity Catalog: Encrypted by default
Delta tables: Encryption inherited from storage
Data in Transit:
All API calls: TLS 1.2+
Databricks workspace: HTTPS only
Inter-service communication: Encrypted by default
Key Management:
AWS KMS for S3 encryption keys
Databricks manages Unity Catalog encryption
Service principal credentials via GitHub OIDC (no secrets)
Data Classification
Tagging Strategy:
Classification Levels:
Public: No restrictions
Internal: Employees only
Confidential: Need-to-know basis
PII: Personal identifiable information
Restricted: Highest sensitivity (PHI, financial, etc.)
Data Retention & Deletion
Retention Policies (to be implemented):
Raw data (bronze): 7 years
Processed data (silver): 5 years
Aggregated data (gold): 3 years
Model artifacts: 2 years after deprecation
Deletion Procedures:
GDPR Right to Erasure:
Incident Response
Data Breach Response Plan
Phase 1: Detection & Assessment (0-1 hour)
Identify scope of breach
Determine data sensitivity
Assess number of affected records
Notify security team
Phase 2: Containment (1-4 hours)
Revoke compromised credentials
Isolate affected systems
Preserve evidence (logs, snapshots)
Document timeline
Phase 3: Notification (4-72 hours)
Notify affected users (if required)
Notify regulators (if required by GDPR/HIPAA)
Internal stakeholder communication
Prepare public statement (if needed)
Phase 4: Recovery & Prevention (72+ hours)
Restore from backups if needed
Implement additional controls
Conduct root cause analysis
Update security procedures
Escalation Contacts
P0 - Data Breach
Immediate
Security team + Legal
P1 - Compliance Violation
4 hours
Compliance officer
P2 - Policy Violation
24 hours
Manager + DevOps
P3 - Minor Issue
72 hours
DevOps team
Compliance Checklists
Pre-Deployment Checklist
Quarterly Compliance Review
Annual Compliance Certification
Regulatory Considerations
Data Residency
Current Setup:
Primary Region: US East (N. Virginia) - us-east-1
Data Storage: S3 buckets in us-east-1
Databricks Workspace: AWS us-east-1
Disaster Recovery & Data Replication:
DR Region: US West (Oregon) - us-west-2
Replication Method: S3 Cross-Region Replication (CRR)
Replication Scope: Production data only (prod catalog)
Purpose: Business continuity, disaster recovery, data redundancy
RPO: 24 hours (Recovery Point Objective)
RTO: 4 hours (Recovery Time Objective)
Security Benefits:
Geographic redundancy protects against regional disasters
Supports SOC 2 Type 2 business continuity requirements
Enhanced data durability (99.999999999% across regions)
Independent availability zone failures do not impact data availability
Cross-Border Transfer Considerations:
If processing EU citizen data, assess GDPR Article 45 compliance
Document legal basis for transfers (Standard Contractual Clauses, etc.)
Implement additional safeguards if required
Industry-Specific Regulations
Financial Services (e.g., SOX, PCI-DSS):
Enhanced audit trails (implemented)
Separation of duties (implemented)
Change control (implemented)
Healthcare (HIPAA):
BAA required with Databricks and AWS
Additional technical safeguards
PHI-specific policies
Documentation & Training
Required Documentation
This Compliance Guide - Governance framework (completed)
Security Architecture - Security documentation (completed)
Access Control Matrix - Operations guide (completed)
Incident Response Plan - Runbooks (completed)
Data Retention Policy - To be formalized
Privacy Policy - To be formalized
Data Processing Agreement - To be formalized (if needed for GDPR)
Training Requirements
Onboarding Training:
Data governance principles
Unity Catalog usage
PII handling procedures
Incident reporting
Annual Refresher:
Updated policies and procedures
New compliance requirements
Lessons learned from incidents
Continuous Compliance
Automated Controls
1. Pre-commit Checks:
No hardcoded secrets (git-secrets)
Code quality standards (black, flake8)
Security scanning (if implemented)
2. Deployment Gates:
Terraform plan approval required
Pull request reviews mandatory
Automated testing (if implemented)
3. Runtime Monitoring:
Access pattern anomalies
Failed authentication alerts
Data export monitoring
Cost anomaly detection
Compliance as Code
Infrastructure:
Policy Enforcement:
Terraform enforces catalog structure
Unity Catalog enforces permissions
GitHub branch protection enforces reviews
Asset Bundles enforce deployment standards
Risk Management
Identified Risks
Unauthorized data access
Low
High
RBAC + Audit logging + Alerts
Data breach
Low
Critical
Encryption + Access controls + Monitoring
Compliance violation
Medium
High
Automated checks + Regular audits
Accidental data deletion
Medium
Medium
Backups + Soft deletes + Permissions
Service principal compromise
Low
High
OIDC (no secrets) + Rotation + Monitoring
Risk Mitigation Strategy
Preventive Controls:
Unity Catalog permissions
Service principal isolation
Encryption at rest and in transit
Code review requirements
Detective Controls:
Audit logging
Access pattern monitoring
Failed authentication tracking
Data lineage validation
Corrective Controls:
Incident response plan
Backup and recovery procedures
Rollback capabilities
Disaster recovery plan
Next Steps
Immediate Actions
Review this compliance framework (completed)
Complete data classification tagging (in progress)
Formalize data retention policy (pending)
Conduct compliance gap analysis (pending)
Implement automated compliance reporting (pending)
Short Term (1-3 months)
SOC 2 readiness assessment
GDPR compliance audit
Privacy impact assessment
Security penetration testing
Incident response drill
Long Term (3-12 months)
SOC 2 Type II certification (if required)
Additional compliance frameworks (ISO 27001, etc.)
Enhanced data governance automation
Advanced threat detection
Compliance training program
Contacts
Compliance Questions: Compliance Officer Security Incidents: Security Team (24/7 hotline) Data Privacy: Data Protection Officer Technical Implementation: DevOps Team
This compliance framework is a living document and should be reviewed quarterly and updated as regulations evolve.
Last updated