Compliance & Governance

Platform Compliance: For comprehensive platform-wide compliance framework (SOC 2, GDPR, encryption standards, audit logging, etc.), see Infra-Core Compliance & Governancearrow-up-right

Audience: Technical Executives, Compliance Officers, Legal Last Updated: October 2025

ML Pipelines-Specific Compliance

This document covers compliance aspects specific to the ML Pipelines service, focusing on Databricks Unity Catalog governance. For platform-wide compliance framework (SOC 2, GDPR, ISO 27001, encryption standards, audit logging, etc.), refer to the platform compliance documentation.

ML-Specific Compliance Highlights:

  • Unity Catalog data governance and lineage

  • Databricks-native RBAC and access control

  • ML model lineage and versioning compliance

  • Service principal per-environment isolation

  • Delta Lake time travel for data recovery

For complete compliance framework, see Infra-Core Compliancearrow-up-right.

Technical Details: For technical implementation of security controls, see Security Architecture. For Unity Catalog structure, see Unity Catalog Design.


Data Governance Framework

Unity Catalog Governance

For detailed Unity Catalog architecture, permissions model, and schema design, see Unity Catalog Architecture.

Key Governance Features:

  • Centralized metadata management and data lineage

  • Catalog-level environment isolation (prod, staging, dev, sandbox)

  • Fine-grained access control with audit logging

  • Compliance-ready data classification and tagging

The catalog structure provides clear data ownership, environment segregation, and blast radius containment for simplified compliance audits.


Compliance Frameworks

Current Compliance

The ML Pipelines platform currently maintains compliance with the following frameworks:

SOC 2 Type 2

Status: Compliant (as of October 2025)

Security Controls:

Control
Implementation
Evidence

Access Control

Unity Catalog RBAC + Service Principals

Permission logs

Data Encryption

At-rest (S3) + In-transit (TLS)

Databricks configuration

Audit Logging

System tables + CloudTrail

Queryable audit trail

Change Management

GitHub + Terraform + Asset Bundles

Git history + deployment logs

Monitoring

Databricks system tables + CloudWatch

Dashboard screenshots

Audit Preparation:

  1. Access Control Matrix: Document who can access what data

  2. Change Log: Git history provides complete audit trail

  3. Incident Response: Runbooks document procedures

  4. Monitoring Evidence: System tables provide queryable logs

GDPR

Status: Compliant (as of October 2025)

Data Subject Rights:

  • Right to Access: Query Unity Catalog for user data locations

  • Right to Erasure: Documented data deletion procedures

  • Right to Portability: Export capabilities via Databricks SQL

  • Right to Rectification: Update procedures documented

Data Protection Measures:

  1. Data Classification: Tagging schema (PII, confidential, public)

  2. Data Minimization: Only necessary data processed

  3. Purpose Limitation: Clear data usage policies

  4. Storage Limitation: Data retention policies

GDPR Implementation Checklist:

CCPA

Status: Compliant (as of October 2025)

California Consumer Privacy Act Compliance:

  • Right to Know: Consumers can request disclosure of data collected

  • Right to Delete: Implemented data deletion procedures

  • Right to Opt-Out: Consent mechanisms in place

  • Non-Discrimination: Equal service regardless of privacy rights exercise

CCPA Implementation:

  • Data inventory maintained in Unity Catalog

  • Consumer request handling procedures documented

  • Data retention and deletion policies enforced

  • Privacy notices provided to California residents


Roadmap: Planned Compliance (Early 2026)

The following compliance frameworks are planned for implementation in early 2026:

HIPAA Compliance (Planned Q1 2026)

Note: HIPAA compliance requires additional Databricks configuration and AWS BAA.

Technical Safeguards:

  • Encryption at rest and in transit (implemented)

  • Access control and authentication (implemented)

  • Audit logs (implemented)

  • Data backup and recovery (implemented)

Administrative Safeguards:

  • Documented security policies

  • Workforce training requirements

  • Incident response procedures

  • Regular security assessments

Required Actions for HIPAA:

  1. Execute BAA with Databricks

  2. Execute BAA with AWS

  3. Enable HIPAA-eligible Databricks workspace

  4. Implement additional audit controls

  5. Document PHI handling procedures

  6. Conduct security risk assessment

ISO 27001 Compliance (Planned Q1 2026)

Information Security Management System (ISMS):

  • Formal information security policies

  • Risk assessment and treatment processes

  • Security awareness training program

  • Regular internal audits

Technical Requirements:

  • Access control policies (partially implemented via Unity Catalog)

  • Cryptographic controls (encryption at rest and in transit - implemented)

  • Operations security procedures

  • Supplier relationships management

Required Actions for ISO 27001:

  1. Develop and document ISMS

  2. Conduct comprehensive risk assessment

  3. Implement Statement of Applicability (SoA)

  4. Establish internal audit program

  5. Achieve third-party certification

  6. Maintain continuous improvement process


Access Control & Permissions

Role-Based Access Control (RBAC)

For detailed access control implementation, permission matrices, and group configuration, see:

Key Access Control Features:

  • Group-Based Permissions: All permissions via Okta groups synced to Databricks (SCIM)

  • Least Privilege: Service principals isolated per environment, users read-only by default

  • Sandbox Isolation: Developers can only modify own sandbox

  • Production Lockdown: No direct user write access to prod

Permissions are automatically audited via Unity Catalog system tables.


Audit Logging & Monitoring

Audit Trail Sources

1. Unity Catalog Audit Logs:

2. Databricks Workspace Audit Logs:

  • Job runs and pipeline updates

  • Cluster creation and termination

  • Notebook execution

  • SQL queries

3. AWS CloudTrail:

  • S3 access logs

  • IAM changes

  • Service principal authentication

4. GitHub Audit Log:

  • Code changes

  • Pull request approvals

  • Deployment actions

Compliance Reporting

Monthly Report Includes:

  • Access pattern analysis

  • Failed authentication attempts

  • Permission changes

  • Data export activities

  • Unusual access patterns

Automated Reports:


Data Protection & Privacy

Encryption

Data at Rest:

  • S3 buckets: AES-256 server-side encryption (SSE-S3)

  • Unity Catalog: Encrypted by default

  • Delta tables: Encryption inherited from storage

Data in Transit:

  • All API calls: TLS 1.2+

  • Databricks workspace: HTTPS only

  • Inter-service communication: Encrypted by default

Key Management:

  • AWS KMS for S3 encryption keys

  • Databricks manages Unity Catalog encryption

  • Service principal credentials via GitHub OIDC (no secrets)

Data Classification

Tagging Strategy:

Classification Levels:

  1. Public: No restrictions

  2. Internal: Employees only

  3. Confidential: Need-to-know basis

  4. PII: Personal identifiable information

  5. Restricted: Highest sensitivity (PHI, financial, etc.)

Data Retention & Deletion

Retention Policies (to be implemented):

  • Raw data (bronze): 7 years

  • Processed data (silver): 5 years

  • Aggregated data (gold): 3 years

  • Model artifacts: 2 years after deprecation

Deletion Procedures:

GDPR Right to Erasure:


Incident Response

Data Breach Response Plan

Phase 1: Detection & Assessment (0-1 hour)

  1. Identify scope of breach

  2. Determine data sensitivity

  3. Assess number of affected records

  4. Notify security team

Phase 2: Containment (1-4 hours)

  1. Revoke compromised credentials

  2. Isolate affected systems

  3. Preserve evidence (logs, snapshots)

  4. Document timeline

Phase 3: Notification (4-72 hours)

  1. Notify affected users (if required)

  2. Notify regulators (if required by GDPR/HIPAA)

  3. Internal stakeholder communication

  4. Prepare public statement (if needed)

Phase 4: Recovery & Prevention (72+ hours)

  1. Restore from backups if needed

  2. Implement additional controls

  3. Conduct root cause analysis

  4. Update security procedures

Escalation Contacts

Severity
Response Time
Contact

P0 - Data Breach

Immediate

Security team + Legal

P1 - Compliance Violation

4 hours

Compliance officer

P2 - Policy Violation

24 hours

Manager + DevOps

P3 - Minor Issue

72 hours

DevOps team


Compliance Checklists

Pre-Deployment Checklist

Quarterly Compliance Review

Annual Compliance Certification


Regulatory Considerations

Data Residency

Current Setup:

  • Primary Region: US East (N. Virginia) - us-east-1

  • Data Storage: S3 buckets in us-east-1

  • Databricks Workspace: AWS us-east-1

Disaster Recovery & Data Replication:

  • DR Region: US West (Oregon) - us-west-2

  • Replication Method: S3 Cross-Region Replication (CRR)

  • Replication Scope: Production data only (prod catalog)

  • Purpose: Business continuity, disaster recovery, data redundancy

  • RPO: 24 hours (Recovery Point Objective)

  • RTO: 4 hours (Recovery Time Objective)

Security Benefits:

  • Geographic redundancy protects against regional disasters

  • Supports SOC 2 Type 2 business continuity requirements

  • Enhanced data durability (99.999999999% across regions)

  • Independent availability zone failures do not impact data availability

Cross-Border Transfer Considerations:

  • If processing EU citizen data, assess GDPR Article 45 compliance

  • Document legal basis for transfers (Standard Contractual Clauses, etc.)

  • Implement additional safeguards if required

Industry-Specific Regulations

Financial Services (e.g., SOX, PCI-DSS):

  • Enhanced audit trails (implemented)

  • Separation of duties (implemented)

  • Change control (implemented)

Healthcare (HIPAA):

  • BAA required with Databricks and AWS

  • Additional technical safeguards

  • PHI-specific policies


Documentation & Training

Required Documentation

  • This Compliance Guide - Governance framework (completed)

  • Security Architecture - Security documentationarrow-up-right (completed)

  • Access Control Matrix - Operations guide (completed)

  • Incident Response Plan - Runbooksarrow-up-right (completed)

  • Data Retention Policy - To be formalized

  • Privacy Policy - To be formalized

  • Data Processing Agreement - To be formalized (if needed for GDPR)

Training Requirements

Onboarding Training:

  • Data governance principles

  • Unity Catalog usage

  • PII handling procedures

  • Incident reporting

Annual Refresher:

  • Updated policies and procedures

  • New compliance requirements

  • Lessons learned from incidents


Continuous Compliance

Automated Controls

1. Pre-commit Checks:

  • No hardcoded secrets (git-secrets)

  • Code quality standards (black, flake8)

  • Security scanning (if implemented)

2. Deployment Gates:

  • Terraform plan approval required

  • Pull request reviews mandatory

  • Automated testing (if implemented)

3. Runtime Monitoring:

  • Access pattern anomalies

  • Failed authentication alerts

  • Data export monitoring

  • Cost anomaly detection

Compliance as Code

Infrastructure:

Policy Enforcement:

  • Terraform enforces catalog structure

  • Unity Catalog enforces permissions

  • GitHub branch protection enforces reviews

  • Asset Bundles enforce deployment standards


Risk Management

Identified Risks

Risk
Likelihood
Impact
Mitigation

Unauthorized data access

Low

High

RBAC + Audit logging + Alerts

Data breach

Low

Critical

Encryption + Access controls + Monitoring

Compliance violation

Medium

High

Automated checks + Regular audits

Accidental data deletion

Medium

Medium

Backups + Soft deletes + Permissions

Service principal compromise

Low

High

OIDC (no secrets) + Rotation + Monitoring

Risk Mitigation Strategy

Preventive Controls:

  • Unity Catalog permissions

  • Service principal isolation

  • Encryption at rest and in transit

  • Code review requirements

Detective Controls:

  • Audit logging

  • Access pattern monitoring

  • Failed authentication tracking

  • Data lineage validation

Corrective Controls:

  • Incident response plan

  • Backup and recovery procedures

  • Rollback capabilities

  • Disaster recovery plan


Next Steps

Immediate Actions

  1. Review this compliance framework (completed)

  2. Complete data classification tagging (in progress)

  3. Formalize data retention policy (pending)

  4. Conduct compliance gap analysis (pending)

  5. Implement automated compliance reporting (pending)

Short Term (1-3 months)

  1. SOC 2 readiness assessment

  2. GDPR compliance audit

  3. Privacy impact assessment

  4. Security penetration testing

  5. Incident response drill

Long Term (3-12 months)

  1. SOC 2 Type II certification (if required)

  2. Additional compliance frameworks (ISO 27001, etc.)

  3. Enhanced data governance automation

  4. Advanced threat detection

  5. Compliance training program


Contacts

Compliance Questions: Compliance Officer Security Incidents: Security Team (24/7 hotline) Data Privacy: Data Protection Officer Technical Implementation: DevOps Team


This compliance framework is a living document and should be reviewed quarterly and updated as regulations evolve.

Last updated