Security Architecture

Overview

This document describes the security model, access controls, compliance considerations, and security best practices for the ML Pipelines platform.

See Also: For compliance-specific information, see Compliance & Governance. For service principal setup, see Service Principals Guide.

Security Model

Defense in Depth

The platform implements multiple layers of security:

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Network Security                                   │
│ - VPC isolation (10.100.0.0/16)                             │
│ - Private subnets for compute                               │
│ - NAT Gateway for outbound traffic                          │
│ - Security groups restricting access                        │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ Layer 2: Authentication & Authorization                     │
│ - GitHub OIDC for CI/CD (no long-lived secrets)             │
│ - Service Principal per environment                         │
│ - OAuth 2.0 for user access                                 │
│ - Unity Catalog permissions (catalog/schema/table)          │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ Layer 3: Data Governance                                    │
│ - Unity Catalog centralized governance                      │
│ - Row-level security (future)                               │
│ - Column-level masking (future)                             │
│ - Audit logging (all access tracked)                        │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ Layer 4: Data Encryption                                    │
│ - At rest: S3 encryption (AES-256)                          │
│ - In transit: TLS 1.2+                                      │
│ - Secrets: AWS SSM Parameter Store (encrypted)              │
└─────────────────────────────────────────────────────────────┘

Unity Catalog Permissions

For detailed Unity Catalog permission model, schema design, and catalog structure, see the Unity Catalog Architecture guide.

Service Principal Permissions Summary

Service principal permissions are configured per environment. For complete details on service principal setup and permissions, see:


Data Isolation Guarantees

Environment Isolation

Principle: Each environment has its own catalog with no write access across environments (except staging → prod read for training).

From → To
Dev
Staging
Prod

Dev

Read/Write

None

None

Staging

None

Read/Write

Read-Only

Prod

None

None

Read/Write

Sandbox Isolation

Developer Sandboxes:

  • Each developer has isolated catalog ({username}_sandbox)

  • Read from dev catalog for data

  • Write only to own sandbox

  • Zero conflicts between developers

Enforcement:


Audit Logging

What is Logged

Unity Catalog and Databricks log all access:

Logged Events:

  • Table access (SELECT, INSERT, UPDATE, DELETE)

  • Schema changes (CREATE, ALTER, DROP)

  • Permission changes (GRANT, REVOKE)

  • Model access (prediction requests)

  • Job executions

  • Notebook runs

  • API calls

Accessing Audit Logs

Databricks UI:

  1. Admin Console → Audit Logs

  2. Filter by:

    • User/Service Principal

    • Action (SELECT, CREATE, etc)

    • Resource (catalog, table)

    • Time range

SQL Query (system tables):

Audit Log Retention

  • Standard: 90 days in Databricks

  • Extended: Export to S3 for 2 years (compliance)

Export Audit Logs:


Compliance Considerations

For detailed compliance information, see the Compliance & Governance Guide.

Current Compliance Status

The platform maintains compliance with:

  • SOC 2 Type 2: Access controls, audit logging, encryption, incident response (Compliant as of October 2025)

  • GDPR: Data subject rights, data protection measures (Compliant as of October 2025)

  • CCPA: Consumer privacy rights, data handling procedures (Compliant as of October 2025)

Planned Compliance (Early 2026)

  • HIPAA: Healthcare data protection (Q1 2026)

  • ISO 27001: Information security management (Q1 2026)

GDPR Implementation

Right to be Forgotten:

Data Processing Agreement:

  • Document data processing activities

  • Maintain data inventory

  • Define retention policies

  • Implement deletion procedures

Data Residency

Primary Region: us-east-1 (US East - N. Virginia)

  • All primary data stored in us-east-1

  • Databricks control plane in us-east-1

Disaster Recovery & Security:

  • Cross-region replication: us-east-1 → us-west-2 (US West - Oregon)

  • Replication purpose: Disaster recovery and data redundancy

  • Replication scope: Critical production data (prod catalog)

  • RPO (Recovery Point Objective): 24 hours

  • RTO (Recovery Time Objective): 4 hours

S3 Cross-Region Replication Configuration:

Compliance Note: Data replication supports SOC 2 Type 2 business continuity requirements and provides additional data protection for GDPR/CCPA compliance.


Secret Management

GitHub Secrets

Secrets Used:

  • GH_PAT: GitHub Personal Access Token (for submodules)

  • No other secrets (OIDC eliminates long-lived credentials)

Best Practices:

  • Rotate every 90 days

  • Minimum required scopes (repo, workflow)

  • Never log or display in CI/CD

  • Store encrypted in GitHub

Databricks Secrets

Not used for CI/CD (OIDC replaces secrets), but available for:

  • Runtime credentials (API keys for external services)

  • Database passwords

  • Third-party integrations

Usage (if needed):

AWS SSM Parameter Store

Databricks Account Credentials:

Access Control:

  • IAM policy restricts access to infrastructure team

  • Encrypted with KMS

  • Audited via CloudTrail


Access Control Matrix

Note: Permissions are granted via groups (Okta → SCIM → Databricks), not individual users.

Production Environment

Principal Type
Principal Name
Catalogs
Permissions
Use Case

Service Principal

ml-pipelines-prod

prod

ALL PRIVILEGES

CI/CD deployments

staging.models

SELECT

Model promotion

Group

Databricks - Prod

prod

SELECT

Debugging, monitoring (DevOps/SRE)

Group

Databricks - Account Admin

prod

ALL PRIVILEGES

Administration

Staging Environment

Principal Type
Principal Name
Catalogs
Permissions
Use Case

Service Principal

ml-pipelines-staging

staging

ALL PRIVILEGES

CI/CD deployments

prod

SELECT

Training on prod data

Group

Databricks - Staging

staging

CAN_RUN, SELECT

Run jobs, view results

Group

Databricks - Account Admin

staging

ALL PRIVILEGES

Administration

Dev Environment

Principal Type
Principal Name
Catalogs
Permissions
Use Case

Service Principal

ml-pipelines-dev

dev

ALL PRIVILEGES

CI/CD deployments

Group

Databricks - Dev

dev

SELECT

Development, testing

METASTORE

CREATE CATALOG

Sandbox creation

Individual User

{username}@refr-esh.com

{username}_sandbox

ALL PRIVILEGES

Personal sandbox ownership

Group

Databricks - Account Admin

dev

ALL PRIVILEGES

Administration


Network Security

VPC Configuration

CIDR Block: 10.100.0.0/16

Subnets:

  • Private subnets (2 AZs): 10.100.1.0/24, 10.100.2.0/24

  • Public subnet (NAT): 10.100.10.0/24

Routing:

Security Groups:

Network Isolation

Databricks Compute:

  • Runs in private subnets

  • No public IP addresses

  • Access via Databricks control plane only

S3 Access:

  • Via VPC endpoint (no internet traversal)

  • IAM roles control access

  • External locations in Unity Catalog


Threat Model

Threats and Mitigations

Threat
Mitigation
Status

Credential theft

GitHub OIDC (no long-lived secrets)

Implemented

Unauthorized data access

Unity Catalog permissions, audit logs

Implemented

Data exfiltration

Network controls, audit logging

Implemented

Malicious code injection

Code review, PR validation

Implemented

Denial of service

Rate limiting, resource quotas

Implemented

Insider threat

Least privilege, audit logs

Implemented

Supply chain attack

Dependency scanning (future)

Planned

Incident Response

Security Incident Workflow:

  1. Detect: Audit logs, monitoring alerts

  2. Contain: Revoke credentials, block access

  3. Investigate: Review audit logs, identify scope

  4. Remediate: Fix vulnerability, rotate secrets

  5. Document: Incident report, lessons learned

  6. Review: Update security controls

Runbook: See Secret Rotation Runbook for credential compromise procedures.


Security Best Practices

For Developers

  1. Never commit secrets:

  2. Use least privilege:

    • Request minimum required permissions

    • Use sandbox for experiments

    • Never share service principal credentials

  3. Protect personal tokens:

    • Rotate every 90 days

    • Minimum scopes

    • Never share with others

  4. Review audit logs:

    • Check own access history monthly

    • Report suspicious activity

For Administrators

  1. Quarterly access reviews:

    • Audit all permissions

    • Remove unused accounts

    • Validate service principal access

  2. Monitor audit logs:

    • Set up alerts for anomalies

    • Review high-privilege actions

    • Investigate failed access attempts

  3. Keep dependencies updated:

    • Review Databricks security bulletins

    • Apply security patches promptly

    • Update libraries with known vulnerabilities

  4. Document everything:

    • Permission grants

    • Incident responses

    • Security changes


This Repository

Cross-Repository Documentation

Last updated