Naming Conventions

Overview

This document provides comprehensive naming standards for all assets in the ML Pipelines project. Consistent naming improves discoverability, maintainability, and organizational standards.

General Principles

  1. Be Descriptive: Names should clearly indicate purpose

  2. Be Consistent: Follow the same pattern across similar resources

  3. Be Concise: Avoid unnecessary verbosity

  4. Use Standard Formats: Follow industry conventions (snake_case, PascalCase, kebab-case)

  5. Include Context: Add environment/scope when needed


Unity Catalog Naming

Catalogs

Format: {environment} or {username}_{environment}

Environments:

  • dev - Shared development catalog

  • staging - Pre-production catalog

  • prod - Production catalog

  • {username}_sandbox - Personal developer catalog (e.g., taylorlaing_sandbox)

Examples:

Rules:

  • Environment catalogs: lowercase, no underscores

  • Sandbox catalogs: {username}_sandbox format

  • No special characters except underscore


Schemas

Format: {layer} or {domain}

Medallion Architecture Layers:

  • bronze - Raw data, minimal transformation

  • silver - Cleaned and validated data

  • gold - Business-ready aggregates and features

Domain Schemas:

  • models - ML model artifacts and metadata

  • monitoring - Monitoring and observability tables

  • testing - Test data and validation results

Examples:

Rules:

  • Lowercase only

  • No underscores in layer names

  • Domain names can use underscores if needed


Tables

Format: {domain}_{entity} or {descriptive_name}

Pattern: Use snake_case for multi-word names

Examples:

Bronze Layer (source system tables):

Silver Layer (processed data):

Gold Layer (business aggregates):

Rules:

  • snake_case format

  • Plural for collection tables (e.g., messages)

  • Singular for dimension tables (e.g., user_profile)

  • Include layer context if ambiguous

  • Avoid abbreviations unless standard (e.g., id ok, msg not ok)


Models (MLflow)

Format: {catalog}.models.{model_name}

Pattern: Descriptive name indicating model purpose

Examples:

Model Aliases:

  • champion - Current production model

  • challenger - Model being tested

  • archive - Previous production model

Rules:

  • snake_case format

  • Descriptive of model's function

  • Internal models: descriptive name (e.g., sentiment_analysis)

  • External models: {source}_{model_name} (e.g., roberta_base_go_emotions)


Volumes

Format: {catalog}.{schema}.{volume_name}

Pattern: Descriptive name for data storage location

Examples:

Rules:

  • snake_case format

  • Prefix with layer if helpful (e.g., raw_messages)

  • Match corresponding table name when possible


Pipeline Naming

DLT Pipeline Configurations

File Format: {pipeline_name}.pipeline.yml

Pipeline Name Format: {descriptive_name} (in YAML) → {descriptive_name}-{environment} (deployed)

Examples:

File names:

Deployed pipeline names:

Rules:

  • File names: snake_case

  • Deployed names: kebab-case with environment suffix

  • Environment suffix added automatically by databricks.yml variables

  • Descriptive of pipeline's primary function


Pipeline Scripts

Format: {pipeline_name}.py

Examples:

Rules:

  • Match corresponding pipeline YAML file name

  • snake_case format

  • Located in /resources/pipelines/{schema}/{pipeline_name}/


Job Naming

Job Configurations

File Format: register_{model_name}.job.yml or {job_type}_{purpose}.job.yml

Job Name Format: {resource_prefix}_{job_name} (deployed)

Examples:

File names:

Deployed job names:

Rules:

  • File names: snake_case

  • Model registration: register_{model_name}.job.yml

  • Inference: inference_{model_name}.job.yml

  • Deployed names: {resource_prefix}_{job_name} (prefix from databricks.yml)


Job Scripts

Format: register_{model_name}.py or {job_type}_{purpose}.py

Examples:

Rules:

  • Match corresponding job YAML file name

  • snake_case format

  • Located in /resources/jobs/{category}/{job_name}/


Code Naming

Python Modules and Packages

Package Names: lowercase, no underscores

Examples:

Module Names: snake_case

Examples:


Python Variables and Functions

Variables: snake_case

Examples:

Functions: snake_case, verb-based

Examples:

Boolean Variables: Prefix with is_, has_, should_

Examples:


Python Classes

Format: PascalCase

Examples:

Rules:

  • Use nouns or noun phrases

  • Descriptive of class purpose

  • Avoid generic names like Manager, Handler unless specific


Python Constants

Format: UPPER_SNAKE_CASE

Examples:


Configuration Naming

Environment Variables

Format: UPPER_SNAKE_CASE

Examples (in databricks.yml):

Usage in code:


Spark Configuration

Format: Dot notation, lowercase

Examples:


File and Directory Naming

Python Files

Format: snake_case.py

Examples:


YAML Files

Format: kebab-case.yml

Examples:


Notebooks

Format: snake_case.ipynb or snake_case.py

Examples:

Rules:

  • snake_case format

  • Descriptive of notebook purpose

  • Include date for temporary notebooks (e.g., debug_2025_01_15.ipynb)


Directories

Format: snake_case (lowercase)

Examples:

Rules:

  • Lowercase only

  • No special characters except underscore

  • Descriptive of contents

  • Follow project structure conventions


Git Branch Naming

Format: ref-{ticket-number}-{brief-description}

Branch names are generated by Linear based on ticket ID and title.

Examples:

Rules:

  • Always starts with ref- prefix

  • Ticket number from Linear

  • kebab-case for description

  • Brief but descriptive (2-5 words)

  • Lowercase only


Pull Request Naming

Format: ref-{ticket} | {Title}

PR titles use Linear ticket reference followed by descriptive title.

Examples:

Rules:

  • Always starts with ref-{ticket} followed by |

  • Title case for PR title

  • Descriptive but concise


Workspace Paths

Bundle Deployment Paths

Format: /Workspace/{location}/.bundle/{bundle_name}/{target}

Examples:

Rules:

  • Sandbox: Under user workspace directory

  • Shared environments: Under /Shared/

  • Bundle name from databricks.yml

  • Target name from deployment


S3 Bucket Paths

Format: s3://{bucket}/{catalog}/volumes/{schema}/{volume}/

Examples:

Rules:

  • Bucket name includes environment

  • Catalog-specific paths for isolation

  • Follow Unity Catalog volume structure


Quick Reference Table

Asset Type
Format
Example

Catalog

{environment}

dev, prod

Sandbox Catalog

{username}_sandbox

taylorlaing_sandbox

Schema

{layer}

bronze, silver, gold, models

Table

{domain}_{entity}

sentiment_analysis

Model

{model_name}

sentiment_analysis

Pipeline File

{name}.pipeline.yml

sentiment_analysis.pipeline.yml

Job File

register_{model}.job.yml

register_sentiment_analysis.job.yml

Python File

{name}.py

sentiment_analysis.py

Class

PascalCase

SentimentAnalysisModel

Function

snake_case

calculate_sentiment_score

Variable

snake_case

sentiment_score

Constant

UPPER_SNAKE_CASE

MAX_MESSAGE_LENGTH

Branch

{type}/{description}

feature/add-emotion-detection


Validation Checklist

Before merging code, verify naming follows conventions:


Last updated