Naming Conventions

Overview

This document provides comprehensive naming standards for all assets in the ML Pipelines project. Consistent naming improves discoverability, maintainability, and organizational standards.

General Principles

Be Descriptive: Names should clearly indicate purpose
Be Consistent: Follow the same pattern across similar resources
Be Concise: Avoid unnecessary verbosity
Use Standard Formats: Follow industry conventions (snake_case, PascalCase, kebab-case)
Include Context: Add environment/scope when needed

Unity Catalog Naming

Catalogs

Format: {environment} or {username}_{environment}

Environments:

dev - Shared development catalog
staging - Pre-production catalog
prod - Production catalog
{username}_sandbox - Personal developer catalog (e.g., taylorlaing_sandbox)

Examples:

dev
staging
prod
taylorlaing_sandbox
williamsmith_sandbox

Rules:

Environment catalogs: lowercase, no underscores
Sandbox catalogs: {username}_sandbox format
No special characters except underscore

Schemas

Format: {layer} or {domain}

Medallion Architecture Layers:

bronze - Raw data, minimal transformation
silver - Cleaned and validated data
gold - Business-ready aggregates and features

Domain Schemas:

models - ML model artifacts and metadata
monitoring - Monitoring and observability tables
testing - Test data and validation results

Examples:

bronze
silver
gold
models
monitoring

Rules:

Lowercase only
No underscores in layer names
Domain names can use underscores if needed

Tables

Format: {domain}_{entity} or {descriptive_name}

Pattern: Use snake_case for multi-word names

Examples:

Bronze Layer (source system tables):

messages
message_participants
calendar_events
calendar_event_participants
work_items
work_item_participants
work_item_state_changes

Silver Layer (processed data):

sentiment_analysis
linguistic_analysis
emoji_analysis
feature_analysis
messages_enriched

Gold Layer (business aggregates):

daily_sentiment_metrics
team_communication_stats
employee_engagement_scores

Rules:

snake_case format
Plural for collection tables (e.g., messages)
Singular for dimension tables (e.g., user_profile)
Include layer context if ambiguous
Avoid abbreviations unless standard (e.g., id ok, msg not ok)

Models (MLflow)

Format: {catalog}.models.{model_name}

Pattern: Descriptive name indicating model purpose

Examples:

dev.models.sentiment_analysis
dev.models.emoji_sentiment
dev.models.linguistic_features
dev.models.roberta_base_go_emotions
staging.models.sentiment_analysis
prod.models.sentiment_analysis

Model Aliases:

champion - Current production model
challenger - Model being tested
archive - Previous production model

Rules:

snake_case format
Descriptive of model's function
Internal models: descriptive name (e.g., sentiment_analysis)
External models: {source}_{model_name} (e.g., roberta_base_go_emotions)

Volumes

Format: {catalog}.{schema}.{volume_name}

Pattern: Descriptive name for data storage location

Examples:

dev.bronze.raw_messages
dev.bronze.raw_calendar_events
dev.bronze.raw_work_items
taylorlaing_sandbox.bronze.raw_messages

Rules:

snake_case format
Prefix with layer if helpful (e.g., raw_messages)
Match corresponding table name when possible

Pipeline Naming

DLT Pipeline Configurations

File Format: {pipeline_name}.pipeline.yml

Pipeline Name Format: {descriptive_name} (in YAML) → {descriptive_name}-{environment} (deployed)

Examples:

File names:

bronze_data_ingestion.pipeline.yml
sentiment_analysis.pipeline.yml
linguistic_analysis.pipeline.yml
emoji_analysis.pipeline.yml
feature_analysis.pipeline.yml

Deployed pipeline names:

bronze-data-ingestion-dev
sentiment-analysis-dev
sentiment-analysis-staging
sentiment-analysis-prod

Rules:

File names: snake_case
Deployed names: kebab-case with environment suffix
Environment suffix added automatically by databricks.yml variables
Descriptive of pipeline's primary function

Pipeline Scripts

Format: {pipeline_name}.py

Examples:

bronze_data_ingestion.py
sentiment_analysis.py
linguistic_analysis.py

Rules:

Match corresponding pipeline YAML file name
snake_case format
Located in /resources/pipelines/{schema}/{pipeline_name}/

Job Naming

Job Configurations

File Format: register_{model_name}.job.yml or {job_type}_{purpose}.job.yml

Job Name Format: {resource_prefix}_{job_name} (deployed)

Examples:

File names:

register_sentiment_analysis.job.yml
register_emoji_analysis.job.yml
register_roberta_go_emotions.job.yml
inference_sentiment_batch.job.yml

Deployed job names:

dev_register_sentiment_analysis
staging_register_sentiment_analysis
prod_register_sentiment_analysis
taylorlaing_register_sentiment_analysis  (sandbox)

Rules:

File names: snake_case
Model registration: register_{model_name}.job.yml
Inference: inference_{model_name}.job.yml
Deployed names: {resource_prefix}_{job_name} (prefix from databricks.yml)

Job Scripts

Format: register_{model_name}.py or {job_type}_{purpose}.py

Examples:

register_sentiment_analysis.py
register_emoji_analysis.py
inference_sentiment_batch.py

Rules:

Match corresponding job YAML file name
snake_case format
Located in /resources/jobs/{category}/{job_name}/

Code Naming

Python Modules and Packages

Package Names: lowercase, no underscores

Examples:

models
pipelines
utils
config

Module Names: snake_case

Examples:

sentiment_analysis.py
text_processing.py
data_validation.py
model_registration.py

Python Variables and Functions

Variables: snake_case

Examples:

message_id = "msg_001"
sentiment_score = 0.95
training_data_path = "/Volumes/dev/bronze/messages"
max_sequence_length = 512

Functions: snake_case, verb-based

Examples:

def calculate_sentiment_score(text):
    pass

def preprocess_message(content):
    pass

def register_model(model_name, catalog):
    pass

def validate_input_schema(df):
    pass

Boolean Variables: Prefix with is_, has_, should_

Examples:

is_valid = True
has_sentiment = False
should_retry = True

Python Classes

Format: PascalCase

Examples:

class SentimentAnalysisModel:
    pass

class MessageProcessor:
    pass

class DataValidator:
    pass

class ModelTrainer:
    pass

Rules:

Use nouns or noun phrases
Descriptive of class purpose
Avoid generic names like Manager, Handler unless specific

Python Constants

Format: UPPER_SNAKE_CASE

Examples:

MAX_MESSAGE_LENGTH = 10000
DEFAULT_BATCH_SIZE = 100
SUPPORTED_MODELS = ["sentiment", "emotion", "linguistic"]
API_VERSION = "v1"

Configuration Naming

Environment Variables

Format: UPPER_SNAKE_CASE

Examples (in databricks.yml):

variables:
  catalog_name: ${var.catalog_name}
  environment: ${var.environment}
  s3_bucket: ${var.s3_bucket}
  resource_prefix: ${var.resource_prefix}

Usage in code:

CATALOG_NAME = dbutils.widgets.get("catalog_name")
ENVIRONMENT = os.getenv("ENVIRONMENT", "dev")

Spark Configuration

Format: Dot notation, lowercase

Examples:

configuration:
  "spark.sql.adaptive.enabled": "true"
  "spark.sql.adaptive.coalescePartitions.enabled": "true"
  "spark.databricks.mlflow.trackingUri": "databricks"
  "spark.databricks.ai.query.timeout": "180s"
  "spark.databricks.ai.query.maxRetries": "3"

File and Directory Naming

Python Files

Format: snake_case.py

Examples:

sentiment_analysis.py
text_processing.py
model_registration.py
data_validation.py

YAML Files

Format: kebab-case.yml

Examples:

register-sentiment-analysis.job.yml
bronze-data-ingestion.pipeline.yml
bronze-volumes.yml
schemas.yml

Notebooks

Format: snake_case.ipynb or snake_case.py

Examples:

data_exploration.ipynb
model_training.py
debug_pipeline_issue.ipynb

Rules:

snake_case format
Descriptive of notebook purpose
Include date for temporary notebooks (e.g., debug_2025_01_15.ipynb)

Directories

Format: snake_case (lowercase)

Examples:

src/
  models/
    internal/
      sentiment_analysis/
    external/
      roberta_go_emotions/
  utils/
  config/
resources/
  jobs/
    model_registration/
  pipelines/
    bronze/
    silver/
    gold/
tests/
  unit/
  integration/
docs/

Rules:

Lowercase only
No special characters except underscore
Descriptive of contents
Follow project structure conventions

Git Branch Naming

Format: ref-{ticket-number}-{brief-description}

Branch names are generated by Linear based on ticket ID and title.

Examples:

ref-1234-add-emotion-detection
ref-5678-fix-null-sentiment-handling
ref-9012-update-deployment-guide
ref-3456-simplify-text-preprocessing
ref-7890-add-model-tests

Rules:

Always starts with ref- prefix
Ticket number from Linear
kebab-case for description
Brief but descriptive (2-5 words)
Lowercase only

Pull Request Naming

Format: ref-{ticket} | {Title}

PR titles use Linear ticket reference followed by descriptive title.

Examples:

ref-1234 | Add emotion detection model
ref-5678 | Fix null sentiment handling in pipeline
ref-9012 | Update deployment guide documentation
ref-3456 | Simplify text preprocessing utilities
ref-7890 | Add unit tests for sentiment model

Rules:

Always starts with ref-{ticket} followed by |
Title case for PR title
Descriptive but concise

Workspace Paths

Bundle Deployment Paths

Format: /Workspace/{location}/.bundle/{bundle_name}/{target}

Examples:

/Workspace/Users/[email protected]/.bundle/ml_pipelines/sandbox
/Shared/.bundle/ml_pipelines/dev
/Shared/.bundle/ml_pipelines/staging
/Shared/.bundle/ml_pipelines/prod

Rules:

Sandbox: Under user workspace directory
Shared environments: Under /Shared/
Bundle name from databricks.yml
Target name from deployment

S3 Bucket Paths

Format: s3://{bucket}/{catalog}/volumes/{schema}/{volume}/

Examples:

s3://ref-ml-core-dev-workspace-bucket/dev/volumes/bronze/raw_messages/
s3://ref-ml-core-dev-workspace-bucket/taylorlaing_sandbox/volumes/bronze/raw_messages/
s3://ref-ml-core-staging-workspace-bucket/staging/volumes/bronze/raw_messages/
s3://ref-ml-core-prod-workspace-bucket/prod/volumes/bronze/raw_messages/

Rules:

Bucket name includes environment
Catalog-specific paths for isolation
Follow Unity Catalog volume structure

Quick Reference Table

Asset Type

Format

Example

Catalog

{environment}

dev, prod

Sandbox Catalog

{username}_sandbox

taylorlaing_sandbox

Schema

{layer}

bronze, silver, gold, models

Table

{domain}_{entity}

sentiment_analysis

Model

{model_name}

sentiment_analysis

Pipeline File

{name}.pipeline.yml

sentiment_analysis.pipeline.yml

Job File

register_{model}.job.yml

register_sentiment_analysis.job.yml

Python File

{name}.py

sentiment_analysis.py

Class

PascalCase

SentimentAnalysisModel

Function

snake_case

calculate_sentiment_score

Variable

snake_case

sentiment_score

Constant

UPPER_SNAKE_CASE

MAX_MESSAGE_LENGTH

Branch

{type}/{description}

feature/add-emotion-detection

Validation Checklist

Before merging code, verify naming follows conventions:

Code Standards - Coding conventions and style guide
Configuration Reference - databricks.yml and configuration
CLI Commands - Makefile and Databricks CLI usage

Previousv1-model-flow-2025-09 NextGlossary

Last updated 5 months ago

hashtagOverview

hashtagGeneral Principles

hashtagUnity Catalog Naming

hashtagCatalogs

hashtagSchemas

hashtagTables

hashtagModels (MLflow)

hashtagVolumes

hashtagPipeline Naming

hashtagDLT Pipeline Configurations

hashtagPipeline Scripts

hashtagJob Naming

hashtagJob Configurations

hashtagJob Scripts

hashtagCode Naming

hashtagPython Modules and Packages

hashtagPython Variables and Functions

hashtagPython Classes

hashtagPython Constants

hashtagConfiguration Naming

hashtagEnvironment Variables

hashtagSpark Configuration

hashtagFile and Directory Naming

hashtagPython Files

hashtagYAML Files

hashtagNotebooks

hashtagDirectories

hashtagGit Branch Naming

hashtagPull Request Naming

hashtagWorkspace Paths

hashtagBundle Deployment Paths

hashtagS3 Bucket Paths

hashtagQuick Reference Table

hashtagValidation Checklist

hashtagRelated Documentation

Overview

General Principles

Unity Catalog Naming

Catalogs

Schemas

Tables

Models (MLflow)

Volumes

Pipeline Naming

DLT Pipeline Configurations

Pipeline Scripts

Job Naming

Job Configurations

Job Scripts

Code Naming

Python Modules and Packages

Python Variables and Functions

Python Classes

Python Constants

Configuration Naming

Environment Variables

Spark Configuration

File and Directory Naming

Python Files

YAML Files

Notebooks

Directories

Git Branch Naming

Pull Request Naming

Workspace Paths

Bundle Deployment Paths

S3 Bucket Paths

Quick Reference Table

Validation Checklist

Related Documentation