CLI Commands Reference

Overview

This document provides a complete reference for all command-line operations in the ML Pipelines repository, including Makefile targets and Databricks CLI commands.

Table of Contents

Makefile Commands

Location: /Users/taylorlaing/Development/refresh-os/ml-pipelines/Makefile

help

Description: Show all available Makefile targets

Output:


validate

Description: Validate bundle configuration for sandbox environment

What it does:

  1. Runs databricks bundle validate -t sandbox

  2. Checks YAML syntax

  3. Verifies variable references

  4. Validates resource configurations

When to use: Before deploying to catch configuration errors early

Example output:


deploy

Description: Deploy to your personal sandbox catalog

What it does:

  1. Validates bundle configuration

  2. Gets your username (e.g., taylor)

  3. Creates catalog {username}_sandbox if needed

  4. Creates S3 volume directories

  5. Builds Python wheel with uv build --wheel

  6. Deploys all resources to dev workspace

  7. Resources named as [dev {username}] resource_name_sandbox

Variables:

  • SHORT_NAME: Auto-detected from current user

  • CATALOG_NAME: ${SHORT_NAME}_sandbox

  • S3_BUCKET: ref-ml-core-dev-workspace-bucket

Example:


copy-data

Description: Copy sample data from dev volumes to sandbox for testing

What it does:

  1. Runs /Users/taylorlaing/Development/refresh-os/ml-pipelines/scripts/copy-dev-data-to-sandbox.sh

  2. Copies data files from dev volumes to {username}_sandbox volumes

  3. Useful for testing pipelines with real data

When to use: After initial sandbox deployment to populate test data


deploy-dev

Description: Deploy to dev environment (CI/CD only)

What it does:

  1. Validates bundle for dev target

  2. Deploys to dev workspace

  3. Uses service principal authentication

Authentication: Requires DATABRICKS_AUTH_TYPE=github-oidc (set by GitHub Actions)

When to use:

  • Automatically in GitHub Actions on PR merge

  • Manual testing with service principal credentials (rare)


deploy-staging

Description: Deploy to staging environment (CI/CD only)

Similar to: deploy-dev but for staging environment


deploy-prod

Description: Deploy to production environment (CI/CD only)

Similar to: deploy-dev but for production environment

Warning: Use with caution - deploys to production


test

Description: Run Python tests using pytest

What it does:

  1. Runs uv run pytest

  2. Executes all tests in tests/ directory

  3. Shows test results and coverage

Example:


clean

Description: Remove build artifacts and cache

What it does:

  1. Removes dist/ directory (wheel files)

  2. Removes build/ directory

  3. Removes *.egg-info directories

  4. Removes __pycache__ directories

  5. Deletes .pyc files

When to use:

  • Before rebuilding from scratch

  • When debugging build issues

  • To reclaim disk space

Example:


validate-dev, validate-staging, validate-prod

Description: Validate bundle for specific environments

What it does: Validates bundle configuration for the specified target

When to use:

  • Testing environment-specific configurations

  • Debugging CI/CD validation failures

Databricks Bundle Commands

bundle validate

Description: Validate bundle configuration

Options:

  • -t, --target <target>: Target environment (sandbox, dev, staging, prod)

  • --var <key=value>: Override variable value

Examples:

When to use: Before deployment to catch errors


bundle deploy

Description: Deploy bundle to Databricks workspace

Options:

  • -t, --target <target>: Target environment

  • --var <key=value>: Override variable value

  • --force: Force deployment even if validation fails (use cautiously)

Examples:

What happens:

  1. Validates configuration

  2. Builds artifacts (Python wheels)

  3. Uploads to workspace

  4. Creates/updates resources (pipelines, jobs, volumes)


bundle destroy

Description: Delete all resources created by bundle

Options:

  • -t, --target <target>: Target environment

  • --auto-approve: Skip confirmation prompt

Examples:

Warning: This deletes all pipelines, jobs, and volumes. Use with caution!


bundle run

Description: Run a job from the bundle

Options:

  • -t, --target <target>: Target environment

Examples:

Databricks Catalog Commands

catalogs list

Description: List all catalogs you have access to

Options:

  • --profile <profile>: Databricks profile to use

  • --output <format>: Output format (json, table)

Examples:


catalogs get

Description: Get details of a specific catalog

Examples:


catalogs create

Description: Create a new catalog

Options:

  • --storage-root <s3_path>: S3 location for catalog

  • --comment <comment>: Catalog description

Examples:


catalogs delete

Description: Delete a catalog

Options:

  • --force: Delete even if not empty

Examples:

Warning: Force delete removes all schemas and tables!

Databricks Pipeline Commands

pipelines list-pipelines

Description: List all DLT pipelines

Options:

  • --profile <profile>: Databricks profile

  • --output <format>: Output format (json, table)

Examples:


pipelines get

Description: Get pipeline details

Examples:


pipelines start-update

Description: Trigger a pipeline update (run)

Options:

  • --full-refresh: Full refresh of all tables

Examples:


pipelines stop

Description: Stop a running pipeline

Examples:


pipelines reset

Description: Reset pipeline state (clear checkpoints)

Examples:

Warning: This clears all streaming state. Next run will reprocess all data.


pipelines list-updates

Description: List recent pipeline updates

Options:

  • --limit <n>: Number of updates to show

Examples:


pipelines get-update

Description: Get details of a specific pipeline update

Examples:

Databricks Job Commands

jobs list

Description: List all jobs

Examples:


jobs get

Description: Get job details

Examples:


jobs run-now

Description: Trigger a job run

Options:

  • --notebook-params <json>: Parameters for notebook task

  • --python-params <args>: Parameters for Python task

Examples:


jobs list-runs

Description: List recent job runs

Options:

  • --job-id <job_id>: Filter by job

  • --limit <n>: Number of runs to show

  • --active-only: Show only running jobs

Examples:


jobs get-run

Description: Get details of a specific job run

Examples:


jobs cancel-run

Description: Cancel a running job

Examples:

Databricks Model Commands

model-serving list

Description: List model serving endpoints

Examples:


model-serving get

Description: Get model serving endpoint details

Examples:


model-serving create

Description: Create a model serving endpoint

Options:

  • --name <name>: Endpoint name

  • --config @<file>: Configuration file (JSON)

Examples:


model-serving update

Description: Update model serving endpoint

Examples:


model-serving delete

Description: Delete model serving endpoint

Examples:

Databricks SQL Commands

sql execute

Description: Execute SQL query

Options:

  • --profile <profile>: Databricks profile

  • --warehouse-id <id>: SQL warehouse ID

Examples:


sql queries list

Description: List saved SQL queries

Examples:

Databricks File System Commands

fs ls

Description: List files in workspace or DBFS

Options:

  • --absolute: Show absolute paths

  • --long: Show detailed info (size, modified time)

Examples:


fs cp

Description: Copy files

Options:

  • --recursive: Copy directories

  • --overwrite: Overwrite existing files

Examples:


fs rm

Description: Remove files

Options:

  • --recursive: Remove directories

Examples:

AWS S3 Commands

s3 ls

Description: List S3 bucket contents

Options:

  • --profile <profile>: AWS profile to use

  • --recursive: List recursively

Examples:


s3 cp

Description: Copy files to/from S3

Options:

  • --recursive: Copy directories

  • --profile <profile>: AWS profile

Examples:


s3api put-object

Description: Create S3 directory placeholder

Options:

  • --profile <profile>: AWS profile

  • --no-cli-pager: Disable pager

Examples:


s3api get-bucket-policy

Description: Get S3 bucket policy

Examples:

Common Workflows

Deploy Changes to Sandbox


Create and Test New Pipeline


Register and Test Model


Debug Pipeline Failure


Check Data Quality


Cleanup Sandbox

Tips and Tricks

Using JSON Output with jq

Using Shell Loops

Environment Variables

External Resources

Last updated