Code Standards and PR Process

Overview

This guide defines coding conventions, standards, and the pull request process for the ML Pipelines project. Consistent code standards improve readability, maintainability, and collaboration.

Python Style Guide

PEP 8 Compliance

All Python code must follow PEP 8arrow-up-right style guidelines.

Key Principles:

  • Indentation: 4 spaces (no tabs)

  • Line length: 100 characters maximum (flexible to 120 for readability)

  • Imports: Grouped and sorted (standard library, third-party, local)

  • Naming: See naming conventions below

  • Whitespace: Consistent spacing around operators and after commas

Enforcement:

# Format code with black
uv run black src/ tests/

# Check code style
uv run flake8 src/ tests/

# Sort imports
uv run isort src/ tests/

Naming Conventions

Variables and Functions

Classes

Constants

Private Methods/Variables

Docstrings

Use Google-style docstrings for all public functions and classes.

Function Docstrings:

Class Docstrings:

Type Hints

Use type hints for function signatures and class attributes.

Import Organization

Import Order:

  1. Standard library imports

  2. Third-party imports

  3. Local application imports

Example:

Use isort to automatically organize imports:

File Organization

Directory Structure

File Naming

  • Python files: snake_case.py

    • sentiment_analysis.py

    • text_processing.py

    • model_registration.py

  • YAML files: kebab-case.yml

    • register-sentiment-analysis.job.yml

    • bronze-ingestion.pipeline.yml

  • Notebooks: snake_case.ipynb or snake_case.py (for Databricks)

    • data_exploration.ipynb

    • model_training.py

  • Test files: test_[module_name].py

    • test_sentiment_model.py

    • test_data_validation.py

Code Documentation

Inline Comments

Use comments to explain why, not what.

Good Comments:

Bad Comments (obvious from code):

TODOs and FIXMEs

Use standard markers for future work:

Search for TODOs:

PR Process

Creating a Pull Request

  1. Create feature branch:

  2. Make changes and commit:

  3. Push to GitHub:

  4. Open pull request:

    • Navigate to GitHub repository

    • Click "Compare & pull request"

    • Fill out PR template (see below)

PR Title Format

Always matches the ticket ID given by Linear, which prefixes tickets with ref-. When copying the branch name from the suggested named Linear gives, it will also include the title of the ticket converted into kebab-case.

  • Example: A ticket is titled "Upgrade matplotlib package for security", the suggested branch name would look like ref-1234-upgrade-matplotlib-package-for-security

PR Description Template

PR Review Process

Automatic Checks (CI/CD):

  • Bundle validation (via GitHub Actions)

  • Code style checks (if configured)

  • Tests (if configured)

Manual Review:

  • Code quality and readability

  • Logic correctness

  • Test coverage

  • Documentation completeness

  • Performance implications

  • Security considerations

Review Checklist for Reviewers:

PR Approval Requirements

Dev Deployment:

  • 1+ approval from team member

  • All automated checks pass

  • No unresolved comments

Staging/Prod Deployment:

  • Lead developer approval required

  • All tests pass

  • Documentation updated

Git Workflow

Branch Strategy

Commit Messages

Format:

Good Commit Messages:

Bad Commit Messages:

Keeping Branch Updated

Code Quality Tools

Linting and Formatting:

Pre-commit Hooks (recommended setup):

Create .pre-commit-config.yaml:

Install pre-commit:

VS Code Settings

Recommended settings (.vscode/settings.json):

Best Practices

Do's

Write self-documenting code: Use clear variable names and structure Add docstrings: Document all public functions and classes Use type hints: Make function signatures explicit Write tests: Test new features and bug fixes Keep functions small: Each function should do one thing well Handle errors gracefully: Use try/except with specific exceptions Log important events: Use logging instead of print statements Review your own code: Before requesting review, review your changes Update documentation: Keep docs in sync with code changes

Don'ts

Don't commit commented-out code: Delete it (git history preserves it) Don't use magic numbers: Define constants with meaningful names Don't ignore errors: Handle or propagate exceptions appropriately Don't hardcode values: Use configuration or environment variables Don't repeat code: Extract common logic into functions Don't commit secrets: Never commit API keys, passwords, tokens Don't skip tests: Untested code will break in production Don't merge without approval: Wait for code review Don't force push to main: Never use git push --force on main branch

Security Considerations

Secrets Management

Never commit:

  • API keys

  • Passwords

  • Tokens

  • Private keys

  • Database credentials

Use instead:

  • Databricks secrets

  • Environment variables

  • GitHub secrets (for CI/CD)

Check for secrets before commit:

Input Validation

Always validate user input:

Last updated