Learnings from Deploying a Custom Python Model for ai_query

This document captures the key findings and best practices discovered while debugging and deploying the emoji_analysis model for use with the Databricks ai_query function. The process revealed several critical subtleties in how ai_query interacts with MLflow model serving endpoints.

Key Takeaway

The ai_query function's automatic schema inference is extremely strict. To ensure compatibility, the MLflow model must return a JSON response that is not only well-formed but also has a 100% stable and predictable schema. Any ambiguity in the output structure or data types will cause the AI_FUNCTION_MODEL_SCHEMA_PARSE_ERROR.

What Didn't Work and Why

Returning a Dictionary from predict:
- Attempt: The model's predict method returned a dictionary like {"predictions": [...]}.
- Problem: This caused a "double-wrapping" issue. The MLflow serving layer adds its own {"predictions": [...]} wrapper, resulting in a final response like {"predictions": {"predictions": [...]}}, which ai_query cannot parse.
Inconsistent Data Types (The Serialization Problem):
- Attempt: The model returned native Python numeric types (float, int, Decimal) and nested JSON objects as strings.
- Problem: ai_query's parser failed on the numeric types (REMOTE_FUNCTION_HTTP_RESULT_PARSE_ERROR) and was confused by fields that were sometimes strings and sometimes objects. The parser requires absolute consistency.
Dynamic Schema Structures:
- Attempt: Parts of the model's output had a dynamic structure. For example, a features dictionary where the keys were the names of detected features, or a sentiment dictionary with a key like "positive" or "negative".
- Problem: MLflow's signature inference engine correctly identified these structures as having an unpredictable schema, marking them as type: object, format: any. The ai_query schema parser cannot handle the any type and fails immediately.
Relying on "Lucky" Signature Inference:
- Attempt: Using a simple, arbitrary input example to infer the model's output signature.
- Problem: If the sample input didn't trigger all possible output structures (e.g., it produced an empty features list), the inferred signature would be incomplete. Specifically, the schema for an empty list is inferred as Any, which guarantees a downstream ai_query failure.
Inconsistent Schema in Edge Cases:
- Attempt: The model had different logic for handling messages with zero emojis versus those with one or more.
- Problem: An early return statement in the zero-emoji case had a bug where it stringified a nested JSON field, while the main code path did not. This created an inconsistent schema depending on the input data, causing parsing failures.

What Works: Best Practices for `ai_query` Compatibility

The predict Method MUST Return a List:
- The Python function should return a list of result objects (e.g., [result_dict]). MLflow will correctly wrap this in the final {"predictions": [...]} JSON response.
Ensure a 100% Stable Output Schema:
- No Dynamic Keys: Convert any dictionaries with dynamic keys into arrays of objects with fixed keys.
  - Incorrect: {"positive": 0.9}
  - Correct: {"sentiment": "positive", "score": 0.9}
- No Generic Objects: Ensure every object and nested object in the output has a consistent, predictable set of keys.
Stringify All Primitive Data Types:
- The most robust way to avoid data type parsing errors is to recursively traverse the entire output object and convert all primitive values (int, float, Decimal, bool, datetime) to their string representations.
- This creates a completely unambiguous JSON payload that ai_query can reliably parse. The final casting to correct SQL types (CAST(... AS INT)) should be done in the DLT pipeline itself.
Guarantee a "Golden" Example for Signature Inference:
- When logging the model, do not rely on a random sample input. Instead, programmatically construct a "golden" output example that is guaranteed to contain every possible field and nested structure, including non-empty lists for all array fields.
- Use this golden example with mlflow.infer_signature to ensure the registered model has a complete and accurate schema.
Use a Two-Step DLT Pipeline:
- Step 1 (Intermediate Table): Create a streaming live table that calls ai_query and lands the raw, string-heavy results. This isolates the model call.
- Step 2 (Final Silver Table): Create the final live table by reading from the intermediate table. In this step, perform all CAST operations to convert the string fields to their proper SQL data types (DECIMAL, INT, TIMESTAMP, etc.) and use to_json() on any struct fields that need to be stored as strings.

The Successful Solution: Why It Worked

The Golden Output Construction Method

The breakthrough came from constructing a 100% complete, programmatically generated "golden" output example for MLflow signature inference. This approach eliminated all ambiguity in the schema.

What Made It Work:

Complete Field Coverage: Every single field from the target silver schema (59 fields total) was explicitly included in the canonical output, including all optional fields.
Populated Nested Structures: Instead of relying on empty lists or dynamic objects, every nested structure was populated with complete sample objects:
- emojis array: Contains complete emoji match objects with all required fields
- features array: Contains complete feature objects with populated positive_matches and negative_matches arrays
- sentiment_analysis: Contains complete sentiment result objects
Stable Schema Design: All dynamic dictionary structures were converted to arrays of objects with fixed keys:
- features changed from {"gratitude": {...}} to [{"feature_name": "gratitude", ...}]
- sentiment_result changed from {"positive": 0.9} to {"sentiment": "positive", "score": "0.9"}
Stringified Primitives: All numeric types, booleans, and timestamps were converted to strings, ensuring JSON serialization compatibility.

The Code That Made It Work:

# CONSTRUCT A "GOLDEN" OUTPUT FOR ROBUST SIGNATURE INFERENCE
# This is the most critical step for ai_query compatibility. We manually build a perfect
# example of the model's output, ensuring every nested list is populated. This prevents
# mlflow.infer_signature from defaulting to "any" type for empty lists, which causes
# the ai_query schema parser to fail.

golden_sentiment_result = {
    'sentiment': 'positive',
    'score': '0.99'
}

golden_feature_match = {
    'emoji': '😊',
    'sentiment_result': golden_sentiment_result
}

golden_feature = {
    'feature_name': 'gratitude',
    'positive_matches': [golden_feature_match],
    'negative_matches': [golden_feature_match],
    'emoji_count': '1',
    'total_score': '0.9',
    'confidence': '0.95',
    'intensity': 'strong'
}

golden_emoji_match = {
    'emoji': '😊',
    'start_pos': '10',
    'end_pos': '11',
    'context_window': 'a sample context window'
}

# The full canonical output structure, matching the model's output and silver schema
canonical_output = {
    'message_id': '', 'timestamp': '2025-01-01T00:00:00+00:00', 'message_type': '', 'message_source': '',
    'total_emoji_count': '1', 'emoji_sentiment_score': '0.98', 'emoji_diversity': '0.0', 'has_emojis': 'True',
    'positive_emoji_count': '1', 'negative_emoji_count': '0', 'neutral_emoji_count': '0', 'emoji_density': '0.1',
    'affect_valence': '0.8', 'affect_activation': '0.4', 'affect_quadrant': 'high_positive_high_energy',
    'emotional_state': 'happy', 'overall_confidence': '0.9', 'emoji_presence_confidence': '0.9',
    'emoji_matches_json': {
        'emojis': [golden_emoji_match],
        'features': [golden_feature],
        'sentiment_analysis': golden_sentiment_result
    },
    'created_date': '2025-01-01T00:00:00+00:00'
}

# Add all 11 feature fields programmatically to ensure none are missed
features = [
    "autonomy", "belonging", "competence", "transactional_leadership", "transformational_leadership", "gratitude",
    "psychological_safety", "work_life_balance", "emotional_intelligence", "resilience", "engagement"
]
for feature in features:
    canonical_output[f"{feature}_emoji_sentiment"] = '0.0'
    canonical_output[f"{feature}_emoji_confidence"] = '0.0'
    canonical_output[f"{feature}_emoji_intensity"] = 'weak'

# Wrap the canonical output in a list, as the model's predict function does
golden_output_for_inference = [canonical_output]

# Infer the signature from this perfect, unambiguous example
signature = infer_signature(INPUT_EXAMPLE, golden_output_for_inference)

The Two-Step DLT Pipeline

The second critical component was the two-step DLT pipeline design:

Step 1: Intermediate Table (`emoji_features_intermediate`)

CREATE OR REFRESH STREAMING LIVE TABLE emoji_features_intermediate
COMMENT "Raw output from the emoji_analysis model endpoint via ai_query. All model fields are strings or structs."
AS
SELECT
  bronze.message_id,
  bronze.timestamp,
  bronze.message_type,
  bronze.message_source,
  result.*
FROM (
  SELECT
    message_id,
    timestamp,
    message_type,
    message_source,
    ai_query(
      "emoji_analysis", -- The name of your model serving endpoint
      redacted_content
    ) AS emoji_analysis_response
  FROM STREAM(LIVE.bronze_messages)
) bronze,
LATERAL VIEW explode(bronze.emoji_analysis_response.predictions) exploded_view AS result;

Step 2: Final Silver Table (`emoji_features`)

CREATE OR REFRESH STREAMING LIVE TABLE emoji_features
(
  -- Complete schema definition with all 59 fields
  message_id STRING NOT NULL,
  timestamp TIMESTAMP,
  -- ... all other fields with proper types
  emoji_matches_json STRING,
  created_date TIMESTAMP
)
COMMENT "Emoji sentiment analysis results, cast to the correct silver schema."
AS
SELECT
  message_id,
  CAST(timestamp AS TIMESTAMP),
  -- ... all other fields with proper casting
  to_json(emoji_matches_json) AS emoji_matches_json, -- Convert struct to JSON string
  CAST(created_date AS TIMESTAMP)
FROM
  STREAM(LIVE.emoji_features_intermediate);

Why This Solution Succeeded

Eliminated Schema Ambiguity: The golden output ensured every field had a concrete, unambiguous type in the MLflow signature.
Robust Data Type Handling: Stringifying all primitives in the model output, then casting to proper types in the DLT pipeline, eliminated data type parsing errors.
Clean Separation of Concerns: The intermediate table isolates the model call, while the final table handles all data type conversions and schema compliance.
Future-Proof Design: Any changes to the model's output structure only require updating the golden example, not the entire pipeline.

This approach transformed a complex, error-prone Python UDF pipeline into a clean, maintainable SQL-based solution that leverages Databricks' native ai_query function for optimal performance and reliability.

Production DLT Streaming Pipeline Learnings

After successfully deploying the model for ai_query compatibility, we discovered additional critical insights when implementing the production DLT streaming pipeline.

Key Discovery: Schema Conflicts vs Performance Issues

Initial Misdiagnosis: When DLT streaming tables with ai_query showed zero rows scanned and written, we initially suspected fundamental incompatibility between streaming and AI functions.

Actual Root Cause: The streaming pipeline was actually working (making AI calls, processing data) but failing on schema conflicts when trying to write results to existing tables.

Evidence from Error Logs:

[INCOMPATIBLE_VIEW_SCHEMA_CHANGE] The SQL query of view `dev`.`silver`.`emoji_features`
has an incompatible schema change and column message_id cannot be resolved.
Expected 1 columns named message_id but got [].

This error revealed that:

AI calls were succeeding - The pipeline was processing ai_query calls
Data was being read - Bronze streaming was working
Results were being generated - The AI model was returning data
Schema conflicts prevented writes - Existing tables had incompatible schemas

Production Streaming Pipeline Best Practices

1. Use `CREATE OR REPLACE` for Schema Cleanup

-- This creates a clean slate, eliminating schema conflicts
CREATE OR REPLACE STREAMING LIVE TABLE emoji_features_raw
COMMENT "Stage 1: Production streaming AI processing with schema cleanup"
TBLPROPERTIES('pipelines.channel' = 'PREVIEW')
AS SELECT
  message_id,
  timestamp,
  message_type,
  message_source,
  redacted_content,
  ai_query(
    endpoint => "emoji_analysis",
    request => redacted_content,
    failOnError => false
  ) AS raw_ai_result,
  CURRENT_TIMESTAMP() AS processed_at
FROM STREAM(${CATALOG}.bronze.messages)
WHERE redacted_content IS NOT NULL
  AND LENGTH(redacted_content) > 10;

2. Two-Stage Streaming Architecture

Stage 1: Fast AI processing → emoji_features_raw (immediate visibility)
Stage 2: Field extraction → emoji_features (final clean table)

This separation provides:

Immediate progress visibility - See AI results in Stage 1 table
Error isolation - Distinguish AI issues from field extraction issues
Performance optimization - AI calls separated from complex CAST operations

3. Optimized Pipeline Configuration

configuration:
  # Streaming optimizations for AI workloads
  "spark.sql.streaming.maxBytesPerTrigger": "200MB"
  "spark.sql.streaming.trigger.processingTime": "30 seconds"
  "spark.sql.streaming.maxRowsPerTrigger": "1000"
  "spark.databricks.ai.query.timeout": "120s"
  "spark.databricks.ai.query.maxConcurrentRequests": "20"
  "spark.databricks.ai.query.retryPolicy": "exponential"

4. File-Based Dependency Management

transformations/
├── 01_emoji_features_raw.sql    # Stage 1: AI processing
└── 02_emoji_features.sql        # Stage 2: Field extraction

DLT processes files alphabetically, ensuring proper execution order.

What Actually Works in Production

ai_query + Streaming is Compatible - The combination works when schema conflicts are resolved
Performance is Good - 30-50 AI requests/second with low CPU usage on model endpoint
Schema Management is Critical - Use CREATE OR REPLACE to avoid conflicts
Two-stage architecture provides visibility - Essential for debugging and monitoring
Pipeline-level optimizations matter - Proper batching and concurrency settings are crucial

Common Troubleshooting

Symptom

Likely Cause

Solution

Zero rows scanned

Schema conflict

Use CREATE OR REPLACE

AI calls but no writes

Field extraction bottleneck

Use two-stage architecture

Pipeline hangs on "Running"

Streaming config too aggressive

Increase trigger intervals

Memory errors

Batch size too large

Reduce maxBytesPerTrigger

This production deployment experience shows that ai_query streaming pipelines are viable and performant when properly configured, contrary to initial research suggesting incompatibility.

PreviousLocal Development Guide NextTesting Guide

Last updated 5 months ago

hashtagKey Takeaway

hashtagWhat Didn't Work and Why

hashtagWhat Works: Best Practices for ai_query Compatibility

hashtagThe Successful Solution: Why It Worked

hashtagThe Golden Output Construction Method

hashtagWhat Made It Work:

hashtagThe Code That Made It Work:

hashtagThe Two-Step DLT Pipeline

hashtagStep 1: Intermediate Table (emoji_features_intermediate)

hashtagStep 2: Final Silver Table (emoji_features)

hashtagWhy This Solution Succeeded

hashtagProduction DLT Streaming Pipeline Learnings

hashtagKey Discovery: Schema Conflicts vs Performance Issues

hashtagEvidence from Error Logs:

hashtagProduction Streaming Pipeline Best Practices

hashtag1. Use CREATE OR REPLACE for Schema Cleanup

hashtag2. Two-Stage Streaming Architecture

hashtag3. Optimized Pipeline Configuration

hashtag4. File-Based Dependency Management

hashtagWhat Actually Works in Production

hashtagCommon Troubleshooting