Learnings from Deploying a Custom Python Model for ai_query
This document captures the key findings and best practices discovered while debugging and deploying the emoji_analysis model for use with the Databricks ai_query function. The process revealed several critical subtleties in how ai_query interacts with MLflow model serving endpoints.
Key Takeaway
The ai_query function's automatic schema inference is extremely strict. To ensure compatibility, the MLflow model must return a JSON response that is not only well-formed but also has a 100% stable and predictable schema. Any ambiguity in the output structure or data types will cause the AI_FUNCTION_MODEL_SCHEMA_PARSE_ERROR.
What Didn't Work and Why
Returning a Dictionary from
predict:Attempt: The model's
predictmethod returned a dictionary like{"predictions": [...]}.Problem: This caused a "double-wrapping" issue. The MLflow serving layer adds its own
{"predictions": [...]}wrapper, resulting in a final response like{"predictions": {"predictions": [...]}}, whichai_querycannot parse.
Inconsistent Data Types (The Serialization Problem):
Attempt: The model returned native Python numeric types (
float,int,Decimal) and nested JSON objects as strings.Problem:
ai_query's parser failed on the numeric types (REMOTE_FUNCTION_HTTP_RESULT_PARSE_ERROR) and was confused by fields that were sometimes strings and sometimes objects. The parser requires absolute consistency.
Dynamic Schema Structures:
Attempt: Parts of the model's output had a dynamic structure. For example, a
featuresdictionary where the keys were the names of detected features, or asentimentdictionary with a key like"positive"or"negative".Problem: MLflow's signature inference engine correctly identified these structures as having an unpredictable schema, marking them as
type: object, format: any. Theai_queryschema parser cannot handle theanytype and fails immediately.
Relying on "Lucky" Signature Inference:
Attempt: Using a simple, arbitrary input example to infer the model's output signature.
Problem: If the sample input didn't trigger all possible output structures (e.g., it produced an empty
featureslist), the inferred signature would be incomplete. Specifically, the schema for an empty list is inferred asAny, which guarantees a downstreamai_queryfailure.
Inconsistent Schema in Edge Cases:
Attempt: The model had different logic for handling messages with zero emojis versus those with one or more.
Problem: An early
returnstatement in the zero-emoji case had a bug where it stringified a nested JSON field, while the main code path did not. This created an inconsistent schema depending on the input data, causing parsing failures.
What Works: Best Practices for ai_query Compatibility
ai_query CompatibilityThe
predictMethod MUST Return a List:The Python function should return a
listof result objects (e.g.,[result_dict]). MLflow will correctly wrap this in the final{"predictions": [...]}JSON response.
Ensure a 100% Stable Output Schema:
No Dynamic Keys: Convert any dictionaries with dynamic keys into arrays of objects with fixed keys.
Incorrect:
{"positive": 0.9}Correct:
{"sentiment": "positive", "score": 0.9}
No Generic Objects: Ensure every object and nested object in the output has a consistent, predictable set of keys.
Stringify All Primitive Data Types:
The most robust way to avoid data type parsing errors is to recursively traverse the entire output object and convert all primitive values (
int,float,Decimal,bool,datetime) to their string representations.This creates a completely unambiguous JSON payload that
ai_querycan reliably parse. The final casting to correct SQL types (CAST(... AS INT)) should be done in the DLT pipeline itself.
Guarantee a "Golden" Example for Signature Inference:
When logging the model, do not rely on a random sample input. Instead, programmatically construct a "golden" output example that is guaranteed to contain every possible field and nested structure, including non-empty lists for all array fields.
Use this golden example with
mlflow.infer_signatureto ensure the registered model has a complete and accurate schema.
Use a Two-Step DLT Pipeline:
Step 1 (Intermediate Table): Create a streaming live table that calls
ai_queryand lands the raw, string-heavy results. This isolates the model call.Step 2 (Final Silver Table): Create the final live table by reading from the intermediate table. In this step, perform all
CASToperations to convert the string fields to their proper SQL data types (DECIMAL,INT,TIMESTAMP, etc.) and useto_json()on any struct fields that need to be stored as strings.
The Successful Solution: Why It Worked
The Golden Output Construction Method
The breakthrough came from constructing a 100% complete, programmatically generated "golden" output example for MLflow signature inference. This approach eliminated all ambiguity in the schema.
What Made It Work:
Complete Field Coverage: Every single field from the target silver schema (59 fields total) was explicitly included in the canonical output, including all optional fields.
Populated Nested Structures: Instead of relying on empty lists or dynamic objects, every nested structure was populated with complete sample objects:
emojisarray: Contains complete emoji match objects with all required fieldsfeaturesarray: Contains complete feature objects with populatedpositive_matchesandnegative_matchesarrayssentiment_analysis: Contains complete sentiment result objects
Stable Schema Design: All dynamic dictionary structures were converted to arrays of objects with fixed keys:
featureschanged from{"gratitude": {...}}to[{"feature_name": "gratitude", ...}]sentiment_resultchanged from{"positive": 0.9}to{"sentiment": "positive", "score": "0.9"}
Stringified Primitives: All numeric types, booleans, and timestamps were converted to strings, ensuring JSON serialization compatibility.
The Code That Made It Work:
The Two-Step DLT Pipeline
The second critical component was the two-step DLT pipeline design:
Step 1: Intermediate Table (emoji_features_intermediate)
emoji_features_intermediate)Step 2: Final Silver Table (emoji_features)
emoji_features)Why This Solution Succeeded
Eliminated Schema Ambiguity: The golden output ensured every field had a concrete, unambiguous type in the MLflow signature.
Robust Data Type Handling: Stringifying all primitives in the model output, then casting to proper types in the DLT pipeline, eliminated data type parsing errors.
Clean Separation of Concerns: The intermediate table isolates the model call, while the final table handles all data type conversions and schema compliance.
Future-Proof Design: Any changes to the model's output structure only require updating the golden example, not the entire pipeline.
This approach transformed a complex, error-prone Python UDF pipeline into a clean, maintainable SQL-based solution that leverages Databricks' native ai_query function for optimal performance and reliability.
Production DLT Streaming Pipeline Learnings
After successfully deploying the model for ai_query compatibility, we discovered additional critical insights when implementing the production DLT streaming pipeline.
Key Discovery: Schema Conflicts vs Performance Issues
Initial Misdiagnosis: When DLT streaming tables with ai_query showed zero rows scanned and written, we initially suspected fundamental incompatibility between streaming and AI functions.
Actual Root Cause: The streaming pipeline was actually working (making AI calls, processing data) but failing on schema conflicts when trying to write results to existing tables.
Evidence from Error Logs:
This error revealed that:
AI calls were succeeding - The pipeline was processing
ai_querycallsData was being read - Bronze streaming was working
Results were being generated - The AI model was returning data
Schema conflicts prevented writes - Existing tables had incompatible schemas
Production Streaming Pipeline Best Practices
1. Use CREATE OR REPLACE for Schema Cleanup
CREATE OR REPLACE for Schema Cleanup2. Two-Stage Streaming Architecture
Stage 1: Fast AI processing β
emoji_features_raw(immediate visibility)Stage 2: Field extraction β
emoji_features(final clean table)
This separation provides:
Immediate progress visibility - See AI results in Stage 1 table
Error isolation - Distinguish AI issues from field extraction issues
Performance optimization - AI calls separated from complex CAST operations
3. Optimized Pipeline Configuration
4. File-Based Dependency Management
DLT processes files alphabetically, ensuring proper execution order.
What Actually Works in Production
ai_query + Streaming is Compatible - The combination works when schema conflicts are resolved
Performance is Good - 30-50 AI requests/second with low CPU usage on model endpoint
Schema Management is Critical - Use
CREATE OR REPLACEto avoid conflictsTwo-stage architecture provides visibility - Essential for debugging and monitoring
Pipeline-level optimizations matter - Proper batching and concurrency settings are crucial
Common Troubleshooting
Zero rows scanned
Schema conflict
Use CREATE OR REPLACE
AI calls but no writes
Field extraction bottleneck
Use two-stage architecture
Pipeline hangs on "Running"
Streaming config too aggressive
Increase trigger intervals
Memory errors
Batch size too large
Reduce maxBytesPerTrigger
This production deployment experience shows that ai_query streaming pipelines are viable and performant when properly configured, contrary to initial research suggesting incompatibility.
Last updated