Data Flow Architecture
Overview
Pipeline Orchestration
Pipeline Execution Flow
Stage 1 (Parallel): Data Ingestion
├── bronze_data_ingestion (DLT) → Bronze layer tables
└── neon_db_replication (Job) → Bronze reference tables
↓
Stage 2 (Parallel): Feature Extraction
├── emoji_analysis (DLT) → Silver layer features
├── feature_analysis (DLT) → Silver layer features
├── sentiment_analysis (DLT) → Silver layer features
└── linguistic_analysis (DLT) → Silver layer features
↓
Stage 3: Aggregation
└── psychosocial_analysis (DLT) → Gold layer aggregations
↓
Stage 4: Reporting
└── risk_analysis_report (DLT) → Report layer outputsMedallion Architecture
Bronze Layer: Raw Ingestion
Purpose
Data Sources
Bronze Tables
Bronze Layer Characteristics
Data Quality
Silver Layer: Cleaned, Validated & Enriched with Model Predictions
Purpose
Transformation Pipeline
Silver Tables
Data Quality Expectations
Deduplication Strategy
Gold Layer: Business-Ready Aggregations
Purpose
Gold Tables
ML Feature Engineering
Note on AI Query Usage
Data Lineage
Pipeline Dependencies
Table Dependencies
Tracking Lineage
Schema Evolution
Handling Schema Changes
Schema Migration Strategy
Data Quality Gates
Quality Framework
Expectation Patterns
Quarantine Tables
Performance Considerations
Partitioning Strategy
Streaming Optimizations
Query Performance
Data Retention
Retention Policies
Implementing Retention
Monitoring Data Flow
Key Metrics
Alerting
Related Documentation
Last updated