180 models ranked for data engineering. Scored with bonuses for JSON mode (structured schemas), reasoning (query optimization), function calling (pipeline orchestration), large context, and large output.
| # | Model | Score |
|---|---|---|
| 1 | Claude Opus 4.7Anthropic | 95 |
| 2 | GPT-5.5OpenAI | 93 |
| 3 | Gemini 3.1 Pro Preview Custom ToolsGoogle | 92 |
| 4 | Gemini 3.1 Pro PreviewGoogle | 92 |
| 5 | GPT-5.4 ProOpenAI | 92 |
| 6 | GPT-5.4OpenAI | 92 |
| 7 | GPT-5.5 ProOpenAI | 91 |
| 8 | GPT-5.2 ProOpenAI | 91 |
| 9 | Claude Opus 4.6 (Fast)Anthropic | 90 |
| 10 | Claude Opus 4.6Anthropic | 90 |
| 11 | GPT-5.2-CodexOpenAI | 90 |
| 12 | GPT-5.2OpenAI | 90 |
| 13 | GPT-5.3-CodexOpenAI | 89 |
| 14 | GPT-5 ProOpenAI | 89 |
| 15 | Gemini 3 Flash PreviewGoogle | 88 |
| 16 | GPT-5.1-Codex-MaxOpenAI | 88 |
| 17 | GPT-5 CodexOpenAI | 88 |
| 18 | GPT-5OpenAI | 88 |
| 19 | GPT-5.1OpenAI | 87 |
| 20 | GPT-5.1-CodexOpenAI | 87 |
| 21 | GPT-5.1-Codex-MiniOpenAI | 87 |
| 22 | DeepSeek V4 ProDeepSeek | 87 |
| 23 | o3 Deep ResearchOpenAI | 87 |
| 24 | o3 ProOpenAI | 87 |
| 25 | o3OpenAI | 87 |
| 26 | Grok 4.20xAI | 89 |
| 27 | Claude Sonnet 4.6Anthropic | 85 |
| 28 | Claude Opus 4.5Anthropic | 85 |
| 29 | Grok 4xAI | 88 |
| 30 | Gemini 2.5 ProGoogle | 84 |
Generate complex SQL queries, dbt models, and data transformations. JSON mode ensures structured output for automated pipeline integration.
Design data warehouse schemas, create migration scripts, and manage evolving data models. Reasoning models optimize for query performance and normalization.
Generate Airflow DAGs, Prefect flows, and Dagster assets. Function calling enables integration with orchestration APIs and metadata catalogs.
Create data quality checks, Great Expectations suites, and validation rules. Large context windows handle full schema documentation for comprehensive testing.
Models generate ETL/ELT code for Apache Spark, dbt, Airflow, and Prefect. Reasoning handles complex transformation logic and data quality rules. Function calling integrates with data catalog APIs. Large context windows process entire pipeline DAGs for optimization suggestions.
Yes, top models generate PySpark and Spark SQL code, optimize join strategies, suggest partitioning schemes, and debug serialization errors. Reasoning is critical for understanding distributed computing patterns and avoiding common pitfalls like data skew.
JSON mode outputs structured data quality rules compatible with Great Expectations and dbt tests. Reasoning identifies edge cases and data anomalies. Function calling enables programmatic data profiling. Large context handles complex schemas with hundreds of columns.
Models design dimensional models (star/snowflake schemas), implement slowly changing dimensions, and generate dbt models with proper materialization strategies. They understand trade-offs between Snowflake, Databricks, and BigQuery architectures.