Two organizations build purpose-built Arabic models: HUMAIN (ALLaM family) and TII (Falcon Arabic family). This page ranks both Arabic-first families on their own axis, then lists the top multilingual frontier models (Claude, GPT, Gemini, Qwen, Mistral) that support Arabic from their multilingual training mix.
Models purpose-built for Arabic from two UAE institutions: HUMAIN (ALLaM) and TII (Falcon Arabic). Ranked by access channel, documented Arabic benchmark results, and capacity.
HUMAIN's largest ALLaM model and the consumer flagship. The highest capacity in the family, served exclusively through HUMAIN Chat.
The mid-size 13B tier, available as a managed IBM watsonx foundation model at roughly $1.80 per million tokens. The cleanest commercial path for enterprise Arabic workloads.
Second-generation 7B Arabic-first model deployed as a managed endpoint in Microsoft Azure AI Foundry. The right fit if your stack is already on Azure.
The only open-weight ALLaM variant, trained from scratch on 4T English + 1.2T Arabic/English tokens per the HF model card. AraLingBench 74.0. Ideal for research and local fine-tuning.
TII's largest Arabic model. Hybrid Mamba-Transformer (H1) architecture with 256K context. Scores roughly 75% on the Open Arabic LLM Leaderboard (OALL), outperforming Llama 3.3 70B on Arabic tasks despite being smaller.
The 7B tier of the Falcon-H1-Arabic line. OALL score of 71.7%, surpassing Fanar-9B and Qwen3-8B on Arabic benchmarks. 256K context window and hybrid architecture make it efficient for production Arabic workloads.
The smallest Falcon-H1-Arabic variant at 3B parameters. OALL score of roughly 62%, outperforming Gemma-4B and Phi-4-mini by roughly 10 points on Arabic tasks. 128K context. Ideal for edge deployment and high-throughput Arabic agents.
The original Falcon Arabic model, built on the Falcon 3 transformer architecture. Extended tokenizer with 32,000 Arabic-specific tokens and trained with DPO alignment. Supports MSA, Gulf, Egyptian, and Levantine dialects.
Top frontier models whose multilingual training mix supports Arabic. Ranked by composite score. Capped at 2 models per provider family.
Two UAE institutions lead purpose-built Arabic language models.
HUMAIN is Saudi Arabia’s national AI champion, a Public Investment Fund company launched on 12 May 2025. The ALLaM program is grounded in peer-reviewed research (arXiv:2407.15390, ICLR 2025) and ships four publicly documented variants. The open 7B Instruct preview was trained from scratch in two stages (4T English tokens then 1.2T mixed Arabic/English tokens per the Hugging Face model card), with the full ALLaM program consuming roughly 5M A100 GPU-hours. The 7B preview scores 74.0 on AraLingBench.
The Technology Innovation Institute (TII), based in Abu Dhabi, builds the Falcon model family. The Falcon-H1-Arabic line uses a hybrid Mamba-Transformer architecture that combines state-space models with traditional attention for efficient long-context Arabic processing (up to 256K tokens). The 34B variant scores roughly 75% on the Open Arabic LLM Leaderboard (OALL), outperforming Llama 3.3 70B on Arabic tasks. TII also ships the original Falcon Arabic 7B with an extended 32,000-token Arabic vocabulary and dialect support (MSA, Gulf, Egyptian, Levantine). All Falcon Arabic models are open-weight on Hugging Face.
Two model families are purpose-built for Arabic: HUMAIN's ALLaM (four variants from 7B to 34B) and TII's Falcon Arabic (including the hybrid Mamba-Transformer Falcon-H1-Arabic line). For Arabic-first tasks, the Falcon-H1-Arabic 34B currently leads on the Open Arabic LLM Leaderboard (OALL) with roughly 75%, while ALLaM 34B is the consumer flagship via HUMAIN Chat. For general multilingual workloads where Arabic is one of several languages, Claude, GPT, Gemini, and Qwen all ship strong Arabic support from their multilingual training mix.
ALLaM is built by HUMAIN (Saudi Arabia's national AI champion, PIF company) and ships via HUMAIN Chat, IBM watsonx, and Azure AI Foundry, with one open-weight 7B preview on Hugging Face. Falcon Arabic is built by TII (Technology Innovation Institute, Abu Dhabi) and all variants are open-weight on Hugging Face. The newest Falcon-H1-Arabic line uses a hybrid Mamba-Transformer architecture that supports 256K context windows and linear-time scaling, while ALLaM uses a standard transformer architecture. Both families cover the 7B-34B range.
Because the composite leaderboard score is driven by English-language frontier benchmarks (MMLU, GPQA, SWE-bench, HumanEval), it systematically underrates Arabic-first models that were optimized for Arabic tasks instead of English reasoning. Mixing them in a single ranked list would either bury ALLaM and Falcon below general frontier models, or would require an artificial bonus that obscures the underlying data. Splitting into two tiers lets us rank each group on the axis that fits it: Arabic-first models by Arabic benchmark signal and channel availability, multilingual frontier models by composite score.
ALLaM 7B Instruct (preview) is on Hugging Face under ALLaM-AI/ALLaM-7B-Instruct-preview. ALLaM 1 13B Instruct is on IBM watsonx.ai at roughly $1.80/M. ALLaM 2 7B Instruct is on Azure AI Foundry. ALLaM 34B is only through HUMAIN Chat. All Falcon Arabic models are open-weight on Hugging Face under tiiuae/: Falcon-H1-Arabic 34B, 7B, and 3B Instruct, plus the original Falcon-Arabic-7B-Instruct. TII also provides an interactive demo at chat.falconllm.tii.ae.
Yes. Claude, GPT, Gemini, and Qwen all handle Modern Standard Arabic fluently for conversation, summarization, and translation because their multilingual training mix includes a large volume of Arabic web text. The gap with Arabic-first models shows up on Arabic-specific benchmarks like AraLingBench and the Open Arabic LLM Leaderboard, where Arabic-first models are competitive despite much smaller parameter counts. The right choice depends on whether you need Arabic-first accuracy on Arabic-heavy workloads, or general-purpose reasoning that happens to include Arabic.
Falcon-H1 combines two architectures running in parallel within each processing block: state-space models (Mamba) for linear-time scalability and traditional transformer attention for precise long-range modeling. The outputs are concatenated and fused. This hybrid approach gives Falcon-H1-Arabic 256K context windows with constant memory cost regardless of sequence length, compared to 32K for standard transformer models of similar size. TII reports the 7B hybrid variant scores 71.7% on OALL, surpassing pure-transformer models like Qwen3-8B on Arabic benchmarks.