Who built ALLaM and what does the name mean?

ALLaM was built at the National Center for AI inside the Saudi Data and AI Authority (SDAIA). The name is a backronym for Arabic Large Language Model. The original research team is credited in the arXiv paper "ALLaM: Large Language Models for Arabic and English" (arXiv:2407.15390), which was accepted to ICLR 2025. In 2025, the productization of the ALLaM family moved under HUMAIN, the PIF-owned AI company chaired by Crown Prince Mohammed bin Salman, which now operates HUMAIN Chat as the consumer surface for the 34B model.

What ALLaM models are publicly available right now?

Four variants are public as of April 2026: (1) ALLaM 7B Instruct (preview) on Hugging Face at ALLaM-AI/ALLaM-7B-Instruct-preview with open weights under a research license, (2) ALLaM 1 13B Instruct as a foundation model on IBM watsonx.ai under model id sdaia/allam-1-13b-instruct at roughly $1.80 per million tokens on both input and output, (3) ALLaM 2 7B Instruct in the Microsoft Azure AI Foundry catalog as ALLaM-2-7b-instruct, and (4) ALLaM 34B, the flagship served only through HUMAIN Chat at chat.humain.ai with no public API or weight release.

What is ALLaM actually trained on?

The Hugging Face model card for ALLaM-AI/ALLaM-7B-Instruct-preview states that the released preview is trained from scratch in two stages: 4 trillion English tokens followed by 1.2 trillion mixed Arabic/English tokens, then instruction tuned on curated Arabic and English data. The config.json shows a Llama-family architecture with 32 layers, a 4096 hidden size, a 64,000-entry vocabulary expanded for Arabic script coverage, and a 4096-token context window. The full ALLaM program (pretraining plus alignment across all sizes) used roughly 5 million A100 GPU hours according to the paper.

How does ALLaM perform on Arabic benchmarks?

The ALLaM team's self-reported Arabic AVG is 64.42 across an internal evaluation suite that includes Arabic MMLU, ACVA, EXAMS, ARC, HellaSwag, and AraTrust. The strongest signal is on Arabic multiple-choice and Arabic knowledge tasks, where ALLaM is competitive with much larger open models. Independently, AraLingBench (an Arabic linguistic benchmark) scores ALLaM at 74.0, placing it in the upper cluster of Arabic-capable open models on that benchmark. ALLaM is a purpose-built Arabic-first specialist, so the right evaluation axis is Arabic language understanding, dialogue, and knowledge rather than English frontier benchmarks, which are not the primary target of the program.

Is ALLaM open source? Can I fine-tune it?

The 7B Instruct (preview) variant has publicly downloadable weights on Hugging Face under a research license, so fine-tuning and local deployment are possible as long as you comply with the license terms posted on the model card. The 13B variant is available as a managed foundation model on IBM watsonx (sdaia/allam-1-13b-instruct), the 2nd-generation 7B is deployable as a managed endpoint on Azure AI Foundry, and the 34B flagship is served through HUMAIN Chat. Each channel targets a different audience, from research and academic work on the 7B preview to enterprise Arabic workloads on IBM and Azure to the consumer experience on HUMAIN Chat.

How does ALLaM fit into the broader Saudi AI strategy?

ALLaM is the centerpiece research output of the National Center for AI, which SDAIA established under a 2019 Royal Order to execute the National Strategy for Data and AI (NSDAI). In May 2025, PIF launched HUMAIN as a dedicated AI national champion, with Mohammed bin Salman as chairman. HUMAIN inherited the productization of ALLaM and launched HUMAIN Chat on August 25, 2025, running the 34B flagship. The Kingdom has also declared 2026 the "Year of AI" and hosts the annual Global AI Summit in Riyadh. ALLaM is positioned as the sovereign Arabic-first model that anchors this stack, alongside infrastructure partnerships with NVIDIA, Microsoft, Google, and Qualcomm announced through 2025 and early 2026.

HUMAIN's ALLaM: The Saudi Arabic-First AI Stack

A primary-source deep dive into HUMAIN's ALLaM, the Arabic-first large language model family developed by Saudi Arabia's national AI champion. We cover the four publicly documented variants, the architecture from Hugging Face config files, the 1.2T token bilingual pretraining mix, and every access channel from Hugging Face to IBM watsonx to Azure AI Foundry to HUMAIN Chat. Every claim cites a primary source so readers at HUMAIN can verify it directly.

Executive Summary

HUMAIN's Arabic-first language model family

A primary-source deep dive into the four publicly documented ALLaM variants, the architecture, the training footprint, and every access channel we could verify.

ALLaM is the Arabic-first large language model family developed and commercialized by HUMAIN, Saudi Arabia's national AI champion, a Public Investment Fund company launched on 12 May 2025. The program is grounded in peer-reviewed research (the ICLR 2025 paper "ALLaM: Large Language Models for Arabic and English" by Bari et al.) and has shipped publicly documented variants on Hugging Face, IBM watsonx.ai, Microsoft Azure AI Foundry, and HUMAIN's own consumer chat product. The research and early variants originated inside the Saudi Data and Artificial Intelligence Authority (SDAIA) National Center for AI, with HUMAIN now serving as the commercial home for the family.

We track 4 ALLaM variants on LM Market Cap today. ALLaM is a purpose-built Arabic-first family: the program was designed around Arabic pretraining data, Arabic alignment, and Arabic evaluation from the start. Our composite leaderboard score is driven by English-language frontier benchmarks (MMLU, GPQA, HumanEval, SWE-bench), so it does not capture the dimension ALLaM was built to serve. The right way to read ALLaM is as a specialist model family anchoring an Arabic-language AI stack for Saudi Arabia and the broader Arabic-speaking world. Every claim in this report links back to a primary source so readers at HUMAIN can verify it directly.

Paper

ICLR 2025

arXiv:2407.15390

Public variants

7B · 13B · 7B v2 · 34B

Training tokens

5.2T

4.0T EN + 1.2T AR/EN mix

Compute

~5M

A100 GPU-hours, program-wide

How to read this report. Green boxes mark claims we verified against a primary source (paper, model card, IBM watsonx docs, Azure catalog, humain.ai, or sdaia.gov.sa). Yellow boxes mark things that are widely reported but we cannot directly verify from a primary source. Benchmarks are labeled as self-reported (measured by the ALLaM team on their own eval suite) or independent (measured by a third party).

Section 01

What ALLaM Is

The peer-reviewed paper, the research origins, and the HUMAIN commercial handoff.

The ALLaM program is described in the paper "ALLaM: Large Language Models for Arabic and English" by M. Saiful Bari and the ALLaM team, first posted to arXiv on 22 July 2024 and accepted to ICLR 2025. The paper introduces four variants: a 7B model continued-pretrained from Llama 2, a 13B model, a 70B model, and a 7B model trained from scratch. The full paper is available at arxiv.org/abs/2407.15390.

HUMAIN is the AI company that now carries the ALLaM family forward. HUMAIN was announced on 12 May 2025 as a Public Investment Fund (PIF) company, chaired by HRH Crown Prince Mohammed bin Salman, with CEO Tareq Amin. It is positioned as Saudi Arabia's national champion for AI, data center, and foundation model work. ALLaM-branded model pages on Hugging Face are now mirrored under both the ALLaM-AI and humain-ai organizations.

The research origins of ALLaM sit inside the National Center for AI (NCAI) at the Saudi Data and Artificial Intelligence Authority (SDAIA), which published the ICLR paper and the early model cards. With HUMAIN now serving as the commercial home for the family, HUMAIN and NCAI together form the research-to-production pipeline behind ALLaM.

Section 02

The Model Family

Four publicly documented variants across open weights, IBM watsonx, Azure AI Foundry, and HUMAIN Chat.

Four ALLaM variants have verifiable public footprints as of April 2026. The 70B model discussed in the paper has not been published with open weights and is not listed on a public commercial endpoint we could identify, so we do not track it here.

ALLaM 7B Instruct (preview)Open weights

Hugging Face · Research license

13 Feb 2025

The 7B Instruct preview is the most open, most studied variant. It is published on Hugging Face as ALLaM-AI/ALLaM-7B-Instruct-preview and mirrored byte-identically under humain-ai/ALLaM-7B-Instruct-preview. According to the Hugging Face model card, the publicly released preview is trained from scratch in two stages: 4T English tokens followed by 1.2T mixed Arabic/English tokens, then instruction tuned on curated Arabic and English data.

ALLaM 1 13B InstructIBM watsonx

Managed API · Closed weights

$1.80/M

The 13B Instruct tier is hosted as a foundation model on IBM watsonx.ai under the model id sdaia/allam-1-13b-instruct. It is listed in IBM's watsonx foundation model catalog at a published list price of roughly $1.80 per million tokens, applied to both input and output. IBM is a first-party commercial launch partner and the watsonx listing is the only public, metered API endpoint for any ALLaM variant we were able to verify.

ALLaM 2 7B InstructAzure Foundry

Managed endpoint · Closed weights

14 Jul 2025

ALLaM 2 is the second generation 7B Instruct model. It is listed in the Microsoft Azure AI Foundry model catalog as ALLaM-2-7b-instruct and can be deployed as a managed endpoint with pay-per-token billing. Azure pricing varies by region and SKU; we link to the Azure catalog rather than quoting a point-in-time number.

ALLaM 34BHUMAIN Chat

Consumer · Closed weights · No public API

25 Aug 2025

The 34B variant is served exclusively through the HUMAIN Chat consumer product at chat.humain.ai. HUMAIN Chat launched on 25 August 2025 as the public-facing surface for the ALLaM family. Anyone wishing to try this variant today does so through the chat product.

Section 03

Architecture

Every number here is pulled directly from the config.json on Hugging Face, not inferred.

The 7B Instruct preview ships with a config.json in its Hugging Face repository that spells out the architecture. These are the numbers that a HUMAIN engineer can confirm by opening the file directly.

Model class

LlamaForCausalLM

Hidden size

4,096

Layers

Attention heads

32 (no GQA)

Vocab size

64,000

Max context

4,096 tokens

Position encoding

RoPE

Precision

bfloat16

Two architectural decisions are worth noting. First, ALLaM 7B runs with 32 full attention heads in a standard Llama-family configuration. Second, the vocabulary was expanded to 64,000 tokens (roughly 2x Llama 2's 32,000) to accommodate Arabic morphology and script. This is a deliberate design choice for an Arabic-first model: a larger Arabic-aware vocabulary gives the model more efficient tokenization on Arabic text, which improves both training efficiency and inference quality on Arabic inputs.

The 4,096 token context window is well sized for the Arabic instruction-following, dialogue, and question-answering workloads the 7B variant was targeted at. For longer-context Arabic workloads, the 13B tier on IBM watsonx and the 34B variant served through HUMAIN Chat offer additional headroom within the ALLaM family.

Section 04

Training Data and Compute

The ICLR paper discloses the token budget and compute footprint in unusual detail.

The ALLaM paper describes the training footprint for the model family in unusual detail for a national-lab release. Section 3 of the paper gives the token budget and compute cost; the model card for the 7B Instruct preview published on Hugging Face states the release was trained from scratch in two stages.

Training approach

From scratch, two-stage

Stage 1 (English)

4.0T tokens

Stage 2 (Arabic + English)

1.2T tokens

Compute (program-wide)

~5M A100 GPU-hours

Post-training

SFT + DPO (paper §4)

Paper

arXiv:2407.15390

For context, Meta has stated that Llama-2-70B took about 1.7M A100 GPU-hours, so the ALLaM program's cumulative ~5M A100-hour budget across four model sizes is a substantial but not exotic commitment. The paper explicitly positions this as a choice: the team was not trying to push frontier scale, they were trying to produce a high-quality Arabic-first model within a fixed, disclosed compute budget.

Section 05

Benchmark Performance

Arabic-first evaluation, labeled as self-reported or independent. Read ALLaM on its home turf.

ALLaM is evaluated primarily on Arabic benchmarks. Our LM Market Cap composite is driven by English-language frontier evaluations (MMLU, GPQA, HumanEval, SWE-bench, Terminal-Bench, BrowseComp, etc.), so the composite score on the coding leaderboard does not reflect what ALLaM was built to do. The fair way to read ALLaM is on its home turf: Arabic language understanding, dialogue, and knowledge. This section shows that picture.

Self-reported · 7B Instruct preview

BenchmarkScoreNote

Arabic AVG (team suite)64.42Internal Arabic eval suite aggregate

IEN-MCQ91.77Saudi K-12 multiple-choice; headline Arabic knowledge score

MT-Bench AR Turn 16.93Arabic MT-Bench, first turn, self-graded

Ar-IFEval (prompt)31.34Arabic instruction-following, 7B tier

Self-reported caveat. These numbers come from the ALLaM team's own evaluation runs as published in the model card and paper. They have not, to our knowledge, been independently reproduced at the specific score level. They should be read as the authors' best measurement on their own harness, not as a third-party verified leaderboard position.

Independent evaluation signal

AraLingBench

Arabic linguistic competence (morphology, syntax, semantics), externally maintained

74.0

AraLingBench is the cleanest independent datapoint we found and places ALLaM 7B Instruct in the upper cluster of Arabic-capable open models on that specific benchmark. It aligns with the self-reported picture: Arabic linguistic competence and Arabic knowledge recall are clear strengths of the family.

Because ALLaM is an Arabic-first specialist, it is not the target of the major independent English-language LLM leaderboards (Chatbot Arena, LiveBench, SEAL, MMLU-Pro, HLE), and we did not find it listed in their top tiers as of April 2026. We also checked the public Arabic leaderboards Scale SEAL Arabic, OALL (Open Arabic LLM Leaderboard), and AraGen. ALLaM is not currently listed in the public top of these specific leaderboards either; this is a gap in independent ranking data rather than a statement about model quality, and it reflects that ALLaM's primary eval harness is the one published with the ICLR 2025 paper.

Notes on the public evaluation record. The ICLR 2025 paper is focused on Arabic and bilingual evaluation, which is the correct target for this family. Scores on the English frontier benchmarks in our composite (MMLU, GPQA, HumanEval, SWE-bench, Terminal-Bench) are not the main axis the paper reports against, and we were not able to find a stable public number to drop into our own composite for any ALLaM variant. The 70B variant is described in the paper as "best on five out of eight Arabic benchmarks" in an internal comparator suite. If HUMAIN publishes updated independent evaluations, we will refresh this section.

Section 06

Where ALLaM Is Available

Four publicly documented access channels across open weights, two clouds, and HUMAIN's own chat product.

Four access channels are publicly documented. They serve very different audiences.

Hugging Face

Open weights

Open

ALLaM-AI/ALLaM-7B-Instruct-preview on Hugging Face. Open weights under a research license. The right channel for academics, Arabic NLP researchers, and anyone who wants to inspect or fine-tune the model.

IBM watsonx.ai

Managed API

Open

sdaia/allam-1-13b-instruct as a managed foundation model. Roughly $1.80 per million tokens on both input and output. Good fit if you are already on IBM watsonx and need an Arabic-tuned model.

Microsoft Azure AI Foundry

Managed endpoint

Open

ALLaM-2-7b-instruct as a managed endpoint in the Azure AI Foundry catalog. Pricing varies by region. Good fit if you are already on Azure and need an Arabic-tuned option in the same catalog as GPT-5, Llama, and Mistral.

HUMAIN Chat

Consumer

Open

chat.humain.ai/en is the only way to interact with ALLaM 34B. Free, web-based, public-facing. Not a developer API.

Section 07

The Saudi National AI Program

ALLaM is the flagship language model of a national program that has been building in public since 2019.

ALLaM is best read as the flagship language model of a national AI program that Saudi Arabia has been building in public since 2019. The timeline, the institutions, and the numbers are all on the record, and they explain how a national-scale Arabic model ended up with a peer-reviewed paper, four public variants, and commercial endpoints on two major clouds within a few years.

Program timeline

30 Aug 2019

SDAIA established

Royal Order A/471 creates the Saudi Data and AI Authority as the central body for data and AI in the kingdom.

21 Oct 2020

NSDAI launched

The National Strategy for Data and AI is unveiled at the Global AI Summit in Riyadh. Targets: top 15 AI nations globally, SAR 75B (~$20B) AI investment, 20,000 AI specialists, 300 AI startups.

22 Jul 2024

ALLaM paper on arXiv

Bari et al. publish "ALLaM: Large Language Models for Arabic and English" (arXiv:2407.15390). Accepted to ICLR 2025.

13 Feb 2025

ALLaM 7B Instruct preview on HF

First public release of open weights for the 7B Instruct preview variant.

12 May 2025

HUMAIN launched

PIF announces HUMAIN as a new AI company. Tareq Amin named CEO. HRH Crown Prince Mohammed bin Salman chairs the company.

14 Jul 2025

ALLaM-2-7b-instruct update

Second-generation 7B Instruct variant published; listed on Microsoft Azure AI Foundry.

25 Aug 2025

HUMAIN Chat launched

chat.humain.ai goes live as the consumer-facing surface for the ALLaM family, featuring the 34B variant.

28 Oct 2025

Aramco minority stake

Saudi Aramco announced as a minority investor in HUMAIN, extending ownership beyond PIF alone.

2026

Year of Artificial Intelligence

Saudi Arabia formally designates 2026 the national Year of Artificial Intelligence.

HUMAIN is the company that now carries the ALLaM family forward commercially. The research lineage runs through the National Center for AI (NCAI), the NSDAI strategy document ties the program to Vision 2030, and the published Vision 2030 targets are the ones the broader national AI program is measured against.

Section 08

HUMAIN and the Saudi AI Ecosystem

HUMAIN's publicly announced infrastructure and model partnerships, sourced from primary press releases.

HUMAIN is the commercial layer and the engine that gives the ALLaM family its path to scale. As of late 2025 and into 2026, the company has been assembling the largest publicly-announced set of AI infrastructure partnerships of any company in the region. The deals are public; the scale is unusual. We list only the partnerships that we could verify from a primary HUMAIN or partner press release.

NVIDIA

GB300 deployment

Announced 18,000 Blackwell Ultra GB300 GPUs for HUMAIN, positioned as the initial tranche of a multi-year commitment that both companies have publicly discussed scaling into the hundreds of thousands of GPUs.

Amazon Web Services

"AI Zone" in Saudi Arabia

AWS announced a $5B+ commitment to build a dedicated AI Zone in Saudi Arabia in partnership with HUMAIN, covering compute, training services, and support for Saudi AI startups.

Google Cloud + PIF

$10B AI hub

Google Cloud and PIF announced a joint $10B commitment to build an AI hub in Saudi Arabia. HUMAIN is the PIF-side counterparty for AI workloads.

Groq

Inference on day zero

Groq committed to day-zero availability for OpenAI's gpt-oss open-weights release on HUMAIN's Groq-powered inference stack, giving HUMAIN a high-speed inference option independent of its own training runs.

Qualcomm

AI200 / AI250 chips

Qualcomm announced a partnership with HUMAIN to deploy AI200 and AI250 chips across 200 MW of data center capacity in Saudi Arabia, giving HUMAIN a second-source inference path alongside NVIDIA.

Adobe

Creative AI

Adobe and HUMAIN announced a partnership around Creative Cloud and Firefly deployments into the Saudi market, with Arabic-language localization as a focus area.

xAI

Grok deployment

xAI and HUMAIN announced an agreement to make Grok available through HUMAIN's distribution channels, positioning HUMAIN as a multi-model distributor inside the kingdom.

Saudi Aramco

Minority investor

Aramco announced a minority stake in HUMAIN in October 2025, extending ownership beyond PIF alone and tying HUMAIN into the Aramco enterprise customer base.

The strategic picture is that ALLaM and the HUMAIN infrastructure program are being built in lockstep. Saudi Arabia is assembling frontier-scale AI infrastructure in parallel with its own Arabic-first model program, and the two reinforce each other: ALLaM anchors the Arabic language capability, and the HUMAIN data center and GPU footprint gives that capability a path to scale. The 2025-2026 releases are the opening chapter of a multi-year program, and the 18,000 GB300 tranche, the AWS AI Zone, and the Google Cloud / PIF hub make the longer trajectory clear.

Section 09

Transparency Notes

Open items we chose to leave out of the main body until we can anchor them to a primary source.

We tried to source every claim above from a primary document. The items below are things we chose not to include in the main body because we could not anchor them to a single stable public reference. They are listed here as open items to update, not as criticism. If HUMAIN publishes new primary sources, we will fold them into the report.

Total forward NVIDIA GPU count beyond the initial 18,000 GB300 tranche. Both HUMAIN and NVIDIA have signaled scaling well beyond this, but we want to wait for a primary document before citing a specific larger number.

Exact Azure pricing for ALLaM-2-7b-instruct. The model is listed in the Azure AI Foundry catalog; the pay-per-token rates depend on region and SKU and we preferred to link to Azure's catalog rather than quote a point-in-time number.

Forward training roadmap at HUMAIN. The ICLR paper covers the current lineage through 70B. Anything beyond that is not yet on a public model card.

Positions on MMLU, GPQA, HumanEval, SWE-bench for any ALLaM variant. These are English frontier benchmarks and not the primary target of the paper; we preferred to leave them out rather than quote a score we could not verify against a primary source.

Public production endpoints for the 70B variant discussed in the ICLR paper. The paper describes it; we have not identified a public endpoint for it yet.

Section 10

How to Use ALLaM Today

A practical evaluation and deployment path for Arabic-first teams, as of April 2026.

If you are building an Arabic-first product and want to evaluate ALLaM, this is the practical path as of April 2026:

Start on Hugging Face

Download ALLaM-AI/ALLaM-7B-Instruct-preview, run it locally or on a single A100, and run your own Arabic eval set against it. This is the cheapest, fastest way to get a gut feel for the family without committing to a managed endpoint.

If you are on IBM Cloud, flip to watsonx

sdaia/allam-1-13b-instruct on watsonx is the cleanest commercial path. You get managed scaling, logging, and IBM's contractual footprint. The published rate (~$1.80/M tokens on both input and output) is competitive for a specialist Arabic model.

If you are on Azure, use Foundry

Deploy ALLaM-2-7b-instruct as a managed endpoint alongside your existing GPT-5 / Llama / Mistral stack. This is the right path if your application already uses Azure AI and you want ALLaM as a drop-in Arabic specialist.

Build a domain-specific Arabic eval suite

Whatever model you deploy, the highest-leverage investment is a good Arabic eval set covering MT-Bench AR, AraLingBench, ArabicMMLU, and your own in-domain tasks (legal, medical, customer support, whatever your product needs). Run your short-listed models on the same harness so your production decision is grounded in numbers that match your workload.

Use HUMAIN Chat to see the 34B flagship in action

The 34B variant is available through chat.humain.ai. Try it on the same prompts you would use to evaluate a production Arabic assistant so you can see how the flagship compares to the 7B and 13B tiers on dialogue quality, Arabic fluency, and cultural context before committing to a deployment path.

Section 11

ALLaM on LM Market Cap

Composite scores for cross-category comparison; model-specific pages for Arabic benchmark detail.

We track all four publicly-documented ALLaM variants on LM Market Cap. Our composite score is driven by English-language frontier benchmarks, so it measures a different axis than the Arabic-first evaluation ALLaM was designed for. Use the composite for cross-category comparison on context window, pricing, and capability flags, and use the model-specific pages below for Arabic benchmark detail, access channels, and primary-source links.

ModelRankScore

ALLaM 7B Instruct (preview)

Section 12

Primary Sources

Every claim in this report links to one of the primary documents below.

ALLaM paper (arXiv:2407.15390, ICLR 2025)ALLaM-7B-Instruct-preview on Hugging Face humain-ai ALLaM mirror on Hugging Face IBM watsonx foundation models catalog Microsoft Azure AI Foundry model catalog HUMAIN official site HUMAIN Chat (consumer product)PIF launches HUMAIN (press release, 12 May 2025)SDAIA official site SDAIA about page (establishment)Open Arabic LLM Leaderboard (OALL) on HF Scale SEAL Arabic leaderboard

Correction policy. If any factual claim in this report is wrong, or if HUMAIN publishes a primary document that changes the picture, we will update the report and note the change here. Email corrections via the contact link in the footer.

HUMAIN's ALLaM: The Saudi Arabic-First AI Stack

Executive Summary

HUMAIN's Arabic-first language model family

A primary-source deep dive into the four publicly documented ALLaM variants, the architecture, the training footprint, and every access channel we could verify.

Paper

ICLR 2025

arXiv:2407.15390

Public variants

7B · 13B · 7B v2 · 34B

Training tokens

5.2T

4.0T EN + 1.2T AR/EN mix

Compute

~5M

A100 GPU-hours, program-wide

Section 01

What ALLaM Is

The peer-reviewed paper, the research origins, and the HUMAIN commercial handoff.

Section 02

The Model Family

Four publicly documented variants across open weights, IBM watsonx, Azure AI Foundry, and HUMAIN Chat.

ALLaM 7B Instruct (preview)Open weights

Hugging Face · Research license

13 Feb 2025

ALLaM 1 13B InstructIBM watsonx

Managed API · Closed weights

$1.80/M

ALLaM 2 7B InstructAzure Foundry

Managed endpoint · Closed weights

14 Jul 2025

ALLaM 34BHUMAIN Chat

Consumer · Closed weights · No public API

25 Aug 2025

Section 03

Architecture

Every number here is pulled directly from the config.json on Hugging Face, not inferred.

Model class

LlamaForCausalLM

Hidden size

4,096

Layers

Attention heads

32 (no GQA)

Vocab size

64,000

Max context

4,096 tokens

Position encoding

RoPE

Precision

bfloat16

Section 04

Training Data and Compute

The ICLR paper discloses the token budget and compute footprint in unusual detail.

Training approach

From scratch, two-stage

Stage 1 (English)

4.0T tokens

Stage 2 (Arabic + English)

1.2T tokens

Compute (program-wide)

~5M A100 GPU-hours

Post-training

SFT + DPO (paper §4)

Paper

arXiv:2407.15390

Section 05

Benchmark Performance

Arabic-first evaluation, labeled as self-reported or independent. Read ALLaM on its home turf.

Self-reported · 7B Instruct preview

BenchmarkScoreNote

Arabic AVG (team suite)64.42Internal Arabic eval suite aggregate

IEN-MCQ91.77Saudi K-12 multiple-choice; headline Arabic knowledge score

MT-Bench AR Turn 16.93Arabic MT-Bench, first turn, self-graded

Ar-IFEval (prompt)31.34Arabic instruction-following, 7B tier

Independent evaluation signal

AraLingBench

Arabic linguistic competence (morphology, syntax, semantics), externally maintained

74.0

Section 06

Where ALLaM Is Available

Four publicly documented access channels across open weights, two clouds, and HUMAIN's own chat product.

Four access channels are publicly documented. They serve very different audiences.

Hugging Face

Open weights

Open

IBM watsonx.ai

Managed API

Open

sdaia/allam-1-13b-instruct as a managed foundation model. Roughly $1.80 per million tokens on both input and output. Good fit if you are already on IBM watsonx and need an Arabic-tuned model.

Microsoft Azure AI Foundry

Managed endpoint

Open

HUMAIN Chat

Consumer

Open

chat.humain.ai/en is the only way to interact with ALLaM 34B. Free, web-based, public-facing. Not a developer API.

Section 07

The Saudi National AI Program

ALLaM is the flagship language model of a national program that has been building in public since 2019.

Program timeline

30 Aug 2019

SDAIA established

Royal Order A/471 creates the Saudi Data and AI Authority as the central body for data and AI in the kingdom.

21 Oct 2020

NSDAI launched

The National Strategy for Data and AI is unveiled at the Global AI Summit in Riyadh. Targets: top 15 AI nations globally, SAR 75B (~$20B) AI investment, 20,000 AI specialists, 300 AI startups.

22 Jul 2024

ALLaM paper on arXiv

Bari et al. publish "ALLaM: Large Language Models for Arabic and English" (arXiv:2407.15390). Accepted to ICLR 2025.

13 Feb 2025

ALLaM 7B Instruct preview on HF

First public release of open weights for the 7B Instruct preview variant.

12 May 2025

HUMAIN launched

PIF announces HUMAIN as a new AI company. Tareq Amin named CEO. HRH Crown Prince Mohammed bin Salman chairs the company.

14 Jul 2025

ALLaM-2-7b-instruct update

Second-generation 7B Instruct variant published; listed on Microsoft Azure AI Foundry.

25 Aug 2025

HUMAIN Chat launched

chat.humain.ai goes live as the consumer-facing surface for the ALLaM family, featuring the 34B variant.

28 Oct 2025

Aramco minority stake

Saudi Aramco announced as a minority investor in HUMAIN, extending ownership beyond PIF alone.

2026

Year of Artificial Intelligence

Saudi Arabia formally designates 2026 the national Year of Artificial Intelligence.

Section 08

HUMAIN and the Saudi AI Ecosystem

HUMAIN's publicly announced infrastructure and model partnerships, sourced from primary press releases.

NVIDIA

GB300 deployment

Amazon Web Services

"AI Zone" in Saudi Arabia

AWS announced a $5B+ commitment to build a dedicated AI Zone in Saudi Arabia in partnership with HUMAIN, covering compute, training services, and support for Saudi AI startups.

Google Cloud + PIF

$10B AI hub

Google Cloud and PIF announced a joint $10B commitment to build an AI hub in Saudi Arabia. HUMAIN is the PIF-side counterparty for AI workloads.

Groq

Inference on day zero

Qualcomm

AI200 / AI250 chips

Qualcomm announced a partnership with HUMAIN to deploy AI200 and AI250 chips across 200 MW of data center capacity in Saudi Arabia, giving HUMAIN a second-source inference path alongside NVIDIA.

Adobe

Creative AI

Adobe and HUMAIN announced a partnership around Creative Cloud and Firefly deployments into the Saudi market, with Arabic-language localization as a focus area.

xAI

Grok deployment

xAI and HUMAIN announced an agreement to make Grok available through HUMAIN's distribution channels, positioning HUMAIN as a multi-model distributor inside the kingdom.

Saudi Aramco

Minority investor

Aramco announced a minority stake in HUMAIN in October 2025, extending ownership beyond PIF alone and tying HUMAIN into the Aramco enterprise customer base.

Section 09

Transparency Notes

Open items we chose to leave out of the main body until we can anchor them to a primary source.

Forward training roadmap at HUMAIN. The ICLR paper covers the current lineage through 70B. Anything beyond that is not yet on a public model card.

Public production endpoints for the 70B variant discussed in the ICLR paper. The paper describes it; we have not identified a public endpoint for it yet.

Section 10