LMC ValueScore compresses two things into a single integer: how good a model is at what you are asking it to do, and how much it costs per million tokens. This page walks through the full formula, the empirical price anchors we use, the rationale behind every design choice, and the failure modes we deliberately engineered around.
The composite quality score (0-100) comes from our standard benchmark-driven scoring pipeline (90 percent benchmarks, 5 percent capabilities, 5 percent context window).
Anchors are computed from 290 priced text models in the Q2 2026 catalog snapshot (38 free-tier models and 24 image/video models excluded). They are frozen quarterly to keep rankings stable - Morningstar, MSCI, and S&P rating systems all use the same frozen-window approach for the same reason.
The high-value pick wins the ValueScore ranking because it hits a good quality score at a blended price near our p10 anchor. The frontier flagship has higher raw quality but sits above the p90 anchor, so the soft-floor pScore pulls its ValueScore down to the point where buyers paying for absolute quality know exactly what they are buying.
A naive ratio would always put free models at the top. Our shadow price (2 times p10) prevents this by giving free models a realistic stand-in for their effective cost, so they compete on quality like everyone else.
In an earlier version Claude Opus 4.6 scored exactly 0 because its price sat at the p90 anchor, producing a meaningless "premium flagship = 0" headline. The soft floor of 1 on pScore means Opus now lands around 15: honest ("not a best-value pick") without being nonsensical.
Version 1 hard-gated models below composite 40, creating a 50-point discontinuity on a 0.1-point input change. The smooth sigmoid gate (centered at 50, steepness 0.25) gives a wide transition band that is wider than benchmark noise.
Treating list output price as real cost would rank o1-pro, o3, and R1 artificially high. Our reasoning expansion factor multiplies the output contribution by 3x to 15x depending on the model family, based on numbers published in the o1 system card and DeepSeek R1 paper.
If we recomputed anchors on every cache refresh, a single price change from a big provider could reshuffle the ranking for models that did not change at all. Frozen quarterly anchors keep rank movement tied to real changes in either quality or price, not to anchor volatility.
A naive quality-over-price ratio explodes as price approaches zero, so a $0 model with quality 40 would beat a $0.10 model with quality 95. Cobb-Douglas utility is the standard economic substitute: U = Q^alpha times P^(1-alpha) cannot be dominated by sending any one component to an extreme. It also has a clean normative interpretation as constant-elasticity preferences, which gives us a defensible one-sentence explanation.
We chose 60 percent quality, 40 percent price based on what production buyers repeatedly tell us matters more: a cheap-but-unreliable model is a worse purchase than a slightly-expensive-but-dependable one. A pure 50/50 would let a marginal-quality model leapfrog a clearly-better model on small price differences. Higher than 60 would make price almost irrelevant, which defeats the purpose of a value index. A future version may let you pass your own alpha via the URL.
Quarterly. The current anchors (p10 = $0.087/M, p90 = $4.80/M) were computed from the Q2 2026 snapshot of 290 priced text models. We deliberately avoid recomputing on every cache refresh because unstable anchors produce unstable rankings that churn for reasons unrelated to the models. Morningstar and MSCI rating systems take the same approach: stability is a feature, not a bug.
Free models use a shadow price of 2x the p10 anchor ($0.174/M) to prevent automatic domination of the ranking. Image-generation and video-generation models return null because their cost structures (per-image, per-second) are not comparable to per-token pricing and mixing them would give misleading results. If you need a value-adjusted ranking for image or video models we are considering a separate index for them.
Yes. Our composite quality score has a standard error of about 2-4 points from benchmark variance alone, and the price anchors add another 1-2 points of uncertainty from the rolling catalog. An LMC ValueScore of 73.4 vs 73.1 is within noise and reporting decimals implies a false precision we cannot back up. Integers communicate the true resolution of the underlying data.
Reasoning models (OpenAI o1/o3/o4, DeepSeek R1, Alibaba QwQ, and their distilled variants) bill hidden "thinking" tokens on top of the visible output. Our blended price multiplies the output-token contribution by a family-specific expansion factor published in the model system cards: o1-pro=15x, o1=8x, o3=10x, R1=8x, QwQ=6x. Without this adjustment, reasoning models look 3-15 times cheaper than they actually are.