UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.
| 信号 | 强度 | 权重 | 影响 |
|---|---|---|---|
| Recencyjust now | 88 | 15% | +13.2 |
| Context Windowjust now | 81 | 15% | +12.2 |
| Capabilitiesjust now | 33 | 30% | +10.0 |
| Output Capacityjust now | 55 | 15% | +8.3 |
| Pricingjust now | 0 | 25% | +0.1 |
社区和从业者反馈在基准测试和价格之上增加了真实世界的信号。
Share your experience with UI-TARS 7B and help the community make better decisions.
成本估算器
每月比类别平均节省$41.24