MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window.
| 信号 | 强度 | 权重 | 影响 |
|---|---|---|---|
| Capabilitiesjust now | 83 | 30% | +25.0 |
| Recencyjust now | 100 | 15% | +15.0 |
| Context Windowjust now | 86 | 15% | +12.9 |
| Output Capacityjust now | 80 | 15% | +12.0 |
| Pricingjust now | 2 | 25% | +0.5 |
社区和从业者反馈在基准测试和价格之上增加了真实世界的信号。
Share your experience with MiMo-V2-Omni and help the community make better decisions.
成本估算器
每月比类别平均节省$33.74