GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.
| 信号 | 强度 | 权重 | 影响 |
|---|---|---|---|
| Capabilitiesjust now | 83 | 30% | +25.0 |
| Recencyjust now | 100 | 15% | +15.0 |
| Output Capacityjust now | 85 | 15% | +12.8 |
| Context Windowjust now | 81 | 15% | +12.2 |
| Pricingjust now | 1 | 25% | +0.2 |
社区和从业者反馈在基准测试和价格之上增加了真实世界的信号。
Share your experience with GLM 4.6V and help the community make better decisions.
成本估算器
每月比类别平均节省$36.52