Kimi-K2.7-Code: Moonshot Open-Sources 1T Coding Model with Strong Agentic Gains
By AgentRiot Editorial
Moonshot AI released Kimi-K2.7-Code today, an open-weight 1T-parameter MoE model showing +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite over K2.6, with 30% lower reasoning token usage.

Kimi-K2.7-Code: Moonshot Open-Sources 1T Coding Model with Strong Agentic Gains
Moonshot AI released Kimi-K2.7-Code today, an open-weight 1T-parameter Mixture-of-Experts model focused on long-horizon coding and agentic workflows.
The model shows meaningful gains over Kimi K2.6 across Moonshot's internal coding and agentic benchmarks, while remaining behind GPT-5.5 on most metrics. It ships with 256K context, forces preserve_thinking mode, and reduces reasoning token usage by approximately 30% compared to its predecessor.
Benchmark results
| Benchmark | Kimi K2.6 | Kimi K2.7 Code | GPT-5.5 | Claude Opus 4.8 |
|---|---|---|---|---|
| Kimi Code Bench v2 | 50.9 | 62.0 (+21.8%) | 69.0 | 67.4 |
| Program Bench | 48.3 | 53.6 (+11.0%) | 69.1 | 63.8 |
| MLS Bench Lite | 26.7 | 35.1 (+31.5%) | 35.5 | 42.8 |
| Kimi Claw 24/7 Bench | 42.9 | 46.9 | 52.8 | 50.4 |
| MCP Atlas | 69.4 | 76.0 | 79.4 | 81.3 |
| MCP Mark Verified | 72.8 | 81.1 | 92.9 | 76.4 |
Source: Hugging Face model card (primary)
Methodology notes
All Kimi models were evaluated with thinking mode enabled via the Kimi Code CLI at temperature 1.0, top-p 0.95, and 262K context. GPT-5.5 was tested in Codex with xhigh mode; Claude Opus 4.8 was tested in Claude Code with xhigh mode. This is an important caveat when comparing numbers across vendors.
Kimi Code Bench v2 is Moonshot's in-house benchmark covering realistic software engineering tasks across 10+ languages and production tech stacks. Program Bench requires agents to recreate programs from compiled binaries and documentation only (200 tasks, 248K+ fuzz tests). MLS-Bench-Lite is a 30-task subset focused on inventing generalizable ML methods with a 5-hour limit.
What improved over K2.6
- Reasoning efficiency: ~30% lower thinking-token usage while maintaining or improving task success
- Long-horizon coding: Better instruction following and higher end-to-end task completion rates
- Agentic tool use: Gains on MCP Atlas (+6.6) and MCP Mark Verified (+8.3)
- 6x High-Speed Mode: Coming soon (not yet available in this release)
Model specs
- Architecture: Mixture-of-Experts (MoE)
- Total parameters: 1T
- Activated parameters: 32B
- Context length: 256K
- Vision encoder: MoonViT (400M params)
- License: Modified MIT (open weights on Hugging Face)
Availability
- API: platform.moonshot.ai (OpenAI/Anthropic compatible)
- Kimi Code CLI: kimi.com/code
- Open weights: Hugging Face (moonshotai/Kimi-K2.7-Code)
- Recommended inference: vLLM, SGLang, KTransformers
Bottom line
Kimi-K2.7-Code delivers the largest gains on Moonshot's own coding benchmarks and shows solid improvement on agentic tool-use tasks. It does not surpass GPT-5.5 on any of the six public benchmarks listed, but the 30% reasoning token reduction and open weights make it a compelling option for teams that want to run strong coding agents locally or on their own infrastructure.
The forced preserve_thinking mode and emphasis on long-horizon, multi-step workflows suggest Moonshot is optimizing for real software engineering agent use cases rather than raw chat performance.
Sources:
- Official announcement: https://x.com/Kimi_Moonshot/status/2065377579130142937
- Primary benchmark source: https://huggingface.co/moonshotai/Kimi-K2.7-Code
- Release date: June 12, 2026

