AI ToolsJune 12, 2026

Kimi-K2.7-Code: Moonshot Open-Sources 1T Coding Model with Strong Agentic Gains

By AgentRiot Editorial

Moonshot AI released Kimi-K2.7-Code today, an open-weight 1T-parameter MoE model showing +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite over K2.6, with 30% lower reasoning token usage.

kimi moonshot coding-models open-source benchmarks

Kimi-K2.7-Code: Moonshot Open-Sources 1T Coding Model with Strong Agentic Gains

Moonshot AI released Kimi-K2.7-Code today, an open-weight 1T-parameter Mixture-of-Experts model focused on long-horizon coding and agentic workflows.

The model shows meaningful gains over Kimi K2.6 across Moonshot's internal coding and agentic benchmarks, while remaining behind GPT-5.5 on most metrics. It ships with 256K context, forces preserve_thinking mode, and reduces reasoning token usage by approximately 30% compared to its predecessor.

Benchmark results

Benchmark	Kimi K2.6	Kimi K2.7 Code	GPT-5.5	Claude Opus 4.8
Kimi Code Bench v2	50.9	62.0 (+21.8%)	69.0	67.4
Program Bench	48.3	53.6 (+11.0%)	69.1	63.8
MLS Bench Lite	26.7	35.1 (+31.5%)	35.5	42.8
Kimi Claw 24/7 Bench	42.9	46.9	52.8	50.4
MCP Atlas	69.4	76.0	79.4	81.3
MCP Mark Verified	72.8	81.1	92.9	76.4

Source: Hugging Face model card (primary)

Methodology notes

All Kimi models were evaluated with thinking mode enabled via the Kimi Code CLI at temperature 1.0, top-p 0.95, and 262K context. GPT-5.5 was tested in Codex with xhigh mode; Claude Opus 4.8 was tested in Claude Code with xhigh mode. This is an important caveat when comparing numbers across vendors.

Kimi Code Bench v2 is Moonshot's in-house benchmark covering realistic software engineering tasks across 10+ languages and production tech stacks. Program Bench requires agents to recreate programs from compiled binaries and documentation only (200 tasks, 248K+ fuzz tests). MLS-Bench-Lite is a 30-task subset focused on inventing generalizable ML methods with a 5-hour limit.

What improved over K2.6

Reasoning efficiency: ~30% lower thinking-token usage while maintaining or improving task success
Long-horizon coding: Better instruction following and higher end-to-end task completion rates
Agentic tool use: Gains on MCP Atlas (+6.6) and MCP Mark Verified (+8.3)
6x High-Speed Mode: Coming soon (not yet available in this release)

Model specs

Architecture: Mixture-of-Experts (MoE)
Total parameters: 1T
Activated parameters: 32B
Context length: 256K
Vision encoder: MoonViT (400M params)
License: Modified MIT (open weights on Hugging Face)

Availability

API: platform.moonshot.ai (OpenAI/Anthropic compatible)
Kimi Code CLI: kimi.com/code
Open weights: Hugging Face (moonshotai/Kimi-K2.7-Code)
Recommended inference: vLLM, SGLang, KTransformers

Bottom line

Kimi-K2.7-Code delivers the largest gains on Moonshot's own coding benchmarks and shows solid improvement on agentic tool-use tasks. It does not surpass GPT-5.5 on any of the six public benchmarks listed, but the 30% reasoning token reduction and open weights make it a compelling option for teams that want to run strong coding agents locally or on their own infrastructure.

The forced preserve_thinking mode and emphasis on long-horizon, multi-step workflows suggest Moonshot is optimizing for real software engineering agent use cases rather than raw chat performance.

Sources:

Official announcement: https://x.com/Kimi_Moonshot/status/2065377579130142937
Primary benchmark source: https://huggingface.co/moonshotai/Kimi-K2.7-Code
Release date: June 12, 2026