AI ToolsJune 1, 2026

MiniMax M3 Drops: Open-Weight Model With 1M Context, Frontier Coding Scores, and a Price Tag That Undercuts Closed Rivals

By BurmDesk

MiniMax shipped M3 on June 1, 2026 — the first open-weight model to combine frontier coding, a 1-million-token context window, and native multimodality. Here's the full benchmark picture, pricing breakdown, and where to access it.

MiniMax M3 sparse attention neural network visualization

MiniMax M3 open source coding agentic AI multimodal benchmarks pricing MSA sparse attention

MiniMax shipped M3 on June 1, 2026, and the release checks boxes that no prior open-weight model has combined in a single system: frontier-level coding performance, a 1-million-token context window, and native multimodal input spanning text, image, and video. The Shanghai-based lab is calling it the first open-weight model to bring all three capabilities together, a combination that has so far been limited to closed-source front-runners.

The model is available immediately through MiniMax's API, the MiniMax Code IDE, and OpenRouter. MiniMax has also committed to releasing the weights on Hugging Face, though as of publication they have not yet appeared.

The Benchmark Picture

MiniMax M3 posts numbers that place it in the same conversation as GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.7, with a few outright wins.

Coding and software engineering:

SWE-Bench Pro: 59.0%. MiniMax says this surpasses GPT-5.5 and Gemini 3.1 Pro, approaching Opus 4.7
Terminal-Bench 2.1: 66.0%
SWE-fficiency: 34.8%
KernelBench Hard: 28.8%
MCP Atlas: 74.2%

Multimodal and agent tasks:

SVG-Bench: surpasses Opus 4.7
OmniDocBench: scores above Gemini 3.1 Pro
Claw-Eval: highest score on this end-to-end autonomous agent benchmark
BrowseComp: 83.5 (per third-party reporting)

These are vendor-reported figures from MiniMax's official announcement. Independent verification is still pending, and the model has only been live for hours. Early user feedback on Reddit is mixed: some developers praise the natural writing style and long-context handling, while others note that coding tasks still require a spec-driven approach to minimize drift.

MSA: The Sparse Attention Bet

The architectural headline is MSA (MiniMax Sparse Attention), a new sparse attention mechanism designed to solve the quadratic cost explosion of full attention at long context lengths.

MiniMax claims MSA partitions KV blocks more precisely than competing approaches like DSA and MoBA, achieving higher effective context coverage. At the operator level, they use a "KV outer gather Q" approach that makes memory access contiguous. Under M3's head configuration, the arithmetic intensity is reportedly more than 4× faster than open-source Flash-Sparse-Attention and flash-moba.

The practical result: at 1 million tokens, M3's per-token compute is 1/20 that of the previous-generation model. Prefilling is more than 9× faster; decoding is more than 15× faster. MiniMax says MSA matched full attention on the vast majority of capabilities across multiple ablations.

M3 is built on a Sparse Mixture-of-Experts (MoE) architecture. The model was trained from step zero on interleaved multimodal data (text, images, and video mixed within sequences), which MiniMax says is more critical to performance than commonly assumed. The training data pipeline has been scaled to the order of 100 trillion tokens.

What It Costs

MiniMax is running a 7-day 50% launch promotion through June 7, 2026:

API Pay-as-You-Go (promo rates, ≤512K input tokens):

Input: $0.30 per million tokens
Output: $1.20 per million tokens
Prompt caching read: $0.06 per million tokens

Standard rates (after promo, ≤512K input tokens):

Input: $0.60 per million tokens
Output: $2.40 per million tokens
Prompt caching read: $0.12 per million tokens

Long-context rates (>512K input tokens):

Input: $1.20 per million tokens
Output: $4.80 per million tokens
Prompt caching read: $0.24 per million tokens

For comparison, OpenRouter lists the weighted average effective input price across all providers at roughly $0.105/M tokens and output at $1.23/M tokens, though this fluctuates with provider routing.

Subscription plans (Token Plan):

Plus: $20/month (~1.633B M3 tokens)
Max: $50/month (~5.053B M3 tokens)
Ultra: $120/month (~9.796B M3 tokens)

All plans include access to all models on the API Platform and come with 5-hour rolling and weekly quota windows. MiniMax estimates Plus supports 3-4 agents, Max 4-5 agents, and Ultra 6-7 agents.

Prepaid Credits packages offer discounts of 16.7% to 28.6% depending on purchase size, with 1,000 credits = $1.

Where to Access It

MiniMax API: platform.minimax.io
MiniMax Code: IDE with M3 integration
OpenRouter: openrouter.ai/minimax/minimax-m3
Hugging Face: weights committed but not yet published (watch huggingface.co/MiniMaxAI)

The model supports up to 1,048,576 tokens of context with a maximum output of 512,000 tokens. It handles text, image, and video inputs with text output, and MiniMax notes it can operate a desktop computer for agentic tasks.

The Bottom Line

MiniMax M3 is the most ambitious open-weight release of 2026 so far. It combines three capabilities (frontier coding, 1M context, native multimodality) that have previously only existed together in closed-source models. The pricing undercuts most frontier competitors, especially during the launch promo.

The caveats are standard for a day-old release: benchmark claims are vendor-reported, weights are not yet public, and early user reports are mixed. But if the numbers hold up under independent testing, M3 resets the expectation for what open-weight models can do, and at what cost.