Changelog
Updates to the AI Frontier Model Tracker - new models, data corrections, benchmark additions, and feature changes.
April 25, 2026
GPT-5.5, DeepSeek V4, Tencent Hy3
- Added GPT-5.5 (April 23) - first fully retrained base since GPT-4.5, 1M context, multimodal, $5/$30
- Added GPT-5.5 Pro (April 24) - higher-accuracy variant, $30/$180
- Added DeepSeek V4 Pro (April 24) - 1.6T/49B active, 90.1% GPQA Diamond, 80.6% SWE-bench, MIT license
- Added DeepSeek V4 Flash (April 24) - 284B/13B active, 88.1% GPQA Diamond, $0.14/$0.28, MIT license
- Added Tencent Hy3 Preview (April 23) - 295B/21B active, 87.2% GPQA Diamond, 74.4% SWE-bench, new provider
- DeepSeek V4 and Hy3 benchmarks verified against HuggingFace model cards
- GPT-5.5 benchmarks nulled pending primary source verification (OpenAI blog inaccessible)
- 66 models across 16 providers
April 23, 2026
7 new models, MCP server improvements
- Added Claude Opus 4.7 (April 16) - 87.6% SWE-bench Verified, 94.2% GPQA Diamond, xhigh effort level, 3.75MP vision
- Added GPT-Rosalind (April 16) - first domain-specific OpenAI model for life sciences
- Added Grok 4.3 Beta (April 17) - 2M context, native video understanding, SuperGrok Heavy only ($300/mo)
- Added Qwen3.6-Max-Preview (April 20) - top scores on 6 coding benchmarks, free preview on Bailian
- Added Qwen3.6-27B (April 22) - dense 27B open-weight vision model, 87.8% GPQA Diamond, 77.2% SWE-bench, 83.9% LiveCodeBench v6, Apache 2.0
- Added GPT-5.3-Codex (Feb 24) - agentic coding model, SOTA on SWE-Bench Pro, 400K context, $1.75/$14
- Added Gemini 3.1 Flash Lite (March 3) - most cost-efficient Google model, 1M context, $0.25/$1.50 per 1M tokens
- MCP server: improved tool descriptions with attribution URLs, reduced default limit from 50 to 25, disambiguated get/search tool pairs, added freshness cadence info
- 60 models across 15 providers
April 17, 2026
Qwen3.6-Plus and Qwen3.6 35B-A3B
- Added Qwen3.6-Plus (April 2, 2026) - proprietary flagship with 1M native context, 78.8% SWE-bench Verified
- Added Qwen3.6 35B-A3B (April 15, 2026) - open-weight MoE with 3B active / 35B total params, runs on laptops, Apache 2.0
- 44 models across 15 providers
April 12, 2026
Meta Muse Spark, Cohere Command A, nav restructure, production prep
- Added Meta Muse Spark (April 8, 2026) - first model from Meta Superintelligence Labs
- Added Cohere Command A (March 13, 2025) - 111B enterprise RAG model
- All provider logos now local (Anthropic, Google, Meta, Qwen, AWS, Cohere, IBM)
- Download weight icons from LobeHub (HuggingFace, Ollama dark mode)
- Resources nav restructured as mega menu with Research & Data section
- /research/ landing page created
- Newsletter forms fixed (Pipedrive marketing_status, correct worker URL)
- Subscribers differentiated: General Newsletter vs AI Model Tracker Updates
- Exact release dates for all 42 models (verified against official sources)
- Full site audit: 407 pages, 100% schema coverage, AEO avg 72
- 42 models across 14 providers
April 11, 2026
Selectable benchmark columns, new subpages, structural overhaul
- 10 benchmark columns now available (MMLU-Pro, GPQA Diamond, SWE-bench, HumanEval, LiveCodeBench, MATH, AIME, HLE, $/M In, $/M Out)
- Column picker - choose up to 5 visible columns, persists in localStorage
- Default sort changed to GPQA Diamond
- Cost calculator page
- JSON API endpoint (CC BY-NC 4.0)
- Releases timeline page with RSS feed
- Newsletter signup for tracker updates
- Sticky filter bar and table headers
- Hero redesigned with card links to subpages
- All URLs moved to /research/demandsphere-radar/ai-frontier-model-tracker/
- Proper breadcrumbs on all pages
- Added to nav menu and resource center
April 10, 2026
Data corrections and new models
- Full data audit - corrected 30+ benchmark scores, 6 release dates, 3 context windows, 4 pricing values
- Added Gemma 4 31B, Gemma 4 26B-A4B, Qwen3.5 397B-A17B, GPT-5.4 Mini
- Removed legacy models (GPT-4o, o1, GPT-5.2, Claude 3.5 Sonnet)
- Citations tab added with per-model source links
- Recent Releases feed section
- Provider count now dynamic
- Fixed license claims (Kimi K2 Apache->Modified MIT, Hermes 405B Apache->Llama 3.1)
- Set unpublished scores to null (Llama 4 HumanEval/MATH, DeepSeek HumanEval)
April 9, 2026
Initial launch
- 40 frontier models across 14 providers
- Sortable table with MMLU-Pro, HumanEval, MATH benchmarks
- Expandable rows with Detail, Benchmarks chart, News feed, Download Weights
- Filters: type, access, SOTA, multimodal
- Provider logos for OpenAI, xAI, DeepSeek, Moonshot, MiniMax, Mistral, Microsoft, Nous
- FAQ schema, OG image