Back to news
Large Language Models
Feb 23, 2026

Why we no longer evaluate SWE-bench Verified

OpenAI BlogFeb 23, 2026
Why we no longer evaluate SWE-bench Verified

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.