Researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory, King Abdullah University of Science and Technology, and artificial intelligence research organisation HUMAIN have released what is being described as the world’s largest and most diverse collection of Olympiad-level mathematics problems, opening up a resource that has significant implications for both artificial intelligence research and mathematics education globally. The dataset, named MathNet, will be presented at the International Conference on Learning Representations in Brazil later this month and is freely available to the public through MIT’s Computer Science and Artificial Intelligence Laboratory.
MathNet comprises more than 30,000 expert-authored problems and solutions spanning 47 countries, 17 languages, and 143 competitions, making it five times larger than the next biggest dataset of its kind. The scale alone sets it apart, but what distinguishes MathNet more fundamentally from previous Olympiad-level datasets is its geographic and linguistic breadth. Previous Olympiad-level datasets draw almost exclusively from competitions in the United States and China, whereas MathNet spans dozens of countries across six continents, covers 17 languages, includes both text and image-based problems and solutions, and spans four decades of competition mathematics. Building the dataset required tracking down 1,595 PDF volumes totalling more than 25,000 pages, including decades-old scans in more than a dozen languages. A significant portion of that archive came from Navid Safaei, a longtime International Mathematical Olympiad community figure and co-author who had been collecting and scanning those national competition booklets by hand since 2006, and whose personal archive formed the backbone of the dataset. The solutions contained in those booklets are expert-written and peer-reviewed, often running to multiple pages with authors walking through several distinct approaches to the same problem, giving artificial intelligence models a far richer training signal than the shorter, informal solutions typical of community-sourced datasets.
Testing on MathNet reveals that even the most capable frontier models struggle meaningfully at this level of mathematical reasoning: GPT-5, the top-performing model tested, averaged around 69.3 percent on MathNet’s main benchmark of 6,400 problems, failing nearly one in three Olympiad-level problems, and when problems include figures, performance drops significantly across the board, exposing visual reasoning as a consistent weak point. Several open-source models scored zero percent on Mongolian-language problems, highlighting the degree to which current artificial intelligence systems remain brittle when confronted with less common languages despite their overall capabilities. Beyond problem-solving, MathNet introduces a retrieval benchmark that asks whether models can recognise when two problems share the same underlying mathematical structure, testing eight state-of-the-art embedding models and finding that even the strongest identified the correct match only about 5 percent of the time on the first attempt, with models frequently ranking structurally unrelated problems as more similar than mathematically equivalent ones. For the broader mathematics community, the dataset also addresses a longstanding gap: Olympiad problem booklets shared between national delegations had never been systematically collected and made publicly accessible, leaving students in many countries to train for these competitions largely in isolation. Lead author Shaden Alshammari, an MIT doctoral student, noted that for many students the Olympiad preparation experience had always been an individual effort with no communal resource to draw from, a gap that MathNet is now designed to close.
Follow the SPIN IDG WhatsApp Channel for updates across the Smart Pakistan Insights Network covering all of Pakistan’s technology ecosystem.