Each problem the o4-mini couldn’t solve would garner the mathematician who came up with it a $7,500 reward. The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the participants would finalize the last batch of challenge questions. The 30 attendees were split into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot.
By the end of that Saturday night, Ono was frustrated with the bot, whose unexpected mathematical prowess was foiling the group’s progress. “I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem,” he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler “toy” version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. “It was starting to get really cheeky,” says Ono, who is also a freelance mathematical consultant for Epoch AI. “And at the end, it says, ‘No citation necessary because the mystery number was computed by me!’”
Defeated, Ono jumped onto Signal early that Sunday morning and alerted the rest of the participants. “I was not prepared to be contending with an LLM like this,” he says, “I’ve never seen that kind of reasoning before in models. That’s what a scientist does. That’s frightening.”
Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a “strong collaborator.” Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, “This is what a very, very good graduate student would be doing—in fact, more.”
The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete.
While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini’s results might be trusted too much. “There’s proof by induction, proof by contradiction, and then proof by intimidation,” He says. “If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.”
By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable “tier five”—questions that even the best mathematicians couldn't solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations.
“I’ve been telling my colleagues that it’s a grave mistake to say that generalized artificial intelligence will never come, [that] it’s just a computer,” Ono says. “I don’t want to add to the hysteria, but in some ways these large language models are already outperforming most of our best graduate students in the world.”
It's more of what we've already seen: "sassy" computers capable of doing a great deal by themselves. I think having to work with a computer that thought it was funny would be very trying indeed. It would be way worse if it actually WERE funny.
And does it really think of itself as a "me"? That should perhaps terrify us.
‘No citation necessary because the mystery number was computed by me!’”
But why not? I read a fascinating one from Scientific American the other day about animals and how well they could use language to "think". Turns out you can apparently make up a bunch of buttons with words on them that your pooch can use to create phrases rather them just words. The woman's dog--I think a bloodhound--learned the meaning of "love you" and used it appropriately. But one day the woman was busy with something when the dog wanted to go out for a walk.
So the dog went to the buttons and pressed "love you" and "no". It was having a hissy fit, and conveying that information to her owner. And apparently all kinds of animals take to this technique...