AI COPE
“It is just autocomplete.”
“It only predicts the next token.”
“It cannot reason.”
“It only memorized benchmark answers.”
“Okay, but benchmarks are fake.”
“Okay, but olympiad math is narrow.”
“Okay, but open math problems are searchable.”
“Okay, but humans verified the proof.”
“Okay, but it used known theorems.”
“Okay, but it needs compute.”
“Okay, but it cannot make coffee.”
“Okay, but robotics is different.”
“Okay, but physical labor is safe.”
“Okay, but people prefer human service.”
“Okay, but AGI is not consciousness.”
“Okay, but it did not invent mathematics from raw hydrogen atoms.”
The last surviving goalpost will be floating somewhere beyond Neptune wearing a parachute.
@ns123abc This is an old video from January. The latest OpenAI solution isn't brute force, it is a new connection across fields, and it matters. He is basically commenting on the OpenAI hype from last year, but ironically AI somehow got there so fast.
@ns123abc This is an old video from January. The latest OpenAI solution isn't brute force, it is a new connection across fields, and it matters. He is basically commenting on the OpenAI hype from last year, but ironically AI somehow got there so fast.
@ns123abc This is an old video from January. The latest OpenAI solution isn't brute force, it is a new connection across fields, and it matters. He is basically commenting on the OpenAI hype from last year, but ironically AI somehow got there so fast.
@GaryMarcus@ls_brd@emollick He gave it some hints (it wasn't done in one shot, it took multiple steps) because he already had the solution. He might not have been able to give those hints without knowing the solution beforehand.
@wtgowers It usually doesn't follow those kinds of rules. With a six minute thinking time, it definitely searched for the answer. You need to use the API instead of the chat interface.
@jimstewartson@mathelirium Imagine watching a video about decompiling a Transformer’s Feed-Forward Network (FFN) and MoE layers into sparse feature vectors to optimize inference pathways, and your takeaway is "look, a text database." 💀
Confidently talking about architecture you clearly don't understand.
@rickasaurus “Just a search problem” is the new “just a calculator.” The AI found a bridge nobody imagined, using deep number theory, and overturned an 80-year-old belief. If inventing a whole new connection that stuns Fields medalists isn’t new maths nothing is
@N8Programs I think it has the same param size as flash 3, and they simply increased the price like Haiku. If it was a new pretrain, then the knowledge cutoff would not still be Jan 2025 lol
@Rob3rtWozny@scaling01 https://t.co/1Mb3aPEtk8 is better than artifical analysis, there are some issues with some of thier benchmark. Flash score is lower mainly because of terminal bench result, both google and vals reproted it is better than 3.1 pro in terminal bench
@Lentils80 its due to the terminal bench score, it is just avg of all. It seems they messed up something. It is better in terminal bench as reproted by google
@sarthmit The embedding is not comparable to direct weight. It does not increase additional compute, it just needs slightly more RAM. Besides, it has vision and native audio support, which is the additional size. Qwen 3 is not close to it, enable thinking