Minimax M3 leads Qwen3.7 in all overlapping benches, and significantly in ClawEval.
At least according to their own self reported numbers. Vibe test consensus on M3 is kinda mid so far.
The earliest she could have got on was around S Lander St, meaning she was driving on light rail tracks for one and a half miles, much of it underground.
How does this even happenโ๏ธ๐ญ
A woman somehow drove her car onto Seattleโs elevated light rail tracks at Mount Baker Station on Wednesday evening, bringing train service to a halt. ๐ณ๐
Witnesses say the driver told people she was โfollowing GPSโ after ending up on the tracks and driving a significant distance before getting stuck. The vehicle had to be removed from the guideway, causing major delays for riders across the 1 Line. #DUBSEA
This is how native Chinese peeps sound before learning English grammar and intricacies. Maybe this is why Mandarin is so token effecient when prompting LLMs?
I just saw Codex leak a thinking trace that might explain why it is more token efficient. Small sample:
"Or if no org scope, keep legacy-only? But then write path not semantic. Could create report? No. Need ask user."
Codex thinks in grug brain to save tokens.
The earliest she could have got on was around S Lander St, meaning she was driving on light rail tracks for one and a half miles, much of it underground.
The earliest she could have got on was around S Lander St, meaning she was driving on light rail tracks for one and a half miles, much of it underground.