Orinth-1.0-35b vs Qwen-3.6-27b
I've asked both to build a single html file of Tetris game. First here's the Orinth 35b
I think it did a fine job. Every control works, the game looks fine visually.
How did Qwen 3.6 27b did? See post #2
Gemma 4 is the first multimodal model on Cerebras! ️
What can you build with Gemma 4 31B running at 1500 tokens per second?
Join the Cerebras x Gemma 4 24-hour virtual hackathon this Sunday to compete for $5,000 in prizes.
Participants get early access to Gemma 4 on Cerebras.
Magnitude scores 75.5% on Terminal-Bench 2.1, making it the top coding agent for GLM 5.2.
npm i -g @magnitudedev/cli
See how we did it (and kept it fair) below ⬇️
ROYAL RUMBLE: M3 Ultra vs M5 Max vs Spark ⚡️ 🧵
- Model: deepreinforce-ai/Ornith-1.0-35B-GGUF:Q8_0
- Server: llama.cpp (9824) - args used i comments
- Cache not used to simulate a single large context load each time
Notes:
- On large contexts DGX Spark on llama.cpp is strong, but again keep in mind that it's not so common starting with 128K context in first message, especially in coding sessions
- In the detailed charts below for DGX Spark you see M3 Ultra because I ran the benchmark remotely, working to fix this.
- I will test with MLX soon using 8bit 💪
Enjoy results and share your feedback or requests for more tests!
True story: I stopped thinking about context since GPT 5.3 Codex
Single project focused threads with the recent capability of codex to spinoff new threads is goated!
Codex continues and goes through compaction but remembers all the important stuff and if not, it’ll look up through the session and find the relevant info
This is also the reason why /goal is so effective as well
Fable is currently export controlled & rumors are that 5.6 will also be subject to an approval framework. Whatever jiu jitsu the Chinese are using to get us slow down our own frontier models while letting their models run free appears to be working. Who is capturing who? 🧐🇺🇸
The US AI pay-to-play scam is so much more tolerable after switching to a locally hosted GLM-5.2. From the front page of HN, open weights will be the frontier this December. Sorry about your IPOs.
I am so deeply disappointed by the US gov. It was apparently a mistake to think this administration had a good sense to support American AI models.
However, I also put more blame on the cult of Anthropic & Dario Amodei. All that fearmongering both achieved its goal & backfired!
Open source just passed American labs in market share.
US models on OpenRouter collapsed from 73% to 33% in one year.
Chinese open source models surged.
Then the US banned Fable 5.
Then gated GPT 5.6 customer by customer.
We are watching the US hand the developer market to China in real time.
Not as relevant now :-(: I had an opportunity to deeply test both Fable 5 and GPT-5.6 Max. 5.6 is clearly better than Opus 4.8 at everything (slightly faster, too, though that depends on the load). Vis-a-vie Fable, it is clearly worse on coding, but better on agentic workloads. I had Fable write code, 5.6 run experiments - dreamy…