Kimi 2.7 ranked 2nd after Fable 5 and before GPT-5 xhigh
We have re-run our ErdosBench smoke test on 14 problems with Kimi 2.7, Qwen 3.7 Max, Grok 4.3 and compared it with the top performers from previous runs.
Kimi 2.7 is amazingly good. More below.
As a result of a US government directive, we are suspending access to Claude Fable 5 for all users. You can continue to use all other Claude models.
Hereโs what this means for you:
Across Claude products, new sessions will run on your selected default model or Opus 4.8, and existing Fable 5 sessions will end with an error.
On the Claude Platform, requests to Fable 5 will also return an error. Please update your integrations to other Claude models.
We know this is a disruption to your workflows; we appreciate your patience and support.
A bit more of the thought process behind building Vulcan Bench.
Currently testing workflows like this, here's an example below, here's how you could compare Opus 4.8 with GPT 5.5 at different reasoning levels:
Step 1:
vulcanbench run --suite v1 --model anthropic:claude-opus-4-8 --effort low --repeat 5
Step 2:
vulcanbench run --suite v1 --model openai:gpt-5.5 --effort medium --repeat 5
Step 3:
vulcanbench leaderboard
I am starting to realize more and more that we canโt just look at, and benchmark models without comparing different effort levels.
Fable 5 is what pushed me to think about this more.
Iโm finding Fable 5 in low and medium effort, produces the same or better output than a lot of other models at high and xhigh.
At the same time, Iโm experimenting with just normal routine tasks, and finding even Fable low is overkill.
There are soooo many tasks that Grok Build, Composer 2.5, SWE-1.6, GLM 5.1, and other models can do, at the exact same accuracy level as Fable.
And thatโs comparing to Fable low, on tasks that Fable Max produces the exact same output. Yes, increasing thinking depth doesnโt mean it gets it more right, sometimes small and medium problems donโt need the most bleeding-edge frontier model in the world to reach the optimal solution.
We keep benchmarking models all at the same effort levels, and I think that could be a mistake. We need to look at effort as another key variable, and optimize for a combination of model and effort, coupled with task complexity and codebase size.
This is one of the things Iโm thinking through more deeply with @vulcanbench which Iโm going to release, open source, this weekend.
Okay, so I've come to the conclusion that I need a 3090, like yesterday.
Really want to run more powerful LLMs locally at home.
Relatively new territory for me, I'm far from an expert, so was chatting with Perplexity about it.
Here's what it thinks I need, hoping someone like @0xSero or @LottoLabs and let me know how right, or wrong it is, and what I really need.
Trying not to break the bank so I'm okay starting small(ish) ๐ค
We poured our hearts into the hand-painted horizons of Planet of Lana II, our love letter to the sweeping scales and quiet wonder of classic Ghibli adventures. โจ
A new odyssey of friendship and mystery awaits.
Lana and Mui are ready. Are you? ๐พ
#indiegame#PlanetofLana
Hey Game Devs! ๐ธ
I'm Lya a cozy game dev, my objective is to try out as many demo as possible during the Steam Next Fest!
If your indie game is participating and you want a peer feedback, don't hesitate to drop your demo link bellow โจ๐
#SteamNextFest#NextFest#indiegames
I started learning @unity about ten years ago. It's an incredible game engine.
Built five small(ish) games, just for myself, nothing I've been proud enough to share publicly.
Some day I'd love to have the time to build a game that I'm proud of enough to share with the world. But as the founder of a software company, my days, nights, and weekends are spent with our amazing team, investors, and clients, and I wouldn't have it any other way.
That being said, I love playing games, and continue tinkering in Unity, not because I want to make money, or get a bunch of users, but just because I love games.
And for those wondering, no vibe coding in Unity yet, I'm old school, still do all my coding in Unity by hand, but likely going to play around with Codex and Opus to see what they can do.
I've spent thousands of dollars on games over the years, plan to spend thousands more, and always like to put more money into indie games, because those devs are my heroes.
If you go to @Official_GDC this year, make sure to spend a ton of time in the indie game section, that's my favorite spot, I usually spend 90% of my time there.
Here's a photo I took at GDC back in 2022, indie game dev, walking around with a laptop, super cool game, so much fun.