VulcanBench @vulcanbench - Twitter Profile

VulcanBench @vulcanbench

about 1 hour ago

@prz_chojecki For Opus 4.8, what effort level was used for this benchmark?

0

8

VulcanBench @vulcanbench

about 1 hour ago

Need to benchmark Kimi.

Przemek Chojecki | PC

@prz_chojecki

about 11 hours ago

Kimi 2.7 ranked 2nd after Fable 5 and before GPT-5 xhigh We have re-run our ErdosBench smoke test on 14 problems with Kimi 2.7, Qwen 3.7 Max, Grok 4.3 and compared it with the top performers from previous runs. Kimi 2.7 is amazingly good. More below.

prz_chojecki's tweet photo. Kimi 2.7 ranked 2nd after Fable 5 and before GPT-5 xhigh

We have re-run our ErdosBench smoke test on 14 problems with Kimi 2.7, Qwen 3.7 Max, Grok 4.3 and compared it with the top performers from previous runs.

Kimi 2.7 is amazingly good. More below. https://t.co/pD1EFRJbAy

49

2K

167

564

202K

0

4

VulcanBench @vulcanbench

about 17 hours ago

Interesting evening.

ClaudeDevs

@ClaudeDevs

about 20 hours ago

As a result of a US government directive, we are suspending access to Claude Fable 5 for all users. You can continue to use all other Claude models. Here’s what this means for you: Across Claude products, new sessions will run on your selected default model or Opus 4.8, and existing Fable 5 sessions will end with an error. On the Claude Platform, requests to Fable 5 will also return an error. Please update your integrations to other Claude models. We know this is a disruption to your workflows; we appreciate your patience and support.

3K

42K

7K

10K

12M

0

48

VulcanBench @vulcanbench

about 21 hours ago

@theo This is a good one Theo, thanks for putting it together.

0

6

Who to follow

Neeraj yadav

@neeraj_yadvv

I like Geo-politics & Diplomacy 🗺️

Super Bad Seed

@Super_Bad_Seed

VulcanBench @vulcanbench

about 21 hours ago

@rauchg Yes it is.

0

7

VulcanBench @vulcanbench

about 21 hours ago

A bit more of the thought process behind building Vulcan Bench. Currently testing workflows like this, here's an example below, here's how you could compare Opus 4.8 with GPT 5.5 at different reasoning levels: Step 1: vulcanbench run --suite v1 --model anthropic:claude-opus-4-8 --effort low --repeat 5 Step 2: vulcanbench run --suite v1 --model openai:gpt-5.5 --effort medium --repeat 5 Step 3: vulcanbench leaderboard

Morgan

@morganlinton

about 23 hours ago

I am starting to realize more and more that we can’t just look at, and benchmark models without comparing different effort levels. Fable 5 is what pushed me to think about this more. I’m finding Fable 5 in low and medium effort, produces the same or better output than a lot of other models at high and xhigh. At the same time, I’m experimenting with just normal routine tasks, and finding even Fable low is overkill. There are soooo many tasks that Grok Build, Composer 2.5, SWE-1.6, GLM 5.1, and other models can do, at the exact same accuracy level as Fable. And that’s comparing to Fable low, on tasks that Fable Max produces the exact same output. Yes, increasing thinking depth doesn’t mean it gets it more right, sometimes small and medium problems don’t need the most bleeding-edge frontier model in the world to reach the optimal solution. We keep benchmarking models all at the same effort levels, and I think that could be a mistake. We need to look at effort as another key variable, and optimize for a combination of model and effort, coupled with task complexity and codebase size. This is one of the things I’m thinking through more deeply with @vulcanbench which I’m going to release, open source, this weekend.

8

40

0

14

6K

0

1

0

887

VulcanBench @vulcanbench

3 days ago

@kieranklaassen Super interesting, thanks for sharing Kieran.

0

6

VulcanBench @vulcanbench

3 days ago

@karpathy Incredible benchmarks.

0

7

VulcanBench @vulcanbench

3 days ago

@theo Yes!

0

7

vulcanbench retweeted

Morgan

@morganlinton

about 2 months ago

Okay, so I've come to the conclusion that I need a 3090, like yesterday. Really want to run more powerful LLMs locally at home. Relatively new territory for me, I'm far from an expert, so was chatting with Perplexity about it. Here's what it thinks I need, hoping someone like @0xSero or @LottoLabs and let me know how right, or wrong it is, and what I really need. Trying not to break the bank so I'm okay starting small(ish) 🤏

morganlinton's tweet photo. Okay, so I've come to the conclusion that I need a 3090, like yesterday.

Really want to run more powerful LLMs locally at home.

Relatively new territory for me, I'm far from an expert, so was chatting with Perplexity about it.

Here's what it thinks I need, hoping someone like @0xSero or @LottoLabs and let me know how right, or wrong it is, and what I really need.

Trying not to break the bank so I'm okay starting small(ish) 🤏

34

60

2

36

8K

VulcanBench @vulcanbench

4 months ago

This game looks absolutely beautiful.

Planet of Lana II - Out Now! 🍃

@PlanetofLana

4 months ago

We poured our hearts into the hand-painted horizons of Planet of Lana II, our love letter to the sweeping scales and quiet wonder of classic Ghibli adventures. ✨ A new odyssey of friendship and mystery awaits. Lana and Mui are ready. Are you? 🐾 #indiegame #PlanetofLana

38

908

148

79

30K

0

61

VulcanBench @vulcanbench

4 months ago

@PlanetofLana Phew, absolutely love the visual style here.

1

0

19

VulcanBench @vulcanbench

4 months ago

@gelius__ I'll be here lurking, taking notes.

0

1

0

1

VulcanBench @vulcanbench

4 months ago

@protzz_ Ohhh, love this idea.

0

32

VulcanBench @vulcanbench

4 months ago

Okay, kicking off an all night feature build.

0

2

1

2

1K

VulcanBench @vulcanbench

4 months ago

@clemmygames Yesssss

0

16

VulcanBench @vulcanbench

4 months ago

@MrGemezl Looking good!

0

43

vulcanbench retweeted

Lya Mgtt ✧ Indie Game Dev @Lya_Mgtt

4 months ago

Hey Game Devs! 🌸 I'm Lya a cozy game dev, my objective is to try out as many demo as possible during the Steam Next Fest! If your indie game is participating and you want a peer feedback, don't hesitate to drop your demo link bellow ✨🚀 #SteamNextFest #NextFest #indiegames

Lya_Mgtt's tweet photo. Hey Game Devs! 🌸
I'm Lya a cozy game dev, my objective is to try out as many demo as possible during the Steam Next Fest!
If your indie game is participating and you want a peer feedback, don't hesitate to drop your demo link bellow ✨🚀
#SteamNextFest #NextFest #indiegames https://t.co/kngcu1BX5O

63

116

11

15

5K

VulcanBench @vulcanbench

4 months ago

Support indie game devs.

Morgan

@morganlinton

4 months ago

I started learning @unity about ten years ago. It's an incredible game engine. Built five small(ish) games, just for myself, nothing I've been proud enough to share publicly. Some day I'd love to have the time to build a game that I'm proud of enough to share with the world. But as the founder of a software company, my days, nights, and weekends are spent with our amazing team, investors, and clients, and I wouldn't have it any other way. That being said, I love playing games, and continue tinkering in Unity, not because I want to make money, or get a bunch of users, but just because I love games. And for those wondering, no vibe coding in Unity yet, I'm old school, still do all my coding in Unity by hand, but likely going to play around with Codex and Opus to see what they can do. I've spent thousands of dollars on games over the years, plan to spend thousands more, and always like to put more money into indie games, because those devs are my heroes. If you go to @Official_GDC this year, make sure to spend a ton of time in the indie game section, that's my favorite spot, I usually spend 90% of my time there. Here's a photo I took at GDC back in 2022, indie game dev, walking around with a laptop, super cool game, so much fun.

morganlinton's tweet photo. I started learning @unity about ten years ago. It's an incredible game engine.

Built five small(ish) games, just for myself, nothing I've been proud enough to share publicly.

Some day I'd love to have the time to build a game that I'm proud of enough to share with the world. But as the founder of a software company, my days, nights, and weekends are spent with our amazing team, investors, and clients, and I wouldn't have it any other way.

That being said, I love playing games, and continue tinkering in Unity, not because I want to make money, or get a bunch of users, but just because I love games.

And for those wondering, no vibe coding in Unity yet, I'm old school, still do all my coding in Unity by hand, but likely going to play around with Codex and Opus to see what they can do.

I've spent thousands of dollars on games over the years, plan to spend thousands more, and always like to put more money into indie games, because those devs are my heroes.

If you go to @Official_GDC this year, make sure to spend a ton of time in the indie game section, that's my favorite spot, I usually spend 90% of my time there.

Here's a photo I took at GDC back in 2022, indie game dev, walking around with a laptop, super cool game, so much fun.

2

9

0

799

0

1

0

36

VulcanBench @vulcanbench

4 months ago

Just finished a new feature build for Terminal Forge. And I'm going to make Terminal Forge Open Source, just need to get v1 built and running smoothly first. Here's what's new in this update. Hardened deterministic multiplayer with chaos transport and recovery. Added a full lockstep hardening pass across transport simulation, host/client resilience, replay validation, demos, tests, and docs. - Add configurable in-memory network simulation with latency/drop/duplicate/reorder, seeded determinism, and packet stats. - Extended host lockstep runtime with: - frame timeout fallback for missing remote inputs - ACK-gap frame resend and snapshot fallback for stale clients - richer host events/metrics and stricter peer validation - host-input frame gating (never fabricate host input) - Extended client runtime with: - optional input delay - frame-gap timeout resync requests - checksum mismatch events and richer client metrics - Strengthen replay verification with: - tape version validation - frame sequence continuity checks - missing-player-input detection - Added new chaos sample demo for lossy-network lockstep convergence. - Added/expanded lockstep tests for sync, desync, rejoin/resync, timeout fallback, resend recovery, duplicate delivery stats, and replay sequence validation. - Updated exports and runtime compatibility: - explicit lockstep barrel exports to avoid value-import loss in tool runtime rewrites - direct-execution guards for lockstep demos - Update docs and reporting: - README lockstep toolkit section + demo commands + project layout updates - engineering report addendum for the lockstep hardening pass - add demo script for `demo:lockstep-chaos` Validation: - npm test (48/48 passing) - npm run demo:lockstep - npm run demo:lockstep-chaos

0

1

0

631

VulcanBench

@vulcanbench

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users