📈Steven Stadler 🇵🇭🇩🇪 @joermungandr - Twitter Profile

📈Steven Stadler 🇵🇭🇩🇪

@joermungandr

2 days ago

@the2ndfloorguy I need to build this for my CTO 😂

0

22

joermungandr retweeted

Gergely Orosz

@GergelyOrosz

11 days ago

You cannot make this up: Meta nuked teams like Integrity so bad that services are without oncall coverage 💀 Let me spell it out: it’s more important for Zuck to reassign devs from security/integrity teams to do data labelling than for these teams to have functioning oncalls…

43

3K

164

373

322K

joermungandr retweeted

Mo

@atmoio

2 months ago

AI is giving every CEO the same advice

312

7K

669

3K

588K

📈Steven Stadler 🇵🇭🇩🇪

@joermungandr

2 months ago

it seems like people think now that SaaS is dying as everyone can just vibe code everything, but reality shows quality is degrading and its not as easy as they think. https://t.co/gfpTTEHSFX

0

1

0

16

Who to follow

Passionate about #fintech #startups D&I Chief Strategy Officer @money2020 Prev CIO #banking Contributor @forbes #Speaker inquiries: https://t.co/eyqmvTgXGS

BonnieLynn Marie 🍀🎩🏴󠁧󠁢󠁳󠁣󠁴󠁿

@TheLastPirateLA

BSN/RN RNC-MNN/CLE: Author, Breastfeeding, & Pediatric/Diabetes, Educator, Film Artist, Photographer, stand-up artist, UCSF Master's Program, ARMY VET.

📈Steven Stadler 🇵🇭🇩🇪

@joermungandr

2 months ago

This misses the point that Mythos and future models will potentially commoditize zero-day discovery. Him saying it's easy, people just don't bother feels more like ego than analysis. If LLMs can soon do what he does, the incentive structure changes completely, which is exactly why cybersecurity risk matters.

0

20

joermungandr retweeted

Chris Hayduk

@ChrisHayduk

2 months ago

I strongly suspect that Claude Mythos is a looped language model, as described in the paper "Scaling Latent Reasoning via Looped Language Models" from ByteDance The authors of that paper called out graph search as one of the areas where looping provides a huge theoretical advantage over standard RLVR. And look at where Mythos blows out its competitors the most

ChrisHayduk's tweet photo. I strongly suspect that Claude Mythos is a looped language model, as described in the paper "Scaling Latent Reasoning via Looped Language Models" from ByteDance

The authors of that paper called out graph search as one of the areas where looping provides a huge theoretical advantage over standard RLVR. And look at where Mythos blows out its competitors the most

110

4K

356

3K

599K

📈Steven Stadler 🇵🇭🇩🇪

@joermungandr

2 months ago

Most teams using LLMs as classifiers in automated decisions never calibrate their confidence scores. Many don't even generate them at all. 😱 How do you approach this? #ai #llm

0

22

joermungandr retweeted

Hao Wang

@MogicianTony

2 months ago

SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵

MogicianTony's tweet photo. SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits.

Our agent scored 100% on both. It solved 0 tasks.

Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵

22

679

90

371

828K

📈Steven Stadler 🇵🇭🇩🇪

@joermungandr

2 months ago

'Me finish code. Give new task!' https://t.co/7mljFpxLpi

0

16

joermungandr retweeted

Aakash Gupta

@aakashgupta

2 months ago

Zuckerberg paid $14.3 billion for a 28-year-old who had never trained a frontier model. Nine months later, that bet just shipped. The benchmark table tells you exactly what kind of lab Wang built. Muse Spark leads or ties Opus 4.6 and GPT 5.4 on multimodal perception, health queries, and visual reasoning. MedXpertQA, SimpleVQA, ScreenSpot Pro, CharXiv. These are all data-quality-sensitive benchmarks where training set curation determines the ceiling. Where it gets destroyed: ARC AGI 2 (42.5 vs 76.5 Gemini), Terminal-Bench (59.0 vs 75.1 GPT 5.4), GDPval office tasks (1444 vs 1672 GPT 5.4). Coding and abstract reasoning. The exact categories where architecture innovation and RL scaling matter more than data. This is a data labeling CEO's model. The fingerprints are all over the results. Wang spent seven years learning which benchmarks respond to better data and which ones require something else entirely. Muse Spark maxed out the first category and exposed the gap in the second. The $14.3B question was always whether the guy who built the best data pipeline in AI could build the best model. The answer so far: he built the best model at the things data pipelines solve, and a mediocre one at everything else. The move nobody's pricing: Meta said larger models are already in development, private API today, open-source future versions. Wang called this "step one." If the next model closes the coding and reasoning gap, Meta goes from also-ran to three-horse race. If it doesn't, they spent $14.3 billion to build a very good medical chatbot for 3 billion users. Both outcomes are interesting. Only one justifies the stock moving 9%.

85

3K

226

2K

990K

joermungandr retweeted

Mo

@atmoio

2 months ago

Claude Mythos is Delusional

571

11K

1K

5K

2M

joermungandr retweeted

Sebastian Raschka

@rasbt

2 months ago

Strong release! GLM-5.1 is a DeepSeek-V3.2-like architecture (including MLA and DeepSeek Sparse Attention) but with more layers. And the benchmarks look better throughout! Looks like THE flagship open-weight model now.

rasbt's tweet photo. Strong release! GLM-5.1 is a DeepSeek-V3.2-like architecture (including MLA and DeepSeek Sparse Attention) but with more layers.

And the benchmarks look better throughout! Looks like THE flagship open-weight model now. https://t.co/8kzTXaFcJv

37

1K

148

361

98K

joermungandr retweeted

Elon Musk

@elonmusk

about 2 years ago

🥰 Happy Mother’s Day 🥰

16K

649K

42K

4K

100M

joermungandr retweeted

François Chollet

@fchollet

about 2 years ago

Many of the people who are concerned with falling birthrates aren't willing to consider the set policies that would address the problem -- aggressive tax breaks for families, free daycare, free education, free healthcare, and building more/denser housing to slash the price of homes. Most people want children, but can't afford them.

170

1K

118

244

559K

joermungandr retweeted

Elon Musk

@elonmusk

over 2 years ago

@pmarca Optimism is better than pessimism

244

3K

229

60

182K

joermungandr retweeted

Not Elon Musk

@ElonMuskAOC

almost 3 years ago

I spent $44 billion for this app and now Lizard boy just decided to hit copy and paste. It’s personal now. See you in the cage, Zuck.

36K

744K

68K

7K

65M

joermungandr retweeted

Snowflake @Snowflake

almost 3 years ago

"@NVIDIA brings its #AI computing platform to cloud data firm Snowflake" enabling customers to build AI models using their own data. https://t.co/sJPi2Nv82Q (via @Reuters)

Snowflake's tweet photo. "@NVIDIA brings its #AI computing platform to cloud data firm Snowflake" enabling customers to build AI models using their own data. https://t.co/sJPi2Nv82Q (via @Reuters) https://t.co/1T8HGndADh

10

152

38

9

27K

joermungandr retweeted

Aakash Gupta

@aakashgupta

about 3 years ago

🚨 LEAKED: Meta CEO Mark Zuckerberg’s thoughts on the Apple Vision Pro. He e-mailed this to the company. 1. The reveal was news to him as much as anybody:

aakashgupta's tweet photo. 🚨 LEAKED:

Meta CEO Mark Zuckerberg’s thoughts on the Apple Vision Pro.

He e-mailed this to the company.

1. The reveal was news to him as much as anybody: https://t.co/SU1VVQCJv4

242

6K

782

2K

4M

joermungandr retweeted

ELK

@elktalkstech

about 3 years ago

Did apple just lowkey launch a v1 brain-machine-interface into the Vision Pro? One of their ex designers just tweeted this: “One of the coolest results involved predicting a user was going to click on something before they actually did. That was a ton of work and something I’m proud of. Your pupil reacts before you click in part because you expect something will happen after you click. So you can create biofeedback with a user's brain by monitoring their eye behavior, and redesigning the UI in real time to create more of this anticipatory pupil response. It’s a crude brain computer interface via the eyes, but very cool”

elktalkstech's tweet photo. Did apple just lowkey launch a v1 brain-machine-interface into the Vision Pro?

One of their ex designers just tweeted this:

“One of the coolest results involved predicting a user was going to click on something before they actually did. That was a ton of work and something I’m proud of. Your pupil reacts before you click in part because you expect something will happen after you click. So you can create biofeedback with a user's brain by monitoring their eye behavior, and redesigning the UI in real time to create more of this anticipatory pupil response. It’s a crude brain computer interface via the eyes, but very cool”

252

12K

2K

3M

joermungandr retweeted

Mark Wlosinski

@LTI_finance

about 3 years ago

Apple just announced its first major new product since 2015. Introducing the groundbreaking AR headset ‘Apple Vision Pro’ Here’s some of its most amazing features:

LTI_finance's tweet photo. Apple just announced its first major new product since 2015.

Introducing the groundbreaking AR headset ‘Apple Vision Pro’

Here’s some of its most amazing features: https://t.co/CUcoJbWJmS

1K

32K

7K

5K

11M

📈Steven Stadler 🇵🇭🇩🇪

@joermungandr

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users