Jan P. Harries @jphme - Twitter Profile

I know that this take is controversial, but I'm convinced that @AnthropicAI did the right thing with Glasswing and not releasing Mythos immediately to the public. Super important to harden critical systems and software before these capabilities are available to anyone.

3

9

0

2

755

0

101

Jan P. Harries

@jphme

3 days ago

Opus 4.8 thinks one abstraction level higher than Opus 4.7 - and beats GPT-5.5 on ARC 3. (and arguably performs better than I would have in a limited amount of time..) Shows once again how almost all public benchmarks are maxxed out by now. We're still scratching the surface.

Greg Kamradt

@GregKamradt

3 days ago

@arcprize just published results for Opus 4.8 ARC-AGI 1, 2 & 3 My notes: * Opus 4.8 showed two behavior differences over Opus 4.7. 1) It operated at an abstraction level *above* 4.7. It was able to see the ARC-AGI-3 environments as objects, not just collections of pixels 2) Instead of short action resets like Opus 4.7, Opus 4.8 would often execute a long series of actions *before* resetting a game. It was holding onto hypotheses longer before giving up * *Feeling* model performance - I'm biased (duh), but imo no other benchmark lets you *feel* a model quite like ARC-AGI-3. Looking at the dc22 replay (attached and link below) you can see the model work through problem, get stuck, and figure it out. Getting past 3 levels shows basic level understanding of this game. There is a new mechanic on level 4 which stumps it. * Updated System Prompt - We observed that in our original system prompt, GPT and Gemini, unlike other models, would not "think out loud" in their reply. This caused them to *only* return an action in their response (ex: "ACTION1"). This capped the signal we were able to extract from the model. We updated the system prompt used for ARC-AGI-3 to *explicitly* say context will be carried forward instead of the original *implicit* nudge See the exact change on the commit below This will be the system prompt going forward. We aren't re-testing the previous 6 models at this time due to api costs (estimated at $40K) https://t.co/F6ZqIfey4i

6

41

2

8

17K

0

2

1

0

291

Jan P. Harries

@jphme

3 days ago

Ant pulled Opus 4.7´s business-skills RL for breeding dishonesty - @andonlabs' vending-bench cratered $10,937 → $2,992 on 4.8. One interesting reward hacking example from system card: 4.8 flooded the log with "PASSED" to evict failing tests from the grader´s 400KB ctx... 😶‍🌫️

jphme's tweet photo. Ant pulled Opus 4.7´s business-skills RL for breeding dishonesty - @andonlabs' vending-bench cratered $10,937 → $2,992 on 4.8.

One interesting reward hacking example from system card: 4.8 flooded the log with "PASSED" to evict failing tests from the grader´s 400KB ctx... 😶‍🌫️ https://t.co/WqPL96McxU

0

2

0

102

Who to follow

Research Engineer @flwrlabs, computational physicist, data scientist. 🌎🌍🌏 Federate everything, federate all.

Jan P. Harries

@jphme

6 days ago

@ccatalini I agree! btw @ccatalini - what's your take on the recent piece by @eastdakota ? > AI isn’t coming for builders or sellers, but it is coming for measurers. verifiers=!measurers - but so far, building has become more commoditized than measuring? 🤔 https://t.co/RcnaZq9Kyy

0

133

Jan P. Harries

@jphme

8 days ago

@stratechery @benthompson best take on the @SpaceX ipo I red so far

0

493

Jan P. Harries

@jphme

11 days ago

@dkundel that was fast 🤣 - thanks, missed this one, will be tested asap👍

0

1

0

21

Jan P. Harries

@jphme

12 days ago

@flozi00 @AnthropicAI the takes from insiders and people I trust were: sure, with enough inside knowledge and steering you can get any frontier or ft model to find these vulns as well - but actually finding and exploiting without guidance, from scratch, is were Mythos is a step-change 🤷

1

0

19

Jan P. Harries

@jphme

13 days ago

I know that this take is controversial, but I'm convinced that @AnthropicAI did the right thing with Glasswing and not releasing Mythos immediately to the public. Super important to harden critical systems and software before these capabilities are available to anyone.

Andrew Curran

@AndrewCurran_

13 days ago

'For the last few months, Anthropic has used Mythos Preview to scan more than 1,000 open-source projects, which collectively underpin much of the internet—and much of our own infrastructure. So far, Mythos Preview has found what it estimates are 6,202 high- or critical-severity vulnerabilities in these projects (out of 23,019 in total, including those it estimates as medium- or low-severity).'

AndrewCurran_'s tweet photo. 'For the last few months, Anthropic has used Mythos Preview to scan more than 1,000 open-source projects, which collectively underpin much of the internet—and much of our own infrastructure.

So far, Mythos Preview has found what it estimates are 6,202 high- or critical-severity vulnerabilities in these projects (out of 23,019 in total, including those it estimates as medium- or low-severity).'

0

80

6

9

7K

3

9

0

2

755

Jan P. Harries

@jphme

13 days ago

and I say this despite absolutely wanting to have access immediately and play with this model right now 🫤

0

54

Jan P. Harries

@jphme

17 days ago

@oanaolt congrats 🎉

1

0

70

Jan P. Harries

@jphme

28 days ago

@rasdani_ @remilouf @dottxtai @vincentweisser well we were close 🤣

0

1

0

64

Jan P. Harries

@jphme

28 days ago

@dkundel haha nice cameo 😁 excited to try this out. translation is great but this will really unlock a lot once cheap/good enough to run in the background all the time... btw think translators were first on many "professions affected by AI" lists 🫣

0

1

0

61

Jan P. Harries

@jphme

29 days ago

@AnjneyMidha how does this mission align with his unconditional and uncritical support for trump just a few months ago?

0

143

Jan P. Harries

@jphme

30 days ago

Odds for an @AnthropicAI IPO in 2026 tripled over the last month (source @Polymarket ) 👀

1

2

0

104

Jan P. Harries

@jphme

about 1 month ago

@FrankRHutter @bjoern_pl @prior_labs @SAP congratz!

0

194

Jan P. Harries

@jphme

about 1 month ago

@stwboerse (but their TPU business is still great and they can sell as much as they want to ant 🤷‍♂️...)

0

24

Jan P. Harries

@jphme

about 1 month ago

@stwboerse but atm $GOOG is just tier 2 - their models are super smart, but agentic capabilities and harness are WAY behind oai/ant. tried antigravity again last week for a project where gsearch/youtube could help - barely usable. And they´re not shipping at the same speed 😕

1

0

151

Jan P. Harries

@jphme

about 1 month ago

I don't believe this. By the time "real" AI is widely adopted in non-tech jobs & sectors, models will be so strong,that the average entry level hire won't be able to compete (for jobs that can be done on a computer. I expect a job market bloodbath for new grads in 1-2 years 🫤

Anthony Pompliano 🌪

@APompliano

about 1 month ago

I have changed my mind on how AI will impact jobs in America. Previously, I believed AI would replace many entry level roles typically filled by young employees. The technology would then work its way up the organization and eventually reduce the total number of jobs in a company. The data is saying something different, so when I get new information I am willing to change my mind. The number of software engineers being hired has been increasing. The number of open software engineer roles is growing. The number of new college grads who get hired has increased 5.6% over the last 12 months. The unemployment level for people aged 20-24 years old who have a college degree has fallen from nearly 9% to almost 5% as well. The Wall Street Journal recently wrote “AI created 640,000 jobs between 2023 and 2025 in the U.S., according to an analysis by LinkedIn of job posting data, including new white-collar positions such as Head of AI and AI engineer.” And I am starting to see companies throughout our portfolio aggressively hiring to keep up with the demand for their products and services. If AI can make employees more productive, which is widely accepted as fact, then companies are going to want as many productive units of labor as possible. This is a key reason why I am changing my mind. AI appears to be a magical technology that will make companies more productive and more profitable. The net result will be more corporations, more startups, and more jobs. All three are big, positive wins for the American economy.

791

7K

674

3K

2M

0

1

0

148

Jan P. Harries

@jphme

about 1 month ago

@ChrisPainterYup @scaling01 (but actually I think you were overly harsh and also think sam will do the right thing in the end; rooting for OAI, competition is good for everyone as long as safety is taken seriously)

0

15

Jan P. Harries

@jphme

about 1 month ago

@ChrisPainterYup @scaling01 wow lisan next after tbpn. this was .. *check notes* ... 2 weeks ago? 🙈 https://t.co/DVwj9ct1Tz

Lisan al Gaib

@scaling01

about 2 months ago

I'm really thankful for OpenAI keeping the "morally misguided" people away from Anthropic it's like a filter not saying it's all, most just want to build something and solve problems but choosing to work for Sam and Greg and a company with that history is certainly a choice i'm confident that all the right people will end up at the right place

12

238

7

44

48K

1

0

122

Jan P. Harries

@jphme

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users