Kent Johnson

@kentyman

Austin, TX, USA

Joined March 2008

200 Following

80 Followers

1.3K Posts

Kent Johnson @kentyman

4 days ago

@gneubig Out of curiosity, when did OpenHands add Slack integration? Am I remembering correctly that it was present back in 2024?

287

Kent Johnson @kentyman

13 days ago

@MelonLeather @ArtificialAnlys Looks like it's part of their Coding Index, but not their top-level Intelligence Index: https://t.co/PVF6SF5oQu

Artificial Analysis

@ArtificialAnlys

17 days ago

We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task. The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others. More below.

ArtificialAnlys's tweet photo. We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top

DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task.

The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others.

More below.

114

184

414

570K

Kent Johnson @kentyman

about 1 month ago

@deanwball So bad TAM forecasting? ;)

Kent Johnson @kentyman

3 months ago

@OfficialLoganK What about the Responses API which superseded Chat Completions?

Who to follow

John Smart

@johnmsmart

Founder, Accel Studies Fdn, Evo-Devo Inst, Brain Pres Fdn. Complexity, Evol. Transitions, AGI, Personal AI, Good Foresight, Empathy, Values, Progress, Purpose.

The Soviet Exploration of Venus, Bossart: America's Forgotten Rocket Scientist. Bell Labs, Princeton University, Microsoft Research.

Kent Johnson @kentyman

4 months ago

@steipete @openclaw PRR (Pull Request Request)!

232

Kent Johnson @kentyman

4 months ago

@sama As far as I can tell, it's still a problem with 5.4: https://t.co/gfkS2FtMaZ

Kent Johnson @kentyman

4 months ago

@sama Can you confirm or deny whether this model contamination from gpt-5.3-codex is fixed in gpt-5.4*? We had to abandon it because of this: https://t.co/gBXf4R9d7s

Kent Johnson @kentyman

4 months ago

@ilex_ulmus Can you also post on Instagram?

Kent Johnson @kentyman

8 months ago

@ilex_ulmus Is this real? I searched but only found this, which looks way less pumpkiny: https://t.co/NFqaDsXcMj

Kent Johnson @kentyman

9 months ago

@dwarkesh_sp @RichardSSutton I was very disappointed with how many times he said the word "obvious". If you cannot explain your position without that tactic, it's a huge red flag.

Kent Johnson @kentyman

10 months ago

@controlai Where can I see these more recent podcasts? The feed I have stopped after 3 episodes.

Kent Johnson @kentyman

10 months ago

@alexalbert__ Am I seeing right that Claude Code and its Typescript SDK are not open source? It looks like their Python SDK is MIT-licensed at least, though not being able to plug in arbitrary models/providers is a bummer.

Kent Johnson @kentyman

12 months ago

@CartoonsHateHer @danshipper You've mentioned your OCD and using AI for therapy. Thoughts on this?

109

Kent Johnson @kentyman

about 1 year ago

@gneubig @nlpxuhui But what if you ask it to roast you? 😉

Kent Johnson @kentyman

about 1 year ago

@AIForHumansShow https://t.co/IaPaU6koSj

Rob Wiblin

@robertwiblin

about 1 year ago

Huge repository of information about OpenAI and Altman just dropped — 'The OpenAI Files'. There's so much crazy shit in there. Here's what Claude highlighted to me: 1. Altman listed himself as Y Combinator chairman in SEC filings for years — a total fabrication (?!): "To smooth his exit [from YC], Altman proposed he move from president to chairman. He pre-emptively published a blog post on the firm's website announcing the change. But the firm's partnership had never agreed, and the announcement was later scrubbed from the post." "...Despite the retraction, Altman continued falsely listing himself as chairman in SEC filings for years, despite never actually holding the position." (WTAF.) 2. OpenAI's profit cap was quietly changed to increase 20% annually — at that rate it would exceed $100 trillion in 40 years. The change was not disclosed and OpenAI continued to take credit for its capped-profit structure without acknowledging the modification. 3. Despite claiming to Congress he has "no equity in OpenAI," Altman held indirect stakes through Sequoia and Y Combinator funds. 4. Altman owns 7.5% of Reddit — when Reddit announced its OpenAI partnership, Altman's net worth jumped $50 million. Altman invested in Rain AI, then OpenAI signed a letter of intent to buy $51 million of chips from them. 5. Rumours suggest Altman may receive a 7% stake worth ~$20 billion in the restructured company. 5. OpenAI had a major security breach in 2023 where a hacker stole AI technology details but didn't report it for over a year. OpenAI fired Leopold Aschenbrenner explicitly because he shared security concerns with the board. 6. Altman denied knowing about equity clawback provisions that threatened departing employees' millions in vested equity if the ever criticised OpenAI. But Vox found he personally signed the documents authorizing them in April 2023. These restrictive NDAs even prohibited employees from acknowledging their existence. 7. Senior employees at Altman's first startup Loopt twice tried to get the board to fire him for "deceptive and chaotic behavior". 9. OpenAI's leading researcher Ilya Sutskever told the board: "I don't think Sam is the guy who should have the finger on the button for AGI". Sutskever provided the board a self-destructing PDF with Slack screenshots documenting "dozens of examples of lying or other toxic behavior. 10. Mira Murati (CTO) said: "I don't feel comfortable about Sam leading us to AGI" 11. The Amodei siblings described Altman's management tactics as "gaslighting" and "psychological abuse". 12. At least 5 other OpenAI executives gave the board similar negative feedback about Altman. 13. Altman owned the OpenAI Startup Fund personally but didn't disclose this to the board for years. Altman demanded to be informed whenever board members spoke to employees, limiting oversight. 14. Altman told board members that other board members wanted someone removed when it was "absolutely false". An independent review after Altman's firing found "many instances" of him "saying different things to different people" 15. OpenAI required employees to waive their federal right to whistleblower compensation. Former employees filed SEC complaints alleging OpenAI illegally prevented them from reporting to regulators. 16. While publicly supporting AI regulation, OpenAI simultaneously lobbied to weaken the EU AI Act. By 2025, Altman completely reversed his stance, calling the government approval he once advocated "disastrous" and OpenAI now supports federal preemption of all state AI safety laws even before any federal regulation exists. Obviously this is only a fraction of what's in the apparently 10,000 words on the site. Link below if you'd like to look over. (I've skipped over the issues with OpenAI's restructure which I've written about before already, but in a way that's really the bigger issue.)

24K

13K

112M

Kent Johnson @kentyman

about 1 year ago

@labenz Really wish they used a .md extension! https://t.co/o7Q386XMw8

Kent Johnson @kentyman

about 1 year ago

@AIForHumansShow Thats a tilde! This is an en dash: –

Kent Johnson @kentyman

almost 2 years ago

@Timbaland Guess so: https://t.co/S3mYS3KsSW

Kent Johnson @kentyman

almost 2 years ago

@Timbaland Surely someone is trying to fine-tune a GRRLLM to finish A Song of Ice and Fire... 🤔

101

Kent Johnson @kentyman

over 3 years ago · Texcoco

Already missing these vintage NLG sheets from back in the day when the Halcons and 49STA.FE were unstoppable. @ Texcoco, Mexico https://t.co/c7Jwd9wgYw

Kent Johnson

@kentyman

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users