Jesse @jessearmand - Twitter Profile

about 5 hours ago

We could already use more Haskell, or rust. I don’t see how it couldn’t be done. But, would that solve the problem, because there are still real human effort there porting every financial or infrastructure system to type-safe functional language It’s also not the priority of most business

0

13

Jesse

@jessearmand

about 8 hours ago

Formal language will not be better for language model in a similar way that it’s not working well for humans

1

0

20

Jesse

@jessearmand

about 7 hours ago

The original motivation of AI is actual knowledge and capability transfer to a biological human, cyborgization Not what we have right now, being subjected under uncontrollable power

0

6

Jesse

@jessearmand

about 9 hours ago

3 years ago we didn’t expect that there’d be any new capable models coming out. Honest question, how much life would’ve changed if we could use Fable, Mythos, GPT-5.6?

1

0

47

Who to follow

about 9 hours ago

If FDEs are the ones deploying the models to the enterprise infrastructure, that is still a software and infrastructure that everyone needs to tinker with, it’s not “general intelligence” in the sense that a human is there to get the work done. Then I think the real measure of improvement is when a human could work in area where he/she is not an expert and able to contribute significantly with the help of these models

1

0

18

Jesse

@jessearmand

about 7 hours ago

Disregarding gemini, all of this looks like a nonsensical sci-fi story

˚♡⋆mimi ˚♡⋆｡☆∴

@mimi10v3

about 23 hours ago

tfw you can't even chat about the news with your favorite ai because they keep thinking it is all a sci fi hallucination

30

427

36

50

14K

0

15

jessearmand retweeted

Tenobrus

@tenobrus

2 days ago

GPT 5.6 Sol cheats so much METR was not able to evaluate it with a meaningful time horizon https://t.co/VQczcEsp8V

23

598

35

78

57K

jessearmand retweeted

Lucas Beyer (bl16)

@giffmana

about 11 hours ago

I wonder if they'll find a way to blame AI.

15

187

2

6

21K

jessearmand retweeted

Jesse Michael Han

@jessemhan

about 19 hours ago

Newest from @gwern who also called the importance of the scaling paradigm right after GPT-3: "In particular, the Lean programming language likely has, with 2026-era LLMs, a worse baseline constant and total loss on existing codebases, but better scaling exponents. This would imply that implementations in Lean can eventually win and deliver large benefits in program correctness at global scale—and thus could help justify large-scale investments in rewriting existing codebases in Lean or paying for new Lean code, thereby improving global cybersecurity."

jessemhan's tweet photo. Newest from @gwern who also called the importance of the scaling paradigm right after GPT-3:

"In particular, the Lean programming language likely has, with 2026-era LLMs, a worse baseline constant and total loss on existing codebases, but better scaling exponents. This would imply that implementations in Lean can eventually win and deliver large benefits in program correctness at global scale—and thus could help justify large-scale investments in rewriting existing codebases in Lean or paying for new Lean code, thereby improving global cybersecurity."

11

294

26

267

62K

jessearmand retweeted

Bojan Tunguz

@tunguz

2 days ago

What exactly did you guys think "permanent underclass" was gonna look like? Having access to all the same models and resources as the insider elite? LOL

51

2K

103

122

71K

Jesse

@jessearmand

about 16 hours ago

If you’d like to understand the exponentials, RSI, and the situation around benchmarks here’s what you should listen to

sarah guo

@saranormous

2 days ago

Really fun to hang again with my friend 🃏 @polynoamial (OpenAI research scientist, our first guest ever on @NoPriorsPod in early 2023) to talk about the implications of large test-time compute, and what happens when models are given $10M budgets to spend on a single task. Topics: 01:23 – Why Benchmarks Are Broken 04:19 – Compute Budgets and Projections 06:48 – How Long Should Models Think? 08:01 – Benchmarkmaxxing 09:48 – Noam's Evals 12:40 – Safety (When Model Capability Scales With Spend) 16:09 – Implications For the Model Release Cycle 18:34 – Latent Model Capability 22:27 – Limits on Recursive Self-Improvement 28:38 – Large-Scale Multi-Agent Coordination 30:39 – Competition at the Frontier 33:19 – Breaking the Benchmark Grid Equilibrium 34:57 – Why Benchmarks Should be Scaled by Cost

22

532

44

628

74K

0

30

jessearmand retweeted

Rhys

@RhysSullivan

about 24 hours ago

https://t.co/Mqwtlvq7X7

56

639

32

654

79K

Jesse

@jessearmand

about 17 hours ago

This isn’t exactly what Opus does or any other models I interacted with. The models will analyze various approaches of a situation, as long as it’s in the data, it will explain that to the degree appropriate to the context, influenced by temperature if it’s configurable as a parameter. For some context it can be a bigger problem or bigger factor than it actually is. That’s what a frontier LLM is capable of. It even discouraged an LLM-based architecture of a system if it thinks that the problem can be solved in a much more efficient manner The model is not a decision maker most of the time. You can also notice this for simpler things like travel planning

Jaya Gupta

@JayaGup10

about 23 hours ago

VCs are now sharing screenshots in group chats of Claude discouraging investment in open-source AI infra startups and models. Obviously there is an absolute EXPLOSION of pitches in inference companies, harness companies, RL-as-a-service companies, open-source tooling currently including Neolabs that plan to open source models as well. Now the obvious takeaway is: “Claude is biased against open source.” Who cares? The more unsettling take is: every major AI model has a worldview, and that worldview is becoming embedded in capital allocation. If Claude’s safety priors cause it to frame open-source AI as dangerous, hard to govern, or less fundable, it’s probably doing the same thing in enterprise buying workflows. Now investors and executives are obviously smarter, but “influence” just changes which risks get highlighted, which questions buyers should ask, and which vendors are suggested…..

32

377

37

140

59K

1

0

50

Jesse

@jessearmand

1 day ago

The days where models are sycophantic is over, they’ll overload you with analysis from various angle, and won’t take a stance by default

0

12

Jesse

@jessearmand

1 day ago

I think Claude is a better advisor on high level complex problems than most people. The models shouldn’t be treated as weapons

1

0

20

Jesse

@jessearmand

1 day ago

Opus will likely disagree with you though

1

0

15

jessearmand retweeted

dax

@thdxr

1 day ago

the models are very capable. i'm not saying they're very smart. but the bottleneck with them is likely your own imagination proof is so many companies demo their ai product with "it can schedule stuff for you"

90

2K

49

152

79K

Jesse

@jessearmand

1 day ago

@GergelyOrosz I’ll push back on this regardless

0

116

Jesse

@jessearmand

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users