Hai-Anh Trinh @NDivergent_AI - Twitter Profile

NDivergent_AI retweeted

Thomas Wolf

@Thom_Wolf

3 months ago

the attack surface keeps increasing

12

274

50

98

58K

Hai-Anh Trinh @NDivergent_AI

4 months ago

This is important. I hypothesize that RL post-training will eventually require as much compute, or more than pre-training.

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

4 months ago

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text "constructing a multiple-choice question-answering version of the fill-in-the-middle task" "Given a source text, we prompt an LLM to identify and mask key reasoning steps, then generate a set of diverse, plausible distractors." "GooseReason effectively revives models saturated on existing RLVR data" "GooseReason-Cyber sets a new state-of-the-art in cybersecurity, surpassing a 7B domain-specialized model with extensive domain-specific pre-training and post-training"

iScienceLuvr's tweet photo. Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

"constructing a multiple-choice question-answering version of the fill-in-the-middle task"

"Given a source text, we prompt an LLM to identify and mask key reasoning steps, then generate a set of diverse, plausible distractors."

"GooseReason effectively revives models saturated on existing RLVR data"

"GooseReason-Cyber sets a new state-of-the-art in cybersecurity, surpassing a 7B domain-specialized model with extensive domain-specific pre-training and post-training"

15

214

27

189

14K

0

51

Hai-Anh Trinh @NDivergent_AI

4 months ago

1.5M API keys leaked via Moltbook https://t.co/iqq3d1zXZE

0

39

NDivergent_AI retweeted

Ryan Grim

@ryangrim

5 months ago

Drop Site obtained harrowing footage of the latest killing which appears to be from the perspective of the woman in pink filming from the sidewalk

6K

111K

31K

29K

24M

Who to follow

Gagan Singh

@arre_yaar_gagan

Engineering @smallcaseHQ. gaming, computers, MUFC and food is all I think about.

Jon Paul Davis

@jonpauldavis

My personal account. For agriculture and science tweets related to my teaching position, see @MrDBio.

Steve Deckert

@stevedeckert

✌️ CX, ecommerce, outdoors, startups, partnerships, biz dev, type II fun, & strategy. Biz Dev @Automattic Co-founder @smilerewards

NDivergent_AI retweeted

Jonny G 🇺🇦

@dontforgetchaos

5 months ago

I make no apology for posting this photo. I think it’s a photo that will haunt America in the years to come. This is what you have become. If you defend this, have a word with yourself.

dontforgetchaos's tweet photo. I make no apology for posting this photo. I think it’s a photo that will haunt America in the years to come.

This is what you have become. If you defend this, have a word with yourself. https://t.co/sJkq1YxwAl

20K

267K

65K

16K

9M

Hai-Anh Trinh @NDivergent_AI

5 months ago

15m talk at NeurIPS 2025 https://t.co/ePq71IgeWX

Lin Shi @LinShi592021

5 months ago

Also, super honored to give my first conference talk at NeurIPS 2025 about Terminal Bench, Harbor, and Adapters! If you are interested in our work or want to gain some context in 15 minutes, this might be a great resource👀

1

3

0

175

0

1

0

61

Hai-Anh Trinh @NDivergent_AI

5 months ago

A significant community efforts to evaluate long-horizon coding agents on terminal environments.

Mike A. Merrill

@Mike_A_Merrill

5 months ago

The Terminal-Bench paper is here! Read it to learn where frontier models still fail and the secrets of how we sourced hundreds of high quality environments from our open source community. 🧵

Mike_A_Merrill's tweet photo. The Terminal-Bench paper is here! Read it to learn where frontier models still fail and the secrets of how we sourced hundreds of high quality environments from our open source community. 🧵 https://t.co/juIviCM1jX

21

458

102

249

104K

1

0

70

Hai-Anh Trinh @NDivergent_AI

5 months ago

This addresses an important fundamental problem with RL, that is RL training sucks the bits of supervision in all the steps/tokens in the answer through a tiny straw that is the reward scalar. Textual feedbacks are much more nuanced and informative.

Yoonho Lee

@yoonholeee

6 months ago

Following the Text Gradient at Scale We wrote a @StanfordAILab blog post about the limitations of RL methods that learn solely from scalar rewards + a new method that addresses this Blog: https://t.co/rJ1IcBKDoR Paper: https://t.co/75pHtElyk3

yoonholeee's tweet photo. Following the Text Gradient at Scale

We wrote a @StanfordAILab blog post about the limitations of RL methods that learn solely from scalar rewards + a new method that addresses this

Blog: https://t.co/rJ1IcBKDoR
Paper: https://t.co/75pHtElyk3 https://t.co/YSkVn7lTjm

20

547

84

547

179K

0

34

Hai-Anh Trinh @NDivergent_AI

over 1 year ago

@rakyll This sounds like the origin of @raydistributed project: distributed agents/actors and realtime RL.

0

334

Hai-Anh Trinh @NDivergent_AI

over 1 year ago

@nrehiew_ Source?

0

144

NDivergent_AI retweeted

Graham Neubig

@gneubig

over 1 year ago

How far are we from having competent AI co-workers that can perform tasks as varied as software development, project management, administration, and data science? In our new paper, we introduce TheAgentCompany, a benchmark for AI agents on consequential real-world tasks.

gneubig's tweet photo. How far are we from having competent AI co-workers that can perform tasks as varied as software development, project management, administration, and data science?

In our new paper, we introduce TheAgentCompany, a benchmark for AI agents on consequential real-world tasks. https://t.co/NFWcT7M8uc

19

826

144

793

127K

Hai-Anh Trinh @NDivergent_AI

over 1 year ago

This has potential to replace Roberta as the workhorse Transformer model.

Jeremy Howard

@jeremyphoward

over 1 year ago

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵

jeremyphoward's tweet photo. I'll get straight to the point.

We trained 2 new models. Like BERT, but modern. ModernBERT.

Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff.

It's much faster, more accurate, longer context, and more useful. 🧵 https://t.co/69JlgOpVa4

127

5K

654

3K

437K

0

227

NDivergent_AI retweeted

Ramez Naam

@ramez

over 1 year ago

It's not over until it's over.

9

64

3

2

18K

NDivergent_AI retweeted

Mehrdad Farajtabar @MFarajtabar

over 1 year ago

1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series. https://t.co/2tv8Pp9MSz Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel. #LLM #Reasoning #Mathematics #AGI #Research #Apple

MFarajtabar's tweet photo. 1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the recent OpenAI GPT-4o and o1-series.
https://t.co/2tv8Pp9MSz

Work done with @i_mirzadeh, @KeivanAlizadeh2, Hooman Shahrokhi, Samy Bengio, @OncelTuzel.

#LLM #Reasoning #Mathematics #AGI #Research #Apple

386

6K

1K

5K

2M

Hai-Anh Trinh @NDivergent_AI

over 1 year ago

@polynoamial @scale_AI @cais @OpenAI Not the last exam, but ARC is pretty good https://t.co/kLb8e7DY10

ARC Prize

@arcprize

over 1 year ago

We put OpenAI o1 to the test against ARC Prize. Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet. Can chain-of-thought scale to AGI? What explains o1's modest scores on ARC-AGI? Our notes: https://t.co/sV6LM1foGx

arcprize's tweet photo. We put OpenAI o1 to the test against ARC Prize.

Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet.

Can chain-of-thought scale to AGI? What explains o1's modest scores on ARC-AGI?

Our notes:
https://t.co/sV6LM1foGx https://t.co/xLPqLRbSaU

44

831

145

263

403K

0

1

0

33

Hai-Anh Trinh @NDivergent_AI

over 1 year ago

@alexandr_wang Not the last exam, but ARC is pretty good https://t.co/kLb8e7DY10

ARC Prize

@arcprize

over 1 year ago

We put OpenAI o1 to the test against ARC Prize. Results: both o1 models beat GPT-4o. And o1-preview is on par with Claude 3.5 Sonnet. Can chain-of-thought scale to AGI? What explains o1's modest scores on ARC-AGI? Our notes: https://t.co/sV6LM1foGx

44

831

145

263

403K

0

1

0

97

NDivergent_AI retweeted

Yann LeCun

@ylecun

over 1 year ago

ZML: a high-performance AI inference stack that can parallelize and run deep learning systems on lots of different hardware. It's out of stealth, impressive, and open source.

26

2K

223

840

210K

NDivergent_AI retweeted

François Chollet

@fchollet

almost 2 years ago

Another paper pointing out in details what we've known for a while: LLMs (used via prompting) cannot make sense of situations that substantially differ from the situations found in their training data. Which is to say, LLMs do not possess general intelligence to any meaningful degree. What LLMs can be good for, is to serve as knowledge/routine stores for an actual AGI. They're a memory -- a representation of a data corpus -- and memory is a necessary component of intelligence. But keep in mind that intelligence is not just memory.

64

2K

377

959

258K

NDivergent_AI retweeted

David Fickling @davidfickling

almost 2 years ago

Famous European wines like Champagne and Barolo have been built on a marriage of weather, terroir and grape varieties. In an era of climate change, the rigidity that makes many of these wines unique will make them vulnerable, too: https://t.co/mqu9WxE5mZ via @opinion

1

38

16

7

14K

NDivergent_AI retweeted

Jesse D. Jenkins

@JesseJenkins

almost 2 years ago

A reminder that China has pledged to peak their emissions by 2030. If they wind up reaching a peak six years early, even if it's a plateau for some time, that's a major development in the effort to peak and reduce global emissions...

8

235

52

15

23K

Hai-Anh Trinh

@NDivergent_AI

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users