Carlos Guestrin @guestrin - Twitter Profile

4 months ago

With SDPO, you can now do RL with natural language feedback, like error messages from coding environments or LLMs as judges. You can achieve huge gains over GRPO with scalar rewards!

Jonas Hübotter

@jonashubotter

4 months ago

Training LLMs with verifiable rewards uses 1bit signal per generated response. This hides why the model failed. Today, we introduce a simple algorithm that enables the model to learn from any rich feedback! And then turns it into dense supervision. (1/n)

jonashubotter's tweet photo. Training LLMs with verifiable rewards uses 1bit signal per generated response. This hides why the model failed.

Today, we introduce a simple algorithm that enables the model to learn from any rich feedback!
And then turns it into dense supervision.

(1/n) https://t.co/AR0yWgaKnL

22

1K

138

1K

211K

3

231

17

182

38K

guestrin retweeted

Ronak Malde

@rronak_

4 months ago

I don't think people have realized how crazy the results are from this new TTT + RL paper from Stanford/Nvidia. Training an open source model, they - beat Deepmind AlphaEvolve, discovered new upper bound for Erdos's minimum overlap problem - Developed new A100 GPU kernels 2x faster than the best human kernel - Outperformed the best AI coding attempt and human attempt on AtCoder The idea of Test Time Training is to train a model *while* it's iteratively trying to solve a task. Combining this with RL like they do in this paper opens up the floodgates of possibilities for continual learning Authors: @mertyuksekgonul @LeoXinhaoLee @JedMcCaleb @xiaolonw @jankautz @YejinChoinka @james_y_zou @guestrin @sun_yu_

rronak_'s tweet photo. I don't think people have realized how crazy the results are from this new TTT + RL paper from Stanford/Nvidia.

Training an open source model, they
- beat Deepmind AlphaEvolve, discovered new upper bound for Erdos's minimum overlap problem
- Developed new A100 GPU kernels 2x faster than the best human kernel
- Outperformed the best AI coding attempt and human attempt on AtCoder

The idea of Test Time Training is to train a model *while* it's iteratively trying to solve a task. Combining this with RL like they do in this paper opens up the floodgates of possibilities for continual learning

Authors: @mertyuksekgonul @LeoXinhaoLee @JedMcCaleb @xiaolonw @jankautz @YejinChoinka @james_y_zou @guestrin @sun_yu_

45

2K

197

2K

111K

Carlos Guestrin @guestrin

4 months ago

By specializing the model to each hard task using a simple entropic objective and RL, TTT-Discover writes new high-performing GPU kernels and improves open mathematical bounds.

0

8

0

3

2K

Carlos Guestrin @guestrin

4 months ago

Relational Transformers provide zero shot predictions for complex databases!

rishabh ranjan @_rishabhranjan_

7 months ago

Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time. Excited to finally share what I have been working on over the last year: a Foundation Model architecture which brings the power of Transformers to relational domains, enabling large-scale pretraining and zero-shot generalization in enterprise settings. 🧵1/n

_rishabhranjan_'s tweet photo. Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time.
Excited to finally share what I have been working on over the last year: a Foundation Model architecture which brings the power of Transformers to relational domains, enabling large-scale pretraining and zero-shot generalization in enterprise settings. 🧵1/n

5

151

40

89

60K

0

36

7

19

5K

Who to follow

Ryan Adams

@ryan_p_adams

Machine Learning Researcher, CS Professor (@PrincetonCS), Dad, Woodworker

AssistProf @CarnegieMellon. Distinguished Eng @NVIDIA. Creator of @XGBoostProject, @ApacheTVM. Member https://t.co/QYyfjQNp4p, @TheASF. Views are on my own

Carlos Guestrin @guestrin

5 months ago

TTT-E2E compresses the context simply through end-to-end test-time training — outperforms the loss of transformers with full attention, and achieves constant-time latency per token.

Karan Dalal

@karansdalal

5 months ago

LLM memory is considered one of the hardest problems in AI. All we have today are endless hacks and workarounds. But the root solution has always been right in front of us. Next-token prediction is already an effective compressor. We don’t need a radical new architecture. The missing piece is to continue training the model at test-time, using context as training data. Our full release of End-to-End Test-Time Training (TTT-E2E) with @NVIDIAAI, @AsteraInstitute, and @StanfordAILab is now available. Blog: https://t.co/woCpiIrq0T Arxiv: https://t.co/3VkFlS3wx3 This has been over a year in the making with @arnuvtandon and an incredible team.

karansdalal's tweet photo. LLM memory is considered one of the hardest problems in AI.

All we have today are endless hacks and workarounds. But the root solution has always been right in front of us.

Next-token prediction is already an effective compressor. We don’t need a radical new architecture. The missing piece is to continue training the model at test-time, using context as training data.

Our full release of End-to-End Test-Time Training (TTT-E2E) with @NVIDIAAI, @AsteraInstitute, and @StanfordAILab is now available.

Blog: https://t.co/woCpiIrq0T
Arxiv: https://t.co/3VkFlS3wx3

This has been over a year in the making with @arnuvtandon and an incredible team.

91

2K

320

2K

574K

0

5

1

794

Carlos Guestrin @guestrin

7 months ago

The Relational Transformer (RT) provides strong zero-shot predictions from relational data, e.g., databases.

rishabh ranjan @_rishabhranjan_

7 months ago

Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time. Excited to finally share what I have been working on over the last year: a Foundation Model architecture which brings the power of Transformers to relational domains, enabling large-scale pretraining and zero-shot generalization in enterprise settings. 🧵1/n

5

151

40

89

60K

1

12

0

4

2K

Carlos Guestrin @guestrin

about 1 year ago

SAIL Postdoctoral Fellows is an awesome opportunity to collaborate with the amazing faculty and students in the Stanford AI Lab!

Stanford AI Lab

@StanfordAILab

about 1 year ago

SAIL is still accepting applications for the SAIL Postdoctoral Fellowships! This is an opportunity to work with our wonderful professors and community. Applications received by the end of April 30 will receive full consideration: https://t.co/3fJEqwlkO8

1

13

20

6

9K

1

7

1

1K

Carlos Guestrin @guestrin

about 1 year ago

The SAIL Postdoctoral Fellowship is a fantastic opportunity for recent PhDs to do their most innovative and impactful AI research, while being part of a highly collaborative and welcoming environment at Stanford.

Stanford AI Lab

@StanfordAILab

about 1 year ago

Stanford AI Lab (SAIL) is excited to announce new SAIL Postdoctoral Fellowships! We are looking for outstanding candidates excited to advance the frontiers of AI with our professors and vibrant community. Applications received by the end of April 30 will receive full consideration: https://t.co/YXXggAwW4p

6

202

77

95

63K

0

2

0

1K

Carlos Guestrin @guestrin

about 1 year ago

We are super excited to empower developers to focus on their goal of building innovative AI applications; we’ll take care of safety and security! What an awesome ride with Bo Li @uiuc_aisecure, @sanmikoyejo, @dawnsongtweets and the whole @VirtueAI_co team!

Virtue AI

@VirtueAI_co

about 1 year ago

We’ve raised $30M in Seed + Series A funding led by @lightspeedvp and Walden Catalyst Ventures, with participation from Prosperity7 Ventures, Factory, Osage University Partners (OUP), Lip-Bu Tan, Chris Re, and more. Virtue AI is the first unified platform for securing AI across red teaming, guardrails, and agents. We’re proud to already support forward-thinking companies like @Uber , @glean, and several frontier labs. This investment will help us continue improving our platform, scale agentic workflows, build out integrated enterprise use cases, and grow our team. Grateful to our customers, our team, and our investors for the support.

3

48

12

9

9K

0

17

3

1

3K

guestrin retweeted

Liana @lianapatel_

over 1 year ago

Excited to share our release of LOTUS 1.1.0, which now makes it easier than ever to get started with LLM-powered data processing over a variety of custom knowledge sources. Seamlessly connect to data from the web, your SQL databases, vector databases and more. And process it all with the power of semantic operators. https://t.co/VWp0Y1VsyT

lianapatel_'s tweet photo. Excited to share our release of LOTUS 1.1.0, which now makes it easier than ever to get started with LLM-powered data processing over a variety of custom knowledge sources.

Seamlessly connect to data from the web, your SQL databases, vector databases and more.

And process it all with the power of semantic operators.

https://t.co/VWp0Y1VsyT

7

153

29

96

19K

guestrin retweeted

Stanford AI Lab

@StanfordAILab

over 1 year ago

SAIL is delighted to announce Carlos @guestrin, the Fortinet Founders Professor of Computer Science, as the next Director of @StanfordAILab. Carlos is a talented researcher and leader, known for his work on explainability, graphs, compilation, and boosted trees in AI.

StanfordAILab's tweet photo. SAIL is delighted to announce Carlos @guestrin, the Fortinet Founders Professor of Computer Science, as the next Director of @StanfordAILab. Carlos is a talented researcher and leader, known for his work on explainability, graphs, compilation, and boosted trees in AI. https://t.co/uLq2PWe2dR

6

97

23

2

11K

guestrin retweeted

Liana @lianapatel_

over 1 year ago

We've been building LOTUS at Stanford and Berkeley to make LLM-powered data processing fast, easy and declarative. LOTUS is an open-source query engine that makes programming as easy as writing Pandas and optimizes your programs for up to 400x speedups. To celebrate the holidays, we're excited to share our release of LOTUS 1.0.0 with a batch of new updates that make reasoning over your data faster, easier and better than ever! Code: https://t.co/qQVJ6Vg6fi 🧵👇

lianapatel_'s tweet photo. We've been building LOTUS at Stanford and Berkeley to make LLM-powered data processing fast, easy and declarative.

LOTUS is an open-source query engine that makes programming as easy as writing Pandas and optimizes your programs for up to 400x speedups.

To celebrate the holidays, we're excited to share our release of LOTUS 1.0.0 with a batch of new updates that make reasoning over your data faster, easier and better than ever!

Code: https://t.co/qQVJ6Vg6fi

🧵👇

17

1K

183

1K

160K

guestrin retweeted

Luke Bailey

@LukeBailey181

over 1 year ago

Can interpretability help defend LLMs? We find we can reshape activations while preserving a model’s behavior. This lets us attack latent-space defenses, from SAEs and probes to Circuit Breakers. We can attack so precisely that we make a harmfulness probe output this QR code. 🧵

11

368

82

217

59K

guestrin retweeted

Teddi Worledge @TeddiWorledge

over 1 year ago

🧵LLMs are great at synthesizing info, but unreliable at citing sources. Search engines are the opposite. What lies between them? Our new paper runs human evals on 7 systems across the✨extractive-abstractive spectrum✨for utility, citation quality, time-to-verify, & fluency!

TeddiWorledge's tweet photo. 🧵LLMs are great at synthesizing info, but unreliable at citing sources. Search engines are the opposite. What lies between them?

Our new paper runs human evals on 7 systems across the✨extractive-abstractive spectrum✨for utility, citation quality, time-to-verify, & fluency! https://t.co/IkXz24AMUz

1

66

21

38

14K

guestrin retweeted

Liana @lianapatel_

almost 2 years ago

Super excited to share our work on LOTUS, a query engine for reasoning over large corpuses of data with LLMs! Joint work w/ the amazing @sid_jha1, @matei_zaharia & @guestrin Read the paper: https://t.co/5BdLT7Atrm Try out the code: https://t.co/y7eRsHlbqi 🧵👇

8

185

30

136

34K

guestrin retweeted

Nicole Meister @nicole__meister

over 1 year ago

Prior work has used LLMs to simulate survey responses, yet their ability to match the distribution of views remains uncertain. Our new paper [https://t.co/DleesiPbif] introduces a benchmark to evaluate how distributionally aligned LLMs are with human opinions. 🧵

nicole__meister's tweet photo. Prior work has used LLMs to simulate survey responses, yet their ability to match the distribution of views remains uncertain.

Our new paper [https://t.co/DleesiPbif] introduces a benchmark to evaluate how distributionally aligned LLMs are with human opinions.

🧵 https://t.co/Q2dpSpZg5Q

4

155

36

75

28K

guestrin retweeted

Irena Gao @irena_gao

over 1 year ago

Many providers offer inference APIs for the same models: for example, there were over nine Llama-3 8B APIs in Summer 2024. Do all of these APIs serve the same completion distribution as the original model? In our new paper, ✨Model Equality Testing: Which Model is This API Serving?✨, we formalize this question as a two-sample distribution testing problem: the user collects samples for their task from the API and a reference distribution, and conducts a statistical test to see if the two distributions are the same. We design tests which show nontrivial power in detecting when models have been quantized, watermarked, or finetuned! (🧵 1/5)

irena_gao's tweet photo. Many providers offer inference APIs for the same models: for example, there were over nine Llama-3 8B APIs in Summer 2024. Do all of these APIs serve the same completion distribution as the original model?

In our new paper, ✨Model Equality Testing: Which Model is This API Serving?✨, we formalize this question as a two-sample distribution testing problem: the user collects samples for their task from the API and a reference distribution, and conducts a statistical test to see if the two distributions are the same.

We design tests which show nontrivial power in detecting when models have been quantized, watermarked, or finetuned! (🧵 1/5)

7

168

33

75

46K

Carlos Guestrin @guestrin

about 6 years ago

Our recent work combines Maps data with hospitalization and death records to model the COVID-19 transmission rates., finding that for many states increasing mobility will lead to a significant increase in cases.