Kevin Lu

23 days ago

The technical report includes our motivation, early evaluation results, and technical approach. https://t.co/AFJZ5kH7Ku

11

384

21

151

92K

23 days ago

we are excited to share our latest work on interactive human-AI collaboration! as intelligence increases, we think progress will be bottlenecked by the ability of AI to work *with* humans -- thereby enabling AI to positively impact the long tail of human experiences

Research scientist @DeepMind and lecturer @UCBerkeley. Interested in AI/education/visual art/nature. Previously @OpenAI and @UCBerkeley.

23 days ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku

461

16K

2K

12K

8M

2

72

1

4

2K

_kevinlu retweeted

Luozhu

@LuozhuZhang

about 1 month ago

https://t.co/X6NQ8XYazG

2

16

5

8

2K

Who to follow

Igor Mordatch

@IMordatch

Jason Weston

@jaseweston

Senior Director & RS @Meta + Visiting Prof NYU | OG in LLMs | Pretrain+Finetune in 2008+ | 148k+ citations | Current: Self-Improving & Co-Improving AI

Hao Liu

@haoliuhl

working on AI at FMR lab at amazon

about 2 months ago

tinker as a sandbox for giving autoresearch access to RL training infra 🙂

Dylan Huang

@dphuang2

about 2 months ago

I pointed Claude Code at a research task (build a golf forecasting system) and let it run for 49 hours on Tinker. No human in the loop. It ran 108 experiments. Here's the full trajectory, including the ones that made things worse.

dphuang2's tweet photo. I pointed Claude Code at a research task (build a golf forecasting system) and let it run for 49 hours on Tinker. No human in the loop.

It ran 108 experiments. Here's the full trajectory, including the ones that made things worse. https://t.co/z4li9rbHqj

11

314

19

317

72K

1

52

2

32

11K

_kevinlu retweeted

YujiaBao @yujia_bao

2 months ago

There's been a lot of excitement around auto-research, but one underappreciated bottleneck: coding agents struggle to run LLM training jobs at scale. A small infrastructure mistake can have major consequences on the output. I recently joined @thinkymachines, and @tinkerapi solves exactly this. It standardizes the training process — training a 1T parameter model is as simple as training a 4B one. That makes auto-research with coding agents like Claude Code actually viable.

7

94

3

46

13K

_kevinlu retweeted

Mira Murati

@miramurati

3 months ago

Grateful to Jensen and @nvidia team for their support. Together, we’re working to deploy at least 1GW of Vera Rubin systems, bringing adaptable collaborative AI to everyone. https://t.co/FiOL7SRbut

miramurati's tweet photo. Grateful to Jensen and @nvidia team for their support. Together, we’re working to deploy at least 1GW of Vera Rubin systems, bringing adaptable collaborative AI to everyone.
https://t.co/FiOL7SRbut https://t.co/WdUkeLl7sL

167

4K

284

488

560K

4 months ago

encouraging progress on continued test-time adaptation beyond model deployment! very excited about the future of personalized models, and developing reliable, easy-to-use pipelines to enable robust & personalized intelligence i think the "no TTT" baseline from Section 4.5 is particularly neat, justifying training with gradient steps at test time

_kevinlu's tweet photo. encouraging progress on continued test-time adaptation beyond model deployment!

very excited about the future of personalized models, and developing reliable, easy-to-use pipelines to enable robust & personalized intelligence

i think the "no TTT" baseline from Section 4.5 is particularly neat, justifying training with gradient steps at test time

0

131

11

70

16K

_kevinlu retweeted

Boyuan Chen

@BoyuanChen0

5 months ago

Introducing Large Video Planner (LVP-14B) — a robot foundation model that actually generalizes. LVP is built on video gen, not VLA. As my final work at @MIT, LVP has all its eval tasks proposed by third parties as a maximum stress test, but it excels!🤗 https://t.co/wjD54YFK3k

23

573

95

273

98K

6 months ago

in the past couple months of closed beta, Tinker has been used to solve Putnam, has powered our blog posts, and has been accelerating internal research! excited to see the innovation from making trillion-parameter RL broadly available -- Tinker is a dream for multi-agent setups, personalization, and continual adaptation

6 months ago

Tinker is now generally available. We also added support for advanced vision input models, Kimi K2 Thinking, and a simpler way to sample from models. https://t.co/nvaJHkGxc0

47

2K

171

668

1M

2

182

10

45

27K

_kevinlu retweeted

Muyu He

@HeMuyu0327

6 months ago

On-policy distillation would revolutionize multi-turn tool-use training beyond RL, but neither Tinker nor TRL which implements on-policy supports anything other than single-turn distillation. We therefore have taken this upon ourselves and implemented this feature in native Tinker. Specifically, with a trainable Tinker client, a model can now call a list of tools, interact with tool results for multiple turns, and return tokens, logprobs, and reward masks sufficient for a distillation training job (p1-2). The engineering we have achieved is to implement tool calling and parsing for Tinker models, which lies in @thinkymachines 's TODO list in their tinker_cookbook code (p3). Apart from that, we also create a dedicated inference stream that spins up robust, multi-turn tool loop that can run alongside a training job and sync the weights in real time. It becomes easy to write a simple training loop with KL loss to run on-policy distillation with tool use. This opens the door for a new domain of application in agentic LLM because small/medium models now have access to dense, on-policy rewards from a swarm of SOTA large models (deepseek, gpt-oss). We will next up begin our training runs and see how they compare with traditional RL/SFT on multi-turn tool use.

HeMuyu0327's tweet photo. On-policy distillation would revolutionize multi-turn tool-use training beyond RL, but neither Tinker nor TRL which implements on-policy supports anything other than single-turn distillation.

We therefore have taken this upon ourselves and implemented this feature in native Tinker.

Specifically, with a trainable Tinker client, a model can now call a list of tools, interact with tool results for multiple turns, and return tokens, logprobs, and reward masks sufficient for a distillation training job (p1-2).

The engineering we have achieved is to implement tool calling and parsing for Tinker models, which lies in @thinkymachines 's TODO list in their tinker_cookbook code (p3).

Apart from that, we also create a dedicated inference stream that spins up robust, multi-turn tool loop that can run alongside a training job and sync the weights in real time. It becomes easy to write a simple training loop with KL loss to run on-policy distillation with tool use.

This opens the door for a new domain of application in agentic LLM because small/medium models now have access to dense, on-policy rewards from a swarm of SOTA large models (deepseek, gpt-oss).

We will next up begin our training runs and see how they compare with traditional RL/SFT on multi-turn tool use.

11

279

35

236

20K

_kevinlu retweeted

Yangjun Ruan

@YangjunR

6 months ago

I’ll be attending #NeurIPS starting Wednesday as part of @thinkymachines! Feel free to DM me if you’d like to catch up, chat about research, or learn more about Thinky (we have openings!)🤝 https://t.co/IjUWdrtEJj

11

165

9

93

17K

_kevinlu retweeted

Astropulse

@RealAstropulse

6 months ago

Man being able to trick nano banana into making real pixels opens SO many doors

70

4K

123

2K

232K

_kevinlu retweeted

Soumith Chintala

@soumithchintala

7 months ago

thinking machines....the people are incredible

145

3K

74

219

803K

_kevinlu retweeted

7 months ago

Science is best shared! Tell us about what you’ve built or discovered with Tinker, so we can tell the world about it on our blog. More details at https://t.co/2z5U597QZ4

44

446

46

153

148K

_kevinlu retweeted

Carlos Miguel Patiño

@cmpatino_

7 months ago

We also replicate the "Distillation for personalization" results from @_kevinlu and @thinkymachines by improving the code performance of a model with SFT and then recovering it's IFEval scores with distillation.

cmpatino_'s tweet photo. We also replicate the "Distillation for personalization" results from @_kevinlu and @thinkymachines by improving the code performance of a model with SFT and then recovering it's IFEval scores with distillation. https://t.co/8tsoK2biRB

1

11

3

1

2K

7 months ago

thanks to multi-tenancy and the incredible engineering effort of the team, tinker is now both a joy to use, and super cheap! hope to see you try it out 🙂

7 months ago

Starting Monday, November 3rd, Tinker is switching to a pricing plan that reflects compute usage. This will ensure we have sufficient capacity to clear our waitlist by the end of the year, allowing anyone to sign up and start Tinkering. https://t.co/RGEEBj4VVo

7

349

13

192

260K

2

84

1

11

16K

_kevinlu retweeted

Li Dong @donglixp

7 months ago

On-policy + Reverse KLD = MiniLLM (https://t.co/MSlVNWGclo). Really nice blog by @thinkymachines. Exciting to see it being offered as a service!

donglixp's tweet photo. On-policy + Reverse KLD = MiniLLM (https://t.co/MSlVNWGclo). Really nice blog by @thinkymachines. Exciting to see it being offered as a service! https://t.co/1ifaucn21r

1

161

25

77

20K

_kevinlu retweeted

7 months ago

We just added 4 new models to Tinker from the gpt-oss and DeepSeek-V3.1 families. Sign up for the waitlist: https://t.co/CAsOcUduwR

20

540

36

121

439K

7 months ago

thanks! i think that's probably more of a feature -- you are basically distilling uncertainty / the teacher's value function into the student in this case. insofar as the signal afterwards is less informative, i think @WendaXu2 @agarwl_ & co have some interesting work: https://t.co/dCDO7m0kiZ

0

7

0

1

393

7 months ago

in our new post, we walk through great prior work from @agarwl_ & the @Alibaba_Qwen team exploring on-policy distillation using an open source recipe: you can run our experiments on Tinker today! https://t.co/7nkW8YgT7K i'm especially excited by the use of on-policy distillation to enable new "test-time training" personalization methods, allow the model to learn new domain knowledge without regressing on post-training capabilities