Dharmesh Kakadia @dharmeshkakadia - Twitter Profile

Pinned Tweet

Dharmesh Kakadia @dharmeshkakadia

4 months ago

"'Three general-purpose models ought to be enough for everybody." https://t.co/RsiFnB9N0F

0

4

2

357

Dharmesh Kakadia @dharmeshkakadia

5 days ago

@natolambert Congrats! Met recently @latkins and came away very impressed and optimistic about American Open AI! Great fit for both 👏

0

1

207

Dharmesh Kakadia @dharmeshkakadia

5 days ago

@GoingBallistic5 @Rewkang @standardbots yeah @evanbeard is great! R2D2 is very creative. Does that make the visit 32 BBY :P ?

0

1

0

99

Dharmesh Kakadia @dharmeshkakadia

5 days ago

@LusciousPear 💯 I think the biggest unlock is allowing rapid creation & deployment of robot models. Working on giving post training superpower to every robot @mixtrainai

0

240

Who to follow

lauren serota

@serota

critic of systems, builder of things that matter. fietster. @fungapbc @underarmbalm @appropriatedsgn

Nathan Wailes 🇺🇦🇹🇼

@NathanWailes

Perfect score on the LSAT, went to the same elite HS as geohot. Full-stack dev, indie hacker.

Anshumali

@Anshumali_

Professor in AI @Rice currently at Meta Superintelligence Labs. Founder: ThirdAI and xmad (Both Acquired)

Dharmesh Kakadia @dharmeshkakadia

5 days ago

Why do you need your own model? "We’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design"

Claude

@claudeai

6 days ago

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

5K

102K

14K

21K

51M

1

2

0

107

Dharmesh Kakadia @dharmeshkakadia

6 days ago

@evanbeard Congrats Evan and the entire @standardbots team!

0

1

0

119

Dharmesh Kakadia @dharmeshkakadia

12 days ago

@xxunhuang Congrats Xun and the entire team!

0

1

0

203

Dharmesh Kakadia @dharmeshkakadia

14 days ago

For some reason, lately I have to remind people a lot about https://t.co/W0baTFo75g

0

2

0

1

53

Dharmesh Kakadia @dharmeshkakadia

16 days ago

@dwarkesh_sp Anything to do with time estimation

0

53

Dharmesh Kakadia @dharmeshkakadia

16 days ago

@dwarkesh_sp Writing.

0

32

Dharmesh Kakadia @dharmeshkakadia

18 days ago

@rronak_ @MichaelElabd @QuantumArjun Congrats on the launch!

2

0

150

Dharmesh Kakadia @dharmeshkakadia

29 days ago

@_arohan_ I am gonna steal the Big token phrase.

0

3

0

375

Dharmesh Kakadia @dharmeshkakadia

29 days ago

@bernhardsson I think it makes sense if they consider owning the entire stack is worth as muscle in the long run (and not just about bps). They have done things in programming language, compiler etc for the same reason I believe.

0

918

Dharmesh Kakadia @dharmeshkakadia

29 days ago

@lukas_m_ziegler Post training platform for robotics, so more and more robotics companies can ship task specific models for every use case x embodiment, while still leveraging the power & economics of generalized learning in large VLA/WAM/VAM.

0

127

Dharmesh Kakadia @dharmeshkakadia

about 1 month ago

@_advaitpatel @Mind_Robotics 🚀

0

1

0

79

Dharmesh Kakadia @dharmeshkakadia

about 1 month ago

@eatpraydiehard @madiator This is largely because post training *platforms* aren’t wide spread yet. With right platform partners, post training is not lot more complicated then harnessing around a single model.

Dharmesh Kakadia @dharmeshkakadia

about 1 month ago

@mixtrainai You can't rely on any labs post training offering, because it's not their core focus. You need platform that is independent/agnostic of a model, and brings you frontier grade infra to your problem domains (custom kernels, data curation, eval infra,..)

0

322

0

1

0

51

Dharmesh Kakadia @dharmeshkakadia

about 1 month ago

@alexstauffer_ Hey Alex, would love to chat about post training stuff :)

0

52

Dharmesh Kakadia @dharmeshkakadia

about 1 month ago

@mark_k @OpenAI Your post training strategy should be model provider agnostic. Partner with @mixtrainai to get frontier grade platform for your custom models https://t.co/EavaavgaTY

Dharmesh Kakadia @dharmeshkakadia

about 1 month ago

@mixtrainai You can't rely on any labs post training offering, because it's not their core focus. You need platform that is independent/agnostic of a model, and brings you frontier grade infra to your problem domains (custom kernels, data curation, eval infra,..)

0

322

0

1

0

1

218

Dharmesh Kakadia @dharmeshkakadia

about 1 month ago

@mixtrainai You can't rely on any labs post training offering, because it's not their core focus. You need platform that is independent/agnostic of a model, and brings you frontier grade infra to your problem domains (custom kernels, data curation, eval infra,..)

0

322

Dharmesh Kakadia @dharmeshkakadia

about 1 month ago

If you were looking for a platform (and company) dedicated to post training, we at @mixtrainai are fully focused on helping you build the best model for your task+constraint+evaluation...

Mark Kretschmann

@mark_k

about 1 month ago

OpenAI has announced they will be winding down fine tuning. I got the email today. Existing active @OpenAI customers can keep running fine-tuning jobs until January 6, 2027, but after that no new training jobs can be created. Existing fine-tuned models will still run, but only until the underlying base model is eventually deprecated. I get the argument that newer models follow instructions much better, and that prompts plus RAG cover more use cases than before. But not all of them.

mark_k's tweet photo. OpenAI has announced they will be winding down fine tuning. I got the email today. Existing active @OpenAI customers can keep running fine-tuning jobs until January 6, 2027, but after that no new training jobs can be created. Existing fine-tuned models will still run, but only until the underlying base model is eventually deprecated.

I get the argument that newer models follow instructions much better, and that prompts plus RAG cover more use cases than before. But not all of them.

26

176

18

57

93K

1

0

198

Dharmesh Kakadia @dharmeshkakadia

about 1 month ago

@willbitsky @oyhsu @JacobZietek 3. Task specific models optimized both for inference compute and latency 4. Efficient training env + kernels when co training video/world models with diffusion heads. 5. Data infra for curation and evaluation for physical AI

0

1

0

43

Dharmesh Kakadia @dharmeshkakadia

about 1 month ago

@willbitsky @oyhsu @JacobZietek Agree with all 10, and it's great to see more than 5 are bottlenecked on "engineering/deployment" and not on research any more. Few more to add: 1. Latent vs pixel space prediction 2. Long horizon context management between autoregressive backbone and diffusion heads

1

2

0

70

Dharmesh Kakadia

@dharmeshkakadia

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users