Mudith Jayasekara @mudithj - Twitter Profile

Pinned Tweet

9 months ago

Rewatching the greats in this video never gets old. Grateful to now be making our own dent in defining the next paradigm of AI. Working with the most thoughtful and passionate people I know and backed by incredible investors (from LocalGlobe @svennj @asharoraa , HuggingFace @Thom_Wolf, DeepMind, NHS, and others). Thank you to our customers who care enough to glimpse into what the future of language models looks like. Let's build🫡

Charlie O'Neill

@oneill_c

9 months ago

Today, we’re launching Parsed. We are incredibly lucky to live in a world where we stand on the shoulders of giants, first in science and now in AI. Our heroes have gotten us to this point, where we have brilliant general intelligence in our pocket. But this is a local minima. We now have an ecosystem of burgeoning tasks where each requires a different kind of intelligence, a different context, a whole host of implicit assumptions and latent knowledge and domain expertise that is very difficult to cram into a system prompt. The big labs want you renting their $50k/month amnesiac interns that forget everything between conversations. Generic behemoths that get quantised, versioned and deprecated behind the scenes, where the only element of control you have is your messy monolithic user prompt. We want people who need their own intelligence to be able to not only access it, but also control it. And whilst the big general models are unbelievably good chatbots and coding agents and purveyors of the world, specialisation of intelligence is required. Clinical scribes, marketing compliance agents, legal red-lining models, insurance policy recommenders, the list goes on. And so that’s what Parsed does: deploy your own frontier model that actually learns. We eval your specific task, build a custom evaluation harness, optimise a model just for you, and host it with continual learning. We bake all the context and knowledge of your task into the model itself, from your engineers to your domain experts to customer feedback, all in a tight SFT → RL loop, with useful interpretability made possible by the open-source ecosystem we build on top of. No more 2000-word prompts with seventeen "IMPORTANT: NEVER DO X" clauses. Your model gets better at YOUR job every single day; the amnesiac pseudo-gods have had their run. Your model, your data, your moat. Let's build 🫡

59

502

57

334

100K

6

7

1

2

3K

Mudith Jayasekara

@mudithj

2 days ago

@tenderizzation waterloo-pilled

1

2

0

482

Mudith Jayasekara

@mudithj

10 days ago

LAB is such a useful foundation to explore what model behaviours lead to frontier performance and also to hill climb on. We validated compaction as a strategy to improve legal agent intelligence. We explored naive natural language compaction and less lossy compaction in KV-cache space. This is a first step. There are many open research questions in addition to retrieval including improving legal reasoning and drafting that we are actively researching together. When we started @parsedlabs, semi-verifiable domains were where we specialized (and find v fun). Stoked to be working with @gabepereyra, @nikogrupen and the Harvey team to keep climbing.

Gabe Pereyra

@gabepereyra

10 days ago

https://t.co/VniSjWQbOI

5

90

21

125

74K

0

25

8

5

2K

Mudith Jayasekara

@mudithj

24 days ago

standing on the shoulder of giants

Tuhin Srivastava

@tuhinone

24 days ago

https://t.co/YPONx4IZSz

23

534

93

580

247K

0

15

1

477

Mudith Jayasekara

@mudithj

29 days ago

Being able to take gradient steps on 1T+ sized models at long sequence lengths isn’t trivial and all the open source libraries start to break down when pushed. Baseten Loops does the hard work to simplify the gradient update to a couple of lines of code. We want ML teams to not worry about the infra + training library, and spend their time looking at their data and reward shaping. Loops gives everyone the ability to do frontier RL robustly and then deploy using Baseten’s inference stack to make the model go brrrr and all the 9s of uptime. At Baseten research, we’re just getting started. Online RL and ultra long context training coming soon...

Raymond Cano

@vim_dzl

29 days ago

https://t.co/yXlylT1Zr2

2

56

8

39

40K

0

28

1

2K

Mudith Jayasekara

@mudithj

about 1 month ago

Supporting the labs to democratise intelligence

Tuhin Srivastava

@tuhinone

about 1 month ago

Model labs should spend their time pushing the frontier, not thinking about API keys, rate limits, metering, and billing. Today, we're launching Baseten Frontier Gateway: the fastest path from trained weights to a production, white-labeled API. https://t.co/1tmF8Xq9OE

5

63

8

24

11K

0

10

0

283

Mudith Jayasekara

@mudithj

about 1 month ago

Working on this with @harvey has really shown how thoughtful they are about embedding frontier legal reasoning into the models they serve. Lots of exciting work to come!

Gabe Pereyra

@gabepereyra

about 1 month ago

https://t.co/AWIhrxBD5c

28

373

52

534

683K

0

13

1

2

534

Mudith Jayasekara

@mudithj

about 2 months ago

@lachygroom @baseten we love to hear it

0

1

0

84

Mudith Jayasekara

@mudithj

about 2 months ago

@thealexker very the difference between human vs LLM-generated

1

0

164

Mudith Jayasekara

@mudithj

about 2 months ago

@clarejtbirch @TobinSouth I knew there was a reason ant hired him

0

65

mudithj retweeted

sshkhr

@sshkhr16

2 months ago

If your startup doesn't have a Tri Dao on your inference team you're ngmi

6

259

11

41

34K

Mudith Jayasekara

@mudithj

2 months ago

@oneill_c @part_harry_ on ya bike if you don’t

0

2

0

187

Mudith Jayasekara

@mudithj

2 months ago

So much of the alpha in post-training comes from figuring out what the right learning signal is to gradient update on. Yes, algorithmic improvements are exciting for researchers, but LOOK AT YOUR DATA, LOOK AT YOUR DATA, LOOK AT YOUR DATA (@part_harry_ @oneill_c) is always what we end up falling back to as the most important lift.

Leonard Tang

@leonardtang_

2 months ago

https://t.co/1b7biWdqAs

10

379

39

606

43K

1

4

1

3

1K

Mudith Jayasekara

@mudithj

2 months ago

Finding an intermediate memory layer between the full KV cache and lossy compression methods like natural language memory files is essential for real human work. The real human work that will be done by long horizon agentic workflows. This is some of the most exciting work we've done at Baseten yet, phase 2 and 3 coming soon.

Charlie O'Neill

@oneill_c

2 months ago

https://t.co/DpJGks81oW

11

211

31

280

78K

1

9

1