Erica Yuen @ NeurIPS 2025 ☀️ @ericajiyuen - Twitter Profile

Erica Yuen @ NeurIPS 2025 ☀️ @ericajiyuen

7 months ago

@code_star I’m at NeurIPs as well!

1

0

59

Erica Yuen @ NeurIPS 2025 ☀️ @ericajiyuen

7 months ago

Come say hi at the Databricks booth tomorrow 9am - 1pm!

Dan Zhang @ ICML @DZhang50

7 months ago

look how they massacred my boy 😭

22

2K

20

85

173K

0

1

0

130

Erica Yuen @ NeurIPS 2025 ☀️ @ericajiyuen

7 months ago

Looking forward to catching some sun at #NeurIPS2025 this week! I’ll be at 2 workshops presenting this work at the poster sessions: - Continual and Compatible Foundation Model Updates (CCFM) - Evaluating the Evolving LLM Lifecycle: Benchmarks, Emergent Abilities, and Scaling

Jacob Portes

@JacobianNeuro

7 months ago

@ilyasut says the age of scaling is over - good thing we put this paper out in time! Many recent embedding models are finetuned versions of pretrained LLMs. We asked 🤓: How does retrieval performance scale with pretraining FLOPs? 📄 paper: https://t.co/xKbLeDr5aa

3

59

10

31

12K

0

3

0

214

ericajiyuen retweeted

Christina Farhat

@farhatchristina

almost 2 years ago

#NYFW FW24 @nvidia 🔋✨@tessybarton

234

14K

1K

2K

2M

Who to follow

Abhi Venigalla

@ml_hardware

Researcher @Databricks. Former @MosaicML, @CerebrasSystems. Addicted to all things compute.

Hanlin Tang

@hanlintang

cto for neural networks @Databricks. previously: cto/co-founder of @MosaicML, director of @intel AI lab, @NervanaSystems

Eric Hartford

@QuixiAI

We make AI models Dolphin and Samantha BTC 3ENBV6zdwyqieAXzZP2i3EjeZtVwEmAuo4 https://t.co/3ri2GbXrQB https://t.co/zH0F3pTjjY @dphnAI

ericajiyuen retweeted

Han

@HanchungLee

almost 2 years ago

vector search bm25 search

5

444

34

71

39K

ericajiyuen retweeted

Jacob Portes

@JacobianNeuro

over 2 years ago

At NeurIPS and want to learn some tips and tricks for speeding up LLM training? I'll be presenting our work on MosaicBERT, an encoder optimized for fast pretraining today, Tues. Dec 12 5:15-7:15 CST Come say hi! https://t.co/6zTLtkUtSR

2

27

6

2

3K

ericajiyuen retweeted

jasmine collins @jazco

over 2 years ago

at #neurips23?! stop by our booth (1009) in the expo hall to chat about the awesome diffusion models our team has been building!!

2

39

7

2

17K

Erica Yuen @ NeurIPS 2025 ☀️ @ericajiyuen

over 2 years ago

I’m at NeurIPs this week! Reach out if you want to catch up.

Databricks AI Research

@DbrxMosaicAI

over 2 years ago

It's the most wonderful time of the year...come see us and the @databricks team at #NeurIPS2023 for a week of talks, parties, and connection! First up: join our Expo Day talk at 10 AM tomorrow to learn more about optimizing and reasoning on #LLM inference.

DbrxMosaicAI's tweet photo. It's the most wonderful time of the year...come see us and the @databricks team at #NeurIPS2023 for a week of talks, parties, and connection! First up: join our Expo Day talk at 10 AM tomorrow to learn more about optimizing and reasoning on #LLM inference. https://t.co/JO1yLpFPN9

1

25

6

2

47K

0

11

0

1

725

Erica Yuen @ NeurIPS 2025 ☀️ @ericajiyuen

over 2 years ago

@PatronusAI There is so much work to be done in the LLM eval space in industry- and I’ve been impressed by the thoughtful work @PatronusAI has done. The UX of their platform, addressing a real industry need, and the meticulous curation of the EnterprisePII dataset. Well done!

0

2

0

79

ericajiyuen retweeted

Databricks AI Research

@DbrxMosaicAI

almost 3 years ago

📦 To evaluate the coding capabilities of LLMs, you need to execute the code. But what if the LLM spits out malicious code?😱 With MosaicML, you can now evaluate #LLMs on code gen benchmarks (eg. HumanEval) in an effortless, end-to-end secure framework. https://t.co/mDD4ic7msb

DbrxMosaicAI's tweet photo. 📦 To evaluate the coding capabilities of LLMs, you need to execute the code. But what if the LLM spits out malicious code?😱

With MosaicML, you can now evaluate #LLMs on code gen benchmarks (eg. HumanEval) in an effortless, end-to-end secure framework.

https://t.co/mDD4ic7msb https://t.co/bqcmX0qMBG

1

54

11

14

15K

ericajiyuen retweeted

PatronusAI

@PatronusAI

almost 3 years ago

security in 2023:

0

5

1

0

471

ericajiyuen retweeted

Jacob Portes

@JacobianNeuro

almost 3 years ago

If you're at #ICML 🌴on Saturday, make sure to check out the https://t.co/H7Rfblz4lM workshop on efficient training of LLMs! @abhi_venigalla and @jefrankle will be at our poster on optimized pretraining of MosaicBERT ⚡️🚄 📜workshop paper: https://t.co/MjHrqiyj6J 🧵

1

46

16

18

7K

ericajiyuen retweeted

Cameron R. Wolfe, Ph.D.

@cwolferesearch

about 3 years ago

The MPT suite of large language models (LLMs) by MosaicML has become incredibly popular. But, what makes these models so special? Although there are a variety of reasons for the popularity of MPT, I find these models to be especially useful due to a few unique components… Fully open-source. MPT models, including MPT-7B and MPT-30B, carry an Apache 2.0 license, meaning that they can be used commercially without any limitations. Plus, these models are accompanied by an entire open-source code repository for fine-tuning, evaluating, or even pre-training these models from scratch (see replies for more details). Given that pre-training a base LLM is the most prohibitive/expensive component of any LLM-based system, the MPT foundation series is a great starting point for building specialized LLMs that solve domain-specific problems. Fast inference. MPT models are based upon a typical, decoder-only transformer architecture. But, they make a few key modifications to this architecture, including: - Low precision layer norm - Flash Attention - ALiBi (instead of normal positional embeddings) Due to these modifications, MPT models perform inference very quickly (i.e., 1.5-2X faster than similarly-sized LLaMA models) with HuggingFace inference pipelines. Plus, MPT models are completely compatible with libraries like FasterTransformer, which could be used to further boost inference speed. Context length. Due to their use of ALiBi, MPT-7B and 30B are capable of handling large context windows and can even extrapolate to context lengths that are beyond data seen during training. To show this, MPT-7B is fine-tuned on data with a 64K token context length (derived from books3 corpus of fiction novels). Researchers at MosaicML found that this MPT-StoryWriter-7B model was capable of handling large context lengths and could even extrapolate further to context windows as large as 84K. They even ingested the entire Great Gatsby book and generated an epilogue! Performance. Finally, MPT models perform really well. MPT-7B achieves performance on-par with LLaMA-7B across a variety of standard benchmarks. MPT-30B lags slightly behind the performance of LLaMA-30B and Falcon-40B on text-based tasks, but it tends to perform better on programming tasks. Plus, MPT-30B seems to exceed the quality of GPT-3. Put simply, these base models are high-quality and serve as a great foundation for creating open-source alternatives to proprietary systems like ChatGPT or GPT-4.

cwolferesearch's tweet photo. The MPT suite of large language models (LLMs) by MosaicML has become incredibly popular. But, what makes these models so special? Although there are a variety of reasons for the popularity of MPT, I find these models to be especially useful due to a few unique components…

Fully open-source. MPT models, including MPT-7B and MPT-30B, carry an Apache 2.0 license, meaning that they can be used commercially without any limitations. Plus, these models are accompanied by an entire open-source code repository for fine-tuning, evaluating, or even pre-training these models from scratch (see replies for more details). Given that pre-training a base LLM is the most prohibitive/expensive component of any LLM-based system, the MPT foundation series is a great starting point for building specialized LLMs that solve domain-specific problems.

Fast inference. MPT models are based upon a typical, decoder-only transformer architecture. But, they make a few key modifications to this architecture, including:

- Low precision layer norm
- Flash Attention
- ALiBi (instead of normal positional embeddings)

Due to these modifications, MPT models perform inference very quickly (i.e., 1.5-2X faster than similarly-sized LLaMA models) with HuggingFace inference pipelines. Plus, MPT models are completely compatible with libraries like FasterTransformer, which could be used to further boost inference speed.

Context length. Due to their use of ALiBi, MPT-7B and 30B are capable of handling large context windows and can even extrapolate to context lengths that are beyond data seen during training. To show this, MPT-7B is fine-tuned on data with a 64K token context length (derived from books3 corpus of fiction novels). Researchers at MosaicML found that this MPT-StoryWriter-7B model was capable of handling large context lengths and could even extrapolate further to context windows as large as 84K. They even ingested the entire Great Gatsby book and generated an epilogue!

Performance. Finally, MPT models perform really well. MPT-7B achieves performance on-par with LLaMA-7B across a variety of standard benchmarks. MPT-30B lags slightly behind the performance of LLaMA-30B and Falcon-40B on text-based tasks, but it tends to perform better on programming tasks. Plus, MPT-30B seems to exceed the quality of GPT-3. Put simply, these base models are high-quality and serve as a great foundation for creating open-source alternatives to proprietary systems like ChatGPT or GPT-4.

8

454

87

373

124K

ericajiyuen retweeted

Ali Ghodsi

@alighodsi

about 3 years ago

Big news: we've agreed to acquire @MosaicML, a leading generative AI platform. I couldn’t be more excited to join forces once the deal closes. https://t.co/L4TyrruUEU

31

1K

197

112

482K

ericajiyuen retweeted

Michael Carbin

@mcarbin

about 3 years ago

Boom! We at @MosaicML plan to unite with an amazing group of colleagues at @Databricks! And don’t worry, still the same great @MosaicML taste: our brand, products, and mission remain. But, going bigger, much bigger. So watch out for more from a truly amazing team! Bravo team!

7

126

7

0

24K

ericajiyuen retweeted

Sam Havens @sam_havens

about 3 years ago

Working on the MPT-30B-Chat and Instruct models was incredibly exciting. The team, the software, and the hardware were all exceptional (FYI, H100s are _really_ fast)

3

69

5

4

9K

ericajiyuen retweeted

Databricks AI Research

@DbrxMosaicAI

about 3 years ago

🚨 A few months ago we announced that you can train Stable Diffusion from scratch for less than $125k using the MosaicML platform. A major price drop is coming...and we have the training run to back it up. Stay tuned for a major announcement this week!

DbrxMosaicAI's tweet photo. 🚨 A few months ago we announced that you can train Stable Diffusion from scratch for less than $125k using the MosaicML platform.

A major price drop is coming...and we have the training run to back it up. Stay tuned for a major announcement this week! https://t.co/wrqUNSrcqQ

0

70

9

7

28K

Erica Yuen @ NeurIPS 2025 ☀️ @ericajiyuen

over 3 years ago

@juliechoi 💜💜💜💜💜💜💜

0

2

0

63

Erica Yuen @ NeurIPS 2025 ☀️ @ericajiyuen

over 3 years ago

@calumbirdo Thanks for highlighting the graphic I made for the @MosaicML blog post about training GPT-3 quality models for <500k! I can’t wait to see what kind of graphs you add to your calculator for us visual learners. https://t.co/AJkxRmaSRs

1

11

2

1

299

Erica Yuen @ NeurIPS 2025 ☀️

@ericajiyuen

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users