Kelly Marchisio @NeurIPS @cheeesio - Twitter Profile

Kelly Marchisio @NeurIPS @cheeesio

2 months ago

sidenote, this comic is 19 years old. How... how did that happen.

0

100

Kelly Marchisio @NeurIPS @cheeesio

2 months ago

Omg. First day *really* agentic coding. This me rn 🤺⚔️ https://t.co/W50byjkN4t

1

5

0

1

217

cheeesio retweeted

Slator

@slatornews

3 months ago

👉 https://t.co/6CY6PLvLI7 While reasoning-enabled #LLMs are among the strongest 🔝 performers in #AI #translation benchmarks, a new study suggests that prompting them to explain their reasoning before translating can hurt translation quality. @rajaee_sara @mziizm @cheeesio @Cohere_Labs @UvA_Amsterdam @cohere #xl8 #t9n

0

1

321

Kelly Marchisio @NeurIPS @cheeesio

4 months ago

“What language is this text written in?” - a question so easy to pose, yet surprisingly difficult to answer!

EleutherAI @AiEleuther

4 months ago

Announcing our latest paper: CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data In collaboration with @CommonCrawl @MLCommons and @JohnsHopkins we worked with 80+ native speaker annotators to build a LID benchmark on actual Common Crawl text covering 109 languages. Existing evaluations overestimate how well LangID works on web data.

1

31

6

8

4K

0

8

1

1K

Who to follow

JHU CLSP

@jhuclsp

Center for Language and Speech Processing at @JohnsHopkins #NLProc #MachineLearning #AI https://t.co/6IXR5OSQtw @[email protected]

Yonatan Belinkov

@boknilev

Associate professor of computer science @TechnionLive; visiting scholar @KempnerInst 2025-2026.

Hanna Hajishirzi

@HannaHajishirzi

VP@Microsoft-AI; past: Olmo, Tulu

Kelly Marchisio @NeurIPS @cheeesio

4 months ago

@BlancheMinerva Wow this is awesome, and so needed!! A great contribution to the community!

0

1

0

18

cheeesio retweeted

Piotr Nawrot

@p_nawrot

4 months ago

🌟🚀Sparse Attention Models Can Get Sparser We've updated The Sparse Frontier—the largest empirical analysis of training-free sparse attention to date—from Qwen 2.5 to 3 model families, now including Llama 3.1 and Gemma 3. Key findings: 📊 Larger sparse models outperform smaller dense ones at equal compute cost. Only high-sparsity configs lie on the Pareto frontier for long sequences. 🔬 Already sparse? You can go sparser. Gemma 3 has 5/6 layers as Sliding Window Attention by design—yet additional sparsification of the remaining dense layers still yields efficiency gains at scale. 📈 Longer sequences tolerate higher sparsity. From 9 models × 6 methods × 9 tasks: fixed-budget methods in production are suboptimal. Token budget should grow sublinearly with context length. Co-authors: Robert Li, Renjie Huang, @seb_ruder, @cheeesio, @PontiEdoardo. Special shout-out to @faridlazuarda who updated our repo to vLLM v1 and made Gemma3 evaluations possible. Links in the comments ⬇️

p_nawrot's tweet photo. 🌟🚀Sparse Attention Models Can Get Sparser

We've updated The Sparse Frontier—the largest empirical analysis of training-free sparse attention to date—from Qwen 2.5 to 3 model families, now including Llama 3.1 and Gemma 3.

Key findings:

📊 Larger sparse models outperform smaller dense ones at equal compute cost. Only high-sparsity configs lie on the Pareto frontier for long sequences.

🔬 Already sparse? You can go sparser. Gemma 3 has 5/6 layers as Sliding Window Attention by design—yet additional sparsification of the remaining dense layers still yields efficiency gains at scale.

📈 Longer sequences tolerate higher sparsity. From 9 models × 6 methods × 9 tasks: fixed-budget methods in production are suboptimal. Token budget should grow sublinearly with context length.

Co-authors: Robert Li, Renjie Huang, @seb_ruder, @cheeesio, @PontiEdoardo.
Special shout-out to @faridlazuarda who updated our repo to vLLM v1 and made Gemma3 evaluations possible.

Links in the comments ⬇️

3

91

13

63

17K

Kelly Marchisio @NeurIPS @cheeesio

5 months ago

Our Multilingual Team at @cohere is hiring interns! If you are a current PhD student working in multilinguality and would like to work with our team, please apply below, reach out! 🌍🌏🌎 (The below says “Winter”, but we hire year-round) https://t.co/Uwhh7aE5LE

5

202

21

143

15K

Kelly Marchisio @NeurIPS @cheeesio

5 months ago

Three emails in the past ~week addressed to “Emma”. This can’t be a coincidence — hive-mind, what is this?

0

1

0

313

cheeesio retweeted

MT Group at FBK @fbk_mt

6 months ago

Our pick of the week by @dhairya_su47605: "How Does #Quantization Affect #Multilingual #LLMs?" by @cheeesio, @TheyCallMeMr_, Hongyu Chen, @d_aumiller, @ahmetustun89, @sarahookr, @seb_ruder (Findings EMNLP, 2024)

0

8

4

0

1K

cheeesio retweeted

Dhairya Suman @dhairya_su47605

6 months ago

Pick of the week @fbk_mt: How Does Quantization Affect Multilingual LLMs? Quantization has become a widely adopted technique for model compression. This work investigates the impact of quantization on different languages in multilingual LLMs. https://t.co/A82iW9Myek

0

5

1

0

1K

Kelly Marchisio @NeurIPS @cheeesio

6 months ago

Announcing - Moms Who ML! 🐣🍼 I landed in San Diego to a video call from my 15-month-old -- she licked the camera then put me in this bag of Mega Bloks👅 If you can relate, let's support one another! Search for the group on the #NeurIPS2025 app, and we'll expand after!

cheeesio's tweet photo. Announcing - Moms Who ML! 🐣🍼

I landed in San Diego to a video call from my 15-month-old -- she licked the camera then put me in this bag of Mega Bloks👅

If you can relate, let's support one another!

Search for the group on the #NeurIPS2025 app, and we'll expand after! https://t.co/uiYYMn0izu

0

14

0

935

Kelly Marchisio @NeurIPS @cheeesio

7 months ago

Looking forward to #neurips2025 next week! Come say hi at the @cohere booth!

2

31

0

13

4K

cheeesio retweeted

David Ifeoluwa Adelani 🇳🇬 @davlanade

7 months ago

First invited talk by Kelly Marchisio @cheeesio

0

4

1

452

cheeesio retweeted

Dwarak

@DwaraknathG

8 months ago

I am hiring highly skilled performance engineers for my team! You will be working on optimising pretraining for models >100B params on O(1000s) of GPUs, and hardware-aligned architecture design. We are cooking a lot of very exciting projects and I can safely say you will have a lot of fun! Link in thread. <3

14

454

44

239

67K

Kelly Marchisio @NeurIPS @cheeesio

8 months ago

Big thanks to @slatornews for hosting me at #SlatorCon Silicon Valley last month. I loved giving attendees an inside look into how SOTA multilingual LLMs are built 🤖🌍

Slator

@slatornews

8 months ago

Kelly Marchisio, Multilingual Team Lead at @cohere, shared an inside look at building 🧑‍💻 a #multilingual #LLM and advancing #AI #translation at #SlatorCon Silicon Valley 2025. #Cohere #LLMs #xl8 #t9n @cheeesio https://t.co/AonaWd7MQ0

0

3

0

1

686

0

1

0

502

Kelly Marchisio @NeurIPS @cheeesio

9 months ago

I had a great time speaking at #SlatorConSV25 today!

Slator

@slatornews

9 months ago

The future of multilingual AI 🚀 is here. At #SlatorConSV25, @cohere's @cheeesio explains how to build massively multilingual LLMs, from technical foundations to the current landscape and what comes next. #MultilingualAI #LLMs #CommandA

slatornews's tweet photo. The future of multilingual AI 🚀 is here. At #SlatorConSV25, @cohere's @cheeesio explains how to build massively multilingual LLMs, from technical foundations to the current landscape and what comes next.
#MultilingualAI #LLMs #CommandA https://t.co/buewQdrxpa

1

2

0

1

1K

0

16

0

1

881

Kelly Marchisio @NeurIPS @cheeesio

9 months ago

@taneemishere Cohere does hire in multimodality: https://t.co/DVNg39GzE1

1

0

50

Kelly Marchisio @NeurIPS @cheeesio

10 months ago

From the Multilingual Team kitchen straight to you 🧑‍🍳 Enjoy Command A Translate! (we're hiring!)

Cohere

@cohere

10 months ago

Introducing Command A Translate, our state-of-the-art model designed for high-quality translation tasks.

10

384

61

138

71K

1

63

10

8

5K

cheeesio retweeted

Matthias Gallé @mgalle

10 months ago

Where machine translation goes, AI goes. This has been true since 1954 (MT made front-page of NYT, 2 years before the Dartmouth workshop) We just released the best translation model ever

mgalle's tweet photo. Where machine translation goes, AI goes. This has been true since 1954 (MT made front-page of NYT, 2 years before the Dartmouth workshop)

We just released the best translation model ever https://t.co/zwXx2lxdfP

0

25

4

3

2K

Kelly Marchisio @NeurIPS

@cheeesio

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users