KP @kexecv - Twitter Profile

Interestingly, despite pre-training on 19T tokens, the LFM2.5 230M and 350M base models underperform on benchmarks like ARC, CSQA, and HellaSwag compared to the similarly sized SmolLM series. However, they absolutely dominate in GSM8K. Overall, quite a solid model.

kexecv's tweet photo. Interestingly, despite pre-training on 19T tokens, the LFM2.5 230M and 350M base models underperform on benchmarks like ARC, CSQA, and HellaSwag compared to the similarly sized SmolLM series. However, they absolutely dominate in GSM8K. Overall, quite a solid model. https://t.co/Hj2X9sSaJq

Liquid AI

@liquidai

5 days ago

Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic tasks on phones, robots, home and network automation devices. > 230M parameters, built on the LFM2 architecture > Pre-trained on 19T tokens, with a 32K context extension > Post-trained with distillation from LFM2.5-350M > 213 tok/s decode speed on Galaxy S25 Ultra (CPU) > 42 tok/s on a Raspberry Pi 5 (CPU) > Competes with and often beats models more than twice its size on instruction following, data extraction, and tool use. > use it for large-scale data extraction pipelines or lightweight on-device agentic workloads. 🧵

liquidai's tweet photo. Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic tasks on phones, robots, home and network automation devices.

> 230M parameters, built on the LFM2 architecture
> Pre-trained on 19T tokens, with a 32K context extension
> Post-trained with distillation from LFM2.5-350M
> 213 tok/s decode speed on Galaxy S25 Ultra (CPU)
> 42 tok/s on a Raspberry Pi 5 (CPU)
> Competes with and often beats models more than twice its size on instruction following, data extraction, and tool use.
> use it for large-scale data extraction pipelines or lightweight on-device agentic workloads.

🧵

81

2K

194

1K

234K

1

3

0

1

186

Who to follow

Jay ツ 🇮🇳

@jay_yaml

DevOps | Security Analyst | @nullahm

Nee_Tech

@tech_nee

Security Researcher, SynAck RedTeam Member, Bug Bounty Hunter, Pentester, OSCP, CRTP, eWaptx2

2 days ago

@goyalaman03 Yes, because they used a custom metric with an older version of LightEval. While I use a slightly different one with a newer version of lighteval, although code is derived from their eval code only.

1

0

11

KP

@kexecv

2 days ago

@goyalaman03 Lighteval, I have probably mentioned this earlier.

1

0

25

KP

@kexecv

3 days ago

@varmology @SarvamAI 68k vocab was the older one, newer one which was used in their latest model inspired by Gemma's tokenizer with a 256k vocab size.

1

0

19

KP

@kexecv

5 days ago

@Shreeyash_0209 Ecosystem is actually good, even though alternative methods exist. It’s hard to leave it now 🫣

0

1

0

29

KP

@kexecv

5 days ago

@FrontiersMind Quite an interesting idea!

1

3

0

45

KP

@kexecv

7 days ago

Do you remember when you joined X? I do! #MyXAnniversary

0

1

0

57

KP

@kexecv

9 days ago

@neural_avb Very close to finding an efficient enough linear architecture for YALM2. Will be using using it with attention in hybrid design.

0

1

925

KP

@kexecv

11 days ago

@yxxshly Looks good, How many tokens used for training it?

1

0

60

kexecv retweeted

altra @catboosted

11 days ago

“Unemployed? No. I am an AI Researcher.”

5

80

6

7

9K

KP

@kexecv

13 days ago

@shantanugoel Yes, it’s easier than many people realise, Just need to adapt to Apple’s restrictions.

0

2

0

328

KP

@kexecv

14 days ago

@himanshustwts 2) why

0

2

0

138

KP

@kexecv

15 days ago

@kingnish24 Looks cool

0

1

0

73

KP

@kexecv

15 days ago

@upperwal Yes, the Gemma one has more coverage of CJK or Mandarin compared to the Sarvam one, which has more coverage of Indic languages.

1

0

55

KP

@kexecv

16 days ago

@original_ngv Then some have to wait till ages to get that lucky.

0

2

0

37

KP

@kexecv

17 days ago

@goyalaman03 This how the scene in India always been, definitely we need VC who is ready to take risk for sovereign tech specially deep tech. Something like YC for India but that not necessarily focuses on indian problem rather Global.

0

2

0

22

KP

@kexecv

17 days ago

Investment is necessary for computing, so the current bottleneck is investors. However, I believe they are not entirely wrong. Even if young people show intent to build something, many ventures aren’t profitable in the long run, and many investors do expect some return from the ventures, thus priority is always given to a business model which can work on the long term.

2

0

134

KP

@kexecv

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users