Interestingly, despite pre-training on 19T tokens, the LFM2.5 230M and 350M base models underperform on benchmarks like ARC, CSQA, and HellaSwag compared to the similarly sized SmolLM series. However, they absolutely dominate in GSM8K. Overall, quite a solid model.
Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic tasks on phones, robots, home and network automation devices.
> 230M parameters, built on the LFM2 architecture
> Pre-trained on 19T tokens, with a 32K context extension
> Post-trained with distillation from LFM2.5-350M
> 213 tok/s decode speed on Galaxy S25 Ultra (CPU)
> 42 tok/s on a Raspberry Pi 5 (CPU)
> Competes with and often beats models more than twice its size on instruction following, data extraction, and tool use.
> use it for large-scale data extraction pipelines or lightweight on-device agentic workloads.
๐งต
@goyalaman03 Yes, because they used a custom metric with an older version of LightEval. While I use a slightly different one with a newer version of lighteval, although code is derived from their eval code only.
@goyalaman03 This how the scene in India always been, definitely we need VC who is ready to take risk for sovereign tech specially deep tech. Something like YC for India but that not necessarily focuses on indian problem rather Global.
Investment is necessary for computing, so the current bottleneck is investors. However, I believe they are not entirely wrong. Even if young people show intent to build something, many ventures arenโt profitable in the long run, and many investors do expect some return from the ventures, thus priority is always given to a business model which can work on the long term.