How can we use small LLMs to shift more AI workloads onto our laptops and phones?
In our paper and open-source code, we pair on-device LLMs (@ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost while maintaining 97.9% of the accuracy.
See Gru and the Minions in action below, 🔉on please (h/t @cartesia)!
Introducing DeepSeek-R1 optimizations for Blackwell, delivering 25x more revenue at 20x lower cost per token, compared with NVIDIA H100 just four weeks ago.
Fueled by TensorRT DeepSeek optimizations for our Blackwell architecture, including FP4 performance with state-of-the-art production accuracy, it scored 99.8% of FP8 on MMLU general intelligence benchmark.
FP4-optimized DeepSeek checkpoint now available on @huggingface: https://t.co/NxLukbCESw
🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference!
Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection
💡 With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs—without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning.
📖 For more details, check out our paper here: https://t.co/HJiqzwnUV7
@tomcocobrico@LightningAI That's great @tomcocobrico. Feel free to join your Discord if you need help on anything and you can publish to the Templates Gallery too https://t.co/gSmrHvhOx8 ;)
A while ago I complained here about persistent storage in Google Colab.
Have been using @LightningAI Studios for a while now for:
- Full VSCode (incl. GH Copilot)
- Persisted files shared across notebooks
- Multi-GPU/node (!!)
It's been great. Feels like a remote ML workstation
@bhimrazyadav@LightningAI Hey @bhimrazyadav. That's great ! BTW, do you know about LitData: https://t.co/0JLt5J6xGr. This is the library we built to make data processing on @LightningAI fast and scalable.
@elitepax@LightningAI That's great to hear @elitepax. You can have a look at to our published Studios: https://t.co/0l9MqUVTZn. There is a ton to learn from there and 1-click away to get everything ready.
@AnindyadeepS@LightningAI@williamfalcon@lantiga Hey @AnindyadeepS, thanks !
Great timing ! I am working on the docs right now.
They should be available in the coming weeks !
Would you mind joining our Slack, you can reach out to me directly. My username is tchaton.
You can duplicate the Studio, you will get everything. The dependencies, the data, the code, etc... Finally, a benchmark you can reproduce yourself with a click!
We just finished benchmarking cloud data-loading libraries over Imagenet 1.2M:
- Lightning AI Streaming Dataset
- Webdataset
- MosaicML Streaming
Conclusion: Lightning AI is the fastest (up to 80%) 🚀
https://t.co/R4O7lH0c8i
Prepare a 1 trillion token dataset to train LLMs from scratch in under 4 hours instead of days with @LightningAI Studio!
Everything is included, the final datasets, the code, dependencies, etc...
Get started in seconds as no setup is needed.
https://t.co/uHeGx0qTK1