🚀 Introducing FlashQLA: high-performance linear attention kernels built on TileLang.
⚡ 2–3× forward speedup. 2× backward speedup.
💻 Purpose-built for agentic AI on your personal devices.
💡Key insights:
1. Gate-driven automatic intra-card CP.
2. Hardware-friendly algebraic reformulation.
3. TileLang fused warp-specialized kernels.
FlashQLA boosts SM utilization via automatic intra-device CP. The gains are especially pronounced for TP setups, small models, and long-context workloads.
Instead of fusing the entire GDN flow into a single kernel, we split it into two kernels optimized for CP and backward efficiency. At large batch sizes this incurs extra memory I/O overhead vs. a fully fused approach, but it delivers better real-world performance on edge devices and long-context workloads.
The backward pass was the hardest part: we built a 16-stage warp-specialized pipeline under extremely tight on-chip memory constraints, ultimately achieving 2×+ kernel-level speedups.
We hope this is useful to the community!🫶🫶
Learn more:
📖 Blog: https://t.co/HF6opiR4yf
💻 Code: https://t.co/G3oaf5L1AZ
« Gérard Depardieu fait la fierté de la France »
-E.Macron-
« S’il y a des sales connes on va les foutre dehors »
-B.Macron-
Par le couple qui avait fait des violences faites aux femmes une grand cause nationale.
Pour ceux qui ne le sauraient pas encore, l’État dissimule le vrai coût des retraites.
Dans certains ministères dont @education_gouv, c’est 25% du budget qui disparaît dans les retraites.
As slop floods the Internet and as humans start relying on generative AI more and more, it's inevitable that future models will be mostly trained on slop (except for verifiable reasoning tasks where the training will be done in sims). Culture will turn into slop remixed from slop remixed from slop
Magnifique dénouement en exploitant le relief parisien et Montmartre, quelle ambiance, quelle course. En gelant les temps en amont, le Tour est gagnant sur tous les tableaux, les Jeux sont faits et on est fier de l’héritage. 👏
Qu’on le veuille ou non, l’âge de départ finira par passer à 70 ans,
Puis un beau jour, pas si lointain, un politique aura enfin le courage d’avouer aux Français que notre système par répartition n’est plus viable au vu de la démographie du pays.
Investissez votre argent.
Quel que soit l'angle adopté par les journalistes, le sujet du montant des retraites risque de sortir de la bulle twittérienne. Et vu qu'on a tous des personnes âgées dans notre entourage, je vous ai préparé un petit guide :
Comment parler des retraites géantes aux boomers ? ⤵️
Nations should prepare for a world with very less necessary jobs, corporations with data centers and robots. They have to start to devise a modern tax / social benefits structure able to scale, because I guarantee it will have to.
This is one of the developers who made TypeScript 10x faster. And he just got laid off by Microsoft.
There will be countless stories like his - of engineers who went above and beyond, shipped game-changing features, improved dev experience for millions…..and still found themselves out of a job.
This week, @Microsoft laid off around 6,000 employees, roughly 3% of its global workforce.
Not because they underperformed. Not because they didn’t deliver.
But because AI just got “good enough” to justify replacing thousands overnight.
The takeaway?
A bitter reminder that no matter how hard you work or how much impact you create, companies will always do what’s best for the business.
So let’s do what’s best for us too.
Build your network. Keep your options open. Protect your peace.
Because loyalty is admirable, but so is self-preservation.
No one else is going to look out for your career like you will.