๐ฌ Parmanu (Hindi for Atom) is live.
Parmanu is part of the Computational Social Systems (LCS2) @lcs2lab at IIT Delhi, led by Prof. Tanmoy Chakraborty @Tanmoy_Chak , and is our dedicated home for Efficient Large Language Models (LLMs) and Small Language Models (SLMs).
Weโre at a turning point in AI. The future wonโt be defined by scaling alone - it will be shaped by efficiency, accessibility, and real-world deployability. Parmanu is our effort to push this efficiency-first vision forward. โจ๐ค
๐ Explore the project page: https://t.co/siVwaKY1GN
Why Parmanu matters ๐ฅ
โข ๐ A centralized hub for our research, with papers accepted at ICLR, ICML, NeurIPS, TACL, ACL, and TMLR
โข ๐ ๏ธ Open access to tools, code, and artifacts spanning model compression, KV efficiency, PEFT, inference optimization, knowledge distillation, and model coordination
โข ๐ง A growing ecosystem focused on making strong language models smarter per parameter, not just larger
What this means for the community
For researchers ๐ฉโ๐ฌ๐จโ๐ฌ
A curated, evolving resource tied to top-tier venues
Reproducible artifacts and principled problem formulations
A shared space to advance efficiency-centric LLM research
For practitioners ๐ฉโ๐ป๐จโ๐ป
Practical techniques to deploy LLMs under tight latency and memory budgets
Faster paths from paper โ production
Tools that actually work under real deployment constraints
Whatโs coming in 2026 ๐๐ฎ
โข ๐ Efficient LLM/SLM leaderboards
โข ๐งช Open-sourced efficient LLM artifacts
โข โ๏ธ More tools for compression, distillation, and inference
โข ๐ค Deep integration with Hugging Face and other popular libraries
If youโre excited about efficient, sustainable, and scalable AI, check out Parmanu, share feedback, and collaborate with us. The next wave of LLMs wonโt just be bigger - theyโll be leaner, faster, and more impactful. ๐
#EfficientLLMs #SLMs #ModelCompression #InferenceOptimization #KnowledgeDistillation #AIResearch #NLP #ICLR #ICML #NeurIPS #ACL #TACL #TMLR #IITDelhi #LCS2 #Parmanu
Three 2026 papers (AOPD, SOD, OPSD) are patching on-policy distillation one failure mode at a time.
Our 2023 work, MPDistil (ICLR 2024 poster), already had structural answers to all four.
A short article on the convergence. @Tanmoy_Chak@lcs2lab
@ysu_ChatData@Tanmoy_Chak We introduce Global Eviction Ratio (GER), which tracks whether answer-critical tokens remain reachable across attention heads. GER spikes before benchmark accuracy drops and strongly correlates with the hallucination safety cliff.
@ysu_ChatData@Tanmoy_Chak We observe stronger routing rigidity with global head-wise pruning methods (e.g., AdaKV), which increase head-level consensus. Chunk-based pruning (e.g., FINCH) tends to preserve more routing diversity and flexibility.
๐ New Podcast Alert ๐๏ธ
Prof. Tanmoy Chakraborty @Tanmoy_Chak joins Rudraditya on The Inner Circle With Rudraditya podcast for a sharp, no-hype conversation on the realities, risks, and future of AI. #AI#MachineLearning#SLM#AIGovernance#NLProc
https://t.co/KSUw5bYoge
@MeatigoOfficial didnโt realize you guys are into fraud, my order Mea41Vmm72034 is showing delivered, but the rider called and said he canโt deliver it, as he is far from the location and he wants me to pick it up. Apparently he is now returning home and deliver it tomorrow. Wow!!
New paper on test-time scaling! We analyze 30 billion tokens from 8 LLMs (32B to 235B) and 4 reasoning datasets and propose a practical recipe for effective scaling of inference-time compute.
๐ #NeurIPS25 Sneak Peek! ๐
Thrilled to present a KV cache compression method for LLMs, now a part of @nvidia KVPress library.
๐ Value-Guided KV Compression for LLMs via Approximated CUR Decomposition
๐ฅ @ayans007, @codetalker07, @Tanmoy_Chak
๐ Arxiv: https://t.co/tJZdqicXHq
Excited to see LoRA back in the spotlight! LoRA โโ and PEFT methods in general โ have been key for efficient train-time adaptation of foundation models. In our recent TACL paper, we introduced a new PEFT method that outperforms LoRA across a range of NLP and math tasks!
Excited to share that our recent paper on value-guided key-value compression for LLMs got accepted at NeurIPS 2025! Our paper is motivated by a simple observation - value-guided KV compression can achieve lower post-eviction loss. Preprint available - https://t.co/kuSrstj6G1. @Tanmoy_Chak
๐๐ฎ๐ซ ๐ฐ๐จ๐ซ๐ค ๐จ๐ง ๐๐ ๐๐๐๐ก๐ย ๐๐จ๐ฆ๐ฉ๐ซ๐๐ฌ๐ฌ๐ข๐จ๐ง ๐๐จ๐ซ ๐๐๐๐ฌ, ๐๐๐๐๐ฉ๐ญ๐๐ ๐ข๐ง #NeurIPS2025 .
Our constant attempt to design small models continues -- This time, we focus on ๐๐ ๐๐๐๐ก๐ ๐๐จ๐ฆ๐ฉ๐ซ๐๐ฌ๐ฌ๐ข๐จ๐ง for LLMs. Many SOTA approaches rely on "attention scores" to decide which tokens to evict -- assuming that higher scores indicate greater importance. While effective, this overlooks the role of "value vectors" in shaping the final attention outputs.
๐ก To address this, we introduce ๐๐ฎ๐ซ๐๐๐ย -- a novel method inspired by the classic CUR decomposition from matrix sketching theory. Instead of focusing only on keys, CurDKV selects the most important keys and values together using leverage scores.
๐ Highlights:
- Outperforms leading baselines (including SnapKV and adaptive variants) across a range of compression ratios.
- Achieves up to 40% reduction in generation latency while maintaining strong model quality.
- Grounded in elegant low-rank matrix approximation theory, yet highly practical for modern LLMs.
๐ Preprint: https://t.co/jdqmoyC0ws
@lcs2lab@iitdelhi
Just as I advocate forย ๐ฅ๐ฐ๐ธ๐ฏ๐ด๐ค๐ข๐ญ๐ช๐ฏ๐จ ๐๐๐๐ด, I also call for ๐๐จ๐ฐ๐ง๐ฌ๐๐๐ฅ๐ข๐ง๐ ๐๐ ๐๐จ๐ง๐๐๐ซ๐๐ง๐๐๐ฌ.
Growing to 20k+ submissions is not a success metric -- it incentivizes negative science, overwhelms the community, and dilutes quality.
๐๐ข๐ ๐ ๐๐ซ ๐ข๐ฌ๐งโ๐ญ ๐๐ฅ๐ฐ๐๐ฒ๐ฌ ๐๐๐ญ๐ญ๐๐ซ.
There is no pride in saying: โWe received 20k+ papers -- 3x more than last year -- look how popular our venue is!โ Popularity does not equal progress.
Requesting big leaders to think about it and put more restrictions on the paper submissions.
@RealAAAI@emnlpmeeting@NeurosamaAI@icmlconf@iclr_conf@CVPR@IJCAIconf@aclmeeting
@rosinality In 2023 we showed that indeed it is possible to design Transformers with constant Lipschitz bound with suitable activation functions and parameterization. Lipschitz continuity helps in efficiency as well as interpretability. https://t.co/wWJ0ysnwga
@AradhyeAgarwal This is predominantly due to >1 activation factor of the non-linear activation functions (ReLU, ELU has activation factor 1, most of the other functions have >1). As we pile up more layers, the activation factor increases linearly. We explored this in https://t.co/wWJ0ysmYqC
Big news!
Our paper "Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of LLMs" has been accepted to TACL โ a top-tier ACL-sponsored journal (Impact Factor > 9)! ๐
๐ Paper: https://t.co/iv03ftyu10
๐ง Code: https://t.co/SuPQqc2t3S
๐งตThread below ๐