A matter of great pride to meet the honorable @PMOIndia to present @soketlabs and @iitgn dream on futuristic #AI which will be built in India & for the world. Lots of insights by PM to what to focus on, what to avoid and how to make AI impactful. 2026 is India’s AI year. 1/2
Another really interesting paper from my 2025 bookmarked papers: On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models (https://t.co/UjhiJW643U).
In short, RL is most effective when applied to data that is neither too close to nor too far from the pre-training distribution.
If the data is too in-distribution, RL adds little beyond supervised training. If it is too far out-of-distribution, RL struggles because the model lacks the necessary priors.
This has been known before, but it's nice to see it formalized with data and figures to reference.
New research from Google: "The Illusion of Deep Learning Architecture".
For those following research on continual learning, you may want to bookmark this one.
Instead of stacking more layers, what if we give neural networks more levels of learning?
The default approach to building more capable AI systems today remains adding depth. More layers, more parameters, more pre-training data. This design philosophy has driven progress from CNNs to Transformers to LLMs.
But there's a ceiling that's often not discussed. Current models suffer from what the authors call "computational anterograde amnesia." Their knowledge is frozen after pre-training. They can't continually learn.
They can't acquire new skills beyond what fits in their immediate context window.
This new research introduces Nested Learning (NL), a paradigm that reframes ML models as interconnected systems of multi-level optimization problems, each with its own "context flow" and update frequency.
Optimizers and architectures are fundamentally the same thing. Both are associative memories that compress their own context. Adam and SGD are memory modules that compress gradients. Transformers are memory modules that compress tokens. Pre-training itself is just in-context learning where the context is the entire training dataset.
Why does this work matter?
NL adds a new design axis beyond depth and width. Instead of deeper networks, you build systems with more levels of nested optimization, each updating at different frequencies. This mirrors how the human brain works, where gamma waves (30-150 Hz) handle sensory information while theta waves (0.5-8 Hz) handle memory consolidation.
Building on this framework, the researchers present Hope, an architecture combining self-modifying memory with a continuum memory system that replaces the traditional "long-term/short-term" memory dichotomy with a spectrum of update frequencies.
The results:
> Hope achieves 100% accuracy on needle-in-a-haystack tasks up to 16K context, where Transformers score 79.8%.
> On BABILong, Hope maintains performance at 10M context length, where GPT-4 fails around 128K.
> In continual learning, Hope outperforms in-context learning, EWC, and external-learner methods on class-incremental classification.
> On language modeling at 1.3B parameters, Hope achieves 14.39 perplexity on WikiText versus 17.92 for Transformer++.
Instead of asking "how do we make networks deeper," NL asks "how do we give networks more levels of learning." The path to continual learning may not be bigger models but models that learn at multiple timescales simultaneously.
Paper: https://t.co/ArKfAZUCLu
Learn to build with AI agents in our academy: https://t.co/zQXQt0PMbG
Very interesting work emerging from the #Eka Project, where we aim to discover the optimal task mixtures in resource-constrained environments, leading to 2-3x reduction in training costs.
Paper link: https://t.co/DJo8e4TAuB
#PritamKadasi , @upperwal@iitgn@soketlabs
Interesting audit of Hugging Face models (on sentiment analysis): Popularity doesn’t equal performance. Many authors exaggerate results, and documentation is often sparse.
https://t.co/mUqE8lLJ71
@gvrkiran I was exactly thinking in this direction, thought of writing an position paper, but don't know how to proceed further, thought to take this up with my advisor, but yeah, here we have this paper. Thanks for sharing.
Seems like Llama 4’s reputation is maybe irreparably tarnished by having a separate unreleased model that was overfit to LMArena.
Actual model is good, but shows again how crucial messaging and details are.
We evaluate several frontier models on PaperBench, finding that the best-performing tested agent, Claude 3.5 Sonnet (New) with open-source scaffolding, achieves an average replication score of 21.0%. Finally, we recruit top ML PhDs to attempt a subset of PaperBench, finding that models do not yet outperform the human baseline.
Thrilled to announce that our paper "Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation" has been accepted at @icwsm ! We examined 500+ models on @HuggingFace to understand what makes AI models popular and how documentation affects adoption. #ICWSM.
Thrilled to announce that our paper "Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation" has been accepted at @icwsm ! We examined 500+ models on @HuggingFace to understand what makes AI models popular and how documentation affects adoption. #ICWSM.
Ratan Tata ’59, B.Arch. ’62, the university’s most generous international donor and one of India's most respected business leaders and philanthropists, passed Oct. 9. We will remember his legacy of transformative giving to Cornell.
https://t.co/0v0zYb6aGl
"I'm in a cheap hotel in California which doesn't have a good internet or phone connection. I was going to have an MRI scan today but I'll have to cancel that!"
- New physics laureate Geoffrey Hinton speaking at today’s press conference where his #NobelPrize was announced.
“I'd also like to acknowledge my students (…) they've gone on to do many great things.
I'm particularly proud of the fact that one of my students fired Sam Altman.“
😳🫡
Marathi is India’s pride.
Congratulations on this phenomenal language being accorded the status of a Classical Language. This honour acknowledges the rich cultural contribution of Marathi in our nation’s history. Marathi has always been a cornerstone of Indian heritage.
I am sure with the status of a Classical Language, many more people will be motivated to learn it.