Pritam Kadasi @prit_sk - Twitter Profile

prit_sk retweeted

Math Files

@Math_files

22 days ago

64

17K

976

915

444K

prit_sk retweeted

Mayank @mayank_iitgn

6 months ago

A matter of great pride to meet the honorable @PMOIndia to present @soketlabs and @iitgn dream on futuristic #AI which will be built in India & for the world. Lots of insights by PM to what to focus on, what to avoid and how to make AI impactful. 2026 is India’s AI year. 1/2

1

22

10

0

1K

prit_sk retweeted

Sebastian Raschka

@rasbt

6 months ago

Another really interesting paper from my 2025 bookmarked papers: On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models (https://t.co/UjhiJW643U). In short, RL is most effective when applied to data that is neither too close to nor too far from the pre-training distribution. If the data is too in-distribution, RL adds little beyond supervised training. If it is too far out-of-distribution, RL struggles because the model lacks the necessary priors. This has been known before, but it's nice to see it formalized with data and figures to reference.

rasbt's tweet photo. Another really interesting paper from my 2025 bookmarked papers: On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models (https://t.co/UjhiJW643U).

In short, RL is most effective when applied to data that is neither too close to nor too far from the pre-training distribution.

If the data is too in-distribution, RL adds little beyond supervised training. If it is too far out-of-distribution, RL struggles because the model lacks the necessary priors.

This has been known before, but it's nice to see it formalized with data and figures to reference.

21

912

140

753

49K

prit_sk retweeted

DAIR.AI

@dair_ai

7 months ago

New research from Google: "The Illusion of Deep Learning Architecture". For those following research on continual learning, you may want to bookmark this one. Instead of stacking more layers, what if we give neural networks more levels of learning? The default approach to building more capable AI systems today remains adding depth. More layers, more parameters, more pre-training data. This design philosophy has driven progress from CNNs to Transformers to LLMs. But there's a ceiling that's often not discussed. Current models suffer from what the authors call "computational anterograde amnesia." Their knowledge is frozen after pre-training. They can't continually learn. They can't acquire new skills beyond what fits in their immediate context window. This new research introduces Nested Learning (NL), a paradigm that reframes ML models as interconnected systems of multi-level optimization problems, each with its own "context flow" and update frequency. Optimizers and architectures are fundamentally the same thing. Both are associative memories that compress their own context. Adam and SGD are memory modules that compress gradients. Transformers are memory modules that compress tokens. Pre-training itself is just in-context learning where the context is the entire training dataset. Why does this work matter? NL adds a new design axis beyond depth and width. Instead of deeper networks, you build systems with more levels of nested optimization, each updating at different frequencies. This mirrors how the human brain works, where gamma waves (30-150 Hz) handle sensory information while theta waves (0.5-8 Hz) handle memory consolidation. Building on this framework, the researchers present Hope, an architecture combining self-modifying memory with a continuum memory system that replaces the traditional "long-term/short-term" memory dichotomy with a spectrum of update frequencies. The results: > Hope achieves 100% accuracy on needle-in-a-haystack tasks up to 16K context, where Transformers score 79.8%. > On BABILong, Hope maintains performance at 10M context length, where GPT-4 fails around 128K. > In continual learning, Hope outperforms in-context learning, EWC, and external-learner methods on class-incremental classification. > On language modeling at 1.3B parameters, Hope achieves 14.39 perplexity on WikiText versus 17.92 for Transformer++. Instead of asking "how do we make networks deeper," NL asks "how do we give networks more levels of learning." The path to continual learning may not be bigger models but models that learn at multiple timescales simultaneously. Paper: https://t.co/ArKfAZUCLu Learn to build with AI agents in our academy: https://t.co/zQXQt0PMbG

dair_ai's tweet photo. New research from Google: "The Illusion of Deep Learning Architecture".

For those following research on continual learning, you may want to bookmark this one.

Instead of stacking more layers, what if we give neural networks more levels of learning?

The default approach to building more capable AI systems today remains adding depth. More layers, more parameters, more pre-training data. This design philosophy has driven progress from CNNs to Transformers to LLMs.

But there's a ceiling that's often not discussed. Current models suffer from what the authors call "computational anterograde amnesia." Their knowledge is frozen after pre-training. They can't continually learn.

They can't acquire new skills beyond what fits in their immediate context window.

This new research introduces Nested Learning (NL), a paradigm that reframes ML models as interconnected systems of multi-level optimization problems, each with its own "context flow" and update frequency.

Optimizers and architectures are fundamentally the same thing. Both are associative memories that compress their own context. Adam and SGD are memory modules that compress gradients. Transformers are memory modules that compress tokens. Pre-training itself is just in-context learning where the context is the entire training dataset.

Why does this work matter?

NL adds a new design axis beyond depth and width. Instead of deeper networks, you build systems with more levels of nested optimization, each updating at different frequencies. This mirrors how the human brain works, where gamma waves (30-150 Hz) handle sensory information while theta waves (0.5-8 Hz) handle memory consolidation.

Building on this framework, the researchers present Hope, an architecture combining self-modifying memory with a continuum memory system that replaces the traditional "long-term/short-term" memory dichotomy with a spectrum of update frequencies.

The results:

> Hope achieves 100% accuracy on needle-in-a-haystack tasks up to 16K context, where Transformers score 79.8%.
> On BABILong, Hope maintains performance at 10M context length, where GPT-4 fails around 128K.
> In continual learning, Hope outperforms in-context learning, EWC, and external-learner methods on class-incremental classification.
> On language modeling at 1.3B parameters, Hope achieves 14.39 perplexity on WikiText versus 17.92 for Transformer++.

Instead of asking "how do we make networks deeper," NL asks "how do we give networks more levels of learning." The path to continual learning may not be bigger models but models that learn at multiple timescales simultaneously.

Paper: https://t.co/ArKfAZUCLu
Learn to build with AI agents in our academy: https://t.co/zQXQt0PMbG

24

585

123

595

40K

Who to follow

Siddharth Jaiswal

@siddsjaiswal

PhD @IITKgp | Here only for the trending news.

Himanshu Beniwal

@HimanshuBeniwaI

Visiting PhD Intern at @uvadatascience, @uva 🇺🇸! 📊 PhD Student at @lingoiitgn, @iitgn, 🇮🇳. #NLP #ML #AI Alum: @cup_bathinda & @officialhnbgu

Punyajoy Saha

@punyajoysaha

Cheif Engineer, Samsung Research Institute Bangalore | PhD@CNeRG |Ex-Intern CLAWS group @GeorgiaTech ,LT group @unihh . NLP | Safety | Social Good.

prit_sk retweeted

Mayank @mayank_iitgn

7 months ago

Very interesting work emerging from the #Eka Project, where we aim to discover the optimal task mixtures in resource-constrained environments, leading to 2-3x reduction in training costs. Paper link: https://t.co/DJo8e4TAuB #PritamKadasi , @upperwal @iitgn @soketlabs

0

12

2

1

1K

prit_sk retweeted

Kiran Garimella @gvrkiran

about 1 year ago

Interesting audit of Hugging Face models (on sentiment analysis): Popularity doesn’t equal performance. Many authors exaggerate results, and documentation is often sparse. https://t.co/mUqE8lLJ71

gvrkiran's tweet photo. Interesting audit of Hugging Face models (on sentiment analysis): Popularity doesn’t equal performance. Many authors exaggerate results, and documentation is often sparse.

https://t.co/mUqE8lLJ71 https://t.co/shPQ1vnt1I

0

10

2

4

596

prit_sk retweeted

hardmaru

@hardmaru

about 1 year ago

We should host more top ML conferences (ICLR, ICML, NeurIPS) in Asia

34

678

58

30

86K

Pritam Kadasi @prit_sk

about 1 year ago

@gvrkiran I was exactly thinking in this direction, thought of writing an position paper, but don't know how to proceed further, thought to take this up with my advisor, but yeah, here we have this paper. Thanks for sharing.

prit_sk's tweet photo. @gvrkiran I was exactly thinking in this direction, thought of writing an position paper, but don't know how to proceed further, thought to take this up with my advisor, but yeah, here we have this paper. Thanks for sharing. https://t.co/eqeCEnS5gy

0

1

0

133

prit_sk retweeted

Kelsey Piper

@KelseyTuoc

about 1 year ago

61

27K

2K

958

2M

prit_sk retweeted

Nathan Lambert

@natolambert

about 1 year ago

Seems like Llama 4’s reputation is maybe irreparably tarnished by having a separate unreleased model that was overfit to LMArena. Actual model is good, but shows again how crucial messaging and details are.

16

398

13

82

46K

prit_sk retweeted

OpenAI

@OpenAI

over 1 year ago

We evaluate several frontier models on PaperBench, finding that the best-performing tested agent, Claude 3.5 Sonnet (New) with open-source scaffolding, achieves an average replication score of 21.0%. Finally, we recruit top ML PhDs to attempt a subset of PaperBench, finding that models do not yet outperform the human baseline.

31

897

57

139

251K

Pritam Kadasi @prit_sk

over 1 year ago

Paper Link: https://t.co/5NNJltmK7u

Lingo IITGN @lingoiitgn

over 1 year ago

Thrilled to announce that our paper "Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation" has been accepted at @icwsm ! We examined 500+ models on @HuggingFace to understand what makes AI models popular and how documentation affects adoption. #ICWSM.

lingoiitgn's tweet photo. Thrilled to announce that our paper "Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation" has been accepted at @icwsm ! We examined 500+ models on @HuggingFace to understand what makes AI models popular and how documentation affects adoption. #ICWSM. https://t.co/dMn0OgW7Am

2

18

6

1K

0

3

0

320

prit_sk retweeted

Lingo IITGN @lingoiitgn

over 1 year ago

Thrilled to announce that our paper "Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation" has been accepted at @icwsm ! We examined 500+ models on @HuggingFace to understand what makes AI models popular and how documentation affects adoption. #ICWSM.

2

18

6

1K

prit_sk retweeted

Cornell University

@Cornell

over 1 year ago

Ratan Tata ’59, B.Arch. ’62, the university’s most generous international donor and one of India's most respected business leaders and philanthropists, passed Oct. 9. We will remember his legacy of transformative giving to Cornell. https://t.co/0v0zYb6aGl

72

18K

2K

403

602K

prit_sk retweeted

Peyman Milanfar

@docmilanfar

over 1 year ago

advisor helping new PhD student write their 1st paper

4

312

16

60

25K

prit_sk retweeted

2000s

@PopCulture2000s

over 1 year ago

23 years ago, linkin park released ‘in the end’ https://t.co/7p5A00g03g

3K

374K

63K

31K

85M

prit_sk retweeted

The Nobel Prize

@NobelPrize

over 1 year ago

"I'm in a cheap hotel in California which doesn't have a good internet or phone connection. I was going to have an MRI scan today but I'll have to cancel that!" - New physics laureate Geoffrey Hinton speaking at today’s press conference where his #NobelPrize was announced.

NobelPrize's tweet photo. "I'm in a cheap hotel in California which doesn't have a good internet or phone connection. I was going to have an MRI scan today but I'll have to cancel that!"

- New physics laureate Geoffrey Hinton speaking at today’s press conference where his #NobelPrize was announced. https://t.co/i7jnucEhFl

119

9K

1K

525

942K

prit_sk retweeted

Jonathan Mannhart 🔎🔸 @JMannhart

over 1 year ago

“I'd also like to acknowledge my students (…) they've gone on to do many great things. I'm particularly proud of the fact that one of my students fired Sam Altman.“ 😳🫡

90

8K

662

2K

835K

prit_sk retweeted

MatLab crashes

@memecrashes

over 1 year ago

#NobelPrize2024

57

9K

1K

463

521K

prit_sk retweeted

Narendra Modi

@narendramodi

almost 2 years ago

Marathi is India’s pride. Congratulations on this phenomenal language being accorded the status of a Classical Language. This honour acknowledges the rich cultural contribution of Marathi in our nation’s history. Marathi has always been a cornerstone of Indian heritage. I am sure with the status of a Classical Language, many more people will be motivated to learn it.

1K

69K

9K

990

5M

Pritam Kadasi

@prit_sk

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users