Arinjay Pathak - e/acc

@arinjay_pathak

PhDing #NLProc @YardiScAI @iitdelhi | Research Scientist @lcs2lab | Prev- CS bachelor @TIETofficial

India

Joined May 2021

396 Following

81 Followers

198 Posts

arinjay_pathak retweeted

JUMPERZ

@jumperz

2 months ago

karpathy is showing one of the simplest AI architectures that actually works.. dump research into a folder, let the model organise it into a wiki, ask questions, then file the answers back in. the real insight is the loop...every query makes the wiki better. it compounds.. now thats a second brain building itself. i think this is so good for agents if applied right instead of pulling from shared memory every session, they build a living knowledge base that stays. your coordinator is not just coordinating tasks anymore.. it is maintaining institutional knowledge so every execution adds something back to the base. the bigger implication is crazy tho. agents that own their own knowledge layer do not need infinite context windows, they need good file organisation and the ability to read their own indexes. way cheaper, way more scalable, and way more inspectable than stuffing everything into one giant prompt.

jumperz's tweet photo. karpathy is showing one of the simplest AI architectures that actually works..

dump research into a folder, let the model organise it into a wiki, ask questions, then file the answers back in.

the real insight is the loop...every query makes the wiki better. it compounds.. now thats a second brain building itself.

i think this is so good for agents if applied right

instead of pulling from shared memory every session, they build a living knowledge base that stays.

your coordinator is not just coordinating tasks anymore.. it is maintaining institutional knowledge so every execution adds something back to the base.

the bigger implication is crazy tho.

agents that own their own knowledge layer do not need infinite context windows, they need good file organisation and the ability to read their own indexes.

way cheaper, way more scalable, and way more inspectable than stuffing everything into one giant prompt.

139

722

15K

934K

arinjay_pathak retweeted

Jack Morris

@jxmnop

3 months ago

Learning to write kernels might be the highest-ROI activity for displaced SWEs: → prereq: reasonable engineering ablity → six to twelve months of study → millions of dollars, mark zuckerberg showing up at your house to hire you, etc. i wish this were an exaggeration

124K

Arinjay Pathak - e/acc @arinjay_pathak

3 months ago

@prajdabre My guess is, if you consider the model as a point in loss landscape, some weight matrix permutation would mean that model lies in a different low loss path/region despite same vocab, breaking the linear mode connectivity(LMC) pre-requisite for linear model merging to work.

678

arinjay_pathak retweeted

ayan sengupta

@ayans007

6 months ago

🔬 Parmanu (Hindi for Atom) is live. Parmanu is part of the Computational Social Systems (LCS2) @lcs2lab at IIT Delhi, led by Prof. Tanmoy Chakraborty @Tanmoy_Chak , and is our dedicated home for Efficient Large Language Models (LLMs) and Small Language Models (SLMs). We’re at a turning point in AI. The future won’t be defined by scaling alone - it will be shaped by efficiency, accessibility, and real-world deployability. Parmanu is our effort to push this efficiency-first vision forward. ✨🤖 🔗 Explore the project page: https://t.co/siVwaKY1GN Why Parmanu matters 🔥 • 📚 A centralized hub for our research, with papers accepted at ICLR, ICML, NeurIPS, TACL, ACL, and TMLR • 🛠️ Open access to tools, code, and artifacts spanning model compression, KV efficiency, PEFT, inference optimization, knowledge distillation, and model coordination • 🧠 A growing ecosystem focused on making strong language models smarter per parameter, not just larger What this means for the community For researchers 👩‍🔬👨‍🔬 A curated, evolving resource tied to top-tier venues Reproducible artifacts and principled problem formulations A shared space to advance efficiency-centric LLM research For practitioners 👩‍💻👨‍💻 Practical techniques to deploy LLMs under tight latency and memory budgets Faster paths from paper → production Tools that actually work under real deployment constraints What’s coming in 2026 🚀🔮 • 📊 Efficient LLM/SLM leaderboards • 🧪 Open-sourced efficient LLM artifacts • ⚙️ More tools for compression, distillation, and inference • 🤝 Deep integration with Hugging Face and other popular libraries If you’re excited about efficient, sustainable, and scalable AI, check out Parmanu, share feedback, and collaborate with us. The next wave of LLMs won’t just be bigger - they’ll be leaner, faster, and more impactful. 🌟 #EfficientLLMs #SLMs #ModelCompression #InferenceOptimization #KnowledgeDistillation #AIResearch #NLP #ICLR #ICML #NeurIPS #ACL #TACL #TMLR #IITDelhi #LCS2 #Parmanu

298

Who to follow

Felipe Oliveira

@FelipeOliverAI

I'm a Machine Learning Engineer who dreams to make the world safe and better through AI.

Arnaud Stiegler

@ArnaudStiegler

RL environments and reasoning at ReflectionAI

ngram

@k_nearest

Machine Learner, grokit scientist

arinjay_pathak retweeted

Mehdi (e/λ)

@BetterCallMedhi

7 months ago

this take is pure BS and misses how deep tech innovation actually works Ilya has a PhD in CS from Toronto under Geoff Hinton where he co-invented AlexNet & literally helped birth the modern DL revolution before founding OpenAI Adam has degrees in CS and Mathematics & built PyTorch during research internships at FAIR with some of the best systems researchers in the world the Cursor team are MIT grads who went through CSAIL & OpenAI’s accelerator before building their stack these aren’t people who just decided to do things and figured it out, they spent years building foundational knowledge in optimization theory & systems architecture & distributed computing before they had the domain expertise to even identify the right problems to solve the real insight is that credentials don’t matter but deep technical fluency absolutely does & that fluency comes from thousands of hours immersed in the mathematical foundations & implementation details whether that’s in a PhD program or grinding through papers and codebases on your own what separates great engineers from people who just ship code is understanding the loss landscape well enough to know when you’re stuck in a local minimum VS when you need to completely rethink your architecture you can’t build a novel neural architecture without understanding information theory & backpropagation from first principles & you can’t optimize distributed training without reasoning from the ground up about communication overhead & gradient synchronization YES the path doesn’t matter but the depth does & there’s no shortcut to internalizing how systems actually compose

100

490

478K

arinjay_pathak retweeted

Jack Morris

@jxmnop

10 months ago

i haven't heard it dicussed yet but AI basically killed hackathons. pretty much anything you could possibly make at a hackathon in 2019 can be built better and faster by AI in 2025

217

247

207K

arinjay_pathak retweeted

Fernando Cao

@thefernandocz

12 months ago

Patrick's learning method was deceptively simple: 1. Reverse-engineer everything obsessively 2. Question every assumption 3. Talk to insiders who built the system 4. Build rapid prototypes from first principles No fancy degrees. Just raw curiosity and relentless execution.

thefernandocz's tweet photo. Patrick's learning method was deceptively simple:

1. Reverse-engineer everything obsessively
2. Question every assumption
3. Talk to insiders who built the system
4. Build rapid prototypes from first principles

No fancy degrees.

Just raw curiosity and relentless execution. https://t.co/8HYNPNOipG

175

160K

arinjay_pathak retweeted

Tivadar Danka

@TivadarDanka

12 months ago

The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices. Encoding matrices as graphs is a cheat code, making complex behavior simple to study. Let me show you how!

TivadarDanka's tweet photo. The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices.

Encoding matrices as graphs is a cheat code, making complex behavior simple to study.

Let me show you how! https://t.co/9hanRcDVFl

114

11K

arinjay_pathak retweeted

Michael A. Arouet

@MichaelAArouet

about 1 year ago

Socialists are really a strange species, they are so detached from reality that they repeat the same mistakes over and over again. Will they ever get it?

MichaelAArouet's tweet photo. Socialists are really a strange species, they are so detached from reality that they repeat the same mistakes over and over again. Will they ever get it? https://t.co/vggwFh44Hn

114

380

215

96K

arinjay_pathak retweeted

Xin Eric Wang

@xwang_lk

about 1 year ago

It happened again. Reviewers asked about models released after the paper submission deadline, as a major weakness.

103

13K

arinjay_pathak retweeted

Andrew Ng

@AndrewYNg

about 1 year ago

Some people today are discouraging others from learning programming on the grounds AI will automate it. This advice will be seen as some of the worst career advice ever given. I disagree with the Turing Award and Nobel prize winner who wrote, “It is far more likely that the programming occupation will become extinct [...] than that it will become all-powerful. More and more, computers will program themselves.” Statements discouraging people from learning to code are harmful! In the 1960s, when programming moved from punchcards (where a programmer had to laboriously make holes in physical cards to write code character by character) to keyboards with terminals, programming became easier. And that made it a better time than before to begin programming. Yet it was in this era that Nobel laureate Herb Simon wrote the words quoted in the first paragraph. Today’s arguments not to learn to code continue to echo his comment. As coding becomes easier, more people should code, not fewer! Over the past few decades, as programming has moved from assembly language to higher-level languages like C, from desktop to cloud, from raw text editors to IDEs to AI assisted coding where sometimes one barely even looks at the generated code (which some coders recently started to call vibe coding), it is getting easier with each step. I wrote previously that I see tech-savvy people coordinating AI tools to move toward being 10x professionals — individuals who have 10 times the impact of the average person in their field. I am increasingly convinced that the best way for many people to accomplish this is not to be just consumers of AI applications, but to learn enough coding to use AI-assisted coding tools effectively. One question I’m asked most often is what someone should do who is worried about job displacement by AI. My answer is: Learn about AI and take control of it, because one of the most important skills in the future will be the ability to tell a computer exactly what you want, so it can do that for you. Coding (or getting AI to code for you) is a great way to do that. When I was working on the course Generative AI for Everyone and needed to generate AI artwork for the background images, I worked with a collaborator who had studied art history and knew the language of art. He prompted Midjourney with terminology based on the historical style, palette, artist inspiration and so on — using the language of art — to get the result he wanted. I didn’t know this language, and my paltry attempts at prompting could not deliver as effective a result. Similarly, scientists, analysts, marketers, recruiters, and people of a wide range of professions who understand the language of software through their knowledge of coding can tell an LLM or an AI-enabled IDE what they want much more precisely, and get much better results. As these tools are continuing to make coding easier, this is the best time yet to learn to code, to learn the language of software, and learn to make computers do exactly what you want them to do. [Original text: https://t.co/HdI3Jb9HmF ]

514

12K

arinjay_pathak retweeted

Bojan Tunguz

@tunguz

about 1 year ago

I’ve tried various “SOTA” AI tools with several old @Kaggle competitions. Just gave them the datasets and the description of the competition. All of them gave me the code that had to be fixed manually. The models they came up with were functional but EXTREMELY pedestrian and unimpressive. Far below what the auto ML tools were able to come up with a decade ago. I think that the gap between the high-end Data Science ML modeling and what AI can come up with is vastly larger than between what it can produce for the SWE tasks.

336

37K

arinjay_pathak retweeted

Christopher Manning

@chrmanning

over 1 year ago

Maybe vilifying China & Chinese people, and increasing visa hassles & FBI investigations aren’t smart moves for keeping the US lead in AI—given the number of young engineers trained in China vs the US? “Many Chinese students are not that interested in full-time jobs in the US.”

439

67K

arinjay_pathak retweeted

Teknium 🪽

@Teknium

over 1 year ago

This is the entire code needed to reproduce R1 lol Hundreds of Billions of Dollars Later

398

18K

13K

arinjay_pathak retweeted

wordgrammer

@wordgrammer

over 1 year ago

Okay. Thanks for the nerd snipe guys. I spent the day learning exactly how DeepSeek trained at 1/30 the price, instead of working on my pitch deck. The tl;dr to everything, according to their papers:

350

23K

17K

arinjay_pathak retweeted

Armen Aghajanyan

@ArmenAgha

over 1 year ago

There is an unprecedented level of cope around DeepSeek, and very little signal on X around R1. I recommend unfollowing anyone spreading conspiracy theories around R1/DeepSeek in general. (1/9)

121

473