karpathy is showing one of the simplest AI architectures that actually works..
dump research into a folder, let the model organise it into a wiki, ask questions, then file the answers back in.
the real insight is the loop...every query makes the wiki better. it compounds.. now thats a second brain building itself.
i think this is so good for agents if applied right
instead of pulling from shared memory every session, they build a living knowledge base that stays.
your coordinator is not just coordinating tasks anymore.. it is maintaining institutional knowledge so every execution adds something back to the base.
the bigger implication is crazy tho.
agents that own their own knowledge layer do not need infinite context windows, they need good file organisation and the ability to read their own indexes.
way cheaper, way more scalable, and way more inspectable than stuffing everything into one giant prompt.
Learning to write kernels might be the highest-ROI activity for displaced SWEs:
→ prereq: reasonable engineering ablity
→ six to twelve months of study
→ millions of dollars, mark zuckerberg showing up at your house to hire you, etc.
i wish this were an exaggeration
@prajdabre My guess is, if you consider the model as a point in loss landscape, some weight matrix permutation would mean that model lies in a different low loss path/region despite same vocab, breaking the linear mode connectivity(LMC) pre-requisite for linear model merging to work.
🔬 Parmanu (Hindi for Atom) is live.
Parmanu is part of the Computational Social Systems (LCS2) @lcs2lab at IIT Delhi, led by Prof. Tanmoy Chakraborty @Tanmoy_Chak , and is our dedicated home for Efficient Large Language Models (LLMs) and Small Language Models (SLMs).
We’re at a turning point in AI. The future won’t be defined by scaling alone - it will be shaped by efficiency, accessibility, and real-world deployability. Parmanu is our effort to push this efficiency-first vision forward. ✨🤖
🔗 Explore the project page: https://t.co/siVwaKY1GN
Why Parmanu matters 🔥
• 📚 A centralized hub for our research, with papers accepted at ICLR, ICML, NeurIPS, TACL, ACL, and TMLR
• 🛠️ Open access to tools, code, and artifacts spanning model compression, KV efficiency, PEFT, inference optimization, knowledge distillation, and model coordination
• 🧠 A growing ecosystem focused on making strong language models smarter per parameter, not just larger
What this means for the community
For researchers 👩🔬👨🔬
A curated, evolving resource tied to top-tier venues
Reproducible artifacts and principled problem formulations
A shared space to advance efficiency-centric LLM research
For practitioners 👩💻👨💻
Practical techniques to deploy LLMs under tight latency and memory budgets
Faster paths from paper → production
Tools that actually work under real deployment constraints
What’s coming in 2026 🚀🔮
• 📊 Efficient LLM/SLM leaderboards
• 🧪 Open-sourced efficient LLM artifacts
• ⚙️ More tools for compression, distillation, and inference
• 🤝 Deep integration with Hugging Face and other popular libraries
If you’re excited about efficient, sustainable, and scalable AI, check out Parmanu, share feedback, and collaborate with us. The next wave of LLMs won’t just be bigger - they’ll be leaner, faster, and more impactful. 🌟
#EfficientLLMs #SLMs #ModelCompression #InferenceOptimization #KnowledgeDistillation #AIResearch #NLP #ICLR #ICML #NeurIPS #ACL #TACL #TMLR #IITDelhi #LCS2 #Parmanu
this take is pure BS and misses how deep tech innovation actually works
Ilya has a PhD in CS from Toronto under Geoff Hinton where he co-invented AlexNet & literally helped birth the modern DL revolution before founding OpenAI
Adam has degrees in CS and Mathematics & built PyTorch during research internships at FAIR with some of the best systems researchers in the world
the Cursor team are MIT grads who went through CSAIL & OpenAI’s accelerator before building their stack
these aren’t people who just decided to do things and figured it out, they spent years building foundational knowledge in optimization theory & systems architecture & distributed computing before they had the domain expertise to even identify the right problems to solve
the real insight is that credentials don’t matter but deep technical fluency absolutely does & that fluency comes from thousands of hours immersed in the mathematical foundations & implementation details whether that’s in a PhD program or grinding through papers and codebases on your own
what separates great engineers from people who just ship code is understanding the loss landscape well enough to know when you’re stuck in a local minimum VS when you need to completely rethink your architecture
you can’t build a novel neural architecture without understanding information theory & backpropagation from first principles
&
you can’t optimize distributed training without reasoning from the ground up about communication overhead & gradient synchronization
YES the path doesn’t matter but the depth does & there’s no shortcut to internalizing how systems actually compose
i haven't heard it dicussed yet but AI basically killed hackathons. pretty much anything you could possibly make at a hackathon in 2019 can be built better and faster by AI in 2025
Patrick's learning method was deceptively simple:
1. Reverse-engineer everything obsessively
2. Question every assumption
3. Talk to insiders who built the system
4. Build rapid prototypes from first principles
No fancy degrees.
Just raw curiosity and relentless execution.
The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices.
Encoding matrices as graphs is a cheat code, making complex behavior simple to study.
Let me show you how!
Socialists are really a strange species, they are so detached from reality that they repeat the same mistakes over and over again. Will they ever get it?
Some people today are discouraging others from learning programming on the grounds AI will automate it. This advice will be seen as some of the worst career advice ever given. I disagree with the Turing Award and Nobel prize winner who wrote, “It is far more likely that the programming occupation will become extinct [...] than that it will become all-powerful. More and more, computers will program themselves.” Statements discouraging people from learning to code are harmful!
In the 1960s, when programming moved from punchcards (where a programmer had to laboriously make holes in physical cards to write code character by character) to keyboards with terminals, programming became easier. And that made it a better time than before to begin programming. Yet it was in this era that Nobel laureate Herb Simon wrote the words quoted in the first paragraph. Today’s arguments not to learn to code continue to echo his comment.
As coding becomes easier, more people should code, not fewer!
Over the past few decades, as programming has moved from assembly language to higher-level languages like C, from desktop to cloud, from raw text editors to IDEs to AI assisted coding where sometimes one barely even looks at the generated code (which some coders recently started to call vibe coding), it is getting easier with each step.
I wrote previously that I see tech-savvy people coordinating AI tools to move toward being 10x professionals — individuals who have 10 times the impact of the average person in their field. I am increasingly convinced that the best way for many people to accomplish this is not to be just consumers of AI applications, but to learn enough coding to use AI-assisted coding tools effectively.
One question I’m asked most often is what someone should do who is worried about job displacement by AI. My answer is: Learn about AI and take control of it, because one of the most important skills in the future will be the ability to tell a computer exactly what you want, so it can do that for you. Coding (or getting AI to code for you) is a great way to do that.
When I was working on the course Generative AI for Everyone and needed to generate AI artwork for the background images, I worked with a collaborator who had studied art history and knew the language of art. He prompted Midjourney with terminology based on the historical style, palette, artist inspiration and so on — using the language of art — to get the result he wanted. I didn’t know this language, and my paltry attempts at prompting could not deliver as effective a result.
Similarly, scientists, analysts, marketers, recruiters, and people of a wide range of professions who understand the language of software through their knowledge of coding can tell an LLM or an AI-enabled IDE what they want much more precisely, and get much better results. As these tools are continuing to make coding easier, this is the best time yet to learn to code, to learn the language of software, and learn to make computers do exactly what you want them to do.
[Original text: https://t.co/HdI3Jb9HmF ]
I’ve tried various “SOTA” AI tools with several old @Kaggle competitions. Just gave them the datasets and the description of the competition. All of them gave me the code that had to be fixed manually. The models they came up with were functional but EXTREMELY pedestrian and unimpressive. Far below what the auto ML tools were able to come up with a decade ago. I think that the gap between the high-end Data Science ML modeling and what AI can come up with is vastly larger than between what it can produce for the SWE tasks.
Maybe vilifying China & Chinese people, and increasing visa hassles & FBI investigations aren’t smart moves for keeping the US lead in AI—given the number of young engineers trained in China vs the US?
“Many Chinese students are not that interested in full-time jobs in the US.”
Okay. Thanks for the nerd snipe guys. I spent the day learning exactly how DeepSeek trained at 1/30 the price, instead of working on my pitch deck. The tl;dr to everything, according to their papers:
There is an unprecedented level of cope around DeepSeek, and very little signal on X around R1. I recommend unfollowing anyone spreading conspiracy theories around R1/DeepSeek in general. (1/9)