Embeddings power every modern LLM. But what do they actually learn?
This Berkeley (BAIR) paper is one of the clearest reads on how AI systems learn and why embeddings really work.
https://t.co/qj10TMZjnp
// Survey on Multi-Agent Systems //
The paper traces the landscape from classical paradigms (consensus, distributed control, swarm intelligence, cooperative learning) to foundation-model-enabled MAS (LLM-based planning, role specialization, task decomposition, multi-modal coordination).
It highlights the hard open problems that neither camp has solved on its own: scalability in heterogeneous systems, alignment across agent collectives, efficient knowledge transfer, and real-time adaptation.
Paper: https://t.co/FhDcRjmBUO
Learn to build effective AI agents in our academy: https://t.co/LRnpZN7deE
LLM Knowledge Bases
Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So:
Data ingest:
I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them.
IDE:
I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides).
Q&A:
Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale.
Output:
Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base.
Linting:
I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into.
Extra tools:
I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries.
Further explorations:
As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows.
TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
Don't think of LLMs as entities but as simulators. For example, when exploring a topic, don't ask:
"What do you think about xyz"?
There is no "you". Next time try:
"What would be a good group of people to explore xyz? What would they say?"
The LLM can channel/simulate many perspectives but it hasn't "thought about" xyz for a while and over time and formed its own opinions in the way we're used to. If you force it via the use of "you", it will give you something by adopting a personality embedding vector implied by the statistics of its finetuning data and then simulate that. It's fine to do, but there is a lot less mystique to it than I find people naively attribute to "asking an AI".
The hot topic at #ICCV2025 was World Models.
They come in different flavors — (interactive) video models, neural simulators, reconstruction models, etc. — but the overarching goal is clear: Generative AI that predict and simulate how the real world works.
Evolution of Deep Learning by Hand ✍️ As my tribute to Geoff Hinton's Nobel Prize, I drew this animation to illustrate the key idea behind Hinton's major contributions to deep learning over the years, with artistic liberty.
----
100% original, made by hand ✍️
Join 40k readers of my newsletter: https://t.co/fFt8roc8D9
< Choosing a Vision Backbone >
your model’s backbone is its perspective
pick ResNet, and it sees in edges
pick a ViT, and it sees in patches
the backbone decides how your model thinks
here are some of the most practical backbones and when you should choose them, from the paper "Battle of the Backbones" (2023):
> ResNet - good for fast prototyping, small models, and edge devices
> ConvNeXt - great all-purpose backbone; strong for detection & segmentation
> Swin Transformer (V2) - best for large-scale detection, segmentation, and high-res inputs
> ViT (Vision Transformer) - good when you have huge datasets; less bias, more global context
> CLIP - best for vision-language, zero-shot, and retrieval tasks
> DINO / MoCo / MAE (SSL) - great when you have little or no labeled data
> MiDaS - surprisingly strong if you care about depth, geometry, or robotics perception
> Stable Diffusion Encoder - useful for creative or aesthetic tasks; not for accuracy-critical CV
> EfficientNet / RegNet / ResNet-18 - good lightweight options for edge or mobile deployment
In 2016 Geoffrey Hinton said “we should stop training radiologists now" since AI would soon be better at their jobs.
He was right: models have outperformed radiologists on benchmarks for ~a decade.
Yet radiology jobs are at record highs, with an average salary of $520k.
Why?
Can AI help understand how the brain learns to see the world?
Our latest study, led by @JRaugel from FAIR at @AIatMeta and @ENS_ULM, is now out!
📄 https://t.co/y2Y3GP3bI5
🧵 A thread:
“AI Engineering is just software engineering with AI models thrown into the stack”
- @abacaj in the book AI Engineering by @chipro
Totally agree - and it’s why learning how to integrate LLMs is such a big win for devs!
Training of a 2 → 100 → 2 → 5 fully connected ReLU neural net via cross-entropy minimisation.
• it starts outputting small embeddings
• around epoch 300 learns an identity function
• takes 1700 epochs more to unwind the data manifold