LLM Knowledge Bases
Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So:
Data ingest:
I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them.
IDE:
I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides).
Q&A:
Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale.
Output:
Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base.
Linting:
I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into.
Extra tools:
I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries.
Further explorations:
As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows.
TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
Bought a new Mac mini to properly tinker with claws over the weekend. The apple store person told me they are selling like hotcakes and everyone is confused :)
I'm definitely a bit sus'd to run OpenClaw specifically - giving my private data/keys to 400K lines of vibe coded monster that is being actively attacked at scale is not very appealing at all. Already seeing reports of exposed instances, RCE vulnerabilities, supply chain poisoning, malicious or compromised skills in the registry, it feels like a complete wild west and a security nightmare. But I do love the concept and I think that just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level.
Looking around, and given that the high level idea is clear, there are a lot of smaller Claws starting to pop out. For example, on a quick skim NanoClaw looks really interesting in that the core engine is ~4000 lines of code (fits into both my head and that of AI agents, so it feels manageable, auditable, flexible, etc.) and runs everything in containers by default. I also love their approach to configurability - it's not done via config files it's done via skills! For example, /add-telegram instructs your AI agent how to modify the actual code to integrate Telegram. I haven't come across this yet and it slightly blew my mind earlier today as a new, AI-enabled approach to preventing config mess and if-then-else monsters. Basically - the implied new meta is to write the most maximally forkable repo and then have skills that fork it into any desired more exotic configuration. Very cool.
Anyway there are many others - e.g. nanobot, zeroclaw, ironclaw, picoclaw (lol @ prefixes). There are also cloud-hosted alternatives but tbh I don't love these because it feels much harder to tinker with. In particular, local setup allows easy connection to home automation gadgets on the local network. And I don't know, there is something aesthetically pleasing about there being a physical device 'possessed' by a little ghost of a personal digital house elf.
Not 100% sure what my setup ends up looking like just yet but Claws are an awesome, exciting new layer of the AI stack.
A hill I’m willing to die on. When it comes to leadership, vulnerability is a strength, not a weakness.
Most leaders carry real insecurities or imposter syndrome. That shows up in subtle but damaging ways. They avoid admitting mistakes, resist asking for help, and stick with decisions long after it is obvious they were wrong, all because they fear looking weak or losing authority.
In reality, the opposite is true. Nothing builds credibility faster than a leader who can say they were wrong or that they need help. It sends a clear signal that truth matters more than ego and that learning matters more than the perception that you have all the answers.
That signal changes behavior. People stop posturing and start being honest. Problems surface earlier. Debate improves. Accountability becomes real instead of performative. Over time, that kind of environment compounds. The organization gets sharper, faster, and more resilient because the truth stops being negotiated and starts driving decisions.
A masterclass in how innovation democratizes advanced AI and changes the world. By allowing sophisticated models to run efficiently on low cost hardware, these clever engineering techniques move advanced intelligence from exclusive high end systems to everyday devices & people.
An IIT Delhi friend shared something shocking. When he graduated 95% of CS grads moved to the US. This year only 5% did. IITs don’t monopolize talent, but America’s comp adv has always been attracting the best. If many no longer want or can’t come, what does that mean for the US?
We just launched Voyage-context-3, a new embedding model that gives AI a full-document view while preserving chunk-level precision that offers better retrieval performance than leading alternatives.
When building AI that reads and reasons over documents (such as reports, contracts, or medical records), it’s critical to break those documents into smaller pieces, or “chunks,” while still maintaining an understanding of the big picture. Most systems today lose important context, or require complicated workarounds to stitch it back together.
https://t.co/OcxvTzfXah
It's been a wild ride through countless hardware iterations and architectural decisions. I've run everything on it, from simple hobby apps to full-blown #Kubernetes and recently AI models. It's been a decade of invaluable, hands-on learning! 🛠️ #DevOps#SRE#Infrastructure
My self-hosted private cloud is 10 years old today! 🎂☁️ What started as a hobby to host projects and explore cloud tech became the foundation of my entire career. #Homelab#CloudNative
"I use AI in a separate window. I don't enjoy Cursor or Windsurf, I can literally feel competence draining out of my fingers."
@dhh, the legendary programmer and creator of Ruby on Rails has the most beautiful and philosophical idea about what AI takes away from programmers.
Life is Lived in The Arena
https://t.co/im6iOtl2cU
@naval : Life is lived in the arena. You only learn by doing. And if you’re not doing, then all the learning you’re picking up is too general and too abstract. Then it truly is hallmark aphorisms. You don’t know what applies where and when. 🤯
Context Engineering Guide
I'm writing a detailed guide on context engineering for AI devs.
v1 is out now! (bookmark it)
I use a concrete deep research multi-agent example to show what context engineering involves.
Always the same story: Google builds an amazing *internal-only* tool that is better than anything else out there. Never turns it into an external product.
Microsoft meanwhile uses the same products they build for external devs.
And this is why Microsoft beats Google w dev tools
I’ve lived in India, Canada, the UK & Nigeria—and traveled the world—but the USA gave my family opportunities we couldn’t find anywhere else. It’s not perfect, but it’s uniquely generous to those who dream big and work hard. So grateful. 🇺🇸 #July4th
The future of software development is agent-native. At MongoDB, we’re already seeing big gains using Factory (powered by @VoyageAI) to accelerate dev workflows and automate tasks. This is just the beginning.
Louder for people in the back
I’m sick and tired of half-baked AI features released that make products worse and cannot be turned off
Gemini for Google Docs / Sheets is one of many examples (this reminds me daily of what a feature that should have not been released looks like)
There's a new paper circulating looking in detail at LMArena leaderboard: "The Leaderboard Illusion"
https://t.co/tVMrx68zwa
I first became a bit suspicious when at one point a while back, a Gemini model scored #1 way above the second best, but when I tried to switch for a few days it was worse than what I was used to. Conversely as an example, around the same time Claude 3.5 was a top tier model in my personal use but it ranked very low on the arena. I heard similar sentiments both online and in person. And there were a number of other relatively random models, often suspiciously small, with little to no real-world knowledge as far as I know, yet they ranked quite high too.
"When the data and the anecdotes disagree, the anecdotes are usually right." (Jeff Bezos on a recent pod, though I share the same experience personally). I think these teams have placed different amount of internal focus and decision making around LM Arena scores specifically. And unfortunately they are not getting better models overall but better LM Arena models, whatever that is. Possibly something with a lot of nested lists, bullet points and emoji.
It's quite likely that LM Arena (and LLM providers) can continue to iterate and improve within this paradigm, but in addition I also have a new candidate in mind to potentially join the ranks of "top tier eval". It is the @openrouter LLM rankings:
https://t.co/N1NCZyVCv3
Basically, OpenRouter allows people/companies to quickly switch APIs between LLM providers. All of them have real use cases (not toy problems or puzzles), they have their own private evals, and all of them have an incentive to get their choices right, so by choosing one LLM over another they are directly voting for some combo of capability+cost. I don't think OpenRouter is there just yet in both the quantity and diversity of use, but something of this kind I think has great potential to grow into a very nice, very difficult to game eval.
I'm now doing "queries" like this (that I could not do before):
"Has there been an unsual spike of signups the last 2 months?"
"What are suspicious-looking emails signing up recently? Any patterns?"
"Which domains have the most number of signups?"
"How many unclaimed promo codes are left?"
The revolutionary is not about how this can be done in the IDE TBH. It's that an LLM "layer" can be so easily added in front of my database!
I could answer all of those questions: by writing queries or even small batched processing... and for the effort, I would not bother.
But like this: it's fast! And I discovered a batch of what looks like fraudulent signups that would have gone undetected if it was not so easy to interact with my DB! (now working on cancelling out those codes!)
About the "Red Hat Middleware moving to IBM" subject, this was the best reading I have had in the last weeks. Thanks @nmcl ! You are an inspiration for everyone on the team. https://t.co/onSuaptOzB