Wojtek (WojTech) Trocki

2 months ago

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

59K

107K

21M

typeapi retweeted

3 months ago

Bought a new Mac mini to properly tinker with claws over the weekend. The apple store person told me they are selling like hotcakes and everyone is confused :) I'm definitely a bit sus'd to run OpenClaw specifically - giving my private data/keys to 400K lines of vibe coded monster that is being actively attacked at scale is not very appealing at all. Already seeing reports of exposed instances, RCE vulnerabilities, supply chain poisoning, malicious or compromised skills in the registry, it feels like a complete wild west and a security nightmare. But I do love the concept and I think that just like LLM agents were a new layer on top of LLMs, Claws are now a new layer on top of LLM agents, taking the orchestration, scheduling, context, tool calls and a kind of persistence to a next level. Looking around, and given that the high level idea is clear, there are a lot of smaller Claws starting to pop out. For example, on a quick skim NanoClaw looks really interesting in that the core engine is ~4000 lines of code (fits into both my head and that of AI agents, so it feels manageable, auditable, flexible, etc.) and runs everything in containers by default. I also love their approach to configurability - it's not done via config files it's done via skills! For example, /add-telegram instructs your AI agent how to modify the actual code to integrate Telegram. I haven't come across this yet and it slightly blew my mind earlier today as a new, AI-enabled approach to preventing config mess and if-then-else monsters. Basically - the implied new meta is to write the most maximally forkable repo and then have skills that fork it into any desired more exotic configuration. Very cool. Anyway there are many others - e.g. nanobot, zeroclaw, ironclaw, picoclaw (lol @ prefixes). There are also cloud-hosted alternatives but tbh I don't love these because it feels much harder to tinker with. In particular, local setup allows easy connection to home automation gadgets on the local network. And I don't know, there is something aesthetically pleasing about there being a physical device 'possessed' by a little ghost of a personal digital house elf. Not 100% sure what my setup ends up looking like just yet but Claws are an awesome, exciting new layer of the AI stack.

17K

10K

typeapi retweeted

4 months ago

A hill I’m willing to die on. When it comes to leadership, vulnerability is a strength, not a weakness. Most leaders carry real insecurities or imposter syndrome. That shows up in subtle but damaging ways. They avoid admitting mistakes, resist asking for help, and stick with decisions long after it is obvious they were wrong, all because they fear looking weak or losing authority. In reality, the opposite is true. Nothing builds credibility faster than a leader who can say they were wrong or that they need help. It sends a clear signal that truth matters more than ego and that learning matters more than the perception that you have all the answers. That signal changes behavior. People stop posturing and start being honest. Problems surface earlier. Debate improves. Accountability becomes real instead of performative. Over time, that kind of environment compounds. The organization gets sharper, faster, and more resilient because the truth stops being negotiated and starts driving decisions.

typeapi retweeted

CEO @ Alludium AI. CTO & Venture Partner @ Sure Valley Ventures

5 months ago

A masterclass in how innovation democratizes advanced AI and changes the world. By allowing sophisticated models to run efficiently on low cost hardware, these clever engineering techniques move advanced intelligence from exclusive high end systems to everyday devices & people.

Who to follow

John Frizelle

@johnfriz

Leigh Griffin

@leighgriffin

Engineering Manager @RedHat working with @Fedora and @CentOSProject. Agile coach and Scrum enthusiast. Long suffering soccer and hurling fan!

James Mernin 🇮🇪

@mernin

Husband, Father, LEGO & Guitar fan, and Chief of Staff Application Services Engineering at Red Hat. Opinions expressed here are my own.

typeapi retweeted

9 months ago

An IIT Delhi friend shared something shocking. When he graduated 95% of CS grads moved to the US. This year only 5% did. IITs don’t monopolize talent, but America’s comp adv has always been attracting the best. If many no longer want or can’t come, what does that mean for the US?

typeapi retweeted

Wojtek (WojTech) Trocki @typeapi

11 months ago

We just launched Voyage-context-3, a new embedding model that gives AI a full-document view while preserving chunk-level precision that offers better retrieval performance than leading alternatives. When building AI that reads and reasons over documents (such as reports, contracts, or medical records), it’s critical to break those documents into smaller pieces, or “chunks,” while still maintaining an understanding of the big picture. Most systems today lose important context, or require complicated workarounds to stitch it back together. https://t.co/OcxvTzfXah

11 months ago

It's been a wild ride through countless hardware iterations and architectural decisions. I've run everything on it, from simple hobby apps to full-blown #Kubernetes and recently AI models. It's been a decade of invaluable, hands-on learning! 🛠️ #DevOps #SRE #Infrastructure

Wojtek (WojTech) Trocki @typeapi

11 months ago

My self-hosted private cloud is 10 years old today! 🎂☁️ What started as a hobby to host projects and explore cloud tech became the foundation of my entire career. #Homelab #CloudNative

101

typeapi retweeted

ℏεsam

@Hesamation

11 months ago

"I use AI in a separate window. I don't enjoy Cursor or Windsurf, I can literally feel competence draining out of my fingers." @dhh, the legendary programmer and creator of Ruby on Rails has the most beautiful and philosophical idea about what AI takes away from programmers.

263

10K

typeapi retweeted

Bilgin Ibryam

@bibryam

11 months ago

Life is Lived in The Arena https://t.co/im6iOtl2cU @naval : Life is lived in the arena. You only learn by doing. And if you’re not doing, then all the learning you’re picking up is too general and too abstract. Then it truly is hallmark aphorisms. You don’t know what applies where and when. 🤯

typeapi retweeted

elvis

@omarsar0

11 months ago

Context Engineering Guide I'm writing a detailed guide on context engineering for AI devs. v1 is out now! (bookmark it) I use a concrete deep research multi-agent example to show what context engineering involves.

omarsar0's tweet photo. Context Engineering Guide

I'm writing a detailed guide on context engineering for AI devs.

v1 is out now! (bookmark it)

I use a concrete deep research multi-agent example to show what context engineering involves. https://t.co/LQQkpPC3la

295

288K

typeapi retweeted

Gergely Orosz

@GergelyOrosz

11 months ago

Always the same story: Google builds an amazing *internal-only* tool that is better than anything else out there. Never turns it into an external product. Microsoft meanwhile uses the same products they build for external devs. And this is why Microsoft beats Google w dev tools

383

214K

typeapi retweeted

11 months ago

I’ve lived in India, Canada, the UK & Nigeria—and traveled the world—but the USA gave my family opportunities we couldn’t find anywhere else. It’s not perfect, but it’s uniquely generous to those who dream big and work hard. So grateful. 🇺🇸 #July4th

typeapi retweeted

about 1 year ago

The future of software development is agent-native. At MongoDB, we’re already seeing big gains using Factory (powered by @VoyageAI) to accelerate dev workflows and automate tasks. This is just the beginning.

11K

typeapi retweeted

about 1 year ago

more context around the claude prompt https://t.co/arycVAPLiB

183K

typeapi retweeted

Gergely Orosz

@GergelyOrosz

about 1 year ago

Louder for people in the back I’m sick and tired of half-baked AI features released that make products worse and cannot be turned off Gemini for Google Docs / Sheets is one of many examples (this reminds me daily of what a feature that should have not been released looks like)

142

103K

typeapi retweeted