LLM Knowledge Bases
Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So:
Data ingest:
I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them.
IDE:
I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides).
Q&A:
Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale.
Output:
Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base.
Linting:
I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into.
Extra tools:
I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries.
Further explorations:
As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows.
TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
Introducing the new SOLO: now on Desktop and Web.
You define the task, review the results, and SOLO handles the rest.
SOLO is in beta, with limited-time, free access via invite codes.
🎵 My DJ set last week @PopUpFest_xyz at the beautiful @4seasDeSoc 🪷
https://t.co/ilybjIIxFl
Shout out to @akprettyok@timourxyz for awesome b3b session towards the end ❤️🔥
Much love for everything we co-created 💕
🎉 清迈 Pop Up Season 最后一天!
11 月 9 日下午 4 点,欢迎前往 4Seas Mountain View 参加 Pop Up Season 盛大闭幕庆典和音乐节!
Pop Up Fest ! 🧑🔬👩🎨🎃🤖🤡🤠👦🤹♀️
Pop Up Fest ! 🍀🎋🌴🪵🌿🌹🌸🌱
Pop Up Fest ! 🪩🔥🎨🥁🎸🎻🎉❤️
Pop Up Fest ! 🎭🚩🧘🛖🏡👐💡💓
📍 地点:4 Seas Mountain View
⏰ 时间:今天下午 4 点
🎪 重磅:所有 Pop Up Village 的超级派对 Pop Up Fest 即将开启! 让我们相聚在这里 ——
- 广阔的草坪 🌿
- 山林间的音乐 🎵
- 艺术装置 🎨
- 尽情狂欢 💃
- 珍贵的朋友们 🫂
✨ 特别的是:素贴山的神灵送给我们雨季里最完美的晴天!☀️
💫 由所有快闪村庄 leaders 和 residency 们联合筹划 🤝 我们将在这里欢聚、狂欢、告别 👋 明天 Devcon 再相见!
🎡 这将是一场堪比伍德斯托克、火人节的难忘盛会! 快来参与这场 Pop Up Fest 的闭幕式吧!🎊
感谢
4Seas
Edge City
Web3 Village
Web3.0 Meta Hub
Lovepunk
AuraVerse
Shanhaiwoo
ZuGarden
The mu
WAMO
#PopUpFest #ChiangMai #PopUpVillage #DevCon2024
🌟 POPUP FEST IS HERE! 🌟
Join us for a FREE celebration of pop-up communities, closing out this incredible congregation of villages! 🙌
📅 Nov 9th, 4PM-12AM
📍 4seas Mountainview, Chiang Mai
🚌 FREE TRANSPORT:
- Buses from Alt_PingRiver: 3PM-4PM (every 15min)
- Return buses starting from 8:30PM-12:30AM
- Red trucks loop from 4Seas Nimman: 3PM-2AM
→ BE AT BUSES FROM 3PM! First come, first serve!
🎵 BRING YOUR VIBE:
- Dress funky! Express yourself!
- Instruments, art, creative projects & decoration strongly encouraged!
- Good energy!
🎒 PACK SMART:
- Weather-ready gear (might be hot/rainy)
- Sturdy shoes, maybe something to sit on (or just keep dancing 💃)
🍾 BYO [insert vice]:
- BYOB (some beer available)
- Treats to share with others (food trucks provided)
💫 Open-source & Decentralized
(for the community by the community):
→ This is YOUR festival! The more you bring, share & create, the more magical it becomes!
Let's make this epic! ✨🚀
P.S. help is always needed... if you want to come to the "pre-party", come between 12-3PM to support with the final finishes... please! :)
Links below 👇