Super excited to receive a Laude grant for DocWriter! AI is revolutionizing knowledge work, but AI-assisted writing is still atrocious. So we're building a new harness and UI for writing. Along the way we'll tackle some broadly-applicable challenges: (i) harnesses that natively support async human-AI collaboration, (ii) steering long-horizon (e.g., multi-month) agents, and (iii) open-source frameworks for automatic, continual evals as models and human behavior drift (e.g., new slopwords arise).
https://t.co/8wVRNlVZl7 -- email if you do a lot of writing for your profession that needs to be high-quality & want to partake in our user studies
If MySQL had a better extensions story, it would almost certainly still be the #1 relational database.
It's so good in every other way.
Perhaps the only thing Postgres would have on it is a better license.
Really excited to open source a new project: Omnigent, a meta-harness for AI agents.
It lets you build multi-agent coding and custom agents, sitting above Claude Code, Codex, Pi, and agent SDKs to let you compose them. It also adds live collaboration and rich control policies.
The web was never meant to be flattened into text.
Yet most web RAG systems start by parsing HTML --- a complex and lossy process.
🔥 Introducing PixelRAG: the first RAG system that retrieves and reads 30M+ web pages as pixels.
Instead of extracting text, PixelRAG retrieves screenshots and lets a VLM read them directly.
PixelRAG not only preserves visual information, but also outperforms text-based RAG on text-only QA benchmarks by +18.1%.
Why?
(1) HTML-to-text conversion often discards layout, structure, tables, and other useful signals.
(2) We continued pretraining a VLM on web page screenshots and turned it into a surprisingly strong visual retriever.
(3) Recent VLMs are remarkably good at understanding web pages, often with better accuracy and token efficiency than text-only pipelines.
Takeaway: HTML parsing may be one of the biggest self-inflicted bottlenecks in web RAG.
Demo below 👇
Code: https://t.co/ssDF0nnVwZ
Paper: https://t.co/OIpQ26Vb8H
Playground: https://t.co/UdzM7GQmu3
🚀 Beyond excited to share we're releasing LOTUSPlan, a new API & optimizer for higher performance LLM-powered data processing, from our team at Berkeley & Stanford.
LOTUS now lets you write your LLM-based queries and optimize them for up to 2.4× lower cost and 4.6× higher accuracy for tasks like, agent trace analysis, LLM-judge evals, RAG, document extraction and deep research.
✨Checkout our our new blog: https://t.co/pbVRwTIDmD
🧵
Hello world!
We're organizing a new interdisciplinary workshop @VLDBconf focused on data management challenges in biomedical research and healthcare.
🗓️ Submit talk proposals by May 15th 🗓️
➡️ More info: https://t.co/SXLJYvOioT ⬅️
💬 Discord server: https://t.co/NgrpsjrD6E 💬
We now offer @CMUDB's Database Systems course offline to incarcerated students across US prisons. No WiFi, completely free. Locked in by the system, freed by the lock manager: https://t.co/rFYRvEYoq3
Thanks to @convex for helping make sure the database game is for everybody.
Database transactions don't get enough love but the ability to execute a bunch of code, change a bunch of data, and only commit it when you've validated the results is going to be so critical in the AI era.
Congrats to Ling Zhang for defending & completing her PhD! During her PhD, she specialized in in log processing and management, structured text search, and database query processing! She is now at @databricks on their Query Optimization team!
We're incredibly proud to congratulate our co-founder and CTO, @matei_zaharia, on receiving the ACM Prize in Computing for his development of distributed data systems that have enabled large-scale machine learning, analytics, and AI.
Matei's open-source contributions have fundamentally changed how organizations work with data and AI — including Apache Spark™, Delta Lake, and MLflow. Researchers, nonprofits, startups, and enterprises across every industry have built on the foundation he helped create.
Now he's pushing the frontier further, focusing on building and scaling reliable AI agents through open-source research like DSPy and GEPA.
Matei, this recognition is so well deserved. We're honored to build alongside you every day. https://t.co/mgBvBc3QnP
Congratulations to Matei Zaharia on receiving the 2025 #ACMPrize in #Computing! He is recognized for his visionary development of distributed data systems and computing infrastructure. Learn more: https://t.co/Yu4dl9PzDM
I first tried to read this book in 2018 and couldn't make it through because I thought it was too hard.
8 years later it's the only book I recommend every developer reads, and I had the chance to review the 2nd edition.
Join a study group, give it a read.