Everyone talks about AI models.
Almost nobody talks about the database behind them.
ClickHouse is already used by OpenAI, Anthropic, LangChain, W&B, Sierra, Modal Labs and others.
This is what’s powering the agent wave.
Anthropic is acquiring @bunjavascript to further accelerate Claude Code’s growth.
We're delighted that Bun—which has dramatically improved the JavaScript and TypeScript developer experience—is joining us to make Claude Code even better.
Read more: https://t.co/aQd3XRdUfR
New @GoogleDeepMind paper builds a new benchmark and agent design so language models can actually learn from their own experience.
Right now most language model agents only keep chat logs or facts, so they remember what happened but not how to solve similar tasks better, and the authors call this conversational recall versus experience reuse.
Evo Memory turns existing benchmarks into streams of tasks arriving 1 after another, and forces agents to search past experiences, use them, then update memory each time.
The simple baseline ExpRAG stores each solved task as a short text record, retrieves a few similar ones for a new task, and inserts them into the prompt.
ReMem goes further by letting the agent choose at each step to think, act, or refine memory, actively pulling useful experiences and pruning or rewriting unhelpful ones.
Across math, question answering, tool use, and interactive environments, these self evolving memories, especially ReMem and even simple ExpRAG, boost accuracy, need fewer steps, and make smaller models behave much stronger without any retraining.
----
Paper Link – arxiv. org/abs/2511.20857
Paper Title: "Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory"
Claude Opus 4.5 is the first Claude I think is reasonably usable for decent math work (Claude interface is great for iterating minus the timeouts & mobile slowdowns)
The big thing here though that I've noted using Opus 4.5 usage is thinking quality = non-thinking quality (!)
Sharing an interesting recent conversation on AI's impact on the economy.
AI has been compared to various historical precedents: electricity, industrial revolution, etc., I think the strongest analogy is that of AI as a new computing paradigm (Software 2.0) because both are fundamentally about the automation of digital information processing.
If you were to forecast the impact of computing on the job market in ~1980s, the most predictive feature of a task/job you'd look at is to what extent the algorithm of it is fixed, i.e. are you just mechanically transforming information according to rote, easy to specify rules (e.g. typing, bookkeeping, human calculators, etc.)? Back then, this was the class of programs that the computing capability of that era allowed us to write (by hand, manually).
With AI now, we are able to write new programs that we could never hope to write by hand before. We do it by specifying objectives (e.g. classification accuracy, reward functions), and we search the program space via gradient descent to find neural networks that work well against that objective. This is my Software 2.0 blog post from a while ago. In this new programming paradigm then, the new most predictive feature to look at is verifiability. If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well. It's about to what extent an AI can "practice" something. The environment has to be resettable (you can start a new attempt), efficient (a lot attempts can be made), and rewardable (there is some automated process to reward any specific attempt that was made).
The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation. This is what's driving the "jagged" frontier of progress in LLMs. Tasks that are verifiable progress rapidly, including possibly beyond the ability of top experts (e.g. math, code, amount of time spent watching videos, anything that looks like puzzles with correct answers), while many others lag by comparison (creative, strategic, tasks that combine real-world knowledge, state, context and common sense).
Software 1.0 easily automates what you can specify.
Software 2.0 easily automates what you can verify.
As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently:
"openai/gpt-5.1",
"google/gemini-3-pro-preview",
"anthropic/claude-sonnet-4.5",
"x-ai/grok-4",
Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response.
It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses.
Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain.
That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored.
I pushed the vibe coded app to
https://t.co/EZyOqwXd2k
if others would like to play. ty nano banana pro for fun header image for the repo
Kalonzo Musyoka to President Ruto: What are you afraid of? Even if you shut down media stations, what about those with phones? The horse has left the stable #June25th
We don’t need you at CBD we need you at Statehouse road
They have started killing us already
Don't say anything, Just Retweet!
#SiriNiNumbers#OccupyStatehouse2025#June25th#OccupyUntilVictory
RECORD EVERYTHING
Communications Authority
Today is Today
WE ARE THE MEDIA
KTN KTN KTN. We shall remember those who stood by the people during the struggle for revolution. KTN have provided WhatsApp numbers to share videos that will be used to send those rogue officers to ICC. These are the numbers; 0732142311 and 0732142590.