BaddieForPipelines @devmarrie - Twitter Profile

7 months ago

Everyone talks about AI models. Almost nobody talks about the database behind them. ClickHouse is already used by OpenAI, Anthropic, LangChain, W&B, Sierra, Modal Labs and others. This is what’s powering the agent wave.

0

1

0

346

devmarrie retweeted

Anthropic

@AnthropicAI

7 months ago

Anthropic is acquiring @bunjavascript to further accelerate Claude Code’s growth. We're delighted that Bun—which has dramatically improved the JavaScript and TypeScript developer experience—is joining us to make Claude Code even better. Read more: https://t.co/aQd3XRdUfR

717

9K

1K

8M

devmarrie retweeted

Rohan Paul

@rohanpaul_ai

7 months ago

New @GoogleDeepMind paper builds a new benchmark and agent design so language models can actually learn from their own experience. Right now most language model agents only keep chat logs or facts, so they remember what happened but not how to solve similar tasks better, and the authors call this conversational recall versus experience reuse. Evo Memory turns existing benchmarks into streams of tasks arriving 1 after another, and forces agents to search past experiences, use them, then update memory each time. The simple baseline ExpRAG stores each solved task as a short text record, retrieves a few similar ones for a new task, and inserts them into the prompt. ReMem goes further by letting the agent choose at each step to think, act, or refine memory, actively pulling useful experiences and pruning or rewriting unhelpful ones. Across math, question answering, tool use, and interactive environments, these self evolving memories, especially ReMem and even simple ExpRAG, boost accuracy, need fewer steps, and make smaller models behave much stronger without any retraining. ---- Paper Link – arxiv. org/abs/2511.20857 Paper Title: "Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory"

rohanpaul_ai's tweet photo. New @GoogleDeepMind paper builds a new benchmark and agent design so language models can actually learn from their own experience.

Right now most language model agents only keep chat logs or facts, so they remember what happened but not how to solve similar tasks better, and the authors call this conversational recall versus experience reuse.

Evo Memory turns existing benchmarks into streams of tasks arriving 1 after another, and forces agents to search past experiences, use them, then update memory each time.

The simple baseline ExpRAG stores each solved task as a short text record, retrieves a few similar ones for a new task, and inserts them into the prompt.

ReMem goes further by letting the agent choose at each step to think, act, or refine memory, actively pulling useful experiences and pruning or rewriting unhelpful ones.

Across math, question answering, tool use, and interactive environments, these self evolving memories, especially ReMem and even simple ExpRAG, boost accuracy, need fewer steps, and make smaller models behave much stronger without any retraining.

----

Paper Link – arxiv. org/abs/2511.20857

Paper Title: "Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory"

27

622

108

515

48K

devmarrie retweeted

Jeff Schneider @jeffrschneider

7 months ago

Agents Building Agents. Humans nudge it along.

1

2

1

2K

Who to follow

Ms.Njuguna.

@NjugunaMs1

Tech Enabled & Enabler| Products & Projects|| Legal Tech consultant|| Community lead || Business Developer

RamaDevsign | ⚛ |

@ramaspeaksdev

Frontend, DevOps, Data, AI || Open source maintainer & contributor || Co-Organizer: @reactdevske

Annie

@AnnieKobia

Organizer @droidconke @254androiddevs @kotlinkenya || @WomenTechmakers💃 My App: https://t.co/iKAThsuhQf

devmarrie retweeted

Emad

@EMostaque

7 months ago

Claude Opus 4.5 is the first Claude I think is reasonably usable for decent math work (Claude interface is great for iterating minus the timeouts & mobile slowdowns) The big thing here though that I've noted using Opus 4.5 usage is thinking quality = non-thinking quality (!)

12

71

2

8

13K

devmarrie retweeted

Andrej Karpathy

@karpathy

7 months ago

Sharing an interesting recent conversation on AI's impact on the economy. AI has been compared to various historical precedents: electricity, industrial revolution, etc., I think the strongest analogy is that of AI as a new computing paradigm (Software 2.0) because both are fundamentally about the automation of digital information processing. If you were to forecast the impact of computing on the job market in ~1980s, the most predictive feature of a task/job you'd look at is to what extent the algorithm of it is fixed, i.e. are you just mechanically transforming information according to rote, easy to specify rules (e.g. typing, bookkeeping, human calculators, etc.)? Back then, this was the class of programs that the computing capability of that era allowed us to write (by hand, manually). With AI now, we are able to write new programs that we could never hope to write by hand before. We do it by specifying objectives (e.g. classification accuracy, reward functions), and we search the program space via gradient descent to find neural networks that work well against that objective. This is my Software 2.0 blog post from a while ago. In this new programming paradigm then, the new most predictive feature to look at is verifiability. If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well. It's about to what extent an AI can "practice" something. The environment has to be resettable (you can start a new attempt), efficient (a lot attempts can be made), and rewardable (there is some automated process to reward any specific attempt that was made). The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation. This is what's driving the "jagged" frontier of progress in LLMs. Tasks that are verifiable progress rapidly, including possibly beyond the ability of top experts (e.g. math, code, amount of time spent watching videos, anything that looks like puzzles with correct answers), while many others lag by comparison (creative, strategic, tasks that combine real-world knowledge, state, context and common sense). Software 1.0 easily automates what you can specify. Software 2.0 easily automates what you can verify.

552

12K

1K

8K

2M

devmarrie retweeted

Andrej Karpathy

@karpathy

7 months ago

As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently: "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response. It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses. Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain. That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored. I pushed the vibe coded app to https://t.co/EZyOqwXd2k if others would like to play. ty nano banana pro for fun header image for the repo

karpathy's tweet photo. As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently:

"openai/gpt-5.1",
"google/gemini-3-pro-preview",
"anthropic/claude-sonnet-4.5",
"x-ai/grok-4",

Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response.

It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses.

Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain.

That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored.

I pushed the vibe coded app to
https://t.co/EZyOqwXd2k
if others would like to play. ty nano banana pro for fun header image for the repo

906

17K

1K

13K

5M

devmarrie retweeted

Franje @oaksupreme

12 months ago

Tunabuy shares za Standard, that sleeping giant will wake up and run.

91

18K

7K

187

426K

devmarrie retweeted

aleya @aleyakassam

12 months ago

Kenyans you are exactly and precisely who the fuck you think you are!!!

62

22K

8K

245

412K

devmarrie retweeted

Adelle Onyango @ADELLEO

12 months ago

Fuck you @CA_Kenya for your ILLEGAL ACTIONS just to support a dictator? We shall revisit! #SiriNiNumbers

1

6K

4K

17

84K

devmarrie retweeted

Ike Ojuku

@IkeOjuku

12 months ago

Rapid Deployment Unit (RDU)- leaving Allsops GSU camp headed to Town, Nairobi CBD via @Georgeson_ #June25th

245

2K

610

100

301K

devmarrie retweeted

Citizen TV Kenya

@citizentvkenya

12 months ago

Kalonzo Musyoka to President Ruto: What are you afraid of? Even if you shut down media stations, what about those with phones? The horse has left the stable #June25th

43

5K

1K

56

135K

devmarrie retweeted

Thee Don @Officialtheedon

12 months ago

Scenes Allsops buana, huyu ameona dust

120

4K

1K

398

235K

devmarrie retweeted

Kydd Khuka @kyddkhuka007

12 months ago

We don’t need you at CBD we need you at Statehouse road They have started killing us already Don't say anything, Just Retweet! #SiriNiNumbers #OccupyStatehouse2025 #June25th #OccupyUntilVictory RECORD EVERYTHING Communications Authority Today is Today WE ARE THE MEDIA

kyddkhuka007's tweet photo. We don’t need you at CBD we need you at Statehouse road
They have started killing us already

Don't say anything, Just Retweet!

#SiriNiNumbers
#OccupyStatehouse2025 #June25th
#OccupyUntilVictory
RECORD EVERYTHING
Communications Authority
Today is Today
WE ARE THE MEDIA https://t.co/lkaWsnK98k

20

1K

2K

46

62K

devmarrie retweeted

George T. Diano

@georgediano

12 months ago

KTN KTN KTN. We shall remember those who stood by the people during the struggle for revolution. KTN have provided WhatsApp numbers to share videos that will be used to send those rogue officers to ICC. These are the numbers; 0732142311 and 0732142590.

georgediano's tweet photo. KTN KTN KTN. We shall remember those who stood by the people during the struggle for revolution. KTN have provided WhatsApp numbers to share videos that will be used to send those rogue officers to ICC. These are the numbers; 0732142311 and 0732142590. https://t.co/wRJSmGbId7

80

13K

6K

121

185K

devmarrie retweeted

FERDINAND OMONDI

@FerdyOmondi

12 months ago · Shimo la Tewa

"If you leave, we will get killed". This is why media presence is important. This is why you must record everything.

5

5K

3K

82

97K

devmarrie retweeted

Njugush

@BlessedNjugush

12 months ago

Moi error...... Lakini times are different . Youtube wako laivu.

36

3K

1K

9

50K

devmarrie retweeted

Tom

@AiwithVitor

12 months ago

Comrades,A million tweets for the release of Ndingui,#FreeNdiangui

62

10K

8K

38

211K

devmarrie retweeted

Master Wong 🥷 @kiruik

12 months ago

Tunajua wengine hawawezi fika tao. Here is how you can help. Every coin will help a person fighting for our country & in need #SiriNiNumbers #RutoMustGo #RutoMustGoNow #June25th

kiruik's tweet photo. Tunajua wengine hawawezi fika tao. Here is how you can help. Every coin will help a person fighting for our country & in need #SiriNiNumbers #RutoMustGo #RutoMustGoNow #June25th https://t.co/f5IfPJ1vB0

7

20

75

4

6K

BaddieForPipelines

@devmarrie

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users