Data Science lead @SafaricomPLC; former Head, Data Science at Absa bank(former Barclays Bank) Kenya. Open to offers. Retweets are not necessarily an endorsement
I recommend Columbia mathematician Michael Harris’s wide-ranging, informative and thought-provoking essay in Boston Review on AI and mathematics:
https://t.co/igA4fbmhgc
Harris rightly worries about the possible negative effects of AI-generated proofs and mathematics on collective knowledge in the field, as I have also argued here: https://t.co/oKhvHdjpP3
Of course, used in the right way, AI is a tool that can be beneficial in many fields. The question is whether we can develop institutions, norms and practices to support its beneficial use and whether the current direction of the technology in Silicon Valley will enable us to do so.
This TRM variant makes a transformer block a contractive map, so that looping becomes a fixed-point process. They leverage this by approximating gradients with Neumann series (Truncated BPTT). Very cool work!
🔗https://t.co/ySlb492R37
since a good bunch of discourse is going on around "how to do research", these pieces are quite worth a read.
https://t.co/pA0MkOMlKS
https://t.co/rw9uMiwlCj
https://t.co/H1AGvnb7LP
https://t.co/FTyAabr9Rx
NEW: Inside the 24-hrs before WH slapped export controls on Anthropic
- Last Thursday, Amazon CEO Andy Jassy raised concerns about Fable jailbreak to Trump admin
- Friday AM, Sean Cairncross, Bessent, Susie etc. held WH call to discuss
- Then White House started reaching out to Anthropic to speak with Dario Amodei, who was at a wellness retreat.
- When Amodei was finally available past 1pm, he had three tense phone calls with a combo of ppl including Cairncross, Bessent, Lutnick, Kessler, Will Scharf, Richard Walters, and Walker Barrett.
-Amodei tried to clear up what he assumed was a misunderstanding. He defended the guardrails and distinguished between universal and non-universal jailbreak
- Cairncross and Bessent were unmoved and asked Amodei to take down Fable and work with the admin to fix the vulnerabilities. (A WH official said Amazon’s findings were run past the NSA and they felt they had “proof.”)
- Amodei asked for more time and info, but he made no commitments to pull the model
- Bessent told Amodei directly at one point that he was making a “bad decision”
- By Friday evening, the Trump admin imposed its export controls.
- “Export controls were a last resort after begging them for hours to work with us,” senior WH official said.
W/ @cheyennehaslett
https://t.co/0Rwny9md3p
Most memory setups for agentic systems are centralized.
They either provide memory only to the orchestrator, or expose one shared pool every agent reads from and writes to. This makes sense because a naive decentralized memory would just mean that each agent has its own isolated context, which goes against the goal of collaboration.
However, centralized memory can hurt multi-agent systems.
It makes every agent have the same context, which blurs the distinct roles each one is supposed to play. The shared pool also makes it computationally expensive — the agents prefill a lot more information that may not be necessary, and this gets worse over time as every update grows this shared repository.
Therefore, we (@GuangyaHao666 ) tried something different, something that's decentralized, but still collaborative.
In our latest paper, DecentMem, the agents still work together the usual way, in whatever agent structure they already have. But we let each agent keep its own private memory instead of pooling everything into a shared repository. And we make the private memory stay collaboration-aware by remembering how a task got solved and who handled each piece, so decentralizing the memory doesn't throw out the coordination signal.
Specifically, each agent's memory has two halves — an exploitation pool of past trajectories it can reuse, and an exploration pool of fresh LLM-generated candidates for things it hasn't seen yet. A lightweight online router reweights the two from stage-wise feedback from a judge, so each agent works out its own exploit/explore balance instead of us hard-coding a schedule.
Theoretically, we model each agent's search as a random walk over a graph of candidate strategies, where the two pools act as two kinds of moves — the exploitation pool is a local walk over strategies the agent already knows, and the exploration pool is a teleport that can jump anywhere in its space through the LLM prior. Under mild assumptions, that combination guarantees no agent ever gets permanently stuck in a local place, since the search can always reach any strategy. We also cast the router as a bandit problem and show it converges toward the right exploit/explore balance at an O(log T) regret rate — about the best rate this kind of online balancing can achieve.
Empirically, across 3 MAS frameworks (AutoGen, DyLAN, AgentNet), 5 backbones (Qwen3-4B/8B/14B, Gemma4-E2B/E4B), and 5 benchmarks spanning math, code, QA, and embodied tasks, DecentMem comes out ahead of the strongest centralized baseline by ~9% on average — up to ~24% in the best case — and the no-memory baseline by ~26%. It also uses up to ~49% fewer tokens, since each agent only touches its own memory instead of the whole shared repository.
We also watched how this plays out as the agents pile up experience, since a memory system should naturally support self-evolution and help the system keep improving. We show that DecentMem helps the agentic system evolve faster than every centralized baseline as it sees more tasks — on DyLAN it reaches strong accuracy roughly 2.5× sooner.
Another interesting result is that the improvement gets bigger when the agent coordination is looser and more free-form.
Going from AutoGen's fixed, scripted workflows to AgentNet's improvised, on-the-fly coordination, the relative gain widens pretty steadily, and on the loosest setup, DecentMem even lands on strategies the shared-pool baselines never reach. Our read is that keeping memory private lets different agents keep chasing different solution paths, while a shared pool drags everyone toward the same stored answers — and that variety pays off most when coordination is loose.
Zooming out, the takeaway may not be that decentralized beats centralized. It's that each agent's memory should be scoped and structured more carefully, and more personalized to that agent, which is something I think most multi-agent systems and memory designs still leave on the table.
📑 Paper: https://t.co/Vv8XdBd8N0
MLEvolve: Graph Search + Self-Evolving Memory for Code Evolution
💡1: MLEvolve breaks the common tree constraint, introducing reference edges that create a directed graph, enabling cross-branch knowledge flow, combined with a progressive explore-exploit schedule.
💡 2: Unlike archive databases or memoryless search, MLEvolve combines a cold-start domain knowledge base with a dynamic global memory that automatically accumulates and retrieves task-specific experience.
💡 3: MLEvolve decouples strategic planning (what/why to modify) from coding (how) and adaptively selects among three coding modes based on search state.
📝 https://t.co/j7Xnf9TaYV
🧑💻 https://t.co/y60xUV9cXW
Cool new open-weight model by Cohere: a new lightweight 30B open-weight model for agentic coding tasks.
This one builds on Command A+ using the parallel transformer design. Interestingly, even though it's almost half as big, it almost doubles the number of layers.
Also, they say that it's been specifically developed for agentic coding, not just coding. I.e., the evaluation is inside a workflow, not just on a single prompt-to-code-answer task.
For Terminal-Bench, the model has to use a terminal, inspect the environment, run commands, read outputs, etc.
For SWE-Bench the model works on real GitHub-style software issues where it has to understand the repository, find relevant files, make a patch, pass tests, etc.
SciCode and LiveCodeBench are more traditional because they mostly test whether the model can produce correct code for a specified problem. Sure, this still requires reasoning, but it's more like “Implement a numerical routine to compute a scientific quantity from given equations and inputs.” which doesn't require any interaction with the environment, existing files, tests, etc.
The focus on the agentic code benchmarks is probably why it's far ahead of Gemma 4 on those.
Overall, it's pretty competitive although not quite Qwen3.6-level performance.
Study the past if you would define the future. ~ Confucius.
A reminder that remembering history is crucial for understanding where we are, and headed to in the name of Affordable Housing.
The writing was and remains on the wall. Amalize tu aende.
#RutoMustGo