“They seemed terribly pathetic to me. They weren’t warriors. They were American boys who by mere chance of fate had wound up with guns in their hands, sneaking up a death-laden street in a strange and shattered city in a faraway country in a driving rain.”
a london startup most people have never heard of has beaten OpenAI, Anthropic, Mistral & DeepSeek on coding benchmarks, two years running (!).
now the uk govt is backing it to build britain's first sovereign frontier ai model:
→ Lumen Sovereign, by Cosine
→ 100% uk-trained (Isambard-AI, one of europe's most powerful supercomputers)
→ £500m government programme behind it
→ runs fully inside your own walls, zero data leaving
→ co-designed with BT, Lloyds, NatWest, BAE & Babcock
"sovereign ai" just stopped being a buzzword.
We're building this at LangChain
Fleet lets you create and manage a fleet of agents. Each agent specializes in a workflow, e.g. inbox management, blog writing, competitor research, candidate recruiting. These are Deep Agents with custom instructions, skills, tools, subagents, and memory. They continually improve with feedback. You can share them with your coworkers. You can configure them to run on a schedule. You can export their context files should you ever want to host them yourself
I think Fleet strikes a great balance: easy to use and still highly capable
We've put an inordinate amount of thought into the UX patterns that make that possible. For example, I love our 'channels' concept: you can configure your agent's communication channel (e.g. Slack, Teams, email, etc.) so it meets you where you work instead of forcing you into Fleet's UI
It's free to try out so give it a spin and share feedback: https://t.co/TRYcK32IBB
Every morning, the moment my eyes open, I wake up to 40 unread Slack messages that effectively say:
“If you don’t fix this in the next 5 minutes, the world will implode and the app will cease to exist.”
Anthropic engineer James Brady:
"Every agent in production lies. We measured it. The good ones lie less, the great ones catch the lie before the user does."
In 29 minutes, he walks through the verification stack he built and the patterns the Claude Code team adopted to keep agents honest at scale.
Watch the full talk, then save the config below👇
I made an MMO rendering in the terminal
a sort of relaxing social game to play while agents run
8 ppl already hang out in there while their coding agents run
it's live join us :D
I need Google Docs but just for markdown files.
Multiplayer comments. Syncing resolving comments.
Suggestion mode
Edit mode
Edit history
Maybe some sense of multi edits.
Easy cli access.
non-technical people saying “just let agents build it” are deluded.
you need to review the code. you need to steer the architecture. you need to know what good looks like.
This is the best way to learn how LLMs work.
Interactive. 3D. Step-by-step.
Covers:
→ Embedding
→ Layer Norm
→ Self-Attention
→ MLP
→ Transformer layers
→ Softmax
→ Output
Stop reading papers. Start seeing.
Link in comments.
Save this immediately.
Blows my mind that the US is about to host its second World Cup and a real hotbed of football, the UK, hasn’t been trusted with one since 1966
Joke really, and a corrupt one at that
In a corner of parliament at the far end of the Royal gallery a box lies permantly open containing sand from all five Normandy beaches -a reminder to both houses of the sacrifice & the cause of freedom fought for by brave service people on DDay June 6 th 1944. #DDay
Cowork Article gets one thing right: the model stopped being the limit a while ago. What you load into it decides the output.
Technique 9 is where I'd start. Skills. A skill is a markdown file that hands Claude a repeatable process, and Cowork pulls in whichever ones fit what you're writing.
So add this one today: stop-slop by Hardik Pandya. It finds the patterns that mark writing as AI and rewrites them out, then scores the draft on how human it reads before handing it back.
Over 7,500 people have starred it, and it earns them.
Load the folder into Cowork once, and your posts stop sounding like a model wrote them.
Building Observability for Multi-Agent Systems
Traditional observability (logs + metrics + traces) is not enough for agents. Multi-Agent Observability requires new layers: LLM-as-Judge evaluations, human annotations, cost tracking per agent, and unified tracing across agent interactions.
Without it, you’re flying blind in production.
As a dev, I now treat observability as a core part of every agent platform I build.
Multi-Agent Observability Cheatsheet:
• Traditional: Logs, Metrics, Traces (still needed)
• AI-Native additions: LLM-as-Judge evals, quality/safety scores, annotations
• Key signals: Token usage, latency per step, tool success rate, cost per task
• Tools: LangSmith, Arize, Phoenix, Helicone, or custom dashboards
• Pro tip: Start with tracing + cost tracking, then add automated evaluations
How are you currently observing and debugging your multi-agent systems? Reply below 👇
Follow @AiCamila_ for practical AI engineering patterns.
#AgentObservability #MultiAgent #LLMEval #DevOps
Claude Code feels completely different once you install this.
Anthropic quietly released an official plugin called claude-code-setup and it basically turns Claude Code from “pretty good” into an actual AI dev environment.
It scans your project and recommends:
→ hooks
→ skills
→ MCP servers
→ subagents
→ automations
Then sets everything up step-by-step for you.
Most people are using Claude Code completely vanilla…
which is why their experience feels messy.
The real power comes from the ecosystem around it.
Install:
/plugin install claude-code-setup@claude-plugins-official
Bookmark this before you forget it.
Just shared PM Skills v2.0 with 9 plugins, 68 skills, 42 commands. Free, MIT.
You get the rigor of Teresa Torres, Marty Cagan, and Alberto Savoia built into your daily workflows.
The open-source repo (12K GitHub stars) is now bigger, and it reaches further:
→ /red-team-prd: attacks the plan before the review does
→ /ship-check documents the system, audits the code against the intent, maps test coverage, and audits performance and security
→ The 8 previous plugins, such as pm-product-discovery, pm-execution, pm-data-analytics, and pm-strategy are still supported
→ Cowork and Claude Code, MIT-licensed
Install it and run /red-team-prd on the plan you're least sure about. That's the fastest way to see the difference.
I used /ship-check to responsibly disclose bugs in public repos, such as Langfuse (already fixed).
Interesting!
Remote work increased alone time and seems to have increased mental distress, perhaps in the process.
A piece of evidence that this is due to loneliness is that, among people with families, remote work seems to have been fine.