Just launched LoopTroop – AI that actually gets the PR right
Most AI coding tools chase speed and you end up with code that’s close but not really what you had in your head or directly AI Slop.
I wanted the opposite, to focus on quality over speed so the result actually matches the idea.
LoopTroop takes your ticket or rough idea and turns it into a proper PR for your existing projects or new ones. It uses an LLM council to plan it right, Ralph loops to recover when things break, beads to break the work down, context engineering, isolated git worktrees and more under the hood.
Just dropped a 16min video explaining the concepts (demo starts around ). APP is fully open source:
https://t.co/v8k1xRfvP6
Give it a spin and let me know what you think.
#AIAgents #OpenSource #DevTools #BuildInPublic
@ThinksDylan Shameless plug of my open-source project - https://t.co/2vr02lLm0M - main pillar is context engineering (every step is done with the minimum context necessary so it wont be polluted with useless info from previous conversations)
Claude will gaslight you, until you install this skill.
It's called The LLM Council.
You ask a question. 5 advisors attack it from different angles. Then they peer-review each other before giving you the verdict.
How it works:
1. You ask a real decision question.
2. 5 advisors attack it from different angles.
3. They grade each other's work anonymously.
4. Chairman synthesises one verdict and the next step.
Install in 4 steps:
1. Download the skill
https://t.co/mnpPNSnDXu
2. Open Customise skills in Claude
3. Upload the SKILL.md file
4. Type /llm-council
One Claude tells you you're right.
Five Claudes show you where you're wrong.
Get more free AI guides here https://t.co/1F12fOTjss
Repost ♻️ to help someone in your network.
P.S. Credit to Ole Lehmann for building it.
BYTEDANCE 🔥: Seedance 2.5 has been officially announced, along with an updated Seedance 2.0.
- Seedance 2.0 now supports 4k output
- Seedance 2.5 will be able to generate 30-second videos in one go
- ByteDance also announced a new AI copyright commercialization platform
This video ad is stunning 👀
We've kept hearing how GLM-5.2 beats Opus 4.8, and are skeptical of benchmarks - so we tested them on a real bug from the Cline repo. While both models fixed the issue, GLM was the winner in terms of cost and code quality:
- GLM used twice as many tokens (GLM 1.1m vs Opus 660K) but cost half as much (GLM $0.41 vs Opus $0.81)
- Opus finished quicker - 1.6 min and 12 tool calls vs GLM 4.7 min and 28 tool calls
- GLM cleaned up dead code and verified the build compiled before completing. Opus didn't - it left type errors that passed tests but broke the production build.
Both runs used the same Cline harness prompting and tools, so it seems GLM is RL trained to spend more tokens verifying its work before completing. Impressive work by the @Zai_org team!
LoopTroop (https://t.co/Ly4uAo6lvQ) does exactly this in practice: it runs structured Ralph loops for execution + recovery (fresh worktrees on failure, compact error traces), paired with LLM councils for planning and bead-based tasks. Focuses on reliable, long-running, high-correctness agent work with human approval gates. Fully local & open source
Before Fable got released (and pulled) @mozilla was quietly testing Claude Mythos against Firefox's 10M line codebase.
The result? Over 400 security bugs fixes, including ones that had been hiding in the codebase for over a decade.
@bgrins, distinguished engineer at Mozilla, walked me through the agentic bug-finding harness behind the model. His take? It was 50% mythos / 50% setup.
In this ep, Brian walks through:
- why you can't just point a model at 10M lines of code
- how to write a good goal/loop pattern
- killing false positives with a verifier
- why it's good to "lie to the agent"
And guess what? This isn't magic - you can write your own similar harness in less than an afternoon.
Watch now on YT: https://t.co/pBQJZHIM6D
Claude Code subagents can nest 5 levels deep now
@bcherny announced it, and today I finally got to try it, Here's the full chain running end to end:
- main
- project-auditor // level 1
- structure-checker // level 2
- import-validator // level 3
- dependency-tracer // level 4
- style-sync // level 5
Each level runs in its own context window
Only the top-level summary returns to main, depth 5 is the hard cap, that agent can't spawn further
the most impressive thing about this isn't that some random japanese company created a mythos-level model - its *how* they did it:
-> their ai model isn't actually a model, it's an API that calls *other models* (e.g. chatgpt, claude, their own)
-> their orchestrator selects different models to do different parts of your prompt. if a cheaper model can be used they'll do that. thats how they cut costs.
-> if a task is challenging then they'll use a frontier model (e.g. claude) to design a solution, then use a cheaper model to build it.
point is - frontier ai capability is no longer solely dependent on how good the model weights are, its how MANY model instances you can get to debate and come up with an answer between themselves
more models going back and forth = better cheaper answer.
we're moving from mono-model to multi-model
GLM-5.2 leads open weights models and sits at #3 overall on GDPval-AA, a real-world agentic work benchmark
GLM-5.2 from @Zai_org scores 1524 Elo on GDPval-AA, which measures performance on real-world, economically valuable knowledge work through long-horizon, multi-turn tasks.
Key takeaways:
➤ #3 overall, behind only Claude Fable 5 (1783) and Claude Opus 4.8 (1615), and level with GPT-5.5 (xhigh, 1509)
➤ The leading open weights model by a wide margin: the next open model, MiniMax-M3, scores 1408
➤ Ahead of many proprietary models, including Google's Gemini 3.5 Flash (1357), Qwen 3.7 Max (1289), Muse Spark (1158)
➤ The tasks are agentic. GLM-5.2 averaged ~31 turns per task across 1,999 matches
➤ Consistent with the rest of its launch, GLM-5.2 also leads open weights on the Artificial Analysis Intelligence Index, ranks #3 on the Agentic Index, and #3 on AA-Briefcase
we've added unique user rankings
some models are token heavy so they skew upwards in rankings - unique people using the model is a more accurate ranking
we'll orient more of our data around this metric
Most AI coding agents derail on big tasks because their context fills up with garbage - errors, dead code, hallucinations. That's "context-rot."
#LoopTroop fixes it locally and open-source.
An LLM Council plans the work (multiple models draft & vote anonymously), atomic Beads execute in isolated git worktrees, and a Ralph Loop retries failures with a fresh context carrying the lesson forward. Every step is transparent from one pane.
Go from a repo issue straight to a clean PR — across multiple projects at once — without ever leaving the app.
https://t.co/v8k1xRfvP6
Free, local, open-source. MIT.
🇺🇸🇮🇷🇵🇰 The feelings of shock and disappointment on the face of Pakistan's Prime Minister, after he was informed by the Iranian delegation they are leaving after Trump threatened them.
9 years later, none of the "Attention Is All You Need" paper authors are at Google.
Ashish Vaswani - cofounder of Essential AI (recently exited)
Noam Shazeer - just moved to OpenAI
Niki Parmar - at Anthropic
Jakob Uszkoreit - cofounder of Inceptive
Llion Jones - cofounder of Sakana AI
Aidan Gomez - cofounder of Cohere
Lukasz Kaiser - at OpenAI
Illia Polosukhin - cofounder of NEAR Protocol
good luck paying $20k only to find out you can generate 20 tok/s. even running 24/7, that's just 50m tokens/month. for glm, at $4.40/m, this is $228 in value. any $200 sub gives you significantly more. and this math means the break-even is 7.3 years not 6 moths. by that time the hardware would die if it's running 24/7.