Stefanescu Liviu @liviusa - Twitter Profile

Pinned Tweet

7 days ago

Just launched LoopTroop – AI that actually gets the PR right Most AI coding tools chase speed and you end up with code that’s close but not really what you had in your head or directly AI Slop. I wanted the opposite, to focus on quality over speed so the result actually matches the idea. LoopTroop takes your ticket or rough idea and turns it into a proper PR for your existing projects or new ones. It uses an LLM council to plan it right, Ralph loops to recover when things break, beads to break the work down, context engineering, isolated git worktrees and more under the hood. Just dropped a 16min video explaining the concepts (demo starts around ). APP is fully open source: https://t.co/v8k1xRfvP6 Give it a spin and let me know what you think. #AIAgents #OpenSource #DevTools #BuildInPublic

0

3

0

1K

Stefanescu Liviu

@liviusa

about 1 hour ago

@ThinksDylan Shameless plug of my open-source project - https://t.co/2vr02lLm0M - main pillar is context engineering (every step is done with the minimum context necessary so it wont be polluted with useless info from previous conversations)

1

0

19

liviusa retweeted

Charlie Hills

@charliejhills

about 2 months ago

Claude will gaslight you, until you install this skill. It's called The LLM Council. You ask a question. 5 advisors attack it from different angles. Then they peer-review each other before giving you the verdict. How it works: 1. You ask a real decision question. 2. 5 advisors attack it from different angles. 3. They grade each other's work anonymously. 4. Chairman synthesises one verdict and the next step. Install in 4 steps: 1. Download the skill https://t.co/mnpPNSnDXu 2. Open Customise skills in Claude 3. Upload the SKILL.md file 4. Type /llm-council One Claude tells you you're right. Five Claudes show you where you're wrong. Get more free AI guides here https://t.co/1F12fOTjss Repost ♻️ to help someone in your network. P.S. Credit to Ole Lehmann for building it.

charliejhills's tweet photo. Claude will gaslight you, until you install this skill.

It's called The LLM Council.

You ask a question. 5 advisors attack it from different angles. Then they peer-review each other before giving you the verdict.

How it works:

1. You ask a real decision question.
2. 5 advisors attack it from different angles.
3. They grade each other's work anonymously.
4. Chairman synthesises one verdict and the next step.

Install in 4 steps:

1. Download the skill

https://t.co/mnpPNSnDXu

2. Open Customise skills in Claude
3. Upload the SKILL.md file
4. Type /llm-council

One Claude tells you you're right.

Five Claudes show you where you're wrong.

Get more free AI guides here https://t.co/1F12fOTjss

Repost ♻️ to help someone in your network.

P.S. Credit to Ole Lehmann for building it.

116

3K

388

8K

333K

liviusa retweeted

🚨 AI News | TestingCatalog

@testingcatalog

about 2 hours ago

BYTEDANCE 🔥: Seedance 2.5 has been officially announced, along with an updated Seedance 2.0. - Seedance 2.0 now supports 4k output - Seedance 2.5 will be able to generate 30-second videos in one go - ByteDance also announced a new AI copyright commercialization platform This video ad is stunning 👀

24

371

31

88

33K

Who to follow

Stoney Bitson

@GhostOfStoneyX2

Slinging 🪨 's, Stacking Sats, Noticer of Noticing. Fiat Credentials NGMI. Acct #15

✨️BtcPump✨️ 🌏/21M@Silent_12AM

@cinemaniac65

BTC and fitness from NY. Stacking joy, one workout at a time. 💪💰

Dirty Anurag⚡OSSIFY NOW!

@AnuragSaikia

Shitposter, Host of @desibitcoinshow Building @xonghoti Tips⚡[email protected] [email protected]

liviusa retweeted

shirish

@shiri_shh

about 5 hours ago

This week is going to be absolutely insane.

10

49

3

2K

liviusa retweeted

Cline

@cline

about 11 hours ago

We've kept hearing how GLM-5.2 beats Opus 4.8, and are skeptical of benchmarks - so we tested them on a real bug from the Cline repo. While both models fixed the issue, GLM was the winner in terms of cost and code quality: - GLM used twice as many tokens (GLM 1.1m vs Opus 660K) but cost half as much (GLM $0.41 vs Opus $0.81) - Opus finished quicker - 1.6 min and 12 tool calls vs GLM 4.7 min and 28 tool calls - GLM cleaned up dead code and verified the build compiled before completing. Opus didn't - it left type errors that passed tests but broke the production build. Both runs used the same Cline harness prompting and tools, so it seems GLM is RL trained to spend more tokens verifying its work before completing. Impressive work by the @Zai_org team!

cline's tweet photo. We've kept hearing how GLM-5.2 beats Opus 4.8, and are skeptical of benchmarks - so we tested them on a real bug from the Cline repo. While both models fixed the issue, GLM was the winner in terms of cost and code quality:

- GLM used twice as many tokens (GLM 1.1m vs Opus 660K) but cost half as much (GLM $0.41 vs Opus $0.81)

- Opus finished quicker - 1.6 min and 12 tool calls vs GLM 4.7 min and 28 tool calls

- GLM cleaned up dead code and verified the build compiled before completing. Opus didn't - it left type errors that passed tests but broke the production build.

Both runs used the same Cline harness prompting and tools, so it seems GLM is RL trained to spend more tokens verifying its work before completing. Impressive work by the @Zai_org team!

137

5K

397

950

421K

Stefanescu Liviu

@liviusa

about 12 hours ago

LoopTroop (https://t.co/Ly4uAo6lvQ) does exactly this in practice: it runs structured Ralph loops for execution + recovery (fresh worktrees on failure, compact error traces), paired with LLM councils for planning and bead-based tasks. Focuses on reliable, long-running, high-correctness agent work with human approval gates. Fully local & open source

0

1

0

1

226

liviusa retweeted

claire vo 🖤

@clairevo

about 14 hours ago

Before Fable got released (and pulled) @mozilla was quietly testing Claude Mythos against Firefox's 10M line codebase. The result? Over 400 security bugs fixes, including ones that had been hiding in the codebase for over a decade. @bgrins, distinguished engineer at Mozilla, walked me through the agentic bug-finding harness behind the model. His take? It was 50% mythos / 50% setup. In this ep, Brian walks through: - why you can't just point a model at 10M lines of code - how to write a good goal/loop pattern - killing false positives with a verifier - why it's good to "lie to the agent" And guess what? This isn't magic - you can write your own similar harness in less than an afternoon. Watch now on YT: https://t.co/pBQJZHIM6D

23

397

27

792

155K

liviusa retweeted

Daniel San

@dani_avila7

1 day ago

Claude Code subagents can nest 5 levels deep now @bcherny announced it, and today I finally got to try it, Here's the full chain running end to end: - main - project-auditor // level 1 - structure-checker // level 2 - import-validator // level 3 - dependency-tracer // level 4 - style-sync // level 5 Each level runs in its own context window Only the top-level summary returns to main, depth 5 is the hard cap, that agent can't spawn further

67

821

61

911

131K

liviusa retweeted

Ejaaz

@cryptopunk7213

about 18 hours ago

the most impressive thing about this isn't that some random japanese company created a mythos-level model - its *how* they did it: -> their ai model isn't actually a model, it's an API that calls *other models* (e.g. chatgpt, claude, their own) -> their orchestrator selects different models to do different parts of your prompt. if a cheaper model can be used they'll do that. thats how they cut costs. -> if a task is challenging then they'll use a frontier model (e.g. claude) to design a solution, then use a cheaper model to build it. point is - frontier ai capability is no longer solely dependent on how good the model weights are, its how MANY model instances you can get to debate and come up with an answer between themselves more models going back and forth = better cheaper answer. we're moving from mono-model to multi-model

cryptopunk7213's tweet photo. the most impressive thing about this isn't that some random japanese company created a mythos-level model - its *how* they did it:

-> their ai model isn't actually a model, it's an API that calls *other models* (e.g. chatgpt, claude, their own)

-> their orchestrator selects different models to do different parts of your prompt. if a cheaper model can be used they'll do that. thats how they cut costs.

-> if a task is challenging then they'll use a frontier model (e.g. claude) to design a solution, then use a cheaper model to build it.

point is - frontier ai capability is no longer solely dependent on how good the model weights are, its how MANY model instances you can get to debate and come up with an answer between themselves

more models going back and forth = better cheaper answer.

we're moving from mono-model to multi-model

18

131

18

48

13K

liviusa retweeted

Artificial Analysis

@ArtificialAnlys

about 15 hours ago

GLM-5.2 leads open weights models and sits at #3 overall on GDPval-AA, a real-world agentic work benchmark GLM-5.2 from @Zai_org scores 1524 Elo on GDPval-AA, which measures performance on real-world, economically valuable knowledge work through long-horizon, multi-turn tasks. Key takeaways: ➤ #3 overall, behind only Claude Fable 5 (1783) and Claude Opus 4.8 (1615), and level with GPT-5.5 (xhigh, 1509) ➤ The leading open weights model by a wide margin: the next open model, MiniMax-M3, scores 1408 ➤ Ahead of many proprietary models, including Google's Gemini 3.5 Flash (1357), Qwen 3.7 Max (1289), Muse Spark (1158) ➤ The tasks are agentic. GLM-5.2 averaged ~31 turns per task across 1,999 matches ➤ Consistent with the rest of its launch, GLM-5.2 also leads open weights on the Artificial Analysis Intelligence Index, ranks #3 on the Agentic Index, and #3 on AA-Briefcase

ArtificialAnlys's tweet photo. GLM-5.2 leads open weights models and sits at #3 overall on GDPval-AA, a real-world agentic work benchmark

GLM-5.2 from @Zai_org scores 1524 Elo on GDPval-AA, which measures performance on real-world, economically valuable knowledge work through long-horizon, multi-turn tasks.

Key takeaways:

➤ #3 overall, behind only Claude Fable 5 (1783) and Claude Opus 4.8 (1615), and level with GPT-5.5 (xhigh, 1509)

➤ The leading open weights model by a wide margin: the next open model, MiniMax-M3, scores 1408

➤ Ahead of many proprietary models, including Google's Gemini 3.5 Flash (1357), Qwen 3.7 Max (1289), Muse Spark (1158)

➤ The tasks are agentic. GLM-5.2 averaged ~31 turns per task across 1,999 matches

➤ Consistent with the rest of its launch, GLM-5.2 also leads open weights on the Artificial Analysis Intelligence Index, ranks #3 on the Agentic Index, and #3 on AA-Briefcase

29

786

97

142

280K

liviusa retweeted

OpenCode

@opencode

about 13 hours ago

we've added unique user rankings some models are token heavy so they skew upwards in rankings - unique people using the model is a more accurate ranking we'll orient more of our data around this metric

opencode's tweet photo. we've added unique user rankings

some models are token heavy so they skew upwards in rankings - unique people using the model is a more accurate ranking

we'll orient more of our data around this metric https://t.co/xshchcfGIc

54

1K

52

133

67K

Stefanescu Liviu

@liviusa

about 16 hours ago

Most AI coding agents derail on big tasks because their context fills up with garbage - errors, dead code, hallucinations. That's "context-rot." #LoopTroop fixes it locally and open-source. An LLM Council plans the work (multiple models draft & vote anonymously), atomic Beads execute in isolated git worktrees, and a Ralph Loop retries failures with a fresh context carrying the lesson forward. Every step is transparent from one pane. Go from a repo issue straight to a clean PR — across multiple projects at once — without ever leaving the app. https://t.co/v8k1xRfvP6 Free, local, open-source. MIT.

0

1

22

liviusa retweeted

Beff (e/acc)

@beffjezos

1 day ago

New Fable-equivalent from Sakana(!!!)

42

1K

64

537

233K

liviusa retweeted

Megatron

@Megatron_ron

1 day ago

🇺🇸🇮🇷🇵🇰 The feelings of shock and disappointment on the face of Pakistan's Prime Minister, after he was informed by the Iranian delegation they are leaving after Trump threatened them.

2K

74K

13K

7K

5M

liviusa retweeted

Chubby♨️

@kimmonismus

2 days ago

Even the Vercel CEO is impressed/shocked at how good GLM-5.2 in coding is. open source, open weights.

30

931

37

108

93K

liviusa retweeted

RWR 🥂 🇬🇧 🇸🇪 @PunishedRWR

4 days ago

This photo is still incredible

104

23K

715

783

554K

liviusa retweeted

MatLab crashes

@memecrashes

3 days ago

aren't you concerned that the left window is going at like 270km/h and the right window only at like 50km/h?

1K

346K

14K

24M

liviusa retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

3 days ago

9 years later, none of the "Attention Is All You Need" paper authors are at Google. Ashish Vaswani - cofounder of Essential AI (recently exited) Noam Shazeer - just moved to OpenAI Niki Parmar - at Anthropic Jakob Uszkoreit - cofounder of Inceptive Llion Jones - cofounder of Sakana AI Aidan Gomez - cofounder of Cohere Lukasz Kaiser - at OpenAI Illia Polosukhin - cofounder of NEAR Protocol

iScienceLuvr's tweet photo. 9 years later, none of the "Attention Is All You Need" paper authors are at Google.

Ashish Vaswani - cofounder of Essential AI (recently exited)

Noam Shazeer - just moved to OpenAI

Niki Parmar - at Anthropic

Jakob Uszkoreit - cofounder of Inceptive

Llion Jones - cofounder of Sakana AI

Aidan Gomez - cofounder of Cohere

Lukasz Kaiser - at OpenAI

Illia Polosukhin - cofounder of NEAR Protocol

31

674

89

255

96K

liviusa retweeted

Jay

@jayair

4 days ago

OpenCode was officially launched a year ago on June 19 2025

88

2K

61

47

95K

liviusa retweeted

banteg

@banteg

4 days ago

good luck paying $20k only to find out you can generate 20 tok/s. even running 24/7, that's just 50m tokens/month. for glm, at $4.40/m, this is $228 in value. any $200 sub gives you significantly more. and this math means the break-even is 7.3 years not 6 moths. by that time the hardware would die if it's running 24/7.

186

2K

99

635

344K

Stefanescu Liviu

@liviusa

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users