Filip Balucha @filbalu - Twitter Profile

filbalu retweeted

4 months ago

psa to saas founders: convert everything into api / mcp services asap and charge by usage. allow connectors to all agents and make your own agent with pre-built context all ui/ux/dashboards will be vibed and dynamically generated if you don’t have prop data, you’re fucked

83

1K

51

1K

239K

Filip Balucha

@filbalu

4 months ago

imagine if you could one-shot a coding agent and add it to your CI or call it from slack. check out @trywoz

0

1

0

205

Filip Balucha

@filbalu

4 months ago

@garrytan Coding agents also have a "memory palace", and compaction shatters it. OpenSpec can help so much with this though!

0

37

filbalu retweeted

Vinod Khosla

@vkhosla

4 months ago

Well well… ARC-AGI-2 (François Chollet’s “hardest” benchmark) is starting to smell like toast. 🍞🔥 @agenticasdk just set a new SOTA: 85.28% with an Agentica agent (~350 lines) that writes & runs code. Best part: it’s not ARC-specialized—it's a general system that’s strong across other benchmarks too. Details at https://t.co/JmVuJiUp83 What benchmark should we throw at it next?

19

293

31

183

54K

Filip Balucha

@filbalu

4 months ago

@rauchg The great thing about building for agents is that the feedback loop is so much tighter. Your user is there in the terminal, 24/7.

0

1

0

285

filbalu retweeted

Huaxiu Yao

@HuaxiuYaoML

4 months ago

Why do most LLM agents hit a wall? They don’t accumulate skills. Introducing SkillRL📚 — recursive skill-augmented reinforcement learning that lets agents learn skills from failure and evolve over time. 🔥A 7B model: • +41% over GPT-4o • ~20% fewer training tokens • 33% faster convergence SkillRL bridges raw experience → policy improvement by distilling trajectories into structured, co-evolving skills during RL. Most agents forget. SkillRL evolves. 🔄 📄 Paper: https://t.co/6VoxpGoPR6 💻 Code: https://t.co/qVDnIaci2K Great work @richardxp888, Jianwen Chen, Hanyang Wang, @JiaqiLiu835914, @lillianwei423, @AiYiyangZ, and nice collab. w/ @__YuWang__, @XujiangZhao, Haifeng Chen, Zeyu Zheng, @cihangxie.

HuaxiuYaoML's tweet photo. Why do most LLM agents hit a wall?
They don’t accumulate skills.

Introducing SkillRL📚 — recursive skill-augmented reinforcement learning that lets agents learn skills from failure and evolve over time.

🔥A 7B model:
• +41% over GPT-4o
• ~20% fewer training tokens
• 33% faster convergence

SkillRL bridges raw experience → policy improvement by distilling trajectories into structured, co-evolving skills during RL.

Most agents forget.
SkillRL evolves. 🔄

📄 Paper: https://t.co/6VoxpGoPR6
💻 Code: https://t.co/qVDnIaci2K

Great work @richardxp888, Jianwen Chen, Hanyang Wang, @JiaqiLiu835914, @lillianwei423, @AiYiyangZ, and nice collab. w/ @__YuWang__, @XujiangZhao, Haifeng Chen, Zeyu Zheng, @cihangxie.

51

981

187

1K

148K

filbalu retweeted

James Evans

@dazzeloid

4 months ago

Quarterly reminder that you do not need any of these things to apply to YC: - revenue - users - a good name - a good domain You only need the fire

0

9

1

575

Filip Balucha

@filbalu

4 months ago

using an agent to create an agent that will create agents

1

2

0

59

filbalu retweeted

Guillermo Rauch

@rauchg

5 months ago

The new engineering is building the agents that "take your job", but now do it at 100x the scale. Agents give developers horizontal scalability. The simple version of this is Ghostty splits and tabs, 𝚝𝚖𝚞𝚡 sessions and the like, running CLI agents in parallel. Skills and MCPs help you direct the behavior of these agents. Sandboxes give the ultimate leverage: ~infinite parallelism, run while you sleep, on PRs, when an incident is filed, a customer reports an issue… Automating the full product development loop is now your job, and your edge.

rauchg's tweet photo. The new engineering is building the agents that "take your job", but now do it at 100x the scale. Agents give developers horizontal scalability.

The simple version of this is Ghostty splits and tabs, 𝚝𝚖𝚞𝚡 sessions and the like, running CLI agents in parallel.

Skills and MCPs help you direct the behavior of these agents. Sandboxes give the ultimate leverage: ~infinite parallelism, run while you sleep, on PRs, when an incident is filed, a customer reports an issue…

Automating the full product development loop is now your job, and your edge.

106

2K

108

1K

175K

Filip Balucha

@filbalu

5 months ago

data centers drive ram demand globally, claude code locally

0

51

filbalu retweeted

Cole

@colderoshay

5 months ago

the holy trinity of agentic UI: - https://t.co/ymclHB0RDA from @elirousso - https://t.co/DZLnezoft4 from @Ibelick - https://t.co/xzdoVQzSd5 from @vercel

colderoshay's tweet photo. the holy trinity of agentic UI:

- https://t.co/ymclHB0RDA from @elirousso
- https://t.co/DZLnezoft4 from @Ibelick
- https://t.co/xzdoVQzSd5 from @vercel https://t.co/85CxIiFS85

96

5K

292

11K

720K

Filip Balucha

@filbalu

5 months ago

@greptile dumbing it down for me after 10h of vibe coding

0

1

0

12

filbalu retweeted

Vercel Developers

@vercel_dev

5 months ago

① Install the skill: $ npx add-skill vercel-labs/agent-skills ② Paste this prompt: Assess this repo against React best practices. Make a prioritized list of quick wins and top fixes. ③ Review and prompt to "make the fixes"

45

3K

234

4K

245K

filbalu retweeted

Alex Albert

@alexalbert__

5 months ago

I'm happy to share that we (@AnthropicAI) are investing $1.5 million in support of the Python Software Foundation and open source security. Python powers so much of the AI industry. Supporting the folks that make our work possible is an honor.

196

8K

420

411

692K

filbalu retweeted

Guillermo Rauch

@rauchg

5 months ago

We're encapsulating all our knowledge of @reactjs & @nextjs frontend optimization into a set of reusable skills for agents. This is a 10+ years of experience from the likes of @shuding, distilled for the benefit of every Ralph

rauchg's tweet photo. We're encapsulating all our knowledge of @reactjs & @nextjs frontend optimization into a set of reusable skills for agents. This is a 10+ years of experience from the likes of @shuding, distilled for the benefit of every Ralph https://t.co/2QrIl5xa5W

346

9K

605

8K

1M

filbalu retweeted

near

@nearcyan

over 1 year ago

i'm assembling a team

186

8K

452

776

963K

filbalu retweeted

Guillermo Rauch

@rauchg

5 months ago

There's an app I use regularly that's defective. I'm weighing in my mind whether I should send feedback and hope it gets attention from its creators, or re-build it with AI from scratch. This is a tiny app so it's plausible for me to do it. But if you're in the business of selling software, this is how your every customer is thinking now, or how they'll be thinking soon. Iteration velocity matters more than ever before. How quickly you fix, improve, and ship is your counter-signal.

108

1K

61

243

130K

filbalu retweeted

swyx

@swyx

5 months ago

so many ambitious startups making "the LLM OS" tried all these fancy UXes and failed so many ambitious startups making "the AI browser" tried to book your flights for you and failed meanwhile Claude Code started unpretentiously as a CLI and now can run your browser and operate your system. classic disruption theory

swyx's tweet photo. so many ambitious startups making "the LLM OS" tried all these fancy UXes and failed

so many ambitious startups making "the AI browser" tried to book your flights for you and failed

meanwhile Claude Code started unpretentiously as a CLI and now can run your browser and operate your system.

classic disruption theory

55

947

43

367

246K

filbalu retweeted

Felix Rieseberg

@felixrieseberg

5 months ago

Claude Code doesn't just resonate with developers anymore. Non-technical people are using it to build things. Technical people are using it for non-technical work. The line is blurring. I'm by far not the first to think about this. Multiple teams at Anthropic have been working on "agentic experiences" for months - Claude not just as a chat partner, but as something that helps you do real work. @bcherny nudged me: can we take what we've built internally and ship an early, scoped-down version in a few days? So we took a small team, set an aggressive deadline ("Monday sound good?"), and got to work. @claudeai wrote Cowork. Us humans meet in-person to discuss foundational architectural and product decisions, but all of us devs manage anywhere between 3 to 8 Claude instances implementing features, fixing bugs, or researching potential solutions. For native code, we use local Git worktrees on our local machines. For smaller or web-code only changes, we just tell Claude to go implement it. When someone reports a bug in Slack, we often just @-mention Claude and tell it to fix it. A human (and another Claude) reviews all code before it's merged, but we're now spending most of our time orchestrating a fleet of Claudes and making decisions than artisanally writing individual lines of code. We're releasing Cowork early. It has rough edges. But figuring out what to build is increasingly the hardest part of software engineering - and we think getting feedback early and hearing what users actually need is how we build something truly good.

87

2K

145

810

323K

filbalu retweeted

Lee Robinson

@leerob

6 months ago

Coding agents running in cloud sandboxes will be a big part of 2026. Kick off a task, close your computer/phone, enjoy your life, and come back in a few hours to review the work.

70

512

20

144

81K

Filip Balucha

@filbalu

Last Seen Users on Sotwe

Trends for you

Most Popular Users