Hugo Ángel🟢® (Self Certified Account)

@hangel

🇨🇴, Dad, EE, Happy, Friendly, Boundlessly imaginative, Open Minded, Tenacious, Learner. Co-Founder: COBOLpro (@COBOLagency) Blues: @RoryGallagher.

Medellin, Colombia

Joined April 2007

5K Following

500 Followers

10K Posts

hangel retweeted

Nav Toor

@heynavtoor

about 24 hours ago

stop asking Claude one question and thinking you understand the topic. you don't. Stanford proved a better way. it's called STORM. peer reviewed. 25% more organized output. open source. the trick: don't ask one question. ask five. from five different experts. >the practitioner: what do they know that academics miss? >the skeptic: what's the strongest counterargument? >the economist: who profits from the current narrative? >the historian: what pattern has played out before? >the academic: what does the evidence actually say? 4 prompts. 5 minutes. no software. no GitHub. just paste into Claude. single prompts give you what everyone already knows. STORM gives you what nobody else found. this article has all 4 prompts ready to copy. pick your hardest topic. paste prompt 1. you'll know more in 5 minutes than people who spent days reading.

434

10K

hangel retweeted

mousepotato

@iluciddreaming

2 days ago

NVIDIA 丟了一個只有 0.6B 的語音辨識模型。叫 Nemotron-3.5-ASR。支援 40 種以上語言，即時串流輸出。純 CPU 就能跑，不需要 GPU。速度是官方 Nemo runtime 的 2.5 倍，辨識結果卻完全一致。離線環境直接用，還能無縫整進你的 agent pipeline。語音這塊，本地 agent 又多了一個又小又快的選擇。

696

369K

hangel retweeted

SpaceX

@SpaceX

2 days ago

SpaceX has exercised the option to acquire @cursor_ai in an all-stock transaction with the goal of building the world’s most useful AI models. For the past few months, SpaceXAI has been jointly training a model with Cursor, which will be released in Cursor and Grok Build soon. We look forward to working closely with the Cursor team to advance our frontier AI capabilities

36K

25M

hangel retweeted

Nico

@nicos_ai

2 days ago

GOOGLE HA LIBERADO EN SILENCIO UNA IA QUE PREDICE PATRONES Ventas. Precios de mercado. Tráfico web. Demanda energética. Volatilidad cripto. Se llama TimesFM: → Entrenada con 100B de datos reales → Forecasting zero-shot, sin fine-tuning → Corre en local. 100% Gratis y Open Source. Enlace abajo👇

602

333K

Who to follow

POCIT

@pocintech

The leading media and career platform the underrepresented in tech. 👩🏾‍💻👨🏾‍💻 Resetting the algorithm. Get your next role: https://t.co/iJjerryf73

Bill Scott

@BillScottIII

We are the people our parents warned us about..... Jimmy Buffett

Lina Rengifo Calle

@linarengifoc

Connecting Startups & Corps via Open Innovation Engineer Educator | Curious Mindset | Runner Finding my path Books Wine & Art enthusiast

hangel retweeted

Sharbel

@sharbel

2 days ago

Someone built a free collection of production-grade engineering skills that teaches your AI coding agent to work exactly like a senior engineer. It's called agent-skills. 60,800+ stars on GitHub. You drop it into Claude Code, Codex, Cursor, or Gemini CLI. Here's what it does: → `/spec` forces the agent to define what to build before touching code. Spec before code. Every time. → `/plan` breaks the spec into small, atomic tasks. No giant PRs. No mystery diffs. → `/build` implements one slice at a time. Each task is test-driven and committed individually. → `/build auto` generates the plan and runs every task in a single approved pass. You approve once. It executes autonomously. Pauses on failures or risky steps. → `/test` proves the code works. Tests are treated as proof, not afterthought. → `/review` enforces code health before merge. A real quality gate, not a vibe check. → `/code-simplify` rewrites for clarity over cleverness. Kills the clever nonsense your agent wrote at 2am. → `/ship` runs the full production checklist. Faster is safer only when nothing is skipped. → Skills activate automatically based on context. Building an API triggers `api-and-interface-design`. Building UI triggers `frontend-ui-engineering`. No manual configuration. 100% Open Source. Github repo link: https://t.co/DUlNIoUr7u

sharbel's tweet photo. Someone built a free collection of production-grade engineering skills that teaches your AI coding agent to work exactly like a senior engineer.

It's called agent-skills. 60,800+ stars on GitHub.

You drop it into Claude Code, Codex, Cursor, or Gemini CLI.

Here's what it does:

→ `/spec` forces the agent to define what to build before touching code. Spec before code. Every time.
→ `/plan` breaks the spec into small, atomic tasks. No giant PRs. No mystery diffs.
→ `/build` implements one slice at a time. Each task is test-driven and committed individually.
→ `/build auto` generates the plan and runs every task in a single approved pass. You approve once. It executes autonomously. Pauses on failures or risky steps.
→ `/test` proves the code works. Tests are treated as proof, not afterthought.
→ `/review` enforces code health before merge. A real quality gate, not a vibe check.
→ `/code-simplify` rewrites for clarity over cleverness. Kills the clever nonsense your agent wrote at 2am.
→ `/ship` runs the full production checklist. Faster is safer only when nothing is skipped.
→ Skills activate automatically based on context. Building an API triggers `api-and-interface-design`. Building UI triggers `frontend-ui-engineering`. No manual configuration.

100% Open Source.

Github repo link: https://t.co/DUlNIoUr7u

768

136

36K

hangel retweeted

Jaynit

@jaynitx

3 days ago

Elon Musk explains his 5-step algorithm for solving any problem: "The most common mistake of smart engineers is to optimize a thing that should not exist." "I have this very basic first principles algorithm that I run as a mantra." Elon breaks it down: Step 1: Question the requirements. "Make the requirements less dumb. The requirements are always dumb to some degree, no matter how smart the person who gave you those requirements. You have to start there, because otherwise you could get the perfect answer to the wrong question." Step 2: Try to delete it. "Try to delete the part or the process step entirely. If you're not forced to put back at least 10% of what you delete, you're not deleting enough. Most people feel like they've succeeded if they haven't been forced to put things back in. But actually they haven't, they've been overly conservative and left things in that shouldn't be there." Step 3: Optimize or simplify. "The most common mistake of smart engineers is to optimize a thing that should not exist. So you don't optimize until after you've tried to delete." Step 4: Speed it up. "Any given thing can be done faster than you think. But you shouldn't speed things up until you've tried to delete it and optimize it otherwise, you're speeding up something that shouldn't exist." Step 5: Automate. "And then the fifth thing is to automate it." Elon explains why the order matters: "I've gone backwards so many times where I've automated something, sped it up, simplified it, and then deleted it. I got tired of doing that. So that's why I have this mantra."

hangel retweeted

Akshay 🚀

@akshay_pachaar

2 days ago

HarnessX: a harness that compiles itself. every harness improvement so far has come from a human editing code by hand. Anthropic strips planning steps out of Claude Code when a stronger model ships. Manus rebuilt its agent five times in six months, removing complexity each round. the craft runs on human judgment about what to change and when. HarnessX is what happens when a system makes those edits itself. the trick is to treat the harness as a first-class object, the way we already treat model weights. once it's a typed, editable artifact, it can be optimized from its own execution traces. the framing they use is an operational mirror. evolving a harness maps cleanly onto reinforcement learning. the harness is the state. an edit is the action. the trace plus a score is the feedback. a new version is the update. once you see it that way, the failure modes come for free. reward hacking, catastrophic forgetting, under-exploration. the same problems that break model training show up when a system edits its own scaffolding. so edits never ship blind. each round, a loop reads the traces, plans a change, writes the edit, then critiques it. a gate keeps the new version only if it beats the current one on tasks it hasn't seen. what makes this safe is the structure underneath. the harness is built from typed components the system can swap without breaking the rest. that is what compiles really means here. every candidate harness is type-checked before it runs. here is the result that matters. the weakest model improved the most. the strongest barely moved. an evolved harness closes the gaps a weak model cannot fix on its own. the weights never changed. the environment around them got smarter. this is the natural next phase of harness engineering. we moved from weights, to context, to hand-built harnesses. the harness was the last piece we still tuned by hand. i wrote a deep dive on agent harness engineering a while back, covering the orchestration loop, tools, memory, context management, and everything that turns a stateless LLM into a capable agent. the article is below. paper: HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry: https://t.co/L0GeUKCgef

akshay_pachaar's tweet photo. HarnessX: a harness that compiles itself.

every harness improvement so far has come from a human editing code by hand.

Anthropic strips planning steps out of Claude Code when a stronger model ships. Manus rebuilt its agent five times in six months, removing complexity each round.

the craft runs on human judgment about what to change and when. HarnessX is what happens when a system makes those edits itself.

the trick is to treat the harness as a first-class object, the way we already treat model weights.

once it's a typed, editable artifact, it can be optimized from its own execution traces.

the framing they use is an operational mirror. evolving a harness maps cleanly onto reinforcement learning.

the harness is the state. an edit is the action. the trace plus a score is the feedback. a new version is the update.

once you see it that way, the failure modes come for free. reward hacking, catastrophic forgetting, under-exploration.

the same problems that break model training show up when a system edits its own scaffolding.

so edits never ship blind. each round, a loop reads the traces, plans a change, writes the edit, then critiques it.

a gate keeps the new version only if it beats the current one on tasks it hasn't seen.

what makes this safe is the structure underneath. the harness is built from typed components the system can swap without breaking the rest.

that is what compiles really means here. every candidate harness is type-checked before it runs.

here is the result that matters. the weakest model improved the most. the strongest barely moved.

an evolved harness closes the gaps a weak model cannot fix on its own. the weights never changed. the environment around them got smarter.

this is the natural next phase of harness engineering. we moved from weights, to context, to hand-built harnesses.

the harness was the last piece we still tuned by hand.

i wrote a deep dive on agent harness engineering a while back, covering the orchestration loop, tools, memory, context management, and everything that turns a stateless LLM into a capable agent. the article is below.

paper: HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry: https://t.co/L0GeUKCgef

774

124

104K

hangel retweeted

Yarchi

@undefinedKi

3 days ago

ANTHROPIC JUST QUIETLY SHIPPED A FEATURE THAT LETS CLAUDE SPAWN A WHOLE TEAM OF AGENTS THAT MESSAGE EACH OTHER AND REVIEW EACH OTHER'S WORK. It's a Claude Code feature called agent teams. The team lead spawns multiple agents that share a task list and message each other directly, not subagents reporting back, actual peers. In the demo a QA agent caught three bugs, sent the work back to the front-end and back-end devs, they fixed it, app shipped in one pass. How to run it: 1. Enable it. Needs Claude Code v2.1.32+. Add to settings.json: "env": { "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1" }. Or paste that to Claude and say "add this to my settings." Restart. 2. Prompt in plain English. Start with a goal (agents wake with zero context), then "create a team of 3 using Sonnet," describe each role, its deliverable, and who it messages when done. 3. The rules: each agent owns its own files, define exact outputs, name who talks to who, keep it to 3-5 agents. Use it for complex work with separate parts running in parallel. Skip it for simple or sequential tasks, teams cost 3-4x the tokens. Bookmark this.

261

445K

hangel retweeted

Daily Dose of Data Science

@DailyDoseOfDS_

5 days ago

Claude Code fully dissected! Researchers from UCL reverse-engineered the leaked Claude source. What they found changes how you should think about agent design. Only 1.6% of the codebase is AI decision logic. The other 98.4% is operational infrastructure. Permission gates, tool routing, context compaction, recovery logic, session persistence. The model reasons. The harness does everything else. This is the opposite of what most agent frameworks do today. LangGraph routes model outputs through explicit state machines. Devin bolts heavy planners onto operational scaffolding. Claude Code gives the model maximum decision latitude inside a rich deterministic harness, and invests all its engineering effort in that harness. The core loop is a simple while-true. Call model, run tools, repeat. But the systems around that loop are where the real design lives: A permission system with 7 modes and an ML classifier. Users approve 93% of prompts anyway, so the architecture compensates with automated layers instead of adding more warnings. A 5-layer context compaction pipeline. Each layer runs only when cheaper ones fail. Budget reduction, snip, microcompact, context collapse, auto-compact. Four extension mechanisms ordered by context cost. Hooks (zero), skills (low), plugins (medium), MCP (high). Each answers a different integration problem. Subagents return only summary text to the parent. Their full transcripts live in sidechain files. Agent teams still cost roughly 7x the tokens of a standard session. Resume does not restore session-scoped permissions. Trust is re-established every session. That friction is the point. The bet behind all of this is simple. As frontier models converge on raw coding ability, the quality of the harness becomes the differentiator, not the model. Paper: Dive into Claude Code (arXiv:2604.14228) We've shared an article on Agent Harness and what every big company is building. Read it below.

DailyDoseOfDS_'s tweet photo. Claude Code fully dissected!

Researchers from UCL reverse-engineered the leaked Claude source. What they found changes how you should think about agent design.

Only 1.6% of the codebase is AI decision logic.

The other 98.4% is operational infrastructure. Permission gates, tool routing, context compaction, recovery logic, session persistence. The model reasons. The harness does everything else.

This is the opposite of what most agent frameworks do today.

LangGraph routes model outputs through explicit state machines. Devin bolts heavy planners onto operational scaffolding. Claude Code gives the model maximum decision latitude inside a rich deterministic harness, and invests all its engineering effort in that harness.

The core loop is a simple while-true. Call model, run tools, repeat.

But the systems around that loop are where the real design lives:

A permission system with 7 modes and an ML classifier. Users approve 93% of prompts anyway, so the architecture compensates with automated layers instead of adding more warnings.

A 5-layer context compaction pipeline. Each layer runs only when cheaper ones fail. Budget reduction, snip, microcompact, context collapse, auto-compact.

Four extension mechanisms ordered by context cost. Hooks (zero), skills (low), plugins (medium), MCP (high). Each answers a different integration problem.

Subagents return only summary text to the parent. Their full transcripts live in sidechain files. Agent teams still cost roughly 7x the tokens of a standard session.

Resume does not restore session-scoped permissions. Trust is re-established every session. That friction is the point.

The bet behind all of this is simple. As frontier models converge on raw coding ability, the quality of the harness becomes the differentiator, not the model.

Paper: Dive into Claude Code (arXiv:2604.14228)

We've shared an article on Agent Harness and what every big company is building.

Read it below.

300

219K

hangel retweeted

Avi Chawla

@_avichawla

5 days ago

Karpathy said something you'll regret ignoring: "Remove yourself as the bottleneck. Maximize your leverage. Put in very few tokens, and a huge amount of stuff happens on your behalf." Loop engineering is the exact thing that does that. In a hand-run session, the operator handles two things: - deciding what the agent runs next - and checking its output before the next step Both are manual, and both decide how far the agent gets on its own without the operator. Loop engineering moves both steps into the system. A core operating structure surrounds the loop, and the diagram below depicts it. - A schedule decides what to run - Loop is the maker that produces the work - A separate checker agent grades the output - A file on disk holds the state they both read. The loop runs until either done, max iterations, or an exhausted budget. Here are some practical engineering considerations: 1) A model grading its own output justifies what it already did instead of catching where it failed. That's why a separate checker's findings return to the maker as the next instruction. And the cycle repeats until the checker finds nothing left to fix. 2) A loop with no stop condition burns tokens, and the cost climbs fast once sub-agents and long runs add up. That's why the exit must be set before the loop runs, not while it is running. A simple exit could be: ↳ fix only the major issues, run one final pass, and stop after two loops, with "all tests pass and lint clean" as the rule that ends it. 3) State has to live on disk, not in context. The model forgets everything between runs, so an MD file or a knowledge graph holds what is done and what is still open. Each run reads it and writes back to it, which lets a loop pick up again after days. 4) The lower the verification bar, the safer the loop. Boring, repetitive checks like a stale version string or a missing test are trivial to verify, so a loop runs them with little risk while the operator is away. Judgment-heavy work is loopable too, but only as far as the checker can confirm the result. Let's look at how an unattended loop fails in two ways. 1) It reports done when nothing is actually verified. The separate checker exists to prevent it, but it merges code faster than anyone reads it, so over weeks, the team stops understanding its own codebase while every check stays green. Green tests say the code passed the tests, not that anyone knows what shipped. Someone still has to read what the loop merges. 2) The checker keeps a running loop honest, but it only catches failures inside a run. The harness around the loop, like the prompts, tools, and checks wrapped around the model, still drifts and breaks in production as models change. That repair loop is usually run by hand based on observability traces. My co-founder wrote a detailed walkthrough (with code) on making that harness repair itself, where a failing trace gets diagnosed, the fix is verified against the exact input that failed, and the failure is locked as a regression test so it cannot recur. Read it below.

548

649K

hangel retweeted

Nanjing University

@NJU1902

19 days ago

#NJU research team, led by Professor Wang Xinran and Associate Professor Qiu Hao from the College of Integrated Circuits, in collaboration with Suzhou National Laboratory and Huawei Technologies Co., Ltd., has successfully developed the Mengqi-1000: the world's first molybdenum disulfide-based multi-bit parallel microprocessor. Mengqi-1000's transistor integration density sets a new record among emerging non-silicon digital circuits. This achievement marks that China's research on two-dimensional semiconductors has entered a new stage of integration with industrial production lines. The achievement was published in Nature Electronics @NatureElectron on May 26, 2026: https://t.co/25c3vEL6ic #NJUresearch

NJU1902's tweet photo. #NJU research team, led by Professor Wang Xinran and Associate Professor Qiu Hao from the College of Integrated Circuits, in collaboration with Suzhou National Laboratory and Huawei Technologies Co., Ltd., has successfully developed the Mengqi-1000: the world's first molybdenum disulfide-based multi-bit parallel microprocessor.

Mengqi-1000's transistor integration density sets a new record among emerging non-silicon digital circuits. This achievement marks that China's research on two-dimensional semiconductors has entered a new stage of integration with industrial production lines.

The achievement was published in Nature Electronics @NatureElectron on May 26, 2026: https://t.co/25c3vEL6ic

#NJUresearch

203

hangel retweeted

Miles Deutscher

@milesdeutscher

7 days ago

Anthropic just literally spoon-fed you how to use Fable properly. 99% of Claude users missed it. The way you need to prompt Fable is fundamentally different from all other AI models. I translated their entire new Fable prompting handbook:

milesdeutscher's tweet photo. Anthropic just literally spoon-fed you how to use Fable properly.

99% of Claude users missed it.

The way you need to prompt Fable is fundamentally different from all other AI models.

I translated their entire new Fable prompting handbook: https://t.co/CnyrnOEWrN

328

702K

hangel retweeted

Claude

@claudeai

9 days ago

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

105K

15K

22K

56M

hangel retweeted

How To Prompt

@HowToPrompt__

9 days ago

Stanford + Meta just dropped the paper that flips everything about AI agents. It's called "Code as Agent Harness." Right now, we treat large language models as text generators. When they need to solve a complex problem, they rely on a "chain of thought." But natural language is slippery. It's vague. It loses context. When an agent hallucinates in English, it just keeps talking. So they introduced a framework that changes the entire architecture of autonomy: "Code as Agent Harness." They stopped asking the AI to reason in words, and forced it to reason in code. Code isn't just the final output anymore. It is the memory. It is the environment. It is the boundary. Instead of writing a paragraph about how to solve a problem, the agent writes a script, executes it, and reads the output. Tests become its senses. Execution logs become its memory. Sandboxes become its physics. If an agent makes a mistake in English, it apologizes and hallucinates again. If an agent makes a mistake in code, the compiler throws an error. The trace tells it exactly what broke. The system forces it to fix it. This is where prompt engineering dies, and systems engineering takes over. The paper proves that reliability doesn't come from a smarter base model. It comes from the "harness" wrapped around it: - The model proposes. - The harness executes. - The environment returns feedback. - The verifier checks.

HowToPrompt__'s tweet photo. Stanford + Meta just dropped the paper that flips everything about AI agents.

It's called "Code as Agent Harness."

Right now, we treat large language models as text generators. When they need to solve a complex problem, they rely on a "chain of thought."

But natural language is slippery. It's vague. It loses context. When an agent hallucinates in English, it just keeps talking.

So they introduced a framework that changes the entire architecture of autonomy: "Code as Agent Harness."

They stopped asking the AI to reason in words, and forced it to reason in code.

Code isn't just the final output anymore. It is the memory. It is the environment. It is the boundary.

Instead of writing a paragraph about how to solve a problem, the agent writes a script, executes it, and reads the output.

Tests become its senses. Execution logs become its memory. Sandboxes become its physics.

If an agent makes a mistake in English, it apologizes and hallucinates again.

If an agent makes a mistake in code, the compiler throws an error. The trace tells it exactly what broke. The system forces it to fix it.

This is where prompt engineering dies, and systems engineering takes over.

The paper proves that reliability doesn't come from a smarter base model. It comes from the "harness" wrapped around it:

- The model proposes.
- The harness executes.
- The environment returns feedback.
- The verifier checks.

192

75K

hangel retweeted

Shann³

@shannholmberg

10 days ago

what is agent looping for the last two years we prompted agents one task at a time. that is starting to change instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up at its simplest, looping is one agent working on itself: > researches > drafts > checks the draft against a goal > fixes what is weak > runs that cycle again until the work clears the requirements you are not prompting each step anymore. the agent repeats the cycle for you the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end you create a goal, and the system runs the loop until it finishes within the reqs you set open and closed looping: OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine CLOSED LOOPING is bounded. a human designs the end-to-end path first: > clear goal > defined steps > an eval at each step > a point where it stops or hands back to you (and feeds back performance data) the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight. for most marketing work, closed is the one that pays off today. > the orchestrator owns the goal > the specialists own the steps > the subagents do the narrow work > an eval gate make sure its not slop

shannholmberg's tweet photo. what is agent looping

for the last two years we prompted agents one task at a time. that is starting to change

instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met

looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up

at its simplest, looping is one agent working on itself:

> researches
> drafts
> checks the draft against a goal
> fixes what is weak
> runs that cycle again until the work clears the requirements

you are not prompting each step anymore. the agent repeats the cycle for you

the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents

the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met

one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end

you create a goal, and the system runs the loop until it finishes within the reqs you set

open and closed looping:

OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out

this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time

the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine

CLOSED LOOPING is bounded. a human designs the end-to-end path first:

> clear goal
> defined steps
> an eval at each step
> a point where it stops or hands back to you (and feeds back performance data)

the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight.

for most marketing work, closed is the one that pays off today.

> the orchestrator owns the goal
> the specialists own the steps
> the subagents do the narrow work
> an eval gate make sure its not slop

200

698

10K

743K

hangel retweeted

Chrome

@0xchromium

12 days ago

Andrej Karpathy spent 2h showing how he actually uses AI day to day he's a co-founder of OpenAI and led AI at Tesla, so when he shows how he works, it’s worth watching and the whole session is just him telling the machine what he wants in simple terms, like he's briefing a coworker watch what's actually happening the entire time: > he describes the task in normal words > it goes off and does the work > he glances at the result and nudges it with one more sentence that's the whole skill, and you've had it since you learned to talk the only gap between that and a worker that runs on its own is handing that sentence a schedule and the tools to act check his work, then build the version that keeps working when you stop

130

11K

30K

hangel retweeted

Rahul

@sairahul1

11 days ago

This is the best site on the internet to learn harness engineering. Free. Completely. Most AI engineers have never heard the term. https://t.co/bwDbTTYsjM Bookmark this site. Then read this setup ↓

sairahul1's tweet photo. This is the best site on the internet to learn harness engineering.

Free. Completely.

Most AI engineers have never heard the term.

https://t.co/bwDbTTYsjM

Bookmark this site.

Then read this setup ↓ https://t.co/ddEP0XowXM

444

445K

hangel retweeted

鸟哥 | 蓝鸟会🕊️

@NFTCPS

13 days ago

家人们，今天给你们扒10个GitHub上免费到离谱的仓库，每一个都能干掉你正在按月交钱的软件，看完别说鸟哥没提醒你。 1️⃣ TradingAgents 一整支AI分析师团队在你电脑里吵架做单。基本面、情绪、新闻、技术分析师同时开工，最后风险经理和执行代理拍板。等于把一支华尔街团队7×24小时塞进你笔记本，还不要工资。 🔗 https://t.co/kwYRkz5PuO 2️⃣ LibreChat 一个界面把ChatGPT、Claude、Gemini、DeepSeek等20多个模型全收了，支持自托管和原生MCP。数据是你的，基础设施是你的。能自己跑，凭啥还交月费？ 🔗 https://t.co/V3iCwASOCA 3️⃣ HyperFrames HeyGen把自家视频引擎开源了。写个HTML就能产出生产级MP4，原生支持GSAP、Lottie、Three.js，同样输入永远同样输出。 🔗 https://t.co/6Gp8NAb38O 4️⃣ Fincept Terminal 笔记本上跑的彭博终端平替，AI投资代理灵感来自传奇投资人。深度财报分析、市场情报、几十个数据源，企业级的钱一分不用掏。 🔗 https://t.co/e0Fgr7sK1S 5️⃣ MoneyPrinterTurbo 丢个关键词，脚本、画面、字幕、音乐、成片一条龙。横屏竖屏随便选，基本不用动手剪。 🔗 https://t.co/yFPFib4nBr 6️⃣ Agentic Inbox Cloudflare开源的AI邮件客户端，AI帮你读收件箱、起草回复，全程数据不出你的地盘。没有外部服务器，没有订阅费。 🔗 https://t.co/tbEaZSfis8 7️⃣ VoxCPM 几秒音频就能克隆声音，几十种语言高质量输出，还能凭一句文字描述捏出定制嗓音。 🔗 https://t.co/SlEimVxLez 8️⃣ Flowsint 输个域名，关联的IP、子域名、邮箱、加密钱包、社交账号全给你画成图。纯本地跑，做私密调查和OSINT情报的神器。 🔗 https://t.co/5UguHbZpwx 9️⃣ agent-skills 谷歌工程师Addy Osmani放出了他生产环境验证过的Claude Code技能库，API设计、调试、代码审查、CI/CD、前端工程全覆盖。 🔗 https://t.co/52Ef6XqvAz 🔟 Nango 企业每年砸几千刀的集成层，几百个预置API、托管OAuth、AI生成集成代码，一堆高速创业公司和企业团队都在用。 🔗 https://t.co/tlv7hGZZs1 这些可不是练手的玩具项目，每一个都能替掉你还在月月付费的软件。挑一个，装上，塞进你的工作流。100%免费，100%开源。省下的钱，记得请鸟哥喝杯咖啡。

NFTCPS's tweet photo. 家人们，今天给你们扒10个GitHub上免费到离谱的仓库，每一个都能干掉你正在按月交钱的软件，看完别说鸟哥没提醒你。

1️⃣ TradingAgents
一整支AI分析师团队在你电脑里吵架做单。基本面、情绪、新闻、技术分析师同时开工，最后风险经理和执行代理拍板。等于把一支华尔街团队7×24小时塞进你笔记本，还不要工资。
🔗 https://t.co/kwYRkz5PuO

2️⃣ LibreChat
一个界面把ChatGPT、Claude、Gemini、DeepSeek等20多个模型全收了，支持自托管和原生MCP。数据是你的，基础设施是你的。能自己跑，凭啥还交月费？
🔗 https://t.co/V3iCwASOCA

3️⃣ HyperFrames
HeyGen把自家视频引擎开源了。写个HTML就能产出生产级MP4，原生支持GSAP、Lottie、Three.js，同样输入永远同样输出。
🔗 https://t.co/6Gp8NAb38O

4️⃣ Fincept Terminal
笔记本上跑的彭博终端平替，AI投资代理灵感来自传奇投资人。深度财报分析、市场情报、几十个数据源，企业级的钱一分不用掏。
🔗 https://t.co/e0Fgr7sK1S

5️⃣ MoneyPrinterTurbo
丢个关键词，脚本、画面、字幕、音乐、成片一条龙。横屏竖屏随便选，基本不用动手剪。
🔗 https://t.co/yFPFib4nBr

6️⃣ Agentic Inbox
Cloudflare开源的AI邮件客户端，AI帮你读收件箱、起草回复，全程数据不出你的地盘。没有外部服务器，没有订阅费。
🔗 https://t.co/tbEaZSfis8

7️⃣ VoxCPM
几秒音频就能克隆声音，几十种语言高质量输出，还能凭一句文字描述捏出定制嗓音。
🔗 https://t.co/SlEimVxLez

8️⃣ Flowsint
输个域名，关联的IP、子域名、邮箱、加密钱包、社交账号全给你画成图。纯本地跑，做私密调查和OSINT情报的神器。
🔗 https://t.co/5UguHbZpwx

9️⃣ agent-skills
谷歌工程师Addy Osmani放出了他生产环境验证过的Claude Code技能库，API设计、调试、代码审查、CI/CD、前端工程全覆盖。
🔗 https://t.co/52Ef6XqvAz

🔟 Nango
企业每年砸几千刀的集成层，几百个预置API、托管OAuth、AI生成集成代码，一堆高速创业公司和企业团队都在用。
🔗 https://t.co/tlv7hGZZs1

这些可不是练手的玩具项目，每一个都能替掉你还在月月付费的软件。

挑一个，装上，塞进你的工作流。100%免费，100%开源。

省下的钱，记得请鸟哥喝杯咖啡。

674

216K

hangel retweeted

Avi Chawla

@_avichawla

14 days ago

Anthropic's in trouble, again! They spent years building what's now fully open-source. What made Claude feel different from a normal app is that the agent could act inside the interface instead of only talking in a chat box. For instance, Claude Artifacts let an agent render real UI, charts, dashboards, and interactive components that assemble live inside the response. Every major AI product tried to replicate it. But the problem was that unlike reasoning, planning, tool-calling, etc., none of it shipped natively with LangGraph, CrewAI, or Google ADK. So teams started building an owned version that required engineering the entire interface layer from scratch. Most teams, however, just settled for shipping the agent as a backend API in a chat box since rendering the UI is only one piece of it. To actually make it work, the interface layer also needed real-time streaming, state kept in sync between agent and UI, conversations that persist across sessions, and reconnection when a user refreshes mid-run. @CopilotKit is now the only open-source framework that actually lets you build your own full-stack Claude-like apps. It decouples the agent from the interface, talking over AG-UI (an open protocol for agent-to-user communication). Being a standard protocol, the frontend never needs to know whether it is talking to a LangGraph or a CrewAI agent. You can change the backend anytime and the UI will never notice. In practice, CopilotKit's interface layer gives several pre-implemented React building blocks that wire the agent directly into the app, like: - generative UI, so the agent renders real components instead of text - chat windows, sidebars, and popups, or a fully headless setup - shared state, so the agent and app stay in sync - human-in-the-loop approvals, where the agent waits before acting - persistent threads that store the whole session, including the agent-user interactions and generated UI, not just text And because that full history is captured, those interactions can feed a self-learning layer that also improves the agent from real usage over time. The interface layer that Anthropic spent years engineering in-house is now literally available to any developer/team. CopilotKit is open-source with 30k+ GitHub stars, and AG-UI, the protocol underneath, is already supported across every major agent framework: LangGraph, CrewAI, Mastra, Google ADK, and more. CopilotKit GitHub repo → https://t.co/wkQ1taF0rM (don't forget to star it ⭐ ) If you want to go deeper, I found a detailed breakdown by Shubham Saboo recently on the three Generative UI patterns, with implementation. Read it below.