wxy @Andrew_WXY - Twitter Profile

Pinned Tweet

https://t.co/JTVCMyUQ5W Taught by @ylecun @alfcnz https://t.co/SApnrseP2m https://t.co/m1KVNs6E9h https://t.co/dWgsq7UkLD https://t.co/85FAbG9CNS

Martin Ford

@MFordFuture

over 5 years ago

Yann LeCun’s #DeepLearning Course Is Now Free & Fully Online. https://t.co/YA03CzYNPr via @GoogleNews

1

392

102

128

0

1

13

1

0

Andrew_WXY retweeted

Teknium 🪽

@Teknium

2 days ago

Had Hermes Agent with the Manim Video skill plus it's TTS tool create a video explaining Hermes' Agent.

75

2K

139

1K

198K

Andrew_WXY retweeted

Luke The Dev

@iamlukethedev

2 days ago

Seeing a lot of people struggling to enable Telegram Rich Messages in Hermes @NousResearch  Here’s the fastest way to turn it on:  Tell your agent:  1️⃣ Update yourself 2️⃣ Enable rich messages in my config: rich_messages: true  3️⃣ Restart the gateway  4️⃣ Send a test message  Then test it with this prompt: Let’s test now Summarize this as a Telegram rich table with columns: Task, Owner, Status. Give me a checklist for the deployment, using completed and incomplete task boxes. Format this as: - heading - short summary - table - checklist - collapsible details section for risks ## Supported useful formats Use normal Markdown-style syntax: ## Sprint Status | Item | Owner | Status | |---|---|---| | Driver App release | Alex | ✅ Done | | Portal QA | Sam | In progress | | Route optimizer | Luke | Blocked | - [x] Review PR - [ ] Run staging smoke test - [ ] Send release note <details> <summary>Risks</summary> - QA may slip if staging data is stale. - Route optimizer dependency needs confirmation. </details>

75

1K

95

2K

127K

Andrew_WXY retweeted

Avi Chawla

@_avichawla

3 days ago

Karpathy said something you'll regret ignoring: "Remove yourself as the bottleneck. Maximize your leverage. Put in very few tokens, and a huge amount of stuff happens on your behalf." Loop engineering is the exact thing that does that. In a hand-run session, the operator handles two things: - deciding what the agent runs next - and checking its output before the next step Both are manual, and both decide how far the agent gets on its own without the operator. Loop engineering moves both steps into the system. A core operating structure surrounds the loop, and the diagram below depicts it. - A schedule decides what to run - Loop is the maker that produces the work - A separate checker agent grades the output - A file on disk holds the state they both read. The loop runs until either done, max iterations, or an exhausted budget. Here are some practical engineering considerations: 1) A model grading its own output justifies what it already did instead of catching where it failed. That's why a separate checker's findings return to the maker as the next instruction. And the cycle repeats until the checker finds nothing left to fix. 2) A loop with no stop condition burns tokens, and the cost climbs fast once sub-agents and long runs add up. That's why the exit must be set before the loop runs, not while it is running. A simple exit could be: ↳ fix only the major issues, run one final pass, and stop after two loops, with "all tests pass and lint clean" as the rule that ends it. 3) State has to live on disk, not in context. The model forgets everything between runs, so an MD file or a knowledge graph holds what is done and what is still open. Each run reads it and writes back to it, which lets a loop pick up again after days. 4) The lower the verification bar, the safer the loop. Boring, repetitive checks like a stale version string or a missing test are trivial to verify, so a loop runs them with little risk while the operator is away. Judgment-heavy work is loopable too, but only as far as the checker can confirm the result. Let's look at how an unattended loop fails in two ways. 1) It reports done when nothing is actually verified. The separate checker exists to prevent it, but it merges code faster than anyone reads it, so over weeks, the team stops understanding its own codebase while every check stays green. Green tests say the code passed the tests, not that anyone knows what shipped. Someone still has to read what the loop merges. 2) The checker keeps a running loop honest, but it only catches failures inside a run. The harness around the loop, like the prompts, tools, and checks wrapped around the model, still drifts and breaks in production as models change. That repair loop is usually run by hand based on observability traces. My co-founder wrote a detailed walkthrough (with code) on making that harness repair itself, where a failing trace gets diagnosed, the fix is verified against the exact input that failed, and the failure is locked as a regression test so it cannot recur. Read it below.

69

4K

540

8K

637K

Who to follow

Yves Hilpisch

@dyjh

The Python Quants | https://t.co/e80v6MaAf1 | #CPF Program | https://t.co/NFl9Lxgvz6 | #Python & #AI for #Finance

Antoine Lendrevie

@Sir_carma

Video game art director, previously at Happy Volcano and Brace Yourself Games.

Michael Herman

@MikeHerman

Engineer @MonitaurAI. Founder and author @TestDrivenio.

Andrew_WXY retweeted

Google Earth

@googleearth

4 days ago

Prepare for takeoff. ✈️ Flight simulator is now available globally on web to all users. https://t.co/hQP0No142P We've recently added many our most powerful professional desktop features to web. Elevation profiles, new import types, but there's always been one other feature you've been asking us to add to the web version of Google Earth, just for fun... Where will you fly? Share your best maneuvers, views, and flyovers with us!

459

32K

4K

21K

9M

Andrew_WXY retweeted

Brooklyn!

@imbabybrooklyn

about 2 months ago

Taught Hermes how to use pretext 😊 @_chenglou @NousResearch

51

2K

74

972

327K

Andrew_WXY retweeted

Ammaar Reshi

@ammaar

4 days ago

I asked Claude Fable 5 to reverse engineer a 1993 DOS game with no source code. It read the raw machine code, rewrote the engine in C, and gave me a fully editable port for every platform. 30 min from EXE to iPhone. Sharing it all so you can revive your own childhood games!

64

825

76

456

138K

Andrew_WXY retweeted

ElevenLabs Developers

@ElevenLabsDevs

6 days ago

Hermes can call you first

11

244

22

285

35K

Andrew_WXY retweeted

Akshay 🚀

@akshay_pachaar

7 days ago

about loop engineering. everyone's saying the same thing this week. you don't prompt agents anymore, you design loops that prompt them. here's the job that loop hands right back to you. a loop running unattended is also a loop failing unattended. loop engineering takes you off prompting. it takes you off curating context. it takes you off babysitting a single run. it does not take you off debugging. it just moves the debugging somewhere worse, into runs you were never watching, with far too much of it to read through by hand. even the loop engineering posts admit this themselves, usually somewhere near the end. you can only walk away from a loop if you trust the thing checking it. a checker you don't trust drops you right back into reading every output by hand, which is the exact work the loop was supposed to take off you. so stack the layers up, prompt, context, harness, loop, and one job survives all of them. closing the loop on failure. the leverage point moved. debugging stayed exactly where it was. i was writing about this exact gap yesterday, before the loop talk picked up today. the idea was simple. make debugging its own loop. a failure leads to a root cause, a proposed fix, a rerun against the exact inputs that broke, and a test that locks it out for good. the checker gets built from your real failures instead of guessed at up front. Opik, the tool i was writing about, does exactly this. a built-in agent reads the trace, finds the root cause, proposes a diff, you approve it, and that failure becomes a permanent regression test. every break you debug makes the loop a little harder to break next time, which is the kind of checker the loop engineering crowd keeps saying you need before you walk away. if you're designing loops you actually plan to walk away from, it's worth a look. Opik is 100% open-source under Apache-2.0 license. GitHub repo: https://t.co/MEC26owCdo (don't forget to star 🌟) loop engineering moved the leverage point. it didn't remove the engineer who still has to close the loop when something breaks. the full article, Your Agent Harness Should Repair Itself, is quoted below.

akshay_pachaar's tweet photo. about loop engineering.

everyone's saying the same thing this week. you don't prompt agents anymore, you design loops that prompt them.

here's the job that loop hands right back to you.

a loop running unattended is also a loop failing unattended.

loop engineering takes you off prompting. it takes you off curating context. it takes you off babysitting a single run. it does not take you off debugging. it just moves the debugging somewhere worse, into runs you were never watching, with far too much of it to read through by hand.

even the loop engineering posts admit this themselves, usually somewhere near the end. you can only walk away from a loop if you trust the thing checking it. a checker you don't trust drops you right back into reading every output by hand, which is the exact work the loop was supposed to take off you.

so stack the layers up, prompt, context, harness, loop, and one job survives all of them. closing the loop on failure. the leverage point moved. debugging stayed exactly where it was.

i was writing about this exact gap yesterday, before the loop talk picked up today. the idea was simple. make debugging its own loop. a failure leads to a root cause, a proposed fix, a rerun against the exact inputs that broke, and a test that locks it out for good. the checker gets built from your real failures instead of guessed at up front.

Opik, the tool i was writing about, does exactly this. a built-in agent reads the trace, finds the root cause, proposes a diff, you approve it, and that failure becomes a permanent regression test. every break you debug makes the loop a little harder to break next time, which is the kind of checker the loop engineering crowd keeps saying you need before you walk away.

if you're designing loops you actually plan to walk away from, it's worth a look.

Opik is 100% open-source under Apache-2.0 license.

GitHub repo: https://t.co/MEC26owCdo

(don't forget to star 🌟)

loop engineering moved the leverage point. it didn't remove the engineer who still has to close the loop when something breaks.

the full article, Your Agent Harness Should Repair Itself, is quoted below.

62

687

111

952

87K

Andrew_WXY retweeted

DailyPapers

@HuggingPapers

7 days ago

OmniGameArena is a real-time UE5 benchmark for VLM game agents Twelve new games span Solo, PvP, and Coop with one shared interface. The Improvement Dynamics Curve tracks how agents learn and improve across reflection rounds.

HuggingPapers's tweet photo. OmniGameArena is a real-time UE5 benchmark for VLM game agents

Twelve new games span Solo, PvP, and Coop with one shared interface.

The Improvement Dynamics Curve tracks how agents learn and improve across reflection rounds. https://t.co/4Geavq1qdX

1

32

4

10

2K

Andrew_WXY retweeted

Teknium 🪽

@Teknium

7 days ago

Happy to present that terminal tool calls will be in prettified markdown codeblocks now if you have display tool calls enabled in Hermes Agent. Works on all platforms that support it.

Teknium's tweet photo. Happy to present that terminal tool calls will be in prettified markdown codeblocks now if you have display tool calls enabled in Hermes Agent.

Works on all platforms that support it. https://t.co/8HJOVa5mcj

64

759

26

99

92K

Andrew_WXY retweeted

Virat Singh

@virattt

8 days ago

This is my favorite dataset we've built. Operational KPI data, unlike anything else on the market. It tells you: • load factor (airlines) • book-to-bill (semiconductors) • $100K customers (SaaS) This data isn't in XBRL nor in a 10-K/Q. Instead, it's buried in earnings releases, transcripts, IR websites, and news articles. We pull it out and standardize it so your agent can run clean comps across a sector. I use it so Claude Code doesn't burn tokens reading long earnings releases and articles. It's updated in ~15s, served in milliseconds.

4

63

6

82

11K

Andrew_WXY retweeted

ChatGPT

@ChatGPTapp

8 days ago

Turn data and comparisons into charts, directly in ChatGPT. Available now on mobile and web.

206

3K

215

588

246K

Andrew_WXY retweeted

Shann³

@shannholmberg

8 days ago

what is agent looping for the last two years we prompted agents one task at a time. that is starting to change instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up at its simplest, looping is one agent working on itself: > researches > drafts > checks the draft against a goal > fixes what is weak > runs that cycle again until the work clears the requirements you are not prompting each step anymore. the agent repeats the cycle for you the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end you create a goal, and the system runs the loop until it finishes within the reqs you set open and closed looping: OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine CLOSED LOOPING is bounded. a human designs the end-to-end path first: > clear goal > defined steps > an eval at each step > a point where it stops or hands back to you (and feeds back performance data) the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight. for most marketing work, closed is the one that pays off today. > the orchestrator owns the goal > the specialists own the steps > the subagents do the narrow work > an eval gate make sure its not slop

shannholmberg's tweet photo. what is agent looping

for the last two years we prompted agents one task at a time. that is starting to change

instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met

looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up

at its simplest, looping is one agent working on itself:

> researches
> drafts
> checks the draft against a goal
> fixes what is weak
> runs that cycle again until the work clears the requirements

you are not prompting each step anymore. the agent repeats the cycle for you

the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents

the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met

one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end

you create a goal, and the system runs the loop until it finishes within the reqs you set

open and closed looping:

OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out

this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time

the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine

CLOSED LOOPING is bounded. a human designs the end-to-end path first:

> clear goal
> defined steps
> an eval at each step
> a point where it stops or hands back to you (and feeds back performance data)

the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight.

for most marketing work, closed is the one that pays off today.

> the orchestrator owns the goal
> the specialists own the steps
> the subagents do the narrow work
> an eval gate make sure its not slop

200

6K

698

10K

741K

Andrew_WXY retweeted

bashbunni

@sudobunni

2 months ago

HE HAS RETURNED TO OPEN SOURCE DEVELOPMENT LET'S FREAKING GOOOOO https://t.co/y1jlQbJvsw Great person to follow if you're looking for project inspo. He's always building fun stuff.

14

781

42

680

45K

Andrew_WXY retweeted

Justine Moore

@venturetwins

11 days ago

Stumbled upon a Codex skill that creates cool illustrations to explain topics or tell stories. You feed it text (blog, article, narrative, even code) and it makes explainer graphics with this cute blob character. I gave it the repo for the X recommendation algo and got this 👇

venturetwins's tweet photo. Stumbled upon a Codex skill that creates cool illustrations to explain topics or tell stories.

You feed it text (blog, article, narrative, even code) and it makes explainer graphics with this cute blob character.

I gave it the repo for the X recommendation algo and got this 👇 https://t.co/ATB7A05YxA

69

3K

208

4K

200K

Andrew_WXY retweeted

Teknium 🪽

@Teknium

12 days ago

Resuming a session or doing `hermes -c` to reopen the most recent session will now relaunch it in the dir it was launched in originally

Teknium's tweet photo. Resuming a session or doing `hermes -c` to reopen the most recent session will now relaunch it in the dir it was launched in originally https://t.co/YKCNf1Yppf

19

227

28

58

187K

Andrew_WXY retweeted

Alessandro Frau

@alessandro_a0

13 days ago

Build your own durable LLM Wiki for free. No Docker volumes. No API keys. You don't even need to know what Obsidian is. Sign in with ChatGPT or X, and jump into Agent Zero.

5

28

64

86

63K

Andrew_WXY retweeted

Kiko Beats @Kikobeats

13 days ago

Cutting Docker build size by ~99.8% (1.87 GB → 2.5 MB) Just using https://t.co/3t0TQ3aP9Q 😌

45

2K

121

3K

329K

Andrew_WXY retweeted

dotta 📎

@dotta

29 days ago

You can now use Grok Build with Paperclip

34

163

11

37

13K

Andrew_WXY retweeted

Paperclip

@papercliping

17 days ago

📎 Paperclip 2025.529.0 is out now 💬 Document annotations & comments 🧠 Skills CLI & catalog 👁‍🗨 Hide projects & agents from your sidebar 🔑 First-admin claim flow 🤖 Live Claude model discovery https://t.co/xqrOzj9ROl

6

39

4

15

3K

wxy

@Andrew_WXY

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users