Rigario @Rigario - Twitter Profile

Pinned Tweet

18 days ago

8/ Repo here: https://t.co/OQnJIkEiEr This is just a side project, but if you keep a Karpathy-style LLM wiki or agent-written markdown knowledge base, I would be curious whether this matches your workflow.

0

3

0

97

Rigario

@Rigario

1 day ago

@Teknium @theo This is the way! I was envisioning that skills would be classified into bundles users could choose their defaults. Another great optimization that will make hermes feel more professional

0

3

0

746

Rigario retweeted

Teknium 🪽

@Teknium

3 days ago

It's finally here. The official Hermes Desktop app. Available on all platforms.

261

3K

197

1K

1M

Rigario

@Rigario

5 days ago

@ollama @MiniMax_AI MVP sub as always. Deepseek launch was rough but so far so good on minimax. Testing has been good so far, instruction following and agentic use cases seem to be a decent. Strong upgrade to mm 2.7.

0

218

Who to follow

Sr Bless

@Sr__Bless

Perfect Assistant @evlvnfts | Entrepreneur | Assistance | @Perfect_Soccer_ Dont miss @TheCultofCults @GodHatesNFTees

5 days ago

@t_blom Companies usually treat context as stored knowledge, when agents need working state. Docs = “what do we know?” Workflow = “what happened, what matters etc.” Without that layer, every agent starts smart but organizationally dumb.

0

10

Rigario

@Rigario

5 days ago

@MiniMax_AI Looks like we might have a new workhorse king. Will definitely be testing a lot in hermes tonight and posting my findings.

0

1

0

2K

Rigario

@Rigario

7 days ago

@theo This is why I like agent benchmarks that expose cost per task, not only pass rate. In real workflows, the bill is not one shot. It is all the retries, dead-end edits, context reloads, and human review needed before the change is safe to ship.

0

351

Rigario

@Rigario

8 days ago

This is why I’m skeptical of long-context claims as a single number. In real agent loops, context has to recover the right files, preserve constraints, avoid repeating failed paths, and keep the target visible. If those degrade differently as length grows, real performance is impacted more by the decay curve, not just the window size.

0

52

Rigario

@Rigario

8 days ago

@Tur24Tur @XiaomiMiMo It is not tokens, it’s credits they have a scaling factor and everything takes credits. It’s a bit better now about 4-5x better in fact but the value actually isn’t quite there yet if you ask me. https://t.co/H1qZ88WlfD

Rigario

@Rigario

about 1 month ago

MiMo v2.5 pro is great but I'm afraid to report that their token plan value is still not resolved. I thought it was going fine but after I monitored their credit usage a bit more, its still ridiculous. I fed a similar audit task to GPT 5.4 and after two follow up questions, GPT 5.4 exhausted 20% or so of my 5h window. On the similar task, MiMo ate up ~35% (the other ~12% was from my prior testing) of my entire week's quota. I suspect they are still charging full credits for the cached prompts. Sad.

Rigario's tweet photo. MiMo v2.5 pro is great but I'm afraid to report that their token plan value is still not resolved. I thought it was going fine but after I monitored their credit usage a bit more, its still ridiculous.

I fed a similar audit task to GPT 5.4 and after two follow up questions, GPT 5.4 exhausted 20% or so of my 5h window.

On the similar task, MiMo ate up ~35% (the other ~12% was from my prior testing) of my entire week's quota. I suspect they are still charging full credits for the cached prompts. Sad.

2

0

1

529

0

213

Rigario

@Rigario

11 days ago

@mr_r0b0t @NousResearch @Teknium https://t.co/Iv9GMrz1FY I actually made a hermes skill for it with slight ergonomic improvement to teach agents how to browse + author.

Rigario

@Rigario

11 days ago

My agents made a Hermes Webwright skill today. Repo: https://t.co/KFKx5bYwkI Credit to Microsoft for Webwright. I did not build the CLI. This is a Hermes-native skill adapting the pattern for how my agents browse and extract.

1

4

0

4

2K

0

6

0

1

2K

Rigario

@Rigario

11 days ago

@sudoingX Given the speed Grok Build is shipping at, the team needs to be using Grok Build literally everywhere.

0

1

0

148

Rigario

@Rigario

11 days ago

@Teknium @NousResearch if this fits the direction of Hermes skills, feel free to use any of it. What I found interesting and encoded was the decision logic, not just adding another browser tool. WebWright: microsoft/Webwright: A simple SWE style browser agent framework that achieves SOTA results on long horizon web tasks.

0

2

0

185

Rigario

@Rigario

11 days ago

My agents made a Hermes Webwright skill today. Repo: https://t.co/KFKx5bYwkI Credit to Microsoft for Webwright. I did not build the CLI. This is a Hermes-native skill adapting the pattern for how my agents browse and extract.

1

4

0

4

2K

Rigario

@Rigario

11 days ago

The lesson was not "always automate the browser." It was the opposite. Use the lightest layer that gives enough reliability: normal browser → Playwright scratch → durable script Promote only when the path stabilizes, repeats, or needs evidence another agent can audit.

1

0

104

Rigario

@Rigario

12 days ago

I think the enterprise gap is between “workflow as SOP” and “workflow as operating state.” SOPs define the steps. But real work also needs to preserve what happened between runs: what changed, what failed, what exception path was taken, what evidence passed, and what should constrain the next execution. If that context stays in the human’s head, it is not really an agent workflow. It is automation with a human memory patch.

0

1

0

875

Rigario

@Rigario

14 days ago

@BeauJohnson89 Better context is usually just the starting state. But if there are fewer tool calls, we have real efficiency. We need to start tracking a metric associated with fewer wrong turns to really understand the impact these things have.

0

1

0

14

Rigario

@Rigario

14 days ago

@0xharrynguyen Exactly! Yet I've seen quite a few people get tricked the agents. Its actually not hard to put it into the core files to give you the receipts, but how it scales as the system grows with more automation is the tricky part that I'm exploring.

1

0

55

Rigario

@Rigario

14 days ago

After building with agents for quite some time now. I’m becoming less interested in whether an agent answer sounds impressive, and more interested in whether I can see enough proof to trust the run. Using Hermes has made this more obvious for me because the agent can have continuity: memory, tools, skills, scheduled runs, and persistent roles. Once you have that, the question shifts from "can it answer?" to "whats the workflow?" For anything serious, I want a small proof packet: 1. what I asked it to do 2. what context it used 3. what it changed or concluded 4. how it verified the result 5. what should carry into the next run If that proof is missing, I still doubt the output, even if it sounds right. To me this is where human-agent workflows get interesting. The agent runtime matters a lot, and Hermes is the base layer I’m building around. But the human still needs a clear trust boundary around each run: what happened, why it is safe, and what should persist. That is the part I’m going to keep testing in public.

Rigario's tweet photo. After building with agents for quite some time now. I’m becoming less interested in whether an agent answer sounds impressive, and more interested in whether I can see enough proof to trust the run.

Using Hermes has made this more obvious for me because the agent can have continuity: memory, tools, skills, scheduled runs, and persistent roles. Once you have that, the question shifts from "can it answer?" to "whats the workflow?"

For anything serious, I want a small proof packet:
1. what I asked it to do
2. what context it used
3. what it changed or concluded
4. how it verified the result
5. what should carry into the next run

If that proof is missing, I still doubt the output, even if it sounds right.

To me this is where human-agent workflows get interesting. The agent runtime matters a lot, and Hermes is the base layer I’m building around. But the human still needs a clear trust boundary around each run: what happened, why it is safe, and what should persist.

That is the part I’m going to keep testing in public.

1

0

49

Rigario

@Rigario

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users