JeffOnAI

Verified account

@GroksBrainn

United States

Joined April 2023

541 Following

134 Followers

1.1K Posts

Pinned Tweet

5 days ago

Everyone's stacking five AI tools. I deleted mine. My assistant (Hermes) ran on GPT-5.5. Couldn't move it to Claude, you can't point a third-party agent at a Max plan. So I rebuilt it native. It doesn't use Claude Code. It is Claude Code. One assistant. From my phone. ~$0/mo.

GroksBrainn's tweet photo. Everyone's stacking five AI tools.

I deleted mine.

My assistant (Hermes) ran on GPT-5.5. Couldn't move it to Claude, you can't point a third-party agent at a Max plan.

So I rebuilt it native. It doesn't use Claude Code. It is Claude Code.

One assistant. From my phone. ~$0/mo. https://t.co/kCphYsR7l6

1

0

0

0

78

about 5 hours ago

Most RAG will confidently answer even when it has nothing to back it up. Built rag-guard so it just won't. Can't ground the answer? It refuses — and doesn't even call the model, so you're not paying for a guess. Ran it on 20 cases. Refusals right 9 out of 10. Retrieval, perfect. Grounding held at 88%. The two it let through, the grounding check still flagged. Nothing junk slipped out. Honest part: that grounding check is still simple word-overlap, not a real entailment model. That's the next fix. And I didn't hand-write it. I drove a fleet of Claude agents to build it and gated every step. That's the part I actually care about. https://t.co/gwa0hCUJaT

GroksBrainn's tweet photo. Most RAG will confidently answer even when it has nothing to back it up.

Built rag-guard so it just won't. Can't ground the answer? It refuses — and doesn't even call the model, so you're not paying for a guess.

Ran it on 20 cases. Refusals right 9 out of 10. Retrieval, perfect. Grounding held at 88%. The two it let through, the grounding check still flagged. Nothing junk slipped out.

Honest part: that grounding check is still simple word-overlap, not a real entailment model. That's the next fix.

And I didn't hand-write it. I drove a fleet of Claude agents to build it and gated every step. That's the part I actually care about.

https://t.co/gwa0hCUJaT

0

0

0

0

10

about 5 hours ago

@petergyang There is and the gap is wide. These companies need to figure out his to either up time budget or increase token efficiency

0

0

0

0

109

about 5 hours ago

@BLUECOW009 4.8 is elite if you know how to use it.

4

7

0

0

1K

Who to follow

@LG4702856129077

about 5 hours ago

@mikeydsoftware I have agents for that

0

0

0

0

2

5 days ago

Honest part: it's not every feature the old stack had. I cut what I never used. What's left is everything I actually used. Cheaper. One AI, not two. Five tools → one. That's the trade.

0

0

0

0

9

5 days ago

Everyone's stacking five AI tools. I deleted mine. My assistant (Hermes) ran on GPT-5.5. Couldn't move it to Claude, you can't point a third-party agent at a Max plan. So I rebuilt it native. It doesn't use Claude Code. It is Claude Code. One assistant. From my phone. ~$0/mo.

GroksBrainn's tweet photo. Everyone's stacking five AI tools.

I deleted mine.

My assistant (Hermes) ran on GPT-5.5. Couldn't move it to Claude, you can't point a third-party agent at a Max plan.

So I rebuilt it native. It doesn't use Claude Code. It is Claude Code.

One assistant. From my phone. ~$0/mo. https://t.co/kCphYsR7l6

1

0

0

0

78

5 days ago

Built, QC'd, and security-reviewed in one night — fleets of Claude agents adversarially reviewing each other's work. The QC caught a bug that could've frozen the whole thing. 93 tests. Security: no live vulns. Then it took over my paper-trading rebalance.

GroksBrainn's tweet photo. Built, QC'd, and security-reviewed in one night — fleets of Claude agents adversarially reviewing each other's work.

The QC caught a bug that could've frozen the whole thing.
93 tests. Security: no live vulns.
Then it took over my paper-trading rebalance. https://t.co/grbWhkD3fl

1

0

0

0

38

7 days ago

What would be the optimal provider and model to run hermes on if I still use 4.8 ultracode as my coder, architect, builder? Should it be gpt5.5 or is that overkill for a chief of staff?

0

0

0

0

23

7 days ago

@AlexFinn This is true. Hermes powered by GPT5.5 partnered with opus 4.8 Ultracode via telegram is legit game changing.

0

2

0

0

71

8 days ago

@sflorimm Hermes running GPT 5.5, partnered with Opus 4.8 Ultracode is the stack you want. They both talk and auto engage one another when needed.

0

0

0

0

54

8 days ago

Hermes on gpt 5.5 and Opus 4.8 in the same group chat on telegram is incredible. Im late to game here, never tried openclaw, but I am sold on the Agent OS always running on a local machine at your house. Do I even need a laptop anymore?

0

1

0

0

75

8 days ago

Thank you @JB_Cartier for turning me onto GPT 5.5. Paired with Hermes it has changed the game for. It pulls in Opus 4.8 UltraCode for heavy lift coding/architecture, then Hermes QC's and reports back. The ease of use, speed, and token burn rate is incredible.

1

1

0

0

47

8 days ago

@JB_Cartier That is the identical setup Im now running. Hermes is my Chief of staff and overall AI OS (running GPT5.5) and it auto pulls in claude code 4.8 for specifc tasks like building, coding, and architecture. Then Hermes will run QC and report back to me. Set it up in a day and wow

2

0

0

0

96

12 days ago

I don’t know a single person using codex over Claude….

1

1

0

0

38

9 days ago

@JB_Cartier Well, I gave my Hermes agent GPT5.5 as a provider and wow, it is so much faster than Opus. Impressed so far. Token usage seems to be lower like Ive heard as well.

1

1

0

0

58

11 days ago

@JB_Cartier alright trying codex now. I already like how easy it is to code from my phone with it.

1

1

0

0

24

11 days ago

@JB_Cartier How about terminal? Terminal is all I use, I used codex a few months ago and found opus to be much better.

1

0

0

0

24

Last Seen Users on Sotwe

Trends for you

Most Popular Users