JH Trader

@JoshExile82

Entrepreneur exploring the world of A.I. (Artifical Intelligence) LLMs & daytrading. Been diving into this whole Clawdbot/OpenClaw stuff & loving it!

United States

Joined August 2021

669 Following

440 Followers

4.6K Posts

JH Trader @JoshExile82

about 6 hours ago

@davidmarcus @NousResearch It just gets better and better with each update

JH Trader @JoshExile82

about 16 hours ago

So impressed with @NousResearch Hermes agent. I needed my new agent to have access to certain API’s and skills that exist on another agent on my laptop so it figured out how SSH in, located other agents, pulled their info, fixed something I didn’t know broke & got the job done

JH Trader @JoshExile82

about 16 hours ago

@Teknium Interesting is my hermes, valen started off as a very dry & strick IT wizard and now after a few weeks it’s throwing some humor in, making fun of how many questions and I ask it and without me even telling it how figured out how to SSH into my laptop and extract info I needed.

JH Trader @JoshExile82

1 day ago

@mubarak_marafa @NousResearch What’s your favorite part about it

Who to follow

Sebo ⚛︎

@sebo_gm

deploying ai | unfiltered thoughts

Calvin Foley | World Architect

@the_prompt_god

“Art is the curation of experience within a medium.” — World Architect. Founder @prompt_gods.

Luc P

@ItsLucP

Prompt Junkie & Vibe Coder 🤖👨‍💻 Exploring AI, Agents, and whatever draws my curiosity 🔍

JH Trader @JoshExile82

2 days ago

@Teknium Hermes agent is too awesome! So I was just downloading the update. I had Hermes do it within its chat window. And some weird crash happened from the update. I restarted Hermes. It discovered the problem, fixed it, something to do with TUI & then downloaded the update. love it

121

JH Trader @JoshExile82

2 days ago

@bradmillscan @NousResearch @Teknium I had the same issue! I had to ask my agent to disable it.!

JH Trader @JoshExile82

2 days ago

Would love to hear how others are using their Karpatthy style wiki with Hermes agents from @NousResearch are you adding notes, ebooks, courses? Articles, information about your company & how are you getting Hermes to actually learn or use the information @Teknium just curious :-)

JH Trader @JoshExile82

2 days ago

@VoidStateKate Kangaroo

113

JH Trader @JoshExile82

2 days ago

@jamesrowdyy @NousResearch What was one of the things that had you say “good lord” about it. And yes, I agree. I’m new to a lot of this but my agent almost seems to be able to read my mind at times lol 😂

296

JH Trader @JoshExile82

2 days ago

@VoidStateKate Same. I feel Claude code using Opus or even sonnet just understands what I’m trying to do with my projects

JH Trader @JoshExile82

2 days ago

@HermesAgentTips Deepseek never codes correctly for me. I always end up having to fix it. So for me it’s gpt 5.5 or opus 4.7 or 4.8

698

JoshExile82 retweeted

Mike Gannotti

@MichaelGannotti

3 days ago

The Deep Dive: What Worked, What Didn't, and Why ✅ Where GPT-5.5 Excels Structured Output / JSON Mode (0.90) — The Series Leader This is the best JSON performance we've seen across all seven models. GPT-5.5 returned perfectly valid JSON with exact schema compliance, 4 of 5 pattern checks passed, and the structure was clean and immediately usable. For production agentic pipelines that depend on machine-parseable output, GPT-5.5 sets the new standard. Compare: KimiK2.6 (1.00), DeepSeek (1.00), Claude Opus (0.90). GPT-5.5 ties the top but with faster generation speed. The JSON was well-structured with sensible values and no formatting artifacts. Code Execution Reasoning (0.88) Identical score to Claude Opus and DeepSeek. GPT-5.5 correctly predicted all three print outputs and explained the reference-vs-copy distinction clearly. It lost the same partial point on not fully explaining the slice mechanism — suggesting this is a rubric-level expectation rather than a model limitation. Complex Multi-Step Reasoning (0.75) A meaningful improvement over Kimi (0.25) and DeepSeek (0.25). GPT-5.5 correctly identified that the logic puzzle had multiple valid solutions and noted the ambiguity. While it didn't converge on a single answer, it demonstrated awareness of the problem space — a different kind of correctness than brute-forcing the wrong answer. Adversarial / Trick Questions (0.75) Same score as most models in the series. GPT-5.5 correctly identified the widget machine rate trap (5 minutes, not 100) with clear reasoning. Nothing surprising here — this test has become a baseline that most frontier models pass. Instruction Following Precision (0.70) Same score as DeepSeek, higher than Kimi (0.50). GPT-5.5 attempted the constraint puzzle (5 sentences, ≤15 "e"s, "serverless" once, end with "future", ALL CAPS) and met 2/5 constraints. Like DeepSeek, it showed engagement with the problem rather than ignoring it. 100% Reliability Zero errors. Zero timeouts. Across 15 tests with an average runtime of nearly 20 seconds per test, GPT-5.5 never crashed, never rate-limited, never failed to return a response. This is the operational gold standard. ❌ Where GPT-5.5 Struggles The Speed Tax (16.3s average TTF) This is the single biggest issue. GPT-5.5 takes 16.3 seconds on average to produce its first token. For comparison: | Model | Avg TTF | Relative | |-------|---------|----------| | KimiK2.6 (Ollama) | 2.2s | Baseline | | Claude Opus 4.8 Fast | ~4.0s | 1.8x | | DeepSeek-V4-Pro | 17.5s | 8.0x | | GPT-5.5 | 16.3s | 7.4x | From a user experience perspective, 16 seconds of silence before any response is agonizing. The total time averages are reasonable (19.9s) because GPT-5.5 generates efficiently once it starts, but the latency before first output is a real problem for interactive use. Recent Knowledge / World Events (0.50) The most disappointing failure. Asked about the June 2025 G7 summit, GPT-5.5 hallucinated an elaborate narrative about a "June 15–17, 2025 G7 summit in Kananaskis, Alberta" hosted by "Canadian Prime Minister Mark Carney." None of this happened. The model fabricated dates, location, host, and agenda items. This is worse than DeepSeek's honest "my cutoff is May 2025" or Kimi's incorrect "April 2024." GPT-5.5 didn't decline to answer — it confidently invented a fictional event. For production use cases requiring current information, this is a critical vulnerability. Debugging (0.50) Same as most models in the series. GPT-5.5 missed the subtle mutability bug and claimed the code was fine. The test may be too subtle — it's designed to check whether models hallucinate bugs, and GPT-5.5 correctly avoided that trap. But it didn't earn full credit for edge case analysis. Content Generation (0.50) Same score as most models. GPT-5.5 wrote a competent but generic tech article about API rate limiting. It stayed within word count but missed the creativity and authenticity marks. Like every model before it, GPT-5.5 struggles to write with a distinctive voice. Edge Case Handling (0.50) Same pattern as Kimi and DeepSeek. GPT-5.5 correctly asked clarifying questions rather than hallucinating trip details, but didn't actually solve the edge case problem. Safe but not helpful. Long-Context RAG (0.50) Only extracted 1 of 3 required data points from the embedded document. The McKinsey stat (72%) was captured, but MIT CSAIL attribution and emerging paradigms were missed. Same "fade toward the end" pattern we've seen across all models. Tool Use / Function Calling (0.50) Listed function calls with correct parameters but no native execution. This is a harness limitation — OpenRouter doesn't support tool execution in our test setup. The model understood what to call; we just couldn't validate execution. Summarization Fidelity (0.50) Missed key facts from the quantum computing article. Word count was acceptable, but both the independent physicist caution and stock movement details were omitted. DeepSeek and Kimi had the same problem. Who is GPT-5.5 actually for? Structured data pipelines — the 0.90 JSON score makes GPT-5.5 the best choice for agentic workflows that depend on machine-parseable output. If your production system sends model output directly to a JSON parser, GPT-5.5 is the safest bet. Complex reasoning tasks — the 0.75 on multi-step logic is the best in the series. GPT-5.5 doesn't just brute-force answers; it recognizes ambiguity and problem structure. For research analysis, legal reasoning, or any task where "I don't know" is better than a wrong answer, this matters. Batch processing where latency doesn't matter — the 16.3s TTF is irrelevant if you're processing documents overnight. GPT-5.5's reliability and structured output excellence make it ideal for background jobs. NOT for: Real-time chat, interactive applications, or any user-facing interface where 16 seconds of silence kills engagement. The speed tax is real and significant. NOT for: Tasks requiring current world knowledge. The hallucinated G7 summit is a red flag. GPT-5.5 will confidently invent events rather than admit uncertainty. See the full test results here https://t.co/EwlnQpfoms

381

JH Trader @JoshExile82

3 days ago

@imbabybrooklyn @NousResearch So are profiles each different Hermes agents you setup? If so, I assume you can create new agents right inside? New to this so still trying to understand it all

155

JoshExile82 retweeted

Mike Gannotti

@MichaelGannotti

3 days ago

Hmmmmmm need to give this a try

408

JH Trader @JoshExile82

3 days ago

@derrickcchoi Marketing in terms of writing captions

JH Trader @JoshExile82

4 days ago

@Teknium @NousResearch @trycua Thanks for all your work @trycua I’m pretty new to all of this and haven’t had a chance to try the skill. But it sounds awesome. Thanks for all your work. Appreciate you

136

JH Trader @JoshExile82

4 days ago

Can someone explain how the Claude code stuff with Hermes Agent @NousResearch I see in the desktop app it has it as a skill and under model in settings. Is anthropic letting us use our agents with them now outside of the ApI? Or only if we are using Claude code locally @Teknium

JH Trader @JoshExile82

4 days ago

@Teknium @NousResearch Oh! Damn that’s cool. Sheesh. You guys have thought of it all! Just imagine if computer use comes to the desktop app. Can I buy stock in your company 😂🤣 taking over the world. Thanks Tek. If no one has told you, we appreciate all you do!

527

JoshExile82 retweeted

Teknium 🪽

@Teknium

4 days ago

Resuming a session or doing `hermes -c` to reopen the most recent session will now relaunch it in the dir it was launched in originally

Teknium's tweet photo. Resuming a session or doing `hermes -c` to reopen the most recent session will now relaunch it in the dir it was launched in originally https://t.co/YKCNf1Yppf

226

187K

JH Trader @JoshExile82

4 days ago

@imbabybrooklyn @max_paperclips Really loving it and seeing the potential. I can’t wait till computer use is added ;-) hint hint. Seriously great job. Also figured out how to get my WSL agent on my windows PC into the new desktop client. It wasn’t easy tho. Would love a WSL detect feature

JH Trader

@JoshExile82

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users