MAYAI

2 days ago

https://t.co/qtLhGcwhSe

290

191K

about 14 hours ago

This AI agent gets smarter every single time you use it. A guy sits at his desk and drops Hermes - an open-source agent that learns from every conversation, builds its own reusable skills, and carries perfect memory across every session. It connects to over 200 models through OpenRouter or Ollama. Links Telegram, Discord, WhatsApp through one gateway. Runs 24/7 on a $5 VPS. You talk to it from your phone while it works in the background. Give it a complex task - competitor research, pricing analysis, proposal writing - and watch it break it down, execute, then automatically write a skill document capturing exactly how it succeeded and where it stumbled. Next time the same problem appears, it nails it faster. It spawns isolated sub-agents for parallel work, sets up cron automations, connects to any MCP server, searches the web, writes and runs code. Zero agent security vulnerabilities that OpenClaw carries. One install command. Persistent memory of your projects, preferences, and environment. The longer it runs, the more capable it becomes. This is the new baseline for personal AI workers. Not a wrapper. Not a simple chatbot. An agent that actually grows with you.

2 days ago

https://t.co/qtLhGcwhSe

290

191K

919

about 18 hours ago

27-year-old dev switched from OpenClaw to Hermes and cut workflow friction by 68% Most people don't realize more creators are choosing Hermes agents over OpenClaw. Tested both on an MSI Stealth 16 laptop for a month. Here are three reasons Hermes wins for real work. First, Hermes gets smarter every run. It refines its own skills, saves what works, and builds a curator that prunes dead weight. OpenClaw repeats the same fixed process. Research agents hit 79% quality on repeat tasks because it learns from every execution. Second, Hermes snapshots the working directory before every change. One /rollback command and everything reverts perfectly if code breaks. No manual git commits or pre-snapshot steps needed. Third, Hermes ships fewer updates. OpenClaw drops changes nearly every week and often slows down with regressions. Hermes stays stable on 2-3 week cycles with zero breaking changes this year. OpenClaw has the bigger ecosystem. But for daily coding, research, and agent runs, Hermes delivers the learning loop and safety net that compounds fast. The agent that grows with you beats the one that stays frozen. Follow for more on AI agents that ship real money workflows.

2 days ago

https://t.co/qtLhGcwhSe

290

191K

673

about 23 hours ago

@sandy4kad thank you for support

1 day ago

A founder unlocked serious leverage by pairing Hermes with Claude Code instead of forcing one tool to do everything They serve completely different roles. Claude Code dominates deep coding sessions: it lives in the terminal, reads your full codebase, follows imports, runs tests, edits files, and uses a 26-event hook system for tight loops in VS Code or JetBrains. Hermes wins on persistence: it runs 24/7 on a VPS, holds long-term memory across projects, self-improves by writing new skills, schedules plain-English cron jobs, and delivers results via Telegram, Slack, or other platforms while you sleep. The real power comes when they work together. Tell Hermes to refactor the auth module. It pulls context from memory, spawns Claude Code as a sub-agent, lets it execute the changes across 14 files, captures the results, updates its own memory, and messages you that it's done. Hermes orchestrates with persistent intelligence. Claude Code executes with unmatched coding depth. Memory folds back in for continuous improvement. Most builders treat agents as competitors. She runs them as a stacked system and watches output compound while the rest fight single-player battles.

2 days ago

https://t.co/qtLhGcwhSe

290

191K

513

2 days ago

All my Hermes Agent crew just walked into the war room for a strategy meeting on how to scale the business. It looks like a peaceful pixel village. This is actually my live AI operation. Every agent has a dedicated home and clear role inside this Hermes-powered world: → Matt runs the command center as manager agent. He tracks lifetime spend at 32.52 dollars, next wake in 42 minutes, and today's 4 active tasks with full operations overview. → Dennis operates permanently from the research lab. He proposes high-potential ideas like occupational pride candles, niche t-shirts, agentic observability tools, AI receptionists, and local service sites. One approval and tasks cascade to the team. Bunc heads straight to the factory after meetings. He designs products, generates mockups, and pushes listings live on Etsy via Printify. Luke works in the media studio creating TikToks that promote every drop. Maverick runs his remote camp experiments. He takes allocated capital and has 30 days to triple it. The entire village lives on Hermes agents running on a VPS. They meet, delegate, execute, and report real metrics in real time. No more staring at terminals. Just one glance at the village shows exactly what's moving. This is functional agentic business, not a dashboard LARP. The agents keep shipping while I watch the map grow.

2 days ago

https://t.co/qtLhGcwhSe

290

191K

537

3 days ago

@sunaiuse interesting

4 days ago

Capital One researchers just solved the hardest problem in AI evaluation. LLM judges give you 1 number. That number tells you nothing. Did the model fail on facts? On relevance? On fluency? On instruction-following? The score doesn't say. You're debugging blind. BINEVAL fixes this. Instead of asking an LLM for 1 holistic judgment, it decomposes every evaluation criterion into atomic yes/no questions. The model answers each independently. You get a structured diagnostic - not a verdict, but a breakdown. The result: scores that are transparent, debuggable, and directly usable for prompt improvement. How it works: → A meta-prompt decomposes your task into fine-grained binary questions organized by dimension → An evaluator answers each question independently for every output → Answers aggregate into per-dimension scores plus an overall calibrated score → Question-level feedback feeds directly into a 2-phase prompt optimization loop Tested on SummEval, Topical-Chat, and QAGS - BINEVAL matches or outperforms UniEval and G-Eval across the board, with especially strong results on factual consistency. The ceiling effect problem disappears. Prior LLM judges cluster scores at the top, making it impossible to distinguish borderline outputs from clearly strong ones. Binary questions force discrimination. The practical upside: the same feedback that scores your outputs also improves your prompts. Evaluator prompts and generator prompts both update from the same question-level signal. Task-agnostic. Training-free. No fine-tuning required. The paper dropped June 25. Capital One AI Foundations, accepted at the Compositional Learning Workshop at ICML 2026.

Mayaikos's tweet photo. Capital One researchers just solved the hardest problem in AI evaluation.

LLM judges give you 1 number. That number tells you nothing.

Did the model fail on facts? On relevance? On fluency? On instruction-following? The score doesn't say. You're debugging blind.

BINEVAL fixes this.

Instead of asking an LLM for 1 holistic judgment, it decomposes every evaluation criterion into atomic yes/no questions. The model answers each independently. You get a structured diagnostic - not a verdict, but a breakdown.

The result: scores that are transparent, debuggable, and directly usable for prompt improvement.

How it works:

→ A meta-prompt decomposes your task into fine-grained binary questions organized by dimension
→ An evaluator answers each question independently for every output
→ Answers aggregate into per-dimension scores plus an overall calibrated score
→ Question-level feedback feeds directly into a 2-phase prompt optimization loop

Tested on SummEval, Topical-Chat, and QAGS - BINEVAL matches or outperforms UniEval and G-Eval across the board, with especially strong results on factual consistency.

The ceiling effect problem disappears. Prior LLM judges cluster scores at the top, making it impossible to distinguish borderline outputs from clearly strong ones. Binary questions force discrimination.

The practical upside: the same feedback that scores your outputs also improves your prompts. Evaluator prompts and generator prompts both update from the same question-level signal.

Task-agnostic. Training-free. No fine-tuning required.

The paper dropped June 25. Capital One AI Foundations, accepted at the Compositional Learning Workshop at ICML 2026.

Swati Gupta

@hrswatigupta

about 1 month ago

https://t.co/HgzUUXQNWk

780

179

325

5 days ago

A reader posted a video about annotating books. 20 minutes later, she needed a quote to back the point. She didn't Google it. She didn't scroll Twitter. She opened Obsidian and searched her own notes. Typed "annotating" - nothing exact. Typed "reading" - closer. Typed "book" - found it. A saved clip from Ryan Holiday: "Books are not precious things. It should look like you read the book. That's how you pay respect to an author. By really engaging with the text, by making it a part of your life, by bringing it with you." Perfect supporting evidence. Already inside her vault. Already hers. This is what a second brain actually does. Not storage. Retrieval. The right idea, in under 2 minutes, from everything you've ever saved. Most people collect notes they never find again. The vault becomes a graveyard of highlights and half-finished thoughts. Obsidian with a proper tagging system turns that graveyard into a library you can search in real time - by topic, by keyword, by whatever thread you're pulling on at the moment. She wasn't looking for Ryan Holiday. She was looking for the idea. The vault gave her both. If this was useful - follow.

6 days ago

https://t.co/uCQkLCqnFA

45K

259

5 days ago

Creative people with ADHD lose 80% of their ideas before writing them down. Obsidian's Daily Note feature fixes that. 1 click. New note. Today's date. That's the entire setup. Any idea that surfaces goes there immediately - in full detail, not a half-typed voice memo you'll never finish. → Not buried in a notes app that kills momentum → Not lost in a journal you won't reopen for 3 weeks → Right there, in the same system as everything else you've ever written The friction disappears. The idea survives. But that's just the entry point. The real power shows up 3 weeks in. Obsidian is self-referential by design. The more you write, the more the system starts connecting things for you. → Start typing a half-formed idea → It surfaces a note you wrote last Tuesday → And another one from 6 weeks ago → Suddenly the context was already there What looked like a random thought becomes a fully developed concept - because the source material had been building the whole time. A small ember turns into something you can actually build on. The folder structure locks it in. Daily notes organized month by month. → Scroll back to any time period → See exactly what you were thinking → Not the polished ideas - the raw ones → The weird, half-finished thoughts that contained the real seed For creative people who can't keep up with their own brains, that archive isn't a nice-to-have. It's the whole point.

6 days ago

https://t.co/uCQkLCqnFA

45K

356

6 days ago

Loop Engineering just replaced prompt engineering Most people are still writing better prompts. The practitioners who figured this out in June 2026 stopped prompting entirely. Prompt engineering teaches you to write the perfect input. Loop engineering teaches you to build the system that writes inputs for you. The shift is one sentence: you no longer feed the agent line by line. You design the loop that feeds itself. Adam Osmani named it. Boris Cherny and Steph Ango surfaced it at the same moment in the same week of June 2026. 3 people reaching the same conclusion independently is not a coincidence. It is a threshold being crossed. What a loop actually is. 6 moves. Discovery, handling, verification, scheduling, persistence, realization. The agent runs one turn. Evaluates its own output. Decides what to do next. Runs again. The human is outside the loop entirely — not directing, not reviewing line by line, not feeding prompts. The scarce resource is no longer the prompt. It is judgment. Where to point the loop. When to stop it. What counts as done. Why this matters right now. The paper surveyed 3 loops running in production. An engineer's morning triage that merged into Stripe's enterprise-scale pipeline. Machine-written pull requests landing at 100 per week. A cost that accrues silently - verification debt, comprehension rot, cognitive surrender, token blowout. The same loop built by 1 person and built by 2 people yields opposite outcomes. Not because the code differs. Because judgment was distributed differently. Loop engineering is not a technique. It is a position shift. The old world: you sit at the keyboard and direct. The new world: you build the thing that sits at the keyboard instead. The practitioners who figured that out in June 2026 are not writing prompts anymore. They are writing loops that write the prompts for them.

Mayaikos's tweet photo. Loop Engineering just replaced prompt engineering

Most people are still writing better prompts. The practitioners who figured this out in June 2026 stopped prompting entirely.

Prompt engineering teaches you to write the perfect input. Loop engineering teaches you to build the system that writes inputs for you.

The shift is one sentence: you no longer feed the agent line by line. You design the loop that feeds itself.

Adam Osmani named it. Boris Cherny and Steph Ango surfaced it at the same moment in the same week of June 2026. 3 people reaching the same conclusion independently is not a coincidence. It is a threshold being crossed.

What a loop actually is.

6 moves. Discovery, handling, verification, scheduling, persistence, realization.

The agent runs one turn. Evaluates its own output. Decides what to do next. Runs again. The human is outside the loop entirely — not directing, not reviewing line by line, not feeding prompts.

The scarce resource is no longer the prompt. It is judgment. Where to point the loop. When to stop it. What counts as done.

Why this matters right now.

The paper surveyed 3 loops running in production.

An engineer's morning triage that merged into Stripe's enterprise-scale pipeline. Machine-written pull requests landing at 100 per week.
A cost that accrues silently - verification debt, comprehension rot, cognitive surrender, token blowout.

The same loop built by 1 person and built by 2 people yields opposite outcomes. Not because the code differs. Because judgment was distributed differently.

Loop engineering is not a technique. It is a position shift.

The old world: you sit at the keyboard and direct.

The new world: you build the thing that sits at the keyboard instead.

The practitioners who figured that out in June 2026 are not writing prompts anymore.

They are writing loops that write the prompts for them.

11 days ago

https://t.co/a45ywIXf1W

166

253

84K

8 days ago

Nobody explains what an agent harness actually is. Here's the version that finally makes sense. The AI model is a brain in a jar. Knows everything. Can do nothing. No hands. No eyes. No memory of what it did 30 seconds ago. The harness is the body you build around it. What a harness is made of 3 components. Every harness, every framework, every "autonomous agent" you've seen is some combination of these 3 things. Tools - the hands The model cannot touch anything without tools. It cannot search the web, run code, edit a file, call an API, or send a message. Tools are functions you give the model permission to call. The model decides which tool to use and when. The harness executes the actual call and returns the result. Without tools, the model answers questions. With tools, it takes actions. That's the entire difference between a chatbot and an agent. The loop - the heartbeat A single prompt gets a single response. That's not an agent. An agent thinks, acts, checks the result, and decides what to do next. Then repeats. Until the goal is complete or something breaks. while goal_not_complete: decide what to do next use a tool to do it read the result update what you know The loop is what makes the harness autonomous. Without it, you're still the one pressing send after every step. Memory - the notepad Models have no persistent memory by default. Every new conversation starts from zero. The harness solves this by maintaining state the model can read and write to. What did it do last session. What did it learn. What does it still need to do. Some harnesses store this in a file. Some in a database. Some in a markdown vault. The format matters less than the fact that it exists at all. Without memory, your agent is a goldfish. Capable for 30 seconds. Starting over every time. The 3 harnesses worth knowing right now OpenClaw - proactive by design, community skill library, messaging built in. The most social harness. Gets you running fast. Audit the community skills before you trust them. Hermes - the one that compounds. After completing a task, it writes a skill so next time requires less prompting. The longer you use it, the less you have to explain yourself. That's a real advantage, not a marketing line. Obsidian + Claude Code - the memory-first setup. Your vault is the notepad. Claude Code reads it, writes to it, references it across sessions. Best for knowledge workers who want persistence without complexity. All 3 are early. All 3 will improve significantly as models improve. Why this matters more than most people realize The brain is getting better every few months. GPT-4 to Claude 3 to Claude Opus 4. Each jump is the model getting smarter, faster, more capable. But a better brain in a worse body is still limited. The harness is the multiplier. A mediocre model in a well-built harness outperforms a frontier model with no tools, no loop, and no memory. The people investing time in harness architecture now are building on infrastructure that compounds. Every model upgrade makes their setup better automatically. The people waiting for the perfect model before they build anything are waiting for a condition that will never arrive. What you should do with this Pick 1 harness. Run it on a real workflow you repeat at least once a week. Do not switch until it fails you in a specific way. When it does fail, you'll know exactly what to fix. That's when building your own starts to make sense. The brain in a jar is not the product. The body you build around it is.

Sandy4ka

@sandy4kad

8 days ago

https://t.co/kw61QsKpPB

49K

589

9 days ago

1 developer. 4 Claude agents running in parallel. 0 merge conflicts. This is how serious loop engineers are scaling output right now. Most people using Claude Code are still running 1 task at a time. Prompt, wait, review, repeat. That's not a loop. That's a slower version of doing it manually. The shift happening right now: work trees. Here's the setup that's being used to handle GitHub issues at scale. What work trees actually do When you run multiple agents in a single repo, they step on each other. Agent 1 edits a file. Agent 2 edits the same file. Everything breaks. Work trees solve this by giving each agent its own isolated branch of the codebase. They're running in parallel but completely separated. No conflicts. No overwrites. No waiting. Boris Chernii, the creator of Claude Code, talks about this as the foundation of any serious multi-agent setup. Isolation is what makes parallelism safe. The loop in action The workflow kicks off 4 Claude agents simultaneously, each assigned a different GitHub issue. → Agent handles the issue → Loop validates the PR was actually created → Human reviews if needed → 4 more agents run code review on each PR That's 3 phases. Handle, validate, review. Running across 4 issues at once. What used to take a developer most of a day runs while they're in meetings. Why human-in-the-loop still matters The validation step isn't just for catching Claude's mistakes. It's the checkpoint that keeps the loop from compounding errors at scale. 4 agents running wrong is 4x worse than 1 agent running wrong. The human review gate between phases is what makes aggressive parallelism safe. You're not removing yourself from the process. You're repositioning yourself — from doing the work to approving the work. That's the real shift. The scalability math 1 developer running sequential Claude Code tasks: linear output. 1 developer running 4 parallel agents with work trees and validation loops: output that doesn't scale with hours worked. The ceiling isn't your time anymore. It's your ability to design the loop architecture and review what comes out of it. The developers building this infrastructure right now aren't writing more code. They're writing the systems that write code for them.

11 days ago

https://t.co/a45ywIXf1W

166

253

84K

544

10 days ago

Most people think they're "using AI" because they open ChatGPT every morning. They're not. They're just prompting. There's a difference. A loop is what happens when you remove yourself from the process entirely. Here's the technical definition nobody gives you: a loop is a cron job with an AI agent inside it. A cron job is a task that runs on a schedule — every hour, every Sunday, every day at 6 PM. Cron jobs have existed for decades. The only thing that changed is what's running in the middle. Before: code. Now: an agent that decides what to do. One content creator built an entire social media management system out of loops. One of those loops watches his niche 24/7 and drops video ideas into his inbox on a schedule. He didn't open anything. He didn't prompt anything. A clock started, the agent ran, and the result arrived. That's a loop. Here's the test: if you still have to open a chat window and type something for the task to happen, you don't have a loop. You have a fast chat session. Saved instructions aren't loops either. They just make prompting faster. A real loop removes the conversation entirely. The simplest loop you can build today: open any AI assistant and say — "Help me create an automated task. Every Sunday at 6 PM, message me 3 things I can prep for the week." The AI walks you through the setup for that specific platform. You close the app. Sunday at 6 PM, the message hits without you doing anything. That's your first loop. Most people will read this, nod, and go back to prompting manually 40 times a day. The ones who build the loop stop doing that forever. The gap isn't skill. It's one Sunday afternoon.

11 days ago

https://t.co/a45ywIXf1W

166

253

84K

464