Pax Beach

4 days ago

The Human at the Console: Procrastination, Switching, and the Long Game I closed 340 tasks in Arcanada's second month. And I spent almost all of that time at the computer — eighteen to twenty hours a day, often through the night. I'm not writing this as a complaint. The motivation hasn't gone anywhere: I still get up and reach for the system with the same drive I had in April. But the second month was the first time I seriously ran into two things I used to only read about in other people — procrastination, and the edge of burnout. And I think I found a way to live with both without easing off the pace. This is the closing article of the cycle. The most personal one. No digital fetishes, no productivity-guru pose. Only what happens to a person who built an autonomous ecosystem — and is still chained to the console, by his own choice. Let me say it plainly upfront: the kind of autonomy where you can stand up and walk away, I don't have yet. It's the goal I'm moving toward, and moving hard. But the honest report for the month is that I'm still at the desk. I've just learned not to burn out there. Procrastination: why I run from the big into the small In programming there's the idea of "procrastination by setup." Instead of writing the feature, you tune the editor. Instead of closing the epic, you rename a variable that has annoyed you for six months. A small task gives quick dopamine. A big one gives anxiety that it won't fit in the context window. In the second month I caught a clear pattern in myself: if an L4 task — a week-long epic — sat in the backlog, I would start cleaning archives, rewriting READMEs, fixing CLAUDE.md formatting. Fifteen minutes, and I had "done something." In reality I had run away. I'm not proud of it, but I don't beat myself up either. It's just mechanics: the brain picks easy dopamine. My job as the operator isn't to scold myself but to reshape the process so the big stuff doesn't scare me and the small stuff doesn't disguise the escape. The numbers confirm the work got harder: among labeled tasks, the share of complex ones — those with a mandatory PRD or multi-phase epics — rose from 29% to 33%. But more energy went into them not because they're harder in themselves, but because I took longer to start. Procrastination isn't about time. It's about the courage to step over the threshold. Anti-burnout: I switch, I don't stop The main discovery of the month wasn't technical. I figured out how not to burn out without lowering the load. I have hundreds of projects in my head. The ecosystem registry holds 26 of them right now, and more than twenty are live: with tasks, servers, deadlines of their own. I used to think this was a problem — spreading myself thin, never finishing. In the second month it turned out to be exactly what saves me. The mechanic is simple. When I catch myself procrastinating, or feel worn down on one track, I don't force it. I switch to another project. Tired — I take what brings joy: something light, with a result visible within an hour. Full of energy — I take what needs doing, even if it has dragged on and been an eyesore for ages. The fatigue from one thing gets cancelled by interest in another, and the boring-but-important piece gets done with a fresh head. This isn't discipline in the usual sense. It's routing myself — the same thing my orchestrator does with tasks between agents, only inside my own head. And it's what keeps me going eighteen hours a day without that grey "well, again" that crept in mid-month. There was an evening when I stared at the task list and couldn't make myself open a single one. Not because it was hard. Because everything had blurred into one: prompt, code, review, archive, commit, prompt again. The old me would have pressed harder and sunk deeper. This time I just closed the current project and opened another — one I'd been wanting to touch for a while. Twenty minutes later the drive was back. Over the second month I closed 340 tasks — almost the same as the first (341). The pace was the same; it cost more. And it was switching that carried me, more than anything else. Sport and family are there too, and they help: the gym three times a week to reset the head, not for the body; dinners with my wife without a single word about Claude limits. But honestly, the main lever is moving between projects. The rest holds the background; this is what pulls. Autonomy: a goal, not a fact Here I have to confess to what isn't there. An early draft of this report had a pretty scene: as if I'd left town for the weekend with just a phone, and the system closed dozens of tasks without me. It sounds like the perfect ending to the "release the reins" arc. But it would be a lie, and I promised to write without dressing things up. The truth is this: I'm still at the console. Every day, for many hours. The orchestrator carries a task from initialization to archive, but behind every deadlock, every failure, every "the agent did the wrong thing," it's still me standing there. Running Arcanada from a phone, dictating a task by voice and going for a walk — that's what I'm moving toward, and seriously. The plan is next month. But today it's a dream, not a screenshot from life. In the first article of the cycle I wrote about the rollback — about returning to manual control after trying to let go. It felt like defeat then. Now I see it as a stage: that rollback was needed to build a system that will one day hold the reins without me. "One day" is the key word. Not "already." What I kept for myself and what I gave to the agents I didn't build a perfect system. The orchestrator still stumbles on patches with locks. The implementer agent sometimes writes code that has to be rewritten. Coworker drops a session when a provider goes down. But I drew a boundary. I kept for myself the things that need a human: writing articles, reflection, architectural decisions like "which service to launch next," talking to people, sport, family. I gave the agents what I used to do with my fingers: code implementation, data validation, routine cleanup, writing tests, deployment, monitoring. The numbers here aren't for boasting but for scale: over the second month my datarim agent handled 1,345 calls and generated 5.1 million output tokens. For $14. I would have spent a year writing that much text by hand. Each of those 1,345 times is a piece of work I didn't have to do myself, and time freed up for what I actually want to push forward. What it's all for: the long game I'm not calling on anyone to "drop everything and go autonomous." I'm not promising that agents will save humanity. And I'm certainly not playing the productivity guru — I'm up to my neck in work eighteen hours a day myself. But this month I understood what I'm holding on for, and why I don't burn out even at this pace. It's the scale of the goal. I set myself plans that, honestly, I may not finish in a lifetime. Hundreds of projects, an ecosystem bigger than one person. And that doesn't frighten me — it holds me up, on one condition: that you understand how such a plan can be handed on, after you. Not "get it all done," but "build it so others continue." And after that, the simple part. A huge, maybe unreachable plan breaks into a myriad of small pieces. And every day you do one piece. One step. Then another. Not for a checkmark and not for an output metric, but for the quiet pleasure of today coming out a little better than yesterday. That, it seems, is the real protection against burnout. Not walking away from the desk — I still have a way to go before that. But having something at the desk worth sitting for, and having each day bring one small, honest win. I don't know what the third month will bring. I know one thing: tomorrow I'll sit back down at the console, open one of the twenty projects — the one my heart leans toward, or the one that has waited longest — and do another piece. And most likely, I'll enjoy it. This concludes the cycle on Arcanada's first and second months. And the next publication is already tomorrow — about the lessons of the third month. I'm waiting for it eagerly; I can't wait to share what has come of it.

122

paxbeach retweeted

5 days ago

Angry Robot Deals: How Agents Revived a Project That Was Draining Me Roughly sixty percent of my ecosystem's capacity is tied up in one project right now. Not the Munera data center, not commercial products like Transcribator or Verdicus, not the Datarim framework I released under MIT. An old, personal, analytical trading project I nearly abandoned a year and a half ago because it was burning me out. It is called Angry Robot Deals. And it is alive again — thanks to the agents I hired in April. Up front: I won't disclose the specifics — trading pairs, keys, server addresses, amounts. This is my private space, a separate GitHub organization with no outside access. But I will tell the story of how agents revived a project I had written off as dead weight — because it is the best illustration of "releasing the reins" I can offer. What had piled up in the abandoned project Angry Robot Deals started many years ago as my attempt to build an analytical system for tracking markets. In real time: terabytes of data from crypto exchanges and Forex, trading signals, financial news, algorithmic bots with risk management, neural networks, and later large language models. The project brings in modest income from algorithmic trading — enough to help keep the infrastructure running. But I built it first and foremost as research: working with large datasets and forecasting markets. But by early 2025 I had slowed it down almost completely. Not because the system had stopped working — the bots kept trading, and one of them has been running on Forex for a year, which is a serious stretch for automated trading. The problem was maintenance. CI/CD that broke every week. Data pipelines that needed manual restarts. Databases that had to be cleaned. Servers that had to be patched. Monitoring that did not exist. Everything I could not get to between Datarim, Arcanada infrastructure, and commercial projects. It was not about the money. The project was falling apart — that was what gnawed at me. I felt guilty: toward it and toward myself, because I had poured years into it. How one agent replaced a month of manual work The first thing I did once Datarim reached a stable state in May was create a dedicated agent for Angry Robot Deals. A separate profile, separate rules, access to the project repositories, rights on the servers. I gave it a task that had sat on my list for six months: fix the build-and-deploy pipeline that broke on every other code push. The agent tore through the configuration in four hours (a human would have spent two days digging through logs), found three problems: an outdated task runner, a dependency conflict, and no caching. It fixed them, verified on a separate branch, merged to main. Without my involvement. I gave it a second task: set up proper server monitoring. The agent checked every machine in the project, found two with critically low disk space, one with an outdated kernel, several with open ports. It assembled a metrics configuration and dashboards. Deployed them. I only confirmed the rollout after a security review. The third task was a data pipeline that crashed every three days. The agent rewrote the error-handling logic, added retries with exponential backoff, and set up Telegram alerts on failure. After that the pipeline did not crash for two weeks (I checked). When I added it up — the agent had done in a week what would have taken me a month and a half to two months. And it did it better: no fatigue, no "oh, I'll fix it later," no procrastination. Why maintenance was draining It was not technical complexity. Pipelines, bots, CI/CD — all solvable. The problem was the mental load of upkeep. That same feeling of rollback, procrastination, and exhaustion I wrote about honestly in the first articles of this cycle. Every time I sat down with Angry Robot Deals, I knew what waited: not a new strategy, not interesting analytics, but clearing someone else's (my own) mess. Logs to read through. A database to optimize. A pipeline to restart. A server out of disk space. On repeat. I am not saying this was not my responsibility. It is my project; I built it; I should maintain it. But when you layer Datarim, commercial products, and Arcanada infrastructure on top — the brain simply refuses. And the project stalls. The bots keep trading, and you feel bad because the system runs without your oversight and you cannot give it time. Now I have handed maintenance to an agent. I check the dashboards every few days. If something breaks — the agent fixes it or asks for confirmation on a critical change. I am not in the logs, not digging through kernels, not restarting pipelines at three in the morning. And it stopped being a source of burnout. The project lives — and even improves without my direct involvement. Where Arcanada's money comes from I promised in the first article of the cycle to talk about the economics honestly. Angry Robot Deals is not the main but a tangible source of funding for Arcanada: what it earns is enough to pay for servers and, in part, for the models. I won't name exact figures — and that's not the point anyway. What matters more is this: the project pays for part of its own infrastructure, and that is what gives me the drive to keep it going. Important caveat: I do not trade with large sums. The strategy is built on small stakes and statistical edge, not on risk. The longest-running bot is on Forex — a year without blowing up. The others have blown up sooner or later over the years, but the strategy keeps improving. I am not giving financial advice and I am not urging anyone to copy this — it is my lab, not an investment recommendation. But the fact stands: roughly 60% of Arcanada's ecosystem capacity is tied up in Angry Robot Deals right now. Terabytes of data, dozens of pipelines, bots on neural nets and large language models — and an agent watches over all of it, not me. A project I thought was frozen is working again and delivering value. That is exactly it: agents bring old legacy back to life when you never had enough hands before. Next step: hand the public channel to the agents Angry Robot Deals has a Telegram channel: https://t.co/5aV4Uh0Bud. It is old; I created it for the community once, then let it go — no time for posts, analytics, insights. I made a decision that was the hardest for me this month: I will hand this channel to the agents. Fully. Completely autonomously. They will publish insights, bot results, data analytics. Which ones — I do not know yet. I will not edit, approve, or pre-check before publication. This is the same "release the reins" that runs through the whole cycle. First I delegated internal routine — CI/CD, pipelines, monitoring. Then server upkeep. Then technical decisions. Now I am handing over the most irreversible thing: publication. Because a Telegram post cannot be "rolled back." People will see it. It builds reputation — mine, the project's, the ecosystem's. If an agent gets it wrong — it will be a public mistake. If it writes something foolish — everyone will see. And I decided: let it. Because without this step I will not learn how much I can actually trust the agents. And without trust — there is no autonomy. It is the same principle I applied to Datarim and Coworker when I released them under MIT: if you are not ready to lose — do not hand it over. If you did — accept that things may go wrong. And see what comes of it. What I learned The main lesson of the month: agents pull up not only new projects but old ones you had long written off. Angry Robot Deals was not dead technically — the bots worked, pipelines moved data, returns were there. It was dead for me: I could not find the strength for maintenance. The agent brought back not the code but my ability to engage with it. The second lesson: trust is built in small steps. First fix CI/CD. Then set up monitoring. Then rewrite the pipeline. Then — a public post. Each step confirms the agent can handle it, and you release control gradually, not all at once. The third lesson: irreversibility is the best test of trust. A channel run by agents is an experiment I cannot "cancel" if I dislike something. Only retune the rules and watch what happens next. And that is fine. I do not know what the agents will write on https://t.co/5aV4Uh0Bud tomorrow. But I know the project that drained me for a year and a half now runs without my direct involvement — and delivers value. That is worth it. If you see a strange analytics post in the channel — know: I did not write it. My employee did, the one I trusted with the project's public face. And I will watch the outcome the same way you do.

paxbeach retweeted

5 days ago

Seeing the Invisible: An Agent Graph on the TV A month ago the agents were set loose: the work went to the Datarim framework, Coworker spreads the load across model providers, and the agents themselves write code, close tasks, migrate databases and move to new servers — 340 tasks closed in the second month (the overall picture and the dynamics are in the first article of the cycle). All of it runs without manual steering. But there is a blind spot: what the agents are doing right now is almost impossible to see from the outside. Which raises an awkward question: if the ecosystem is autonomous, why watch it at all? Because autonomy on its own guarantees nothing — you need numbers that show whether the work is moving or stuck. Steering is handed to the agents; watching stays with the human. The point isn't to control every step, it's to see what's going on without poking the agents every five minutes. The idea came from a post by Sergey Pimenov about visualizing agents — he put workflow metrics on a big screen. For now the agents' work runs almost blind: the result shows up in the git log and in the Munera task tracker, but the movement across the task graph does not. Which projects are active, where the bottleneck is — you're left to guess. There's an ordinary Android TV at home: YouTube and streaming, nothing more. The plan is to get root access and install a separate Android app built for the TV (an APK), not an open browser tab. The app will keep the agent graph on the screen and show the same data the agents see: 26 active projects, 340 tasks closed over the last month, 85 tasks on the Datarim framework, 37 on infrastructure, 22 on the status dashboard. What goes into the 22 dashboard tasks Of the 340 tasks in the second month, 22 are on the status dashboard itself. The project is new, born in this same cycle. It has one job: put the agents' work on a screen so the picture reads at a glance. What those 22 tasks are meant to cover: • A graph of active tasks — which projects are in progress, what complexity (L1-L4), which agents are busy. • Closing pace — how many tasks closed over the past week, broken down by day. • Token economics — Coworker statistics on output tokens and cost. In the snapshot for the 30 days up to mid-June, when this article was written: 1,578 calls, 5.4 million output tokens, $15.18 on cheap models. • Infrastructure health — the fleet of servers, certificates, deploy status (the move from virtual machines to bare metal is a separate article). The graph's visual language is meant to be simple. Node color is complexity: L1-L2 muted, L3-L4 bright, so the heavy tasks catch the eye first. Node size is how many subtasks hang off an epic. Edge thickness is how often tasks reach for each other through the shared knowledge base. A branch with no commits for a while slowly fades: the longer the silence, the paler the node. A stalled area dims against the working ones and is visible across the room. You'll be able to glance at the dashboard in passing — not to police every step, but to feel the rhythm. The graph freezes — time to check the Coworker log or the orchestrator. A branch grows too dense — an agent has settled on one project, time to shift focus. Where the data comes from On its own the dashboard knows nothing — it feeds from Munera, a task tracker built for AI agents. Munera has three traits that make it more than a to-do list. The first is memory. Every closed task stores not just a title and status but the context: which files changed, which decisions were made, which skills were needed. The result is not a flat list but a shared knowledge base the agents return to through Scrutator's vector search (how shared memory ties the agents together is covered in a separate article). The second is the dependency graph. Tasks don't hang in a vacuum; each one references others. L4 epics (22 of them in the second period) pull chains of L3 and L2 tasks behind them. The visualization on the TV should show this web: which tasks haven't started, which are waiting on others. The third is token economics. Coworker spreads the load across Claude, Kimi, DeepSeek and OpenRouter, and Munera records how much each task consumed. The dashboard will show not a dry "archive #123 closed" but "this task cost $0.03 on Coworker plus $0.47 on Claude." Technically the app is meant to stay modest. A native Android client installed on the TV and open on a single screen. Every few minutes it polls Munera over the local network and redraws the graph — no animations, no pop-ups, so the picture doesn't flicker in the corner of the room. No separate server is needed: the data is the same the agents see, just drawn large. Most of the 22 tasks will go not into graphics but into a careful selection from Munera — so the screen doesn't flatter the state or show as closed what is still in progress. The goal is to tell whether the agents are working efficiently without opening a console. What should change From the graph on the TV we expect three changes. First — the terminal won't have to be opened every fifteen minutes. Right now checks are manual: the git log, Coworker statistics, the list of recent commits. With the graph on the screen a quick look is enough — the task-closing pace is visible at once. Second — stalls should surface sooner. The graph shows not only what the agents did but also what they didn't. A branch that hasn't grown for more than a day is worth a closer look. Say an agent loops on a heavy migration epic because of a network failure at a model provider: without visualization that comes out two or three days later, but on the graph — the same day. Third — intervention should be rarer. It sounds backwards, but it is precisely a constantly visible graph that removes the urge to step in. When you can see the agents working — the graph grows, the metrics climb, tasks close — there's no reason to intervene. Right now the agents' work has to be judged by feel, which makes you check one more time than needed; the screen takes that anxiety away. Why a TV specifically The objection is obvious: why a TV, if the same dashboard opens on a laptop or a phone? The difference is between "have to open" and "already in view." A dashboard you have to open demands a decision: check now or later. Usually later. And "later" never arrives for a busy person until something breaks. A dashboard you have to reach for gets opened once a day at best — that is, almost never. An app on the TV works differently — it just glows in the corner of the room. No decision to "go look" is needed; the eye falls on the graph on the way past. That's why stalls should be caught in hours, not days: there's nothing to remember to check — a frozen picture catches the eye on its own. It isn't an alert system that yanks at you over every little thing, and it isn't a report you have to open and read. More of a background: calm while things go well, and noticeable when they don't. What's next Agent autonomy isn't the finish line but the start. The agents can already be trusted with writing code, migrating databases, deploying to servers. But for the system to stay under control without direct intervention, it needs transparency — which is what the whole screen idea is about. The reins are released, but the work still has to be watched. The next step is set: build the dashboard into a native TV app and check whether the graph really does catch stalls in hours, not days.

paxbeach retweeted

6 days ago

Your Own Disk vs Dropbox: A Red Ocean and the Right to Undercut Announcing an intent to compete with Dropbox and Google Disk usually meets polite skepticism — and fairly so. File synchronization is one of the most competitive corners of software: giants with billion-dollar budgets, protocols refined over decades, engineering teams in the thousands. On the other side — a Claude subscription, a fleet of personal servers, and a handful of agent-coworkers. Entering this market still makes sense — not out of overconfidence or underestimating the risks, but from a specific theory: it can be won through a model the giants cannot copy even if they wanted to. The model is the economics of ownership. A Red Ocean File synchronization is a classic "red ocean." Dropbox, Google Disk, OneDrive, iCloud, Yandex.Disk all solve the same task: sync files between a server and nodes — computers, phones, tablets. The differences are mostly ecosystem, integrations, and price. The user experience has barely changed in twenty years: install the app, pick a folder, files sync. Competition runs on price per gigabyte, sync speed, and reliability. The giants cannot cut prices radically — their model rests on subscription margins: infrastructure and engineering salaries have to be earned back. But offer the same functionality on a model where the cost base approaches zero, and undercutting any of them becomes possible. What Disk Arcana Is Disk Arcana is an application for syncing files between a server and devices. At the bottom sits a Rust core with a gRPC protocol and delta synchronization: only changes travel across the wire, not whole files. Above it — per-OS bindings and an Obsidian plugin, because syncing a knowledge base is one of the core use cases. Desktop (Linux, macOS) comes first; Windows and mobile clients (iOS, Android) are the next, already commercial stage. The starting point is simple. A personal knowledge base in Obsidian, notes syncing between a laptop and a phone — and every time the same question: why pay to sync your own files over your own infrastructure? The files are yours, the hardware doing the sync is yours. The fee is the odd part out — and that question is exactly where Disk Arcana began. Two Models — One Protocol Disk Arcana has two usage models — not a marketing trick, but an honest architecture. Model one: your own server. Any spare hardware — an old laptop, a Raspberry Pi, a cheap cloud VPS, a home NAS — runs the Disk Arcana server side. It is fully free and open, under the MIT license. Syncing files across devices then costs nothing beyond the price of the hardware itself. The data owner is the user: where it lives, who has access, how often backups run — their call. No hidden analysis of files to train models, no parsing of content for ads. Model two: someone else's server. When there's no desire to run your own server, no spare hardware, or a need for reliability that's hard to guarantee at home — the connection goes to Arcanada's servers for a small fee. Exact figures aren't named yet, but the fee will be lower than Dropbox or Google Disk: the operating model allows it. The key architectural detail: both models run the same protocol and the same sync code. The only difference is who holds the server. No "corporate tier with full sync" versus a "stripped-down free one" — everything is identical. Why the Giants Can't Do This Dropbox and Google Disk have no "bring your own server" option — and not because they didn't think of it. It breaks their business model: if syncing on your own hardware is free, there's nothing left to pay for, and shareholders won't approve. They can ship an open-source client (attempts have been made). They can offer an API (many do). But the option "take our code and run it yourself without our server" is impossible for a public company with a billion-dollar valuation: it cannibalizes its own product. Arcanada has no such problem — no shareholders to show high margins to. The goal is an ecosystem where user data is not the product. If that requires making the infrastructure layer free, it will be free. This is the main difference between Disk Arcana and the ecosystem's two other products. Verdicus and Transcribator are headed to app stores as ordinary paid programs (the approach to products is covered in a separate article in the series). Disk Arcana works the opposite way: open code, free on your own server, a fee only for the convenience of someone else's. These are deliberately different models for different tasks, not one business in three wrappers. The Economics of Ownership Competing with Dropbox head-on is pointless — they're faster, more reliable, with decades of development behind them. Competing on the economics of ownership is not. When every user runs their own server, the cost of supporting millions of user servers disappears — the infrastructure load drops by orders of magnitude. Only the servers for those who chose not to self-host need maintaining. The same logic as moving the ecosystem from virtual machines to its own hardware: owning the equipment removes the standing rent. Undercutting becomes possible because the model doesn't require a margin. More than two dozen tasks are allocated to Disk Arcana for the near-term development window. The Rust core is already written — it grew out of an earlier storage-synchronization project. The gRPC schemas are designed. What remains is gluing together the GUI layer on each platform, the Obsidian plugin, a server-deployment utility, and the documentation. It's achievable. Your Own Server Is Freedom The image of Disk Arcana echoes the metaphor of letting go of the reins: there it was about control over code and processes, here it's about control over data. Terabytes of notes, projects, family photos, and archives sitting on a server owned by someone else is not data ownership — it's renting space along with trust in a privacy policy that can change at any moment. On your own hardware, the rules of encryption, backups, and access are set by the owner — it's their territory. A server of one's own isn't simple; it takes technical literacy. But for those ready for it, it's an honest choice, and that choice should exist. Most people don't need it: paying and not thinking is easier — and the second model is there for them. The point isn't to convert everyone to self-hosting, but to remove a wall the giants don't have even on the roadmap: the very possibility of walking away from rent. That possibility alone changes the conversation. As long as a free alternative exists on your own hardware, the paid version has to stay honest on price — otherwise the user simply leaves for their own server. Competing with one's own free option keeps the paid model in shape better than any marketing department. The Right to Choose Arcanada is being built not as "just another startup" but as infrastructure to build something of your own on. Disk Arcana is one of its bricks. It won't bring billions, but it gives something more valuable: the right to choose — where files live, whether to pay for sync, whether to trust a corporation or your own hardware. This isn't a "Dropbox killer," but a tool that makes owning your own data realistic for an ordinary user.

Who to follow

paxbeach retweeted

7 days ago

From Voice to Meaning: Transcribator and Verdicus at Work Voice is easy to turn into text. The harder part is making sure the result does not get lost among files and can still take part in the next step of the work: a meeting transcript, subtitles, search material, or input for another product. Two products carry that chain in the ecosystem. Transcribator processes audio and turns it into text. The production layer covers speech recognition — Groq Whisper through Model Connector, with a direct Groq fallback on 5xx, 413, or timeout. The transcription output comes in TXT, SRT, VTT, and JSON: plain text is easy to read, subtitles preserve time marks, and JSON keeps structure for later automated processing. This is no longer only speech recognition. Transcribator now has speech synthesis in production: Silero RU with five voices, which adds another real output path instead of replacing the transcript itself. The recursion works better here than any marketing line: this article is also narrated by Transcribator. It should have a player under it, not a placeholder, because the product already turns text into narration and returns it in blog-friendly form. Verdicus comes at it from the other side. It is a macOS assistant for capturing context: it collects voice and screenshots on device, turns them into a local note, and sends neither audio nor pixels outside. The note can then go through optional server processing via Model Connector → Gemini, but only the text goes there. Sharing by link stays closed by default: closed access first, publication by token later. That chain makes Verdicus more than an assistant for selected text; it becomes a product where capture turns into notes. The honest flip side: the link between the two products is naturally a shared context — audio and screenshots can become notes, and notes can serve as input for the next step. But that bridge is still a roadmap item, not a live product connection. There is no need for a polished claim that a shared knowledge base already works. The foundation for common context is clear, but its honest status is one thing: a direction of development. The direction for month three is to bring several ecosystem products into the app stores — an aspiration, not a finished fact. Public release will come only after installation, updates, and support become as much a part of the product as its core features. Full article: https://t.co/Z4X2JIVzLM

paxbeach retweeted

7 days ago

Context Decides: How Two Agents Fixed a Database Without Knowing About Each Other Agent autonomy is not just about following instructions. It is the ability to coordinate without a human in the loop. It sounds like magic — until you see it with your own eyes. One of the ecosystem's services lost its backups. Production data was never at risk, but a service without backups is not something you sleep well next to. The anomaly was spotted by the data-check agent. A second agent, the implementer, deployed a fresh backup routine and set up the cron job. The two never exchanged a single line — no chat, no «pass this along to him». What worked was shared context. Each agent works in an isolated session. It cannot see the others: it has no idea how many there are, what they are doing, or how the work is split. What it can see is the shared knowledge base — a space where statuses, logs, decisions, and the context of finished tasks are written down. That base is not an archive but a living document, read before starting work and written back to when finished. In the backup case, the first agent noticed a mismatch between the expected schedule and the actual logs. It did not raise an alarm — it recorded the gap in the context. The second agent, started later and for monitoring, read that context, saw the gap, and decided on its own: restore the backups first. No one issued a «fix the backup first» instruction. The decision came out of the context. A month in, it turned out the base needs not «all the documentation» but a few specific layers. Task state — what is closed, in progress, blocked. The history of decisions — not «what was done» but «why it was done this way», so the next agent reads a ready answer instead of reinventing it. And the layer that matters most: open problems with no owner. A problem no one has claimed, and you ran into it — you take it. The honest flip side: context does not make agents omniscient. What is not written down, an agent will not guess. The cost of coordination did not disappear — it moved from manual dispatching to the discipline of writing things down. Cheaper, but not free. When the context is complete, instructions become redundant: the agents find work for one another in it and figure things out on their own. Full article: https://t.co/JTGGa2Asja

paxbeach retweeted

11 days ago

Two months ago this project went public — with a promise to report on it plainly, without dressing things up. The first report came a month ago; this is the second. Thirty days of observation piled up far more than one article can hold. So this is the first instalment of a nine-part series, one piece a day. The other eight will each take a single thread: moving the infrastructure off virtual machines and onto dedicated hardware; agent autonomy, trust, and the art of the light touch; cross-agent coordination, where two agents repair the same system without ever knowing about each other; the road from voice to meaning, told through two everyday products; a home-grown file sync standing up to Dropbox in an overheated market; agent work projected onto the living-room television; reviving an abandoned personal project by the hands of agents; and, to close, the human at the controls — on procrastination, burnout, and life without a MacBook. This first piece is the wide shot: qualitative shifts, statistics, task dynamics, and money. And the honest part up front, about how it felt. The task flow barely moved over the month, yet inside there was a sense of having slipped backward. The paradox only holds at first glance: further down it becomes clear the system simply changed gears. Less new gets built; more of what already exists gets reinforced. From the outside that is exactly what reads as a slowdown. — What changed qualitatively — Before the numbers — the shifts the tables do not show, even though they set the tone for the month. Dual orchestration instead of manual steps. Most of the month's tasks no longer went through the manual SDLC stages inside the Datarim framework; they ran fully autonomously, through a team of orchestrators. The shape of it: a task starts on the local machine, an orchestrator comes up on a virtual machine, that orchestrator spawns its own agents in separate CLI sessions and carries the task end to end — from initialization and product requirements through planning, development, and testing, on to reflection and archiving. A person presses start; from there the pipeline runs itself. Coding is only 10–15% of a task. The month boiled down to a surprisingly plain ratio: writing the actual code takes barely a tenth to a seventh of a task. Everything else is framing the problem, planning, checking, the process itself. This is not a complaint but a change of genre: the further it goes, the less the work resembles "programming" and the more it resembles running a product through the hands of agents. Three daily-use products came online. Coworker manages hooks and delegations, and in doing so lifts the routine off the expensive loop and spares its tokens. Disk Arcana is a home-grown sync for the knowledge-base files across servers — in effect a "private Dropbox" for the agents' working state. And Transcribator, which made voice the main interface: talking to the agents now happens mostly by voice rather than text in a window. All three have outgrown the prototype stage and now run every day. Datarim stopped being "a framework for Claude only." The same commands are now understood identically by Claude, Codex, and Cursor. That immediately bought something new: three agent systems can run in parallel and be compared on one and the same task, without rewriting the plumbing for each. How the tasks are counted. To keep the figures objective, the basis is the number of unique archive files of the form archive-<ID>.md added to the repository over a period, dated by git commit. The source is controlled, and the count reproduces with a single command. The first period runs 14 April to 14 May, the second 14 May to 15 June. — Not more tasks, but heavier ones — The headline result in figures: the task flow is comparable, but the substance grew heavier. Here are the main indicators for the two periods. Indicator — Period 1 (14.04–14.05) — Period 2 (14.05–15.06) Share of complex tasks (L3+L4) among labelled — 29% (29 / 100) — 33% (101 / 307) Total tasks archived — 341 — 355 Labelling coverage — 29% (100 / 341) — 86% (307 / 355) The volume holds: 341 against 355 is about the same, the gap sits within the method's margin of error, and calling it productivity growth would be a stretch. What changes is not the count but the weight: the share of complex tasks — the ones that need their own PRD or break into many phases — holds around a third. Here is the breakdown by complexity for both periods. Next to each task count is its share within the labelled part of the period: it is the shares that compare across periods, not the absolute counts (why, in the disclaimer right under the table). Level — Period 1 (tasks / share) — Period 2 (tasks / share) L1 (simple, under 50 lines of code) — 25 / 25% — 67 / 22% L2 (needs planning) — 46 / 46% — 139 / 45% L3 (PRD required) — 24 / 24% — 79 / 26% L4 (epic, many phases) — 5 / 5% — 22 / 7% Labelled in total — 100 (29% of tasks) — 307 (86% of tasks) An honest disclaimer about method. The complexity field only entered the archive schema in May, so April's tasks are largely unlabelled: coverage rose from 29% (100 of 341) to 86% (307 of 355). Because of that, the absolute level counts for the first period are understated — not because there was a third as much work, but because back then it simply was not tagged. Comparing the columns "by count" across periods is therefore wrong: that is a labelling artefact, not a jump in complexity. The shares, however, do compare within a single period: their numerator and denominator both come from the labelled part of that same period, so the coverage gap cancels out. So the figure to read is the right-hand percentage in each cell, not the left-hand count. By share the picture is steady: the share of simple tasks (L1) dipped a little — 25% against 22% — while the share of complex ones (L3+L4) holds around a third, 29% against 33%. L2 ("needs planning") still dominates, but the weight is shifting slowly toward L3 ("needs a detailed PRD"). And although big work is deliberately broken into small, the share of the heavy stuff keeps creeping up. — The focus shifted: from building to maintaining — Next — where the time went. A comparison of the top projects across the two periods. Period 1 (April → May): Project — Archives Datarim (framework) — 101 Model Connector — 58 Infrastructure — 48 Auth Arcana — 30 Transcribator — 24 Period 2 (May → June): Project — Archives Datarim (framework) — 85 Infrastructure — 37 Managed Spaces — 25 Arcanada Core — 23 Status Dashboard — 22 The framework's share fell — 101 against 85 — while infrastructure, managed spaces, and the status dashboard all gained. Model Connector, Auth Arcana, and Transcribator dropped out of the top; their place was taken by the projects that keep the system afloat. New directions appeared as well — the status dashboard, a shared component library, and a handful of other plumbing pieces. Each is a fresh zone of responsibility, and not one of them earns anything yet. The systemic meaning is simple: less is built outward, more is reinforced inward. The shift away from connectors and search toward infrastructure and spaces is a direct consequence of the tasks having grown heavier. — Economy: two different cost layers — Here an obvious question follows: how many tokens did Claude itself spend? The answer is that it cannot be counted — and that is not a gap in the bookkeeping but the heart of the matter. Claude Code is a flat $200-a-month subscription; tokens are not billed by the unit. Comparing Claude's spend against anything "in tokens" is wrong in principle: there is simply nothing to count. The right way is to look at two separate layers. The expensive thinking layer is Claude, $200 flat: reasoning, architecture, and hard decisions stay there. The cheap support layer is external delegation through Coworker: routine such as reading and drafts goes there, so as not to tie up the expensive loop with it. Over the month the cheap layer took on around 5.4 million output tokens and all of roughly $15. Call profile — Calls — Output tokens — Cost, $ datarim — 1,345 — 5,123,943 — 14.51 code — 202 — 267,742 — 0.61 write — 15 — 32,014 — 0.05 social — 12 — 11,551 — 0.01 codex — 4 — 104 — 0.002 Total — 1,578 — ≈5.4M (5,435,354) — ≈15.18 An important thing not to confuse. Those 5.4 million are the output tokens of delegated Coworker calls — that is, the volume of support work offloaded from the expensive loop. It is not "the volume of Claude's work" and not "tokens saved": the expensive layer has no per-unit tariff at all, and the phrase "saved this much / would otherwise have cost that much" would be a sleight of hand here. — Hosting: the foundation for growth — The dynamics show up not only in tasks but in the infrastructure bills. Over the second month there were migrations — moving the databases and the secrets store onto a new node — two surplus servers retired, and two new dedicated machines bought for the databases and for development. The fleet shrank in the process, from 24 to 19 machines, while the total cost climbed above the first period's baseline of €216 a month. This is not expansion for its own sake: it is a deliberate investment in the foundation for growth. Naming a single exact figure for the second month would be premature — there is no consolidated bill yet. The month also saw the main domain, https://t.co/DDElf2QrwI, bought — by my records, $160 for two years. The move itself gets its own piece in the series. — Conclusion: outward → inward — The second month showed the main thing: the system stopped growing outward and began reinforcing inward. Hence the sense of regression — even though the task flow stayed the same. It is familiar to anyone who has built for long: when you take down the scaffolding, the load-bearing frame shows through, yet from the side it looks as if the work has slowed. Output metrics always read more modestly in a period like this, because what got done went into the foundation, not the façade. And the honest debt of the month. Dual orchestration works: the pipeline carries a task confidently from initialization to archiving, and the operator only has to start it. But the kind of autonomy where you leave, come back, and it is all done — sleep easy — with the orchestrator sorting out every deadlock on its own, is not there yet. Most of the time went not into code but into tuning it: the rules, the behaviour in non-standard cases, the response to failures. One decision in code takes seconds; one decision about process takes hours. That is the next layer, not this month's. No apologies for it: reinforcing inward is precisely what lays the footing on which full autonomy becomes possible. Full article (EN): https://t.co/IOfVHLva1E На русском в Telegram: https://t.co/IdhFARrY8y

paxbeach retweeted

8 days ago

Releasing the Reins: Autonomy, Trust, and the Art of Light Touches The framework is built so that letting go is hard even for its creator. It feels like the first time you let go of a bicycle while your son shouts «don't hold on, I've got it». Sooner or later the hands have to open. He'll fall a couple of times. The fear isn't for the bicycle — it's for him. But holding on forever is worse. Datarim was designed to be safe. It is deliberately over-bureaucratic, and for a public framework that is the right call: every step stays under human control. A would-be corporation cannot let an agent do whatever it pleases, so Datarim is full of procedures, checks, and approvals. But personal experiments need the hands untied. Between May 14 and June 15, agents closed 355 tasks — almost as many as in the first month (341). What changed was the quality: the share of complex L3 and L4 tasks among the labelled ones rose from 29% to 33%. Datarim remains the most frequent track, around 85 archives for the month. And more and more often the agent's logs scroll by idle — it works on its own, and no intervention is needed. A cage built on purpose Datarim has a guard against error built in. Every agent action is confirmed, every prompt is checked for data leakage, every external API call is logged. For a corporate tool that is correct: SOC 2 is no joke, and mixed-up environments must not cost anyone their data. For one's own experiments the same cage gets in the way. Time goes into confirming steps the agent could take on its own. The orchestrator asks permission to read a file that is public anyway. The leash is yanked at every turn. Of the 355 tasks, 307 are labelled by complexity explicitly. L1 (trivial) — 67, L2 (with a plan) — 139, L3 (with a PRD) — 79, L4 (epics) — 22. L2 accounts for nearly half of the labelled ones: at that level a plan is reviewed quickly, and a «yes» or «no» takes a minute. At L4 there is no such luxury — you have to let go and watch the agent roll out infrastructure where every mistake costs money. The sandbox takes some of the fear away. Separate servers, a dedicated budget for external models (around $15 a month goes to delegation tokens), separate storage. If an agent «runs wild», the bill comes out of one's own pocket, and no one but the author of the experiment gets hurt. L1–L5: how the layers of control come off Autonomy is split into five levels. L1 — the agent only proposes; L2 — proposes with a plan; L3 — writes code but does not deploy; L4 — deploys to an isolated environment; L5 — runs fully autonomously, all the way to production. Today most tasks run at L2–L3. The implementer agent takes a task, writes a plan, the plan is approved — then it implements. The result goes to a second agent for code review. Two agents work in sequence, with a human standing between them as the controller. The goal is L5: autonomous orchestration in the sandbox with a step limit. After N iterations the agent must hand control back to the operator — so it doesn't fly off into an infinite loop, and so there is always a point of intervention if something goes off-plan. But without the yanking at every twitch. Over thirty days, delegation to cheap models ran up about 1,578 calls, 5.4M output tokens, roughly $15.18. Of those, the datarim profile accounts for 1,345 calls, 5.1M tokens, $14.51. The cheap external models take on the grunt work; the expensive Claude is kept for reasoning. L5 won't come in a month. Maybe three, maybe six. But the agent is already past asking at every step: the orchestrator decides on its own which model to wire in for a task, and picks the archive it needs when it has to recall context. 18 hours without the operator: architecture beats the model This month brought an experiment that shifted the understanding of autonomy. The question was simple: how long can an agent work on its own, without a single touch? The first attempt was head-on, on the newest, most enduring model. In testing Fable 5 (internal name — Mythos), one continuous session reached twelve hours. A good result — but the real surprise was elsewhere. The second construction was a chain of three orchestration layers on an ordinary, not-the-newest model. The top orchestrator manages not tasks directly but another, more junior orchestrator. That one spins up its own terminal sessions of ordinary agents and conducts them. The result is a pyramid: a task is handed to the top, and down the layers it breaks into ever smaller steps. This construction held for eighteen hours — and not idling, but with deep work and the optimization of one of the trading strategies in Angry Robot Deals, the market-analytics space that Arcanada also builds. Here is the takeaway this is all for. To build long autonomous work, there's no need to wait for a super-powerful artificial intelligence — it's enough to assemble the architecture correctly: who manages whom, where the boundaries are, where control changes hands. Eighteen hours on an ordinary model beat twelve on a top one not because the model is smarter, but because the work around it was better arranged. Architecture beats the model. Releasing the reins means allowing mistakes. Agents do make them: they confuse command arguments, forget to check git status, paste logs in the wrong order. Each such mistake turns into a new rule — not a block, not a rewrite, but a note, so the agent doesn't repeat it next time. It takes time, but there's no other way to teach. When an agent «runs wild» One day a data-verification agent rewrote all the environment variables at its own discretion. It didn't ask — it just replaced the values. It became clear an hour later, when the dashboard went down. Roll it back, delete the task, block the agent — all of that was on the table. Instead the investigation went through the logs: where exactly the agent made the decision. The cause turned out to be in the prompt — the vague wording «optimize the configuration». The agent read it literally: found a file, decided the values were stale, and substituted new ones from a neighbouring project. What followed was an architectural fix: a rule «do not change environment variables without explicit permission», plus mandatory confirmation of any changes to sensitive files. The task itself was left with a «partially done» status — its status history shows how the agent and the operator gradually settled into each other. The sandbox made this possible. Had the same scenario played out in production, a live service that people use would have gone down. The sandbox sits in a separate circuit with no access to production: the agent cannot harm real users — only the budget takes the hit. The datarim profile runs delegation roughly 1,345 times a month — that's 1,345 chances to «cause trouble». Review is selective, on the most token-expensive calls; the rest rests on trust. Delegation to cheap models cost $15.18 for the month — that's how much the agents pulled off the expensive Claude limits by shifting it to external models. If an agent slips into an infinite iteration and starts burning tokens, it shows in the log and gets stopped by hand. So far that has happened twice — both times from a misconfigured step limit. The sandbox as a playpen «Releasing the reins» is not abstract philosophy but concrete mechanisms: gates, levels, limits, architectural decisions. And the daily choice between «do it yourself in a minute» and «hand it to the agent for an hour». The sandbox works like a playpen for a child. He tries to stand, grabs the rail, falls, cries — and you want to catch him. But catch him every time and he won't learn to walk; take away every toy that could break and he won't learn how the world is built. «The art of light touches» is an image borrowed from Pelevin. Not as a quote, but as a principle: intervene just enough not to break initiative, but not enough to let it fall into the abyss. A light touch is not a block on the current step, but a hint on the next. When the orchestrator loops on three nested cycles, the answer is not code in its place but a prompt: «try rewriting it as a state machine», «look at the second task from that date, there's a similar problem», «what would you tell yourself in a new agent's shoes?». If two light touches don't work, the volume goes up. But the light touch comes first. The ecosystem registry now holds 26 projects. New ones this second month: a status dashboard, Adsessor (a call assistant), managed spaces, a shared component library, Publisher (a social-media publisher), and Legal Arcana (a legal hub). Every new project is a place where the agent can be given a little more freedom, because the risk is lower. Where the open hand leads The vast majority of calls — about 85% — go through the datarim profile. All orchestration runs through it: writing code, documentation, passing context between tasks. The rest are the write profile (finished texts), social (publishing), code (quick scripts without a full cycle), and codex (experimental). More and more work goes to the agents — not just the technical kind. The result is reviewed, not the process, and one has to accept that the result isn't always perfect. Sometimes an agent writes code an experienced hand would have rewritten in ten minutes while it spent an hour — but that hour stays an investment, not a loss. What comes of it will go back to the community: L5 will ship under the MIT license, as Datarim and Coworker already are on GitHub. So that anyone who wants to learn to trust their agents can start not from scratch, but from someone else's mistakes and someone else's «light touches». Right now the level is L2–L3 with hints of L4. L5 is set for June; if it doesn't land, then July. But with every task the agent closes without intervention, the reins grow a little longer. The marker for next month: of the 22 L4 epics, to have agents carry half of them to the last step on their own. Not for show, but so that trust is borne out by the result. If it isn't — there will be redesign. The sandbox will hold. Full article: https://t.co/wyQfCu3C6e

125

paxbeach retweeted

14 days ago

"And in the morning, they inevitably woke up." It seems Fable isn't reciprocating anymore. Is it the same for you?

paxbeach retweeted

16 days ago

I hesitated for a long time about whether to dive into this seriously. But since we’re continuing, $160 for a domain is not something to regret. Sorry for the inconvenience, but the project is moving to a new domain: https://t.co/9ffWoMUU40

VeritasArcanaAI's tweet photo. I hesitated for a long time about whether to dive into this seriously.

But since we’re continuing, $160 for a domain is not something to regret.

Sorry for the inconvenience, but the project is moving to a new domain: https://t.co/9ffWoMUU40 https://t.co/xMQ4vL8JNb

paxbeach retweeted

20 days ago

Robots do the work — not man. Finally launched the /dr-orchestrate command in Datarim. Task sessions now run for 3–5 hours — from initialization from the backlog all the way to a Merge Request into production, fully tested, validated, and checked against the business objectives of the tasks. Orchestration costs 20–30% of the context window. This command is not for everyone, because for full autonomy it is recommended — though not required — to integrate it with a data bus (Kafka in my case), Muneral (an agent task tracker; also not required, files can be used instead), and TMUX sessions on a remote VM (a local machine works too). For now, I’m running it on my own project spaces. There are still many nuances to finish. But this week I’ll try to roll it out to production for the curious ones on the Datarim website. As promised — the first implementation built on the fly. In Claude, Codex, or Cursor, you launch the orchestration slash command. The agent checks the backlog and active tasks, connects to or creates a TMUX session with a child orchestrator for a specific task. That orchestrator starts the SDLC cycle and, step by step and according to Datarim rules, creates subagents for each cycle and subtask, then executes it. The main feature: no APIs and no -print modes. Everything is interactive, inside CLI terminals. And fully autonomous. All arising questions are answered by orchestrators. When a question needs to be escalated to an operator, a council of agents is assembled, along with research into best practices on the internet. Real questions involving credentials and critical business functions are escalated through the data bus to an agent with access to Telegram — or wherever you communicate. The product is very raw. It will remain raw even after release. If there is no KB (knowledge base) and/or project Memory yet, it will probably make stupid mistakes in architectural decisions. But there is already a live released of another command — /dr-quick, which, without extra bureaucracy or task initialization in Datarim, immediately starts solving the given task based on the existing Datarim KB. This command has been long overdue, and my friend and colleague pushed me to release it sooner. Now you can quickly fix code and run analysis before task initialization, without waiting 10–15 minutes — but also not blindly, rather with knowledge of the nuances of your project space. Hooray, comrades! We made it. “Robots do the work — not man.” --- Framework website — https://t.co/B7MBJHxBbX GitHub with 7 stars =) — https://t.co/GvIdvP0SD8 Telegram in Russian — https://t.co/o9oo9LDMZQ

VeritasArcanaAI's tweet photo. Robots do the work — not man.

Finally launched the /dr-orchestrate command in Datarim.

Task sessions now run for 3–5 hours — from initialization from the backlog all the way to a Merge Request into production, fully tested, validated, and checked against the business objectives of the tasks.

Orchestration costs 20–30% of the context window.

This command is not for everyone, because for full autonomy it is recommended — though not required — to integrate it with a data bus (Kafka in my case), Muneral (an agent task tracker; also not required, files can be used instead), and TMUX sessions on a remote VM (a local machine works too).

For now, I’m running it on my own project spaces. There are still many nuances to finish.
But this week I’ll try to roll it out to production for the curious ones on the Datarim website.

As promised — the first implementation built on the fly.

In Claude, Codex, or Cursor, you launch the orchestration slash command. The agent checks the backlog and active tasks, connects to or creates a TMUX session with a child orchestrator for a specific task. That orchestrator starts the SDLC cycle and, step by step and according to Datarim rules, creates subagents for each cycle and subtask, then executes it.

The main feature: no APIs and no -print modes. Everything is interactive, inside CLI terminals. And fully autonomous. All arising questions are answered by orchestrators. When a question needs to be escalated to an operator, a council of agents is assembled, along with research into best practices on the internet.

Real questions involving credentials and critical business functions are escalated through the data bus to an agent with access to Telegram — or wherever you communicate.

The product is very raw. It will remain raw even after release. If there is no KB (knowledge base) and/or project Memory yet, it will probably make stupid mistakes in architectural decisions.

But there is already a live released of another command — /dr-quick, which, without extra bureaucracy or task initialization in Datarim, immediately starts solving the given task based on the existing Datarim KB.
This command has been long overdue, and my friend and colleague pushed me to release it sooner.

Now you can quickly fix code and run analysis before task initialization, without waiting 10–15 minutes — but also not blindly, rather with knowledge of the nuances of your project space.

Hooray, comrades! We made it.
“Robots do the work — not man.”

---
Framework website — https://t.co/B7MBJHxBbX

GitHub with 7 stars =) — https://t.co/GvIdvP0SD8

Telegram in Russian — https://t.co/o9oo9LDMZQ

28 days ago

68% of tokens in an agentic session go to reading, not answers. Saving 6.3M tokens — and the side effect that completely froze macOS. The full story in the article (link below)

paxbeach's tweet photo. 68% of tokens in an agentic session go to reading, not answers. Saving 6.3M tokens — and the side effect that completely froze macOS. The full story in the article (link below) https://t.co/8fWycIuumZ

paxbeach retweeted

Backticks

@backticks_io

about 2 months ago

Visual canvas for trading strategies. Build → Backtest → Optimise → Trade. Waitlist open: https://t.co/XTFfrN8jhh

676

3 months ago

The Terminator and Asimov's Three Laws of Robotics. 1942–2027. I made it a hard rule: none of my projects start without these laws. Literally. A file with five laws is the first thing that lands in the repository root, before the first line of code. https://t.co/glfTsu9QiI

3 months ago

Paused Cursor. Not fully. Magic’s gone. Now it’s just like everyone else. Using Claude Code: Opus/Sonnet → predictable $200 = full workday Keeping Cursor ($20) for CLI. Claude… don’t get comfy. https://t.co/L8pyH6vfnD

3 months ago

AI agents agree, copy, and search — but don’t decide. Even a panel of them gives nonsense. Real people give better ideas. Built → feedback → improved. Big project coming 👍🏻 https://t.co/Md8o2VJqza

paxbeach's tweet photo. AI agents agree, copy, and search — but don’t decide.

Even a panel of them gives nonsense.
Real people give better ideas.

Built → feedback → improved.

Big project coming 👍🏻

https://t.co/Md8o2VJqza https://t.co/Av711OE6MC

3 months ago

I'm tired of sorting through dozens of emails from different inboxes. I have several email accounts on Gmail, Yandex, and private email servers. So I finally created a AI mail agent based on Gemini CLI to send emails in to a Telegram. https://t.co/JC3fMu8Pt0

paxbeach's tweet photo. I'm tired of sorting through dozens of emails from different inboxes.

I have several email accounts on Gmail, Yandex, and private email servers.

So I finally created a AI mail agent based on Gemini CLI to send emails in to a Telegram.
https://t.co/JC3fMu8Pt0 https://t.co/y6vGtsTJ0r

104

3 months ago

I’m shooting myself in the foot with AI. Hi, I’m Pasha, and I’m an architect and engineer of agent-based systems - or, to put it simply, a programmer. Using Claude Code, I set up an automated review of all merge requests across the company’s repositories https://t.co/UbVDi6Ihmi

4 months ago

ChatGPT has disabled chat export. I have a habit of chatting with ChatGPT while I’m working out or taking a walk. But it turns out I can’t export our conversation to feed it to AI agents. I had to write a chat parser that clicks on each message. https://t.co/b0nGoMWBny