Sergii Guslystyi

@JuiceSharp

Software Architect • AI Systems that Work • Productivity tools | Chess • Father on the journey🙏✨

Florida, USA

Joined January 2010

242 Following

337 Followers

1.2K Posts

Sergii Guslystyi

@JuiceSharp

10 minutes ago

@__alpoge__ Is IPO around the corner? My internal data shows the hypometer is out of available scale

309

Sergii Guslystyi

@JuiceSharp

2 days ago

@dani_avila7 What did prevent them from visualizing that inside of TUI? Let’s say by running /workflow [name] command? I agree with the fact that observability as well as the ability to customize/adjust/reuse is everything.

Sergii Guslystyi

@JuiceSharp

2 days ago

Working on a much more powerful (and useful), way more transparent version for the https://t.co/DQFh3idzVT coding agent. Eventually you’ll be able to generate your workflow based on skills, scripts, and integrations based on prompts, including contracts among stages. You should be able to reuse the flows, customize them for your needs: have steps that connect your harness to external systems, and the ability to access your data the way you want, use deterministic logic in place and use the models you wish in each components. The only good idea in Anthropic’s version is dynamic generation. I don’t see any reason to generate Python under the hood and have a runtime based on subagents only, without the ability to bend that flow for your needs or use skills as the steps. Anthropic’s version is half-baked and very limited, as it doesn’t let you build a real harness that you’re in control of. Look preview of rpiv-workflow package on my github: https://t.co/xMRSP4GnFh

Thariq

@trq212

2 days ago

https://t.co/R6exTuF7P8

226

10K

22K

113

Sergii Guslystyi

@JuiceSharp

5 days ago

@badlogicgames @kushaldas @mitsuhiko I saw today similar behavior on Mimo 2.5 Pro

662

Who to follow

Vadim Tikanov

@vadimtikanov

You are where you are because of who you are.

Sergii Guslystyi

@JuiceSharp

6 days ago

@badlogicgames Having several sessions in one process. No RPC, no bootstrap, zero infra, just in-process sync. If the flow is stateless or needs no durability, process-per-session is pure overhead. Flue-like, the process boundary is just the priciest tier.

Sergii Guslystyi

@JuiceSharp

6 days ago

@badlogicgames Simple example is running of a simple fanout in parallel

Sergii Guslystyi

@JuiceSharp

6 days ago

@badlogicgames To be able run more than a single session simultaneously in process.

Sergii Guslystyi

@JuiceSharp

7 days ago

Agree same time there is a nuance ... they are trying to eat this time a part of the control flow around the models (as usual hiding it), a one possible consequence that the deterministic flows and loops we own will be pushed up if properly utilized the "new" abilities or get more dynamic as well.

242

Sergii Guslystyi

@JuiceSharp

7 days ago

might be that just self-soothing or self-calming on my side :) only time shows... I am glad you found the thoughts insightful. I do believe in the direction for sure - human will stay inside the loop for a while but not everyone keep the job ... human should learn how to push models back even along SDD, steer it timely, reject decisions etc. Believe or not I am quite often following agent output from stage to stage ... just to see a moment one goes off the rails :)

Sergii Guslystyi

@JuiceSharp

17 days ago

This anthropic's representative post shows the limits of their current approach in prompting. That's exactly why "wrapper builders" have good odds. The highest-leverage primitive in agentic engineering isn't a smarter model. It's the structured pause - the ability to ask before committing. A post-hoc "I picked X because Y" is exhaust (still helpful pattern), a postmortem journal of decisions the model already made in one forward pass. By the time you read it, X is already an import, a schema, a public type. And no matter how detailed the spec, ambiguities exist. I learned that the hard way. The reason labs can't quietly absorb this layer is structural. Every autonomous coding eval scores the model on completing without asking, so asking is a leaderboard loss. But that's the symptom. The deeper structure: a forward pass is just generation... It can't pause itself. Deterministic control flow (pauses, gates, checkpoints) lives outside it ... reliability comes from architecture, not instruction. The harness has the opposite gradient. The user is the eval and the driver, not the benchmark. The harness can enforce pauses because pauses are control flow, not sampled tokens. An opinionated metaflow (like mine https://t.co/pU8NwonxtI, @dexhorthy's CodeLayer/HL, the many other flows enthusiasts built on top of the models) is uncompressible into a general purpose API. Not because the model can't learn workflow shapes, but because workflows are control flow around the model. This is a product competition on even ground. Not a model eating a layer.

Thariq

@trq212

17 days ago

a prompt I've been using a lot recently: implement <SPEC> and while you do, keep a running implementation-notes.html file (or markdown) with decisions you had to make weren't in the spec, things you had to change, tradeoffs you had to make or anything else I should know

trq212's tweet photo. a prompt I've been using a lot recently:

implement <SPEC> and while you do, keep a running implementation-notes.html file (or markdown) with decisions you had to make weren't in the spec, things you had to change, tradeoffs you had to make or anything else I should know https://t.co/qQFTES4fjo

343

10K

582

12K

819K

392

Sergii Guslystyi

@JuiceSharp

7 days ago

https://t.co/tbTkHziT7E

107

Sergii Guslystyi

@JuiceSharp

7 days ago

Three years ago I was on the same page. Classical Detroit school, state-based, avoid mocks. AI changed my mind, but less than I first thought. what AI actually killed is the maintenance cost. Tests got so cheap to write and rewrite that a lot of coupled tests I used to throw away are worth keeping now (bellow why). In that sense, yes, coverage starts to outweigh coupling.... The coupling had a second cost that has nothing to do with how cheap tests are ... false confidence. When the agent writes the code and the test at the same session, the test just confirms its own assumptions instead of constraining them. So I don't think it's "fewer seams" vs "more coverage". There are two kinds of tests now (in my point of view). A few behavioral outside - the real proof the system works. And a pile of cheap coupled ones for the agent's loop The cheap ones still matter, just not as assurance. They localize. When the agent breaks something, a fine-grained test points at the exact function, so the loop is fast and it self corrects instead of hunting Your point that some apps can't reduce to one seam sounds right for me. That's where mocks lie the most, so there one may spend on one real integration seam instead of ten mocked ones... Keep a few outside seams as the real proof, prompt the agent to protect that layer (those tests are forbidden "to fix by modification" and let the pile stay disposable.

190

Sergii Guslystyi

@JuiceSharp

10 days ago

@plainionist Found this article helpful and almost on the subject: https://t.co/x6B9RPW1UK

Sergii Guslystyi

@JuiceSharp

10 days ago

@mattpocockuk right… but is the issue the grip or the hands? :)

116

Sergii Guslystyi

@JuiceSharp

12 days ago

@badlogicgames Best possible way

Sergii Guslystyi

@JuiceSharp

14 days ago

7th to the table https://t.co/SbzQ9t5KUt. The article answers why SDD and I know exactly the answer on Pi + SDD = ? … we will see some benefits quite soon.