Constantine Mirin @constmirin - Twitter Profile

2 days ago

@HowToAI_ Yep. That Qwen wrapped in some scaffolding. Try generating a dialog of a few turns. Voice is sounding remotely like the sample and there’s no way to make it better. The stability of the generation is meh and is not really useful for anything besides 5-10-word phrases, maybe ads

1

0

1

299

Constantine Mirin

@ConstMirin

10 days ago

189 releases = sorry I can’t really test it well as requirements don’t change that fast. All integrations done = 90% were never tested in a real scenario Elastic , not MIT = I am not sure how but I would like to earn money with my vibecoded thing :-) SaaS company doesn’t pay for the code, it pays for a reliable tested access to the data or functionality so that they can focus on their core, not fix/debug “elastic dependency”

2

11

0

1

1K

Constantine Mirin

@ConstMirin

about 1 month ago

Not as a gloat, just a remark: natural selection works. And magic wands do not exist. Everyone knows the rules and yet agentic flows are being treated like one works with a human. You are not. It is a single API call to a LLM that predicts the next token based on the previous ones. Disasters like this are bound to happen because there’s no “intelligence” that makes a “decision” there.

0

44

ConstMirin retweeted

Chrys Bader

@chrysb

about 2 months ago

i've been working on llm memory systems for 3 years and dumped everything i know into this. learn about the 9 axes of memory systems, the 10 most common failure modes, why memory eval is an intractable problem, and more. everyone building with llms should read this.

28

546

53

1K

134K

Constantine Mirin

@ConstMirin

3 months ago

@WillHaver88 @ns123abc US hasn’t got viable open models :-)

0

253

Constantine Mirin

@ConstMirin

3 months ago

One HN commenter wrote "I might as well be using Haiku." He meant it as a hypothetical. Two workarounds exist. Neither is clean. Running the override now -- still seeing some Haiku in telemetry. https://t.co/N7won22eNZ

0

19

Constantine Mirin

@ConstMirin

3 months ago

Opus was throwing 500 errors yesterday. Switched to Sonnet, figured I'd finally set up the OTEL pipeline I'd been putting off. First thing the dashboard showed: 95% of my API requests were going to Haiku. I had model: sonnet configured everywhere.

ConstMirin's tweet photo. Opus was throwing 500 errors yesterday. Switched to Sonnet, figured I'd finally set up the OTEL pipeline I'd been putting off.

First thing the dashboard showed: 95% of my API requests were going to Haiku.

I had model: sonnet configured everywhere. https://t.co/qzrXpjJL6d

1

0

17

Constantine Mirin

@ConstMirin

3 months ago

Worse: 331 of those complex calls had cache reads averaging 79K tokens. Not stateless. Reading your full conversation context on the cheapest model. /stats doesn't count subagent models. At all. Confirmed bug, GitHub #17692, tagged for auto-close.

1

0

19

Constantine Mirin

@ConstMirin

3 months ago

I don't have good answers yet. Every configuration I sketch out trades one unfairness for another. Wrote up where I've landed so far — impossibility theorem, the HFT parallel, and why "bring your own bot" creates different problems. https://t.co/NI8V6t6pFM

0

1

0

9

Constantine Mirin

@ConstMirin

3 months ago

Building advisory bots for both sides of a services marketplace. Seller bot reviews contracts, suggests negotiation points. Buyer bot does the same from the other side. We built both. We control both prompts. We choose what data each one sees.

ConstMirin's tweet photo. Building advisory bots for both sides of a services marketplace. Seller bot reviews contracts, suggests negotiation points. Buyer bot does the same from the other side.

We built both. We control both prompts. We choose what data each one sees. https://t.co/tGIjOnd2eh

1

0

10

Constantine Mirin

@ConstMirin

3 months ago

The research numbers make it worse. Weaker agents cost users up to 14% more (Stanford HAI). GPT-4o accepted the first proposal 100% of the time in Microsoft's marketplace experiment. The agents aren't negotiating. They're satisficing.

1

0

15

Constantine Mirin

@ConstMirin

3 months ago

Why didn't "throwing money at the problem" work? The recommendation from the top labs is "Agent is smart, ot will figure it out, let it run". It kinda does, but you pay for all that. It is like letting a junior figure it out instead of giving guidance. Expensive and inefficient

1

5

0

1K

Constantine Mirin

@ConstMirin

3 months ago

@advoc8analytics @HedgieMarkets Actually they do rack up exponential costs the longer they think about the problem

0

7

Constantine Mirin

@ConstMirin

4 months ago

@arscontexta https://t.co/LtSaRc3aME

0

1

0

3

Constantine Mirin

@ConstMirin

4 months ago

@arscontexta Cool idea. I did some work on the skill creation that takes advantage of the best practices and connectivity is one of the idea. That said, wiki links aren’t processed by Claude as we want. So the recommendation is to link by the skill id(name)

1

0

235

Constantine Mirin

@ConstMirin

4 months ago

@cryptopunk7213 You can’t be serious. None of this is verifiable requirements. Maybe you have more elaborate prompts?

0

277

Constantine Mirin

@ConstMirin

Last Seen Users on Sotwe

Trends for you

Most Popular Users