Jeremy Blankenship @aliasocracy - Twitter Profile

Pinned Tweet

2 months ago

Most multi-agent AI systems don't fail at prompting. They fail at lifecycle semantics. A sub-agent says "done", the orchestrator trusts it, and the artifact ships without evidence. Enoch is my attempt to make that impossible.

1

0

128

Jeremy Blankenship

@aliasocracy

11 days ago

@Teknium Wow! Impressive. You should be very proud of yourself. The hard work paid off.

0

3

Jeremy Blankenship

@aliasocracy

21 days ago

@Teknium You and these infographics crack me up.

1

4

0

1K

Jeremy Blankenship

@aliasocracy

24 days ago

Since we are arbitrarily throwing around benchmarks:

0

15

Jeremy Blankenship

@aliasocracy

about 1 month ago

@bishara

0

282

Jeremy Blankenship

@aliasocracy

about 1 month ago

@Teknium @theo 🤣

0

1

0

329

Jeremy Blankenship

@aliasocracy

about 1 month ago

@Colosteve2000 @Teknium I added a note to the README. TLDR - you probably won't need the tool slimmer now - just the native Hermes thing.

0

21

Jeremy Blankenship

@aliasocracy

about 1 month ago

@KettlebellDan Well two things... The token consumption and cost are not competitive (all things considered... wrapper/harness features - model size and capacity versus others in market). The only thing I would consider using Grok for right now is sensitive / private work.

0

1

0

250

Jeremy Blankenship

@aliasocracy

about 1 month ago

@NoahKingJr Nice try, Elon.

0

4

Jeremy Blankenship

@aliasocracy

about 1 month ago

@lmariscal @xai Trying. It says: "This person's inbox is closed. They need to update their message settings before you can message them." -- I opened up my DMs. Thank you.

0

1

0

162

aliasocracy retweeted

Benjamin Marie

@bnjmn_marie

about 1 month ago

Unless you’re ready to spend serious time (and money) tuning hyperparameters, don’t mess with LLM reasoning traces. I evaluated multiple reasoning budgets and BNF grammar / structured CoT settings on Qwen3.6 27B. The results are underwhelming. Yes, it can work: for a few specific tasks, it significantly reduces inference cost by shortening reasoning traces while preserving accuracy. But in most settings, simply disabling reasoning is better, both for token efficiency and accuracy. Full analysis here: https://t.co/xxLLzVkASx

bnjmn_marie's tweet photo. Unless you’re ready to spend serious time (and money) tuning hyperparameters, don’t mess with LLM reasoning traces.

I evaluated multiple reasoning budgets and BNF grammar / structured CoT settings on Qwen3.6 27B.

The results are underwhelming.

Yes, it can work: for a few specific tasks, it significantly reduces inference cost by shortening reasoning traces while preserving accuracy.

But in most settings, simply disabling reasoning is better, both for token efficiency and accuracy.

Full analysis here:
https://t.co/xxLLzVkASx

18

168

13

91

24K

aliasocracy retweeted

Jeremy Blankenship

@aliasocracy

about 1 month ago

@KaiXCreator

0

1

0

654

Jeremy Blankenship

@aliasocracy

about 1 month ago

@KaiXCreator

0

1

0

654

Jeremy Blankenship

@aliasocracy

about 1 month ago

If anyone is needing a cheap hosting service, I found this one to be the most unique and useful: https://t.co/VFG3xUDqPp No referral links. Dallas and extremely fast. Paired with Tailscale... not sure there is anything else quite like it.

0

57

Jeremy Blankenship

@aliasocracy

about 1 month ago

This is why the wake gate watches the process tree and the telemetry windows instead of trusting the agent's own declaration that the work is done. Docs: https://t.co/JKrFPUL3Ua

0

5

Jeremy Blankenship

@aliasocracy

about 1 month ago

An agent can report that it is finished while child processes are still writing files, the GPU is still allocated, or the local state was never updated. The model has no direct view of those things. It only knows what it was told or what it can see in its current context window.

1

0

5

Jeremy Blankenship

@aliasocracy

about 1 month ago

If you only listen to what the agent says about completion, you have no independent way to know whether the run actually stopped or just went quiet for a while.

1

0

4

Jeremy Blankenship

@aliasocracy

about 1 month ago

@yoheinakajima behavior.failed as a first-class event in the log, not an exception that disappears. The audit trail captures why something broke, not just that it did. Anyone who's debugged a long-running agent at 2am knows the difference.

0

16

Jeremy Blankenship

@aliasocracy

about 1 month ago

@populartourist The n-gram cache scaling looks clean for single-stream, but the memory cost competes with KV cache under --parallel >1. In batched serving that tradeoff shifts. Curious how the curve looks with concurrent requests.

0

80

Jeremy Blankenship

@aliasocracy

about 1 month ago

@bravo_abad Tree search over LLM calls is the real pattern. ERA evaluates thousands of candidates per task. The scoring function and branching strategy matter more than the model. This is what agentic actually means in practice: search with a metric, not just a bigger prompt.

0

28

Jeremy Blankenship

@aliasocracy

Last Seen Users on Sotwe

Trends for you

Most Popular Users