Dani Alami @Daniel_Alami - Twitter Profile

I've built a catalog of agentic engineering patterns, as well as some reflexive engineering primitives for a somewhat naive recursive AI implementation I've been working on. They're nothing fancy, just practical patterns that came out of real bugs we hit. If you're curious, check them out: → Agentic Engineering Patterns https://t.co/1sRjdORvqN → Reflexive Engineering Primitives https://t.co/ubCLqDr5Mg

0

14

Dani Alami

@Daniel_Alami

about 8 hours ago

A thought: improving LLMs is like improving cars - some keep focusing on the engine, while the true power may lie in bettering the chassis.

0

6

Dani Alami

@Daniel_Alami

about 21 hours ago

@thsottiaux Idea: Internal agentic Chaos Monkey / bounty that inserts entropy to mess with Codex’s reliability. If it succeeds, you RCA it, fix the root cause, and reset our limits ad infinitum. Win-win. Deal?

0

3

0

374

Dani Alami

@Daniel_Alami

2 days ago

@tunguz A way for big4 and other consulting firms to print millions of dollars, faster than the Fed ever could.

0

21

0

3K

Dani Alami

@Daniel_Alami

2 days ago

It would be interesting for Codex to self-pace when approaching weekly limits - as well as asking users if they want to continue the current work or reprioritize.

0

21

Dani Alami

@Daniel_Alami

2 days ago

@DimitrisPapail Agents do love strawmanning and drifting towards easier tasks. Feel like you periodically need to pull them back. I use “Don’t agent-strawman me” quite often.

0

1

0

1

218

Dani Alami

@Daniel_Alami

2 days ago

In a world where humans make decisions driven by cognitive biases (eg loss aversion), will offloading far more choices to LLMs, lacking such utility functions, make outcomes more iatrogenic?

0

14

Dani Alami

@Daniel_Alami

2 days ago

@IterIntellectus It means whatever it has to mean to raise shitloads of money.

0

77

Dani Alami

@Daniel_Alami

2 days ago

@waitbutwhy Give us an update on this. Did you read it?

0

8

Dani Alami

@Daniel_Alami

3 days ago

Some insights on using LLMs to forecast: 1. Do not ask “which LLM is best at forecasting?” Ask “best on which kind of question?” In my experiments, model rankings change by corpus/source and even by metric. A model can look best by Elo and worse by Brier. 2. Naive model ensembles are not automatically better. Averaging forecasts can lose to the best single model. The useful signal seems to be conditional routing by question/ source/model family, not “just take the mean.” 3. A model’s confidence explanation is not the same thing as calibration. Chain-of-thought style content can sound epistemically rich while being weakly related to actual Brier error. 4. Auxiliary channels can predict forecast error. Asking for “worry,” “confidence,” or related side-channel estimates can reveal whether the model’s probability is fragile. But the sign and usefulness are model-family dependent. 5. Never pool worry signals across model families without checking sign. One family’s “worry” can mean useful tail-risk awareness; another’s can mean generic uncertainty theater. 6. Forecasting skill decays with source/cutoff currency. Post-cutoff, source-fresh questions appear much harder. This suggests LLM forecasting skill is partly inherited from training-distribution currency, not pure reasoning. 7. The practical object is not a better prompt. It is a forecast validity layer. The system should decide when to trust, route, shrink, ensemble, abstain, or demand fresh evidence. 8. Decision utility is stricter than Brier. A forecast can be better-calibrated in aggregate but still hurt downstream decisions if the threshold policy is wrong. 9. The strongest near-term product implication: use LLMs as conditional forecasting instruments, not standalone forecasters. They can generate probabilities, side-channel diagnostics, decomposition, and routing features. The deployment layer has to arbitrate. 10. A possible "law": LLM forecasting errors are structured by carrier, source currency, and family-specific elicitation channels. If true, forecasting improvement comes from discovering the right conditional invariants, not from making one universal “superforecaster prompt". Doing a bunch of follow-up experiments on when human forecasting biases transfer to LLMs, when they disappear, and when they mutate into different failure modes. Genesis of this was inspired after reading @PTetlock's work!

0

1

0

26