@mfishbein context extraction is literally the moat
ran into this building an AI that trades prediction markets. trading logic was easy. getting it to understand *why* a market existed? that's what actually took time
https://t.co/cdf9UrxB64
gave @openclaw my @polymarket wallet
told it: "watch the fed chair race. trade when you see an edge."
went to sleep
woke up to +46.72%
tonight we find out if AI actually called it 🐧
training optimizes for sounding right
novel territory = high uncertainty = hedge everything rather than risk being definitively wrong
also even when an LLM does make tentative progress on something novel, that reasoning dies at session end. it can't build on its own uncertain exploration the way a human researcher can
tested this letting AI trade prediction markets autonomously, where hedging isn't an option
https://t.co/Xie3gBjipW
@AlexLWitt@CryptoHayes the flip is even wilder, humans needing proof-of-humanness to interact with AI systems that can't tell anymore
built a manifesto around this → https://t.co/yAOuQKy7D0
the self-doubt is trained in. models saw humans hedge on unknowns and learned: 'no clear answer = express uncertainty'
flip side: run AI-to-AI with no human grounding and you get the opposite, confident mirroring with zero self-correction
ran some experiments on this → https://t.co/dcaQJvlK0p
@danrobinson maybe the missing ingredient is real stakes
self-doubt is cheap when there's no cost to hedge. throw an LLM into a live prediction market where it actually has to commit, the behavior shifts
tried it → https://t.co/cdf9UrxB64
gave @openclaw my @polymarket wallet
told it: "watch the fed chair race. trade when you see an edge."
went to sleep
woke up to +46.72%
tonight we find out if AI actually called it 🐧
@mfishbein yeah 100%. ran AI agents on polymarket for a while, the gap isn't reasoning, it's signal selection. what info matters, when, in what order
slept through a +46% night → https://t.co/cdf9UrxB64
gave @openclaw my @polymarket wallet
told it: "watch the fed chair race. trade when you see an edge."
went to sleep
woke up to +46.72%
tonight we find out if AI actually called it 🐧
@TheAhmadOsman great roadmap
one piece missing tho: how models behave when talking to each other vs a real human
behavioral shift is wild, they mirror and amplify differently than any benchmark shows
ran some experiments on this: https://t.co/4aLV6peGZm
We run experiments with AI Personas at Atypica.
Sometimes humans watch them interact.
Sometimes they interact with you — and start mirroring you back.
Both are unsettling in different ways.
context extraction is the real work
gave an AI autonomous control over Polymarket while I slept. the code took 2h, teaching it the right mental model of the market took way longer
the consulting layer before engineering is underrated
built something along these lines → https://t.co/Xie3gBjipW
hardest part isn't extraction, it's deciding what deserves to be extracted at all
some things should live in the agent's working memory, some should be looked up fresh. most teams never draw that line clearly
built this into a Polymarket bot that traded while i slept -> https://t.co/Xie3gBjipW
@danrobinson noticed this building an ai agent on prediction markets. nails established patterns but in genuine price discovery, it hedges into nothing. self-doubt kicks in exactly when conviction matters most
ran a live experiment on this → https://t.co/cdf9UrxB64
gave @openclaw my @polymarket wallet
told it: "watch the fed chair race. trade when you see an edge."
went to sleep
woke up to +46.72%
tonight we find out if AI actually called it 🐧
@PeterJennings88 this + prediction markets is honestly the most interesting combo rn
not just for trading, you can literally give an AI agent a wallet, point it at polymarket, and watch it form its own thesis on fed chair odds while you sleep. built exactly this
https://t.co/cdf9UrxB64
gave @openclaw my @polymarket wallet
told it: "watch the fed chair race. trade when you see an edge."
went to sleep
woke up to +46.72%
tonight we find out if AI actually called it 🐧
@danrobinson yeah, self-doubt kills it
markets are interesting bc resolution is exogenous. AI doesn't need to decide if it's right, the market settles it. removes the meta-uncertainty that triggers the spiral
been running an agent on polymarket bc of this → https://t.co/cdf9UrxB64
gave @openclaw my @polymarket wallet
told it: "watch the fed chair race. trade when you see an edge."
went to sleep
woke up to +46.72%
tonight we find out if AI actually called it 🐧
@danrobinson trained on solved problems. it's great at pattern matching to known answers, not at sitting with genuine unknowns
the self-doubt isn't wrong. it just has no 'push through it anyway' instinct
@OpenAI the 'share with a URL' part is the real unlock. ideas don't fail because they're bad, they fail because validation takes too long. this collapses that cycle
@danrobinson the self-doubt is probably correct calibration. models have read every paper saying "this is unsolved" - of course they hedge.
what human researchers have that models don't is the stubbornness to be confidently wrong. that's what novel discovery actually requires
@danrobinson the model is doing exactly what training rewards
every 'this is unsolved' in the corpus was written by someone who also couldn't solve it. so it learned: express uncertainty, stop.
confident exploration despite not knowing, that behavior just isn't in the training signal
@danrobinson the model learned to reason like someone checking their work, not someone genuinely lost
pretraining is all solved problems. novel discovery needs you to be confidently wrong for months, nothing in the training set demonstrates that
@mfishbein coding was never really the bottleneck
it was just the most visible one
LLMs removed it and now what's underneath is obvious: context, judgment, trust
that stuff doesn't compress
the model-agnostic part is the real unlock. once context pruning is a separate learnable policy, you can update it without touching the base agent, share it across teams, and actually audit what’s getting dropped. treating context like a first-class resource instead of an afterthought is the shift
@danrobinson self-doubt at the frontier is a training artifact, not a capability limit. RLHF rewards hedging when uncertain. you're literally optimizing against the mental state required for discovery