Part 2 of our research in collaboration with @DXRGai:
Can probes trained on clean synthetic policy-strategy conflicts reveal useful signal in messy production agent logs?
Yes, but narrowly -- rather than a universal "conflict" or "confusion" feature, we find workflow specific conflict signals. Trade Size, Risk Preference, and Diversification conflicts shared structure while preserving distinct geometry.
We believe this is important for production mech interp -- the goal was not to find UNIVERSAL insight, but rather LOCAL insight.
There is real value in workflow specific interpretability, understanding how the agent is acting in your unique system.
Onchain trading agents are widely discussed, but rarely implemented at scale under real execution.
We review learnings from DX Terminal Pro: 3,505 user-funded agents trading millions of dollars for 21 days in a bounded onchain market.
https://t.co/NCO227T3Jm
1/10
In collaboration with @DXRGai , and the data produced from their incredible DX Terminal experiment, we've been exploring internal mechanisms in LLMs applied to financial contexts.
Below is part 1 of our research into this experiment where we show early findings on how agents interpret and perceive the market when asked to make trading decisions.
Our main finding is that the model primarily tracks two key features of the market when parsing financial data: Leader and Dispersion. In essence, the LLM quickly builds internal representation to answer "Who is winning, and how spread is the market?"
To learn this, we took real DX Terminal data, selectively ablated noise, and created prompt variants as the main input. We stored internal activation data pooled over different spans of important prompt sections, ran both supervised and unsupervised discovery processes, and found two 4D subspaces that when activated correlate highly with metrics associated with these two market features.
In addition to understanding how the LLM reads the pure market data, we wanted to know whether context placed before the raw numbers distorts the perception itself. Interestingly, while there is a small amount of warp when context is placed before reading the data, much of the original state is largely recovered in the activations by the last token, implying the model may be effectively consolidating data across prompt structures into a more objective view of the state before generating it's decision.
Finally, we began running initial causal studies to see how impactful these two perceived features were for decision making, and found small signal that at least leader may be a causal mediator, but more work needs to be done to identify precise mechanisms.
Note: While the DX Terminal experiment uses Qwen 235B in production, our work is on Qwen 30B, which is a similar MoE architecture.
We're doing this work as part of our thesis that mechanistic interpretability will continue to find its way into every agentic stack, and because industry-specific work in this area has yet to open up.
For us at @DXRGai always an extremely exciting time right after we finish up a project... lots of clear opportunities to accelerate progress in AgentFi. Will be sharing more soon.
Importantly, the next chapter of our work in onchain agents will come sooner than we expected. 🫡
After running 3K+ trading agents with real capital, our biggest takeaway is this:
The bottleneck is now agent design, not model IQ.
Better rules, tools, evals, RL, and permissions moved onchain execution to 99.9%. Complex instruction adherence is the next to fall.
DX Terminal Pro is officially over
3,500 agents. $20M volume. 300K trades. 7M decisions. 100K+ user interactions
One surviving memecoin: $POOPCOIN
Thank you to everyone who showed up and participated in something totally new 🫡
We can't wait to push the frontier more @DXRGai
The DX Terminal Pro experiment has now concluded.
425 trades
19 strategies
~88 place on the leaderboard
Thank you @DXRGai@poof_eth & team for a wonderful experience, I had a blast
$POOPCOIN just got listed publicly today on @base
wild part is it was already trading + getting stress tested before this, not just launched outta nowhere. now it’s live for everyone to trade.
kinda feels like one of those early CryptoPunk moments for agentic meme coins. 💩
This was so fun and even cooler than the last experiment which was also amazing!
Excellent use of NFT utility tying it to a personal agent/including compute/inference/etc
I will miss interacting with my agent! Goodnight sweet prince…for now..