Is it just me, or did US frontier models get aggressively nerfed over the last 6 months? ๐
Feels like we transitioned from genuine raw intelligence to highly distilled, speed-optimized wrapper models. ๐๐จ
@karpathy@MoonDevOnYT@RoundtableSpace here's the actual agent working โ editing exit thresholds in real time, running backtests, iterating automatically ๐
everyone says autoresearch is for tuning LLMs
i'm using it to tune quant trading parameters
agent edits code โ runs backtest โ reads stats โ decides what to change
LLMs are just one use case ๐งต
built on @karpathy's autoresearch
quant AI inspo from @MoonDevOnYT
shoutout @RoundtableSpace for pushing AI x crypto
this is what AI-assisted algo research looks like
https://t.co/IIQ4O9gdAm
๏ผ1/8๏ผ๐ Introducing Qwen3.6-Plus: Towards Real-World Agents! ๐ค
Today, weโre thrilled to drop a major milestone in our journey toward native multimodal agents.
Here is what makes Qwen3.6-Plus a game-changer๏ผ
๐ป Next-level Agentic Coding: Smarter, faster execution.
๐๏ธ Enhanced Multimodal Vision: Sharper perception & reasoning.
๐ Top-tier Performance: Maintaining leading general capabilities.
๐ 1M Context Window: Available by default via our API.
Built on your invaluable feedback from the Qwen3.5 era, weโre laying a rock-solid foundation for real-world devs. Get ready to experience truly transformative โจ Vibe Coding โจ.
Huge thanks to our community! Go try it out and show us what you can build. ๐
Chat: https://t.co/V7RmqMaVNZ
API: https://t.co/937Qkc9AMy
Blog: https://t.co/P0rJSxERND
๐Noted๏ผMore Qwen3.6 models to come and be open-sourced! Stay tuned~ ๐#Qwen #AI #AgenticCoding #VibeCoding #Agents
just used the new /buddy function in Claude Code and got MYTHOS as my AI pair programmer ๐คฏ๐ฅ
"Regex matches everything except what you're searching for." ๐
this thing is UNREAL #ClaudeCode#AI@Anthropic@RoundtableSpace
This is the future of AI coding. Not using AI as a chatbot โ using it as a full research & engineering team.
Claude Code Bridge makes it all possible ๐
๐ https://t.co/QUbiTMQWCX
#ClaudeCode#AI#AIAgents#ClaudeCodeBridge#FutureOfCoding
Standing at the frontier.
3 AI workers running in parallel โ all coordinated through a single tmux session.
This is what maxing out AI actually looks like ๐งต
The results speak:
96.85% FOMC no-change โ priced in, tracked live
Win rate improving 81.5% โ 95.0% with confirmation signals
272KB research PDF compiled in seconds
Not vibes. Real outputs. Real numbers.
7 task categories. 6 adversarial pressure strategies. 5-turn conversations. Zero data contamination.
TrustBench is open-source โ run it on any model in under 2 minutes ๐
https://t.co/d82zoX6UCX
@openrouter@deepseek_ai@Alibaba_Qwen@MiniMax_AI@XiaomiMiMo
Most LLM benchmarks ask if a model gets the right answer.
We ask if it keeps the right answer after being told it's wrong. Five times.
TrustBench is our open-source adversarial consistency benchmark โ and here are the results across 4 frontier models ๐งต