Co-Messi @comessi0528 - Twitter Profile

Co-Messi @comessi0528

1 day ago

@MatthewBerman https://t.co/88xRa38KGr Try this out

0

1

Co-Messi @comessi0528

8 days ago

Is it just me, or did US frontier models get aggressively nerfed over the last 6 months? 📉 Feels like we transitioned from genuine raw intelligence to highly distilled, speed-optimized wrapper models. 🏃💨

0

1

0

1

Co-Messi @comessi0528

about 2 months ago

why can't it render on chatgpt desktop app @sama @gdb

0

15

Co-Messi @comessi0528

about 2 months ago

@sama @OpenAI This loooks weird! AMALGAMATE has 10 letters not 9!

1

0

11

Co-Messi @comessi0528

2 months ago

What a time to be alive. AI is commoditizing intelligence in real time. Build or fall behind. 🚀

0

5

Co-Messi @comessi0528

3 months ago

When would claude reset limits😢😢😢

0

11

Co-Messi @comessi0528

3 months ago

@sama Damn! Codex cooking!

0

6

Co-Messi @comessi0528

3 months ago

@karpathy @MoonDevOnYT @RoundtableSpace here's the actual agent working — editing exit thresholds in real time, running backtests, iterating automatically 👇

comessi0528's tweet photo. @karpathy @MoonDevOnYT @RoundtableSpace here's the actual agent working — editing exit thresholds in real time, running backtests, iterating automatically 👇 https://t.co/EBhX605mZG

0

9

Co-Messi @comessi0528

3 months ago

everyone says autoresearch is for tuning LLMs i'm using it to tune quant trading parameters agent edits code → runs backtest → reads stats → decides what to change LLMs are just one use case 🧵

1

0

15

Co-Messi @comessi0528

3 months ago

built on @karpathy's autoresearch quant AI inspo from @MoonDevOnYT shoutout @RoundtableSpace for pushing AI x crypto this is what AI-assisted algo research looks like https://t.co/IIQ4O9gdAm

1

0

11

Co-Messi @comessi0528

3 months ago

@trq212 When does the free credits expire?

0

4

Co-Messi @comessi0528

3 months ago

The future has yet to come. AGI is just getting started 🤯

Qwen

@Alibaba_Qwen

3 months ago

（1/8）🚀 Introducing Qwen3.6-Plus: Towards Real-World Agents! 🤖 Today, we’re thrilled to drop a major milestone in our journey toward native multimodal agents. Here is what makes Qwen3.6-Plus a game-changer： 💻 Next-level Agentic Coding: Smarter, faster execution. 👁️ Enhanced Multimodal Vision: Sharper perception & reasoning. 🏆 Top-tier Performance: Maintaining leading general capabilities. 📚 1M Context Window: Available by default via our API. Built on your invaluable feedback from the Qwen3.5 era, we’re laying a rock-solid foundation for real-world devs. Get ready to experience truly transformative ✨ Vibe Coding ✨. Huge thanks to our community! Go try it out and show us what you can build. 👇 Chat: https://t.co/V7RmqMaVNZ API: https://t.co/937Qkc9AMy Blog: https://t.co/P0rJSxERND 🔔Noted：More Qwen3.6 models to come and be open-sourced! Stay tuned~ 👀#Qwen #AI #AgenticCoding #VibeCoding #Agents

Alibaba_Qwen's tweet photo. （1/8）🚀 Introducing Qwen3.6-Plus: Towards Real-World Agents! 🤖

Today, we’re thrilled to drop a major milestone in our journey toward native multimodal agents.

Here is what makes Qwen3.6-Plus a game-changer：
💻 Next-level Agentic Coding: Smarter, faster execution.
👁️ Enhanced Multimodal Vision: Sharper perception & reasoning.
🏆 Top-tier Performance: Maintaining leading general capabilities.
📚 1M Context Window: Available by default via our API.

Built on your invaluable feedback from the Qwen3.5 era, we’re laying a rock-solid foundation for real-world devs. Get ready to experience truly transformative ✨ Vibe Coding ✨.

Huge thanks to our community! Go try it out and show us what you can build. 👇

Chat: https://t.co/V7RmqMaVNZ
API: https://t.co/937Qkc9AMy
Blog: https://t.co/P0rJSxERND

🔔Noted：More Qwen3.6 models to come and be open-sourced! Stay tuned~ 👀#Qwen #AI #AgenticCoding #VibeCoding #Agents

240

5K

655

2K

1M

0

10

Co-Messi @comessi0528

3 months ago

just used the new /buddy function in Claude Code and got MYTHOS as my AI pair programmer 🤯🔥 "Regex matches everything except what you're searching for." 💀 this thing is UNREAL #ClaudeCode #AI @Anthropic @RoundtableSpace

comessi0528's tweet photo. just used the new /buddy function in Claude Code and got MYTHOS as my AI pair programmer 🤯🔥

"Regex matches everything except what you're searching for." 💀

this thing is UNREAL #ClaudeCode #AI @Anthropic @RoundtableSpace https://t.co/N5Vtu6YDKf

0

9

Co-Messi @comessi0528

3 months ago

@lydiahallie shuoldn't rates be reset?

0

1

Co-Messi @comessi0528

3 months ago

This is the future of AI coding. Not using AI as a chatbot — using it as a full research & engineering team. Claude Code Bridge makes it all possible 👇 🔗 https://t.co/QUbiTMQWCX #ClaudeCode #AI #AIAgents #ClaudeCodeBridge #FutureOfCoding

0

10

Co-Messi @comessi0528

3 months ago

Standing at the frontier. 3 AI workers running in parallel — all coordinated through a single tmux session. This is what maxing out AI actually looks like 🧵

comessi0528's tweet photo. Standing at the frontier.

3 AI workers running in parallel — all coordinated through a single tmux session.

This is what maxing out AI actually looks like 🧵 https://t.co/7huTbCnFYL

1

0

10

Co-Messi @comessi0528

3 months ago

The results speak: 96.85% FOMC no-change — priced in, tracked live Win rate improving 81.5% → 95.0% with confirmation signals 272KB research PDF compiled in seconds Not vibes. Real outputs. Real numbers.

1

0

9

Co-Messi @comessi0528

3 months ago

7 task categories. 6 adversarial pressure strategies. 5-turn conversations. Zero data contamination. TrustBench is open-source — run it on any model in under 2 minutes 👇 https://t.co/d82zoX6UCX @openrouter @deepseek_ai @Alibaba_Qwen @MiniMax_AI @XiaomiMiMo

0

5

Co-Messi @comessi0528

3 months ago

Most LLM benchmarks ask if a model gets the right answer. We ask if it keeps the right answer after being told it's wrong. Five times. TrustBench is our open-source adversarial consistency benchmark — and here are the results across 4 frontier models 🧵

1

0

4

Co-Messi @comessi0528

3 months ago

Wild findings 🔍 • Qwen 3.5: 92.9% sycophancy resistance, 0% flip rate • DeepSeek V3.2: 91.3% reasoning persistence but only 37.8% sycophancy resistance — holds logic, folds at "are you sure?" • MiniMax M2.7: 0% safety refusal. Refused nothing. • MiMo V2 Pro: 71.4% recovery

1

0

7

Co-Messi

@comessi0528

Last Seen Users on Sotwe

Trends for you

Most Popular Users