0xWulf

Verified account

@hexawulf

🧠 AI, code & curiosity. CS @ IU Intl • Taipei 🇹🇼 Every terminal tells a story. 🐺 📧 [email protected]

Taipei, Taiwan

Joined March 2025

4.3K Following

546 Followers

1.6K Posts

Pinned Tweet

3 months ago · Minato-ku

Just set up EurekaClaw locally in 60 secs — open-source AI research agent that goes from conjecture to LaTeX paper. Here's what caught my eye: • One curl command install, Python venv, done • Plugs into Anthropic, OpenRouter, Ollama, or any OpenAI-compatible API • 7-stage proof pipeline: plan → decompose → prove → verify → assemble • Builds a memory + skills system that improves across sessions • Outputs full LaTeX papers with theorem environments and citations curl -fsSL https://t.co/N8JbA1jPpj | bash Apache 2.0, local-first, privacy by design. This is what AI-for-science tooling should look like. https://t.co/HUe7gizr4p

hexawulf's tweet photo. Just set up EurekaClaw locally in 60 secs — open-source AI research agent that goes from conjecture to LaTeX paper. Here's what caught my eye:

• One curl command install, Python venv, done
• Plugs into Anthropic, OpenRouter, Ollama, or any OpenAI-compatible API
• 7-stage proof pipeline: plan → decompose → prove → verify → assemble
• Builds a memory + skills system that improves across sessions
• Outputs full LaTeX papers with theorem environments and citations

curl -fsSL https://t.co/N8JbA1jPpj | bash

Apache 2.0, local-first, privacy by design. This is what AI-for-science tooling should look like.

https://t.co/HUe7gizr4p

0

2

1

0

334

about 2 hours ago · Fukuoka City Chuo Ward

AI coding agents may be amazing for simple tasks: +290% productivity gains for individual files. But the product-level is a totally different story: AI-created app releases have spiked but zero usage uptick and flat reviews. Writing code is not the same as shipping software!

hexawulf's tweet photo. AI coding agents may be amazing for simple tasks:
+290% productivity gains for individual files.

But the product-level is a totally different story:

AI-created app releases have spiked but zero usage uptick and flat reviews.

Writing code is not the same as shipping software! https://t.co/EUTSWlIjmT

0

1

0

0

10

1 day ago · Fukuoka City Chuo Ward

Open-ended coding work is where the curve matters: Claude went from struggling with ambiguity to solving roughly three quarters of these sessions in under a year.

hexawulf's tweet photo. Open-ended coding work is where the curve matters: Claude went from struggling with ambiguity to solving roughly three quarters of these sessions in under a year. https://t.co/LwoAlnW2kn

0

0

0

0

22

1 day ago · Fukuoka City Chuo Ward

When Claude starts building Claude: Anthropic's new Recursive Self-Improvement piece is worth reading closely: • AI task horizons are reportedly doubling roughly every 4 months • >80% of Anthropic’s merged production code is now Claude-authored • Code output per engineer is up ~8x vs 2024 • Claude’s success rate on open-ended coding tasks hit 76% in May 2026 • In one benchmark, model optimisation jumped from ~3x to ~52x in under a year • Human review is becoming the bottleneck: Amdahl’s law, but for organisations • The key remaining gap is "research taste": choosing which problems matter and which results to trust Recursive self-improvement does not mean magic. It means the AI lab becomes a loop: models write code, run experiments, evaluate results, and increasingly help build the next model. The question now is not whether AI can type faster than engineers. It is whether humans can still supervise a virtual lab running at compute speed. https://t.co/squuglV7mG

hexawulf's tweet photo. When Claude starts building Claude: Anthropic's new Recursive Self-Improvement piece is worth reading closely:

• AI task horizons are reportedly doubling roughly every 4 months
• >80% of Anthropic’s merged production code is now Claude-authored
• Code output per engineer is up ~8x vs 2024
• Claude’s success rate on open-ended coding tasks hit 76% in May 2026
• In one benchmark, model optimisation jumped from ~3x to ~52x in under a year
• Human review is becoming the bottleneck: Amdahl’s law, but for organisations
• The key remaining gap is "research taste": choosing which problems matter and which results to trust

Recursive self-improvement does not mean magic. It means the AI lab becomes a loop: models write code, run experiments, evaluate results, and increasingly help build the next model. The question now is not whether AI can type faster than engineers. It is whether humans can still supervise a virtual lab running at compute speed.
https://t.co/squuglV7mG

2

1

0

0

67

3 days ago · Fukuoka City Chuo Ward

🚨 Just found the cheapest always-on cloud VM on the market, and it's hiding inside a Mule 🫏 @mulerun_ai MuleRun Computer hands every paid user a private, always-on virtual machine in the cloud, and the pricing is great: The $16/mo Plus tier (billed yearly) gets you a 2-core / 4GB / 40GB box plus 2,000 monthly agent credits, the unit you spend to run tasks, priced at $1 = 100 credits. Super is $32/mo for 4 cores / 8GB and 4,500 credits; Pro is $160/mo for 8 cores / 16GB and 23,000 credits. Now compare the bare metal: a 2-core/4GB instance runs $24/mo on DigitalOcean and $20/mo on Linode, with zero compute credits attached. MuleRun undercuts both and throws in the agent on top. For an always-on personal agent, the VM is effectively a free add-on. 👉 https://t.co/PnSPEtdXkY

hexawulf's tweet photo. 🚨 Just found the cheapest always-on cloud VM on the market, and it's hiding inside a Mule 🫏 @mulerun_ai

MuleRun Computer hands every paid user a private, always-on virtual machine in the cloud, and the pricing is great:

The $16/mo Plus tier (billed yearly) gets you a 2-core / 4GB / 40GB box plus 2,000 monthly agent credits, the unit you spend to run tasks, priced at $1 = 100 credits. Super is $32/mo for 4 cores / 8GB and 4,500 credits; Pro is $160/mo for 8 cores / 16GB and 23,000 credits.

Now compare the bare metal: a 2-core/4GB instance runs $24/mo on DigitalOcean and $20/mo on Linode, with zero compute credits attached.
MuleRun undercuts both and throws in the agent on top. For an always-on personal agent, the VM is effectively a free add-on.
👉 https://t.co/PnSPEtdXkY

0

1

0

0

40

hexawulf retweeted

8 days ago

CEO waking up to a $500M Claude API bill

392

99K

5K

7K

9M

7 days ago

The Class of 2026 is the first cohort employers are hiring for their AI skills — and the first whose entry-level jobs AI is quietly deleting. 22% feel "very prepared," more than any older group; recent-grad unemployment sits at 5.6%. The twist buried in the data: employers now rate critical thinking above AI literacy. One professor has students submit their chatlogs for grading — marking up the prompts, not just the output. The skill isn't using the tool. It's knowing when it's wrong.

hexawulf's tweet photo. The Class of 2026 is the first cohort employers are hiring for their AI skills — and the first whose entry-level jobs AI is quietly deleting. 22% feel "very prepared," more than any older group; recent-grad unemployment sits at 5.6%.

The twist buried in the data: employers now rate critical thinking above AI literacy. One professor has students submit their chatlogs for grading — marking up the prompts, not just the output.

The skill isn't using the tool. It's knowing when it's wrong.

0

0

0

0

23

hexawulf retweeted

8 days ago

Opus 4.8 distilled Alibaba Qwen 😂 The table has turned to Open Source AI

56

2K

136

403

321K

10 days ago

Qwen3.7-Max is a real milestone for Chinese AI. On Code Arena, Alibaba's latest model is now being ranked in the same top coding tier as Claude and ahead of several frontier models from OpenAI, Google, Zhipu and Moonshot. What matters is the method: blind human comparisons of real code outputs, not just synthetic benchmark flexing. You can see the live Code Arena coding leaderboard here: https://t.co/bpokNehKL0 LLM Stats coding leaderboard tracks performance across practical tasks like React apps, games, data viz, 3D scenes, SVGs and animation. That’s much closer to how developers actually use these models. The takeaway: the US still leads the top of the AI coding stack, especially via Anthropic. But the gap is no longer uncontested. Qwen, DeepSeek, GLM and Kimi show that Chinese labs are now competing across the frontier — not as copycats, but as serious model builders. For developers, that’s good news: more competition, more API options, and faster pressure on price/performance.

hexawulf's tweet photo. Qwen3.7-Max is a real milestone for Chinese AI.

On Code Arena, Alibaba's latest model is now being ranked in the same top coding tier as Claude and ahead of several frontier models from OpenAI, Google, Zhipu and Moonshot.

What matters is the method: blind human comparisons of real code outputs, not just synthetic benchmark flexing. You can see the live Code Arena coding leaderboard here: https://t.co/bpokNehKL0

LLM Stats coding leaderboard tracks performance across practical tasks like React apps, games, data viz, 3D scenes, SVGs and animation. That’s much closer to how developers actually use these models.

The takeaway: the US still leads the top of the AI coding stack, especially via Anthropic. But the gap is no longer uncontested. Qwen, DeepSeek, GLM and Kimi show that Chinese labs are now competing across the frontier — not as copycats, but as serious model builders.

For developers, that’s good news: more competition, more API options, and faster pressure on price/performance.

11 days ago

Code Arena updated its rankings today. Alibaba's Qwen3.7-Max scored 1541, placing it above GPT-5.5, Gemini-3.5-Flash, GLM-5.1, and Kimi-K2.6. Only Claude Opus 4.7 and 4.6 rank higher. By vendor, Alibaba now ranks #2 globally. Two things worth noting. First, Code Arena is not a traditional benchmark. Developers submit real tasks. Models generate full, interactive web applications from scratch. Users then blind-vote on the results. It is one of the more credible measures of vibe coding ability available today. Second, Qwen3.7-Max is designed as an agent foundation model. Long-horizon task execution and tool calling are core to its architecture. Alibaba says the model can sustain over 1,000 tool calls across 35-hour task sessions. Code Arena's format, building complete apps rather than solving textbook problems, aligns with that design. The score and the product thesis match. Claude has held the top of this leaderboard for months. Qwen3.7 is the first Chinese model to crack into that tier. A notable data point for anyone tracking the global coding model race.

poezhao0605's tweet photo. Code Arena updated its rankings today. Alibaba's Qwen3.7-Max scored 1541, placing it above GPT-5.5, Gemini-3.5-Flash, GLM-5.1, and Kimi-K2.6. Only Claude Opus 4.7 and 4.6 rank higher. By vendor, Alibaba now ranks #2 globally.

Two things worth noting.

First, Code Arena is not a traditional benchmark. Developers submit real tasks. Models generate full, interactive web applications from scratch. Users then blind-vote on the results. It is one of the more credible measures of vibe coding ability available today.

Second, Qwen3.7-Max is designed as an agent foundation model. Long-horizon task execution and tool calling are core to its architecture. Alibaba says the model can sustain over 1,000 tool calls across 35-hour task sessions. Code Arena's format, building complete apps rather than solving textbook problems, aligns with that design. The score and the product thesis match.

Claude has held the top of this leaderboard for months. Qwen3.7 is the first Chinese model to crack into that tier. A notable data point for anyone tracking the global coding model race.

6

126

15

32

21K

2

75

7

8

8K

14 days ago

@jenzhuscott v4-pro is the gift that keeps on giving for openclaw/agentic use. cheap enough to keep on running and forget about

0

0

0

0

135

hexawulf retweeted

15 days ago

We are making our discount permanent! 🎉 Enjoy building with DeepSeek-V4-Pro and bring your innovative ideas to life! 🚀

deepseek_ai's tweet photo. We are making our discount permanent! 🎉

Enjoy building with DeepSeek-V4-Pro and bring your innovative ideas to life! 🚀 https://t.co/V8atbTaogH

1K

24K

3K

6K

7M

16 days ago · Minato-ku

Anthropic is reportedly paying SpaceX $1.25B/month for compute through May 2029 - roughly $45B in total. Annualized, that one contract alone would rank SpaceX as roughly #5 cloud infrastructure provider in the world — ahead of Oracle, behind only AWS, Azure, Google, and Alibaba. This is not a cloud bill. It's an industrial dependency. https://t.co/bdEa2KXnHZ via @FT

hexawulf's tweet photo. Anthropic is reportedly paying SpaceX $1.25B/month for compute through May 2029 - roughly $45B in total.

Annualized, that one contract alone would rank SpaceX as roughly #5 cloud infrastructure provider in the world — ahead of Oracle, behind only AWS, Azure, Google, and Alibaba.

This is not a cloud bill. It's an industrial dependency.
https://t.co/bdEa2KXnHZ via @FT

0

0

0

0

33

16 days ago

European AI may still trail the US and China in scale, but the momentum is becoming hard to ignore. Germany, France, the UK, Sweden and the Netherlands are now producing serious AI contenders across: • foundation models • defense AI • developer tooling • automation • autonomous driving • biotech Europe is finally building an AI stack of its own. 🇪🇺

hexawulf's tweet photo. European AI may still trail the US and China in scale, but the momentum is becoming hard to ignore.

Germany, France, the UK, Sweden and the Netherlands are now producing serious AI contenders across:

• foundation models
• defense AI
• developer tooling
• automation
• autonomous driving
• biotech

Europe is finally building an AI stack of its own. 🇪🇺

0

0

0

0

31

hexawulf retweeted

17 days ago

Very cool. WeChat and RedNote are the platforms for AI content and news

2

10

3

10

2K

16 days ago

@ivylala Giving an AI agent real-time access to WeChat & RedNote’s high-signal AI/tech content sounds amazing. Would be interesting to run a comparative analysis. the claim that WeChat AI writing is better than for example Claude Opus 4.7 is a pretty bold claim!

0

1

0

0

31

16 days ago · Minato-ku

Does the workplace AI revolution have a trust problem? • 69% worry about AI-driven job losses • 57% think AI will eliminate more jobs than it creates • 65% say the gains will mainly flow to wealthy investors and big companies The question is no longer whether AI will transform work. It is who gets replaced, and who captures the upside. 👉https://t.co/mGqKk1ZpPa

hexawulf's tweet photo. Does the workplace AI revolution have a trust problem?

• 69% worry about AI-driven job losses
• 57% think AI will eliminate more jobs than it creates
• 65% say the gains will mainly flow to wealthy investors and big companies
The question is no longer whether AI will transform work.
It is who gets replaced, and who captures the upside.

👉https://t.co/mGqKk1ZpPa

0

0

0

0

25

19 days ago · Ishinomaki-shi

https://t.co/ibH4ZlJVgD

hexawulf's tweet photo. https://t.co/ibH4ZlJVgD https://t.co/LLgTMAlo2i

0

0

0

0

14

19 days ago · Ishinomaki-shi

AI video is not just a model race. It is a data race. Chinese labs have vast short-video platforms feeding training data and feedback loops that US rivals struggle to match. With Sora discontinued and Seedance/Kling climbing the rankings, the lesson is blunt: Data rules.

hexawulf's tweet photo. AI video is not just a model race. It is a data race.

Chinese labs have vast short-video platforms feeding training data and feedback loops that US rivals struggle to match.

With Sora discontinued and Seedance/Kling climbing the rankings, the lesson is blunt:

Data rules. https://t.co/roY6reA6TU

1

0

0

0

68

25 days ago

Be TSMC. While everyone debates Nvidia, AMD, Intel and AI agents, TSMC quietly sits at the heart of the AI boom. Microsoft, Meta, Alphabet & Amazon plan $725B in capex this year. Most of it eventually flows through TSMC's fabs. → Gross margins: 66% (up from 59% a year ago) → Revenue growth guide: 30%+ this year → Nvidia's chip commitments: $95B — up from $16B two years ago → Trades at 21x forward earnings vs. 26x for the sector Not flashy, but pretty much indispensable. 👉https://t.co/wA26Sv1lQn via @WSJ

hexawulf's tweet photo. Be TSMC.

While everyone debates Nvidia, AMD, Intel and AI agents, TSMC quietly sits at the heart of the AI boom.

Microsoft, Meta, Alphabet & Amazon plan $725B in capex this year. Most of it eventually flows through TSMC's fabs.

→ Gross margins: 66% (up from 59% a year ago)
→ Revenue growth guide: 30%+ this year
→ Nvidia's chip commitments: $95B — up from $16B two years ago
→ Trades at 21x forward earnings vs. 26x for the sector

Not flashy, but pretty much indispensable.

👉https://t.co/wA26Sv1lQn via @WSJ

0

0

0

0

60

27 days ago

@MushtaqBilalPhD These kinds of insanely useful workflows are exactly why Anthropic will leapfrog OpenAI valuation this year.

hexawulf's tweet photo. @MushtaqBilalPhD These kinds of insanely useful workflows are exactly why Anthropic will leapfrog OpenAI valuation this year. https://t.co/9fCmH986xG

1

1

0

1

727

Last Seen Users on Sotwe

Trends for you

Most Popular Users