warmshao @warmshao - Twitter Profile

warmshao retweeted

13 days ago

Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change. Find a problem, design the harness, and train your own agents! 🧵

billxbf's tweet photo. Excited to release 🌟Polar🌟, our Agent RL rollout infra for real-world harnesses. Be it Codex, Claude Code, OpenClaw, Hermes, or your self-made ones 🔥 -- Polar takes your harnesses directly as training environments without code change.

Find a problem, design the harness, and train your own agents! 🧵

25

900

144

947

130K

warmshao @warmshao

about 1 month ago

Add new feature to browser use desktop to support external cdp url for connecting existing browsers🚀

0

38

warmshao @warmshao

about 1 month ago

Seriously, Browser Use Desktop is really awesome. It combines Claude Code and Browser Harness, and features a visual UI, making it perfect for non-technical users to use, you guys should try it out🔥

warmshao's tweet photo. Seriously, Browser Use Desktop is really awesome. It combines Claude Code and Browser Harness, and features a visual UI, making it perfect for non-technical users to use, you guys should try it out🔥 https://t.co/VlBj3v5I8V

3

6

2

4

361

warmshao @warmshao

about 1 month ago

Also, it’s under the MIT License. The Browser Use team is really generous, and I plan to build some new things on top of it.

0

2

0

63

warmshao retweeted

DeepSeek

@deepseek_ai

about 2 months ago

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: https://t.co/drlDrxkYtp 🤗 Open Weights: https://t.co/T13Y8i7SDM 1/n

deepseek_ai's tweet photo. 🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.

🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.

Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today!

📄 Tech Report: https://t.co/drlDrxkYtp
🤗 Open Weights: https://t.co/T13Y8i7SDM

1/n

2K

46K

8K

10K

10M

warmshao retweeted

Gregor Zunic

@gregpr07

about 2 months ago

https://t.co/pihp7MOIql

30

923

72

2K

175K

warmshao retweeted

Qwen

@Alibaba_Qwen

about 2 months ago

🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: https://t.co/P2Zx7FwMxB Qwen Studio: https://t.co/c4vm4LuZrU Github: https://t.co/zKDEbv0R4U Hugging Face: https://t.co/N67hyzxvfr https://t.co/SSdtbWRDap ModelScope: https://t.co/xODf1pj9kw https://t.co/xXhoqlJ2AB

Alibaba_Qwen's tweet photo. 🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power!

Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇

What's new:
🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks
💡 Strong reasoning across text & multimodal tasks
🔄 Supports thinking & non-thinking modes
✅ Apache 2.0 — fully open, fully yours

Smaller model. Bigger results. Community's favorite. ❤️
We can't wait to see what you build with Qwen3.6-27B! 👀

🔗👇
Blog: https://t.co/P2Zx7FwMxB
Qwen Studio: https://t.co/c4vm4LuZrU
Github: https://t.co/zKDEbv0R4U
Hugging Face:
https://t.co/N67hyzxvfr
https://t.co/SSdtbWRDap
ModelScope:
https://t.co/xODf1pj9kw
https://t.co/xXhoqlJ2AB

538

13K

2K

5K

4M

warmshao retweeted

Kimi.ai @Kimi_Moonshot

about 2 months ago

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: https://t.co/uvoSJKyGCY - 🔗 API: https://t.co/EOZkbOwCN4 🔗 Tech blog: https://t.co/9wWvgIQSS3 🔗 Weights & code: https://t.co/Be0hjs2RTP

Kimi_Moonshot's tweet photo. Meet Kimi K2.6: Advancing Open-Source Coding

🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2)

What's new:
🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization).
🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D.
🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files.
🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops.
🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop.
-
K2.6 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode.
For production-grade coding, pair K2.6 with Kimi Code: https://t.co/uvoSJKyGCY
-
🔗 API: https://t.co/EOZkbOwCN4
🔗 Tech blog: https://t.co/9wWvgIQSS3
🔗 Weights & code: https://t.co/Be0hjs2RTP

942

18K

2K

8K

8M

warmshao @warmshao

about 2 months ago

@browser_use 卷

0

31

warmshao @warmshao

about 2 months ago

mark

Gregor Zunic

@gregpr07

about 2 months ago

Introducing: Browser Harness. A self-healing harness that can complete virtually any browser task. ♞ We got tired of browser frameworks restricting the LLM. So we removed the framework. > Self-healing — edits helpers. py on the fly > Direct CDP — one websocket to Chrome > No framework, no rails, complete freedom > Drop-in for Claude Code and Codex I challenge anyone to find a task that DOESN'T work. I couldn't yet.🔥 100% open source ↓

gregpr07's tweet photo. Introducing: Browser Harness. A self-healing harness that can complete virtually any browser task. ♞

We got tired of browser frameworks restricting the LLM. So we removed the framework.

> Self-healing — edits helpers. py on the fly
> Direct CDP — one websocket to Chrome
> No framework, no rails, complete freedom
> Drop-in for Claude Code and Codex

I challenge anyone to find a task that DOESN'T work. I couldn't yet.🔥

100% open source ↓

178

4K

296

7K

1M

0

84

warmshao retweeted

Qwen

@Alibaba_Qwen

about 2 months ago

⚡ Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Powerful. Versatile. Try it now👇 Blog：https://t.co/EXx5y466su Qwen Studio：https://t.co/bg4tAU1p74 HuggingFace：https://t.co/w4pDX14DZS ModelScope：https://t.co/SuRyLzdQiO API（‘Qwen3.6-Flash’ on Model Studio）：Coming soon～ Stay tuned

Alibaba_Qwen's tweet photo. ⚡ Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀

A sparse MoE model, 35B total params, 3B active. Apache 2.0 license.

🔥 Agentic coding on par with models 10x its active size
📷 Strong multimodal perception and reasoning ability
🧠 Multimodal thinking + non-thinking modes

Efficient. Powerful. Versatile. Try it now👇

Blog：https://t.co/EXx5y466su
Qwen Studio：https://t.co/bg4tAU1p74
HuggingFace：https://t.co/w4pDX14DZS
ModelScope：https://t.co/SuRyLzdQiO
API（‘Qwen3.6-Flash’ on Model Studio）：Coming soon～ Stay tuned

445

12K

2K

5K

3M

warmshao @warmshao

2 months ago

qwen 3.5 27b is better

Artificial Analysis

@ArtificialAnlys

2 months ago

Google has released Gemma 4, a new family of multimodal open-weight models including Gemma 4 E2B, Gemma 4 E4B, Gemma 4 31B and Gemma 4 26B A4B @GoogleDeepMind’s new Gemma 4 family introduces four multimodal models supporting text, image, and video inputs. We evaluated Gemma 4 31B (dense) and Gemma 4 26B A4B (MoE), both with a 256k context window, while the other two smaller models support up to 128k. With 31B and 26B parameters respectively, both evaluated models can run on a single H100. On GPQA Diamond, our scientific reasoning evaluation, Gemma 4 31B (Reasoning) scores 85.7%, the second highest result we have recorded for an open-weights model with fewer than 40B parameters, just behind Qwen3.5 27B (Reasoning, 85.8%). It reaches this score using only ~1.2M output tokens, fewer than Qwen3.5 27B (~1.5M) and Qwen3.5 35B A3B (~1.6M). Gemma 4 26B A4B (Reasoning) scores 79.2%, ahead of gpt-oss-120B (high, 76.2%) but behind Qwen3.5 9B (Reasoning, 80.6%). We are now running the Artificial Analysis Intelligence Index on all four Gemma 4 models and will share a full update once those results are complete.

ArtificialAnlys's tweet photo. Google has released Gemma 4, a new family of multimodal open-weight models including Gemma 4 E2B, Gemma 4 E4B, Gemma 4 31B and Gemma 4 26B A4B

@GoogleDeepMind’s new Gemma 4 family introduces four multimodal models supporting text, image, and video inputs. We evaluated Gemma 4 31B (dense) and Gemma 4 26B A4B (MoE), both with a 256k context window, while the other two smaller models support up to 128k. With 31B and 26B parameters respectively, both evaluated models can run on a single H100.

On GPQA Diamond, our scientific reasoning evaluation, Gemma 4 31B (Reasoning) scores 85.7%, the second highest result we have recorded for an open-weights model with fewer than 40B parameters, just behind Qwen3.5 27B (Reasoning, 85.8%). It reaches this score using only ~1.2M output tokens, fewer than Qwen3.5 27B (~1.5M) and Qwen3.5 35B A3B (~1.6M). Gemma 4 26B A4B (Reasoning) scores 79.2%, ahead of gpt-oss-120B (high, 76.2%) but behind Qwen3.5 9B (Reasoning, 80.6%).

We are now running the Artificial Analysis Intelligence Index on all four Gemma 4 models and will share a full update once those results are complete.

16

621

49

112

62K

0

48

warmshao retweeted

Google Gemma

@googlegemma

2 months ago

Meet Gemma 4! Purpose-built for advanced reasoning and agentic workflows on the hardware you own, and released under an Apache 2.0 license. We listened to invaluable community feedback in developing these models. Here is what makes Gemma 4 our most capable open models yet: 👇

googlegemma's tweet photo. Meet Gemma 4!

Purpose-built for advanced reasoning and agentic workflows on the hardware you own, and released under an Apache 2.0 license.

We listened to invaluable community feedback in developing these models. Here is what makes Gemma 4 our most capable open models yet: 👇 https://t.co/1JCj0bupov

167

7K

838

1K

626K

warmshao @warmshao

2 months ago

open claude code🔥

Chaofan Shou

@Fried_rice

2 months ago

Claude code source code has been leaked via a map file in their npm registry! Code: https://t.co/jBiMoOzt8G

3K

49K

8K

42K

36M

0

76

warmshao retweeted

ollama

@ollama

2 months ago

Ollama is now updated to run the fastest on Apple silicon, powered by MLX, Apple's machine learning framework. This change unlocks much faster performance to accelerate demanding work on macOS: - Personal assistants like OpenClaw - Coding agents like Claude Code, OpenCode, or Codex

293

6K

726

3K

781K

warmshao retweeted

Broooooklyn

@Brooooook_lyn

2 months ago

https://t.co/FskPksPHFO

13

206

23

268

97K

warmshao @warmshao

2 months ago

awesome

Browser Use

@browser_use

2 months ago

We hit SOTA on the biggest browser agent benchmark, scoring 97% on Online-Mind2Web🔥 We used Karpathy's Auto-Research (Claude Code in a loop) to improve our product. Here is how you can apply the same to your product👇Full guide, CLI design, and all the results:

browser_use's tweet photo. We hit SOTA on the biggest browser agent benchmark, scoring 97% on Online-Mind2Web🔥

We used Karpathy's Auto-Research (Claude Code in a loop) to improve our product.

Here is how you can apply the same to your product👇Full guide, CLI design, and all the results: https://t.co/BTV1yVPOkV

24

490

44

442

69K

0

79

warmshao @warmshao

3 months ago

Real frontier AI lab

Kimi.ai @Kimi_Moonshot

3 months ago

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: https://t.co/u3EHICG05h

Kimi_Moonshot's tweet photo. Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation.

Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers.

🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth.
🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale.
🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead.
🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains.

🔗Full report:
https://t.co/u3EHICG05h

336

13K

2K

10K

5M

0

99

warmshao retweeted

Chrome for Developers

@ChromiumDev

3 months ago

Take a bigger slice of the agentic web this #PiDay by shipping a literal pizza pie → https://t.co/13Cts8pFD1 To try: enable chrome://flags/#enable-webmcp-testing & install the Model Context Tool Inspector extension. Share what "pi" you’d build below! 🍕

18

511

81

467

100K

warmshao @warmshao

3 months ago

Does vllm have this issue?

Han Xiao

@hxiao

3 months ago

uh..Qwen3.5-35B-A3B on llama.cpp re-prefill on every request, ~4x slower than it should be. anyone solved this? Thought people have happily deployed & used it locally? But if this is not solved yet, the perf is quite limited. Root cause: GDN layers are recurrent → pos_min tracks full sequence → but llama.cpp validates cache using an SWA threshold that defaults to 1 for non-SWA models → pos_min > 1 always true → cache always discarded → full re-refill every time?

hxiao's tweet photo. uh..Qwen3.5-35B-A3B on llama.cpp re-prefill on every request, ~4x slower than it should be. anyone solved this? Thought people have happily deployed & used it locally? But if this is not solved yet, the perf is quite limited.

Root cause: GDN layers are recurrent → pos_min tracks full sequence → but llama.cpp validates cache using an SWA threshold that defaults to 1 for non-SWA models → pos_min > 1 always true → cache always discarded → full re-refill every time?

32

270

27

203

27K

0

81

warmshao

@warmshao

Last Seen Users on Sotwe

Trends for you

Most Popular Users