Luhui

@LuhuiDev

indie dev & product thinker (INTJ) building 「JustLog APP」 &「Dino-GSP Web」 exploring how systems & creations grow with warmth.

Hangzhou China

Joined March 2020

251 Following

117 Followers

298 Posts

Luhui @LuhuiDev

6 days ago

DeepMind’s AlphaProof Nexus is more than “AI solves math problems.” I broke down the paper into 4 system paradigms worth studying. https://t.co/AiSWHvf8om

Luhui @LuhuiDev

14 days ago

Your agent works in the demo and dies in production. The fix is a Runtime that handles: · durable execution · layered state · human-in-the-loop · permissions · observability I wrote up what LangChain's Runtime taught me, plus a 10-point design checklist👇 https://t.co/sRG391Kkyu

373

LuhuiDev retweeted

ClaudeDevs

@ClaudeDevs

21 days ago

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

13K

10M

Luhui @LuhuiDev

21 days ago

I wrote a shorter piece on Anthropic’s 2026 Agent Harness architecture. Production agents need durable sessions, isolated execution, scoped credentials, context routing, tracing, and evals. Full article 👇 https://t.co/BiLfy4ANp2

Who to follow

24 days ago

I’m sharing Algeo SDK 2.0 embedded editing mode, a product I’ve been building with the Dino-GSP team. Read more 👇 If you’re building in K12 math, AI tutoring, or education SaaS, feel free to reach out. https://t.co/eUOoRJca0g

Luhui @LuhuiDev

about 1 month ago

Most coding agent failures need engineering fixes, not better prompts. AHE makes the harness (tools/middleware/memory) auto-evolve like software: observable, testable, rollbackable. 69.7%→77.0% in 10 iterations. Deep-dive + code walkthrough 👇 https://t.co/lpsXA0HI51

LuhuiDev retweeted

Cormac

@cormachayden_

about 1 month ago

software engineers before vs after agents

469

20K

Luhui @LuhuiDev

about 1 month ago

🚀 Stop hard-coding prompts. Start programming them. DSPy from Stanford separates task logic from model instructions. Auto-optimizes prompts for your data + metrics. Result: 2x accuracy on same model. Game-changer for production LLM pipelines. Read more https://t.co/9mBODrQ3gy

LuhuiDev retweeted

el.cine

@EHuanglu

about 2 months ago

wowww.. Opus 4.7 has automated CAD

113

501

Luhui @LuhuiDev

about 2 months ago

Research figures are a nightmare. That’s why this PaperBanana approach is interesting: they don’t “prompt better” — they split the task into agents. 👉 My take: this isn’t generation, it’s systems design. Multi-agent workflows > bigger models. https://t.co/WduERZxXq6

LuhuiDev retweeted

Andrew Curran

@AndrewCurran_

2 months ago

Three weeks ago there were rumors that one of the labs had completed its largest ever successful training run, and that the model that emerged from it performed far above both internal expectations and what people assumed the scaling laws would predict. At the time these were only rumors, and no lab was attached to them. But in light of what we now know about Mythos, they look more credible, and the lab was probably Anthropic. Around the same time there were also rumors that one of the frontier labs had made an architectural breakthrough. If you are in enough group chats, you hear claims like this constantly, and most turn out to be nothing. But if Anthropic found that training above a certain scale, or in a certain way at that scale, produces capabilities that sit far above the prior trendline, then that is an architectural breakthrough. I think the leaked blog post was real, but still a draft. Mythos and Capybara were both candidate names for the new tier, though Mythos may now have enough mindshare that they end up keeping it. The specific rumor in early March was that the run produced a model roughly twice as performant as expected. That remains unconfirmed. What is confirmed is that Anthropic told Fortune the new model is a 'step change,' a sudden 2x would certainly fit the definition. We will find out in April how much of this is true. My own view is that the broad shape of this is correct even if some of the numbers are wrong. And if it is substantially accurate, then it also casts OpenAI's recent restructuring in a new light. If very large training runs are about to become essential to staying in the game, then a lot of their recent decisions, like dropping Sora, make even more sense strategically. For the public, this would mean the best models in the world are about to become much more expensive to serve, and therefore much more expensive to use. That will put pressure on rate limits, pricing, and subscription plans that are already subsidized to some unknown degree. Instead of becoming too cheap to meter, frontier intelligence may be about to become too expensive for most of humanity to afford. Second-order effects; compute, memory, and energy are about to become much more important than they already are. In the blog they describe the new model as not just an improvement, but having 'dramatically higher scores' than Opus 4.6 in coding and reasoning, and as being 'far ahead' of any other current models. If this is the new reality, then scale is about to become king in a whole new way. It would also mean, as usual, that Jensen wins again.

182

321

977K

Luhui @LuhuiDev

about 2 months ago

I just shipped a major new version of the geometry engine I’ve been building—let me properly introduce it. Dino-GSP 2.4.0 is a step toward turning geometry into a programmable, interactive medium for the web. https://t.co/xcZGkcLUEP

LuhuiDev retweeted

Nous Research

@NousResearch

2 months ago

The Hermes Agent update you've been waiting for is here.

333

471

619K

LuhuiDev retweeted

ellen livia ᯅ

@ellen_in_sf

2 months ago

here's how Claude Code actually handles memory : all 8 phases 🧵 Our team at @mem0ai use @claudeai a lot, we deeply care about memory. here is a summary of how it works 👇 User Input -> Context Assembly -> History System -> API / Query -> Response -> Summary Phase 1: session init registers hooks, warms the memory cache, and kicks off async directory walks before the first render Phase 2: memory is discovered in priority order — managed enterprise policy → user global → project VCS → local per-directory → auto-generated → team shared Phase 3: three parallel pipelines merge into every API call: system prompt + memory section + user context. relevance prefetch selects up to 5 memory files via sonnet side-call Phase 4: the model can directly read/write memory files using FileReadTool, FileWriteTool, FileEditTool. background extractor and model writes are mutually exclusive Phase 5: after EVERY response, three background agents fire — extractMemories, sessionMemory, and autoDream. extractMemories is a forked agent that runs in parallel, capped at 200 lines / 25kb Phase 6: when context fills up, compaction summarizes old messages using a skipped summarizer, preserving min 10k tokens / 5 text-block messages Phase 7: memory lives across ~/.claude/, project root, sessions/, and agent-memory/ — auto memory is git-ignored, team memory is VCS-tracked Phase 8: self-improving loop across sessions — within-turn writes + end-of-turn extracts + session memory + auto-dream consolidations every 24h+ every touchpoint: launch → query → response → background agents → shutdown → next session shoutout to @ChaithanyaK42 for the beautiful excalidraw!

ellen_in_sf's tweet photo. here's how Claude Code actually handles memory : all 8 phases 🧵

Our team at @mem0ai use @claudeai a lot, we deeply care about memory. here is a summary of how it works 👇

User Input -> Context Assembly -> History System -> API / Query -> Response -> Summary

Phase 1: session init registers hooks, warms the memory cache, and kicks off async directory walks before the first render

Phase 2: memory is discovered in priority order — managed enterprise policy → user global → project VCS → local per-directory → auto-generated → team shared

Phase 3: three parallel pipelines merge into every API call: system prompt + memory section + user context. relevance prefetch selects up to 5 memory files via sonnet side-call

Phase 4: the model can directly read/write memory files using FileReadTool, FileWriteTool, FileEditTool. background extractor and model writes are mutually exclusive

Phase 5: after EVERY response, three background agents fire — extractMemories, sessionMemory, and autoDream. extractMemories is a forked agent that runs in parallel, capped at 200 lines / 25kb

Phase 6: when context fills up, compaction summarizes old messages using a skipped summarizer, preserving min 10k tokens / 5 text-block messages

Phase 7: memory lives across ~/.claude/, project root, sessions/, and agent-memory/ — auto memory is git-ignored, team memory is VCS-tracked

Phase 8: self-improving loop across sessions — within-turn writes + end-of-turn extracts + session memory + auto-dream consolidations every 24h+

every touchpoint: launch → query → response → background agents → shutdown → next session
shoutout to @ChaithanyaK42 for the beautiful excalidraw!

559

795

28K

LuhuiDev retweeted

Qwen

@Alibaba_Qwen

2 months ago

🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: 'Audio-Visual Vibe Coding'. Describe your vision to the camera, and Qwen3.5-Omni-Plus instantly builds a functional website or game for you. Offline Highlights: 🎬 Script-Level Captioning: Generate detailed video scripts with timestamps, scene cuts & speaker mapping. 🏆 SOTA Performance: Outperform Gemini-3.1 Pro in audio and matches its audio-visual understanding. 🧠 Massive Capacity: Natively handle up to 10h of audio or 400s of 720p video, trained on 100M+ hours of data. 🌍 Global Reach: Recognize 113 languages (speech) & speaks 36. Real-time Features: 🎙️ Fine-Grained Voice Control: Adjust emotion, pace, and volume in real-time. 🔍 Built-in Web Search & complex function calling. 👤 Voice Cloning: Customize your AI's voice from a short sample, with engineering rollout coming soon. 💬 Human-like Conversation: Smart turn-taking that understands real intent and ignores noise. The Qwen3.5-Omni family includes Plus, Flash, and Light variants. Try it out: Blog: https://t.co/yuSAz3DuO8 Realtime Interaction: click the VoiceChat/VideoChat button (bottom-right): https://t.co/nnAW9ZfRet HF-Demo: https://t.co/rLsqejKgCG HF-VoiceOnline-Demo: https://t.co/LIGtmITeSw API-Offline: https://t.co/lNE7fH5YUt API-Realtime: https://t.co/9A3lopXGwV

Alibaba_Qwen's tweet photo. 🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI.

Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction.

A standout feature: 'Audio-Visual Vibe Coding'. Describe your vision to the camera, and Qwen3.5-Omni-Plus instantly builds a functional website or game for you.

Offline Highlights:
🎬 Script-Level Captioning: Generate detailed video scripts with timestamps, scene cuts & speaker mapping.
🏆 SOTA Performance: Outperform Gemini-3.1 Pro in audio and matches its audio-visual understanding.
🧠 Massive Capacity: Natively handle up to 10h of audio or 400s of 720p video, trained on 100M+ hours of data.
🌍 Global Reach: Recognize 113 languages (speech) & speaks 36.

Real-time Features:
🎙️ Fine-Grained Voice Control: Adjust emotion, pace, and volume in real-time.
🔍 Built-in Web Search & complex function calling.
👤 Voice Cloning: Customize your AI's voice from a short sample, with engineering rollout coming soon.
💬 Human-like Conversation: Smart turn-taking that understands real intent and ignores noise.

The Qwen3.5-Omni family includes Plus, Flash, and Light variants.

Try it out:
Blog: https://t.co/yuSAz3DuO8
Realtime Interaction: click the VoiceChat/VideoChat button (bottom-right): https://t.co/nnAW9ZfRet
HF-Demo: https://t.co/rLsqejKgCG
HF-VoiceOnline-Demo: https://t.co/LIGtmITeSw
API-Offline: https://t.co/lNE7fH5YUt
API-Realtime: https://t.co/9A3lopXGwV

171

597

967K

LuhuiDev retweeted

Claude

@claudeai

2 months ago

Computer use is now in Claude Code. Claude can open your apps, click through your UI, and test what it built, right from the CLI. Now in research preview on Pro and Max plans.

59K

25K

16M

Luhui @LuhuiDev

3 months ago

🎯 Geometry canvas as a one-line component iframe or SDK integration + AI-native REPL interface Stop building geometry engines from scratch. Start embedding. https://t.co/WKVr0uiCfA #WebDev #EdTech #OpenSource

Luhui @LuhuiDev

3 months ago

AlphaGeometry2 can solve 84% of IMO geometry problems. But the interesting part isn't the model — it's the architecture. LLM + symbolic reasoning + search. Here’s a deep dive into how the system works: https://t.co/xO4FkEPqqc

Luhui @LuhuiDev

3 months ago

If verification becomes machine infrastructure, research speed stops being human-limited. DeepMind’s Aletheia (~91.9% on IMO-ProofBench) isn’t just about scores. Deep dive 👇 https://t.co/rban5Ji81U

Luhui @LuhuiDev

4 months ago

CodePlot-CoT lets an LLM solve geometry by writing matplotlib and looking at the diagram. If you're interested in AI reasoning, agents with external state, or the future of AI for mathematics — this is a fascinating direction. https://t.co/X1ObPiqaHa

Luhui

@LuhuiDev

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users