Xavier Au

@xavierau

Manchester, England

Joined January 2009

609 Following

48 Followers

353 Posts

Xavier Au

@xavierau

8 days ago

https://t.co/9RMk5fYKAk

xavierau retweeted

DataDan｜AI Consultant + Builder

@ba_niu80557

17 days ago

讲一个真实的大阪做独立FDE落地案例。客户信息做了脱敏,但技术细节和踩坑过程是完整的。分享一个我在大阪接的制造业 AI 落地项目,从头到尾讲一遍,包括中间差点翻车的部分。客户是关西地区一家做精密零部件的中型制造企业,员工大概 400 人,年营收几十亿日元。不是什么明星公司,是日本制造业里最典型的那种:技术积累深、品控严格、信息化程度不高、平均年龄偏大。他们找到我的原因很具体:质检环节出了问题。这家公司的质检一直靠老师傅目视。产线上的零部件经过最终检查台,两个经验超过二十年的老师傅拿着放大镜和量具一个一个看。他们的准确率极高,接近 99.7%。可问题是:这两个师傅一个 61 岁,一个 58 岁。三年内两个都要退休。后面的年轻人培养了好几个,准确率最高的一个只有 96%。3.7 个百分点的差距,在精密零部件行业意味着客诉翻倍。客户最初的需求很直接:"能不能用 AI 做质检,替代老师傅的眼睛。" 这是一个听起来非常标准的"AI 视觉检测"项目。市面上方案一大堆,很多 AI 公司会直接报价:采集图片、标注数据、训练模型、部署边缘设备,三个月交付。可我去了现场之后发现事情完全不是这样的。第一次去工厂,我没聊 AI,花了一整天站在检查台旁边看老师傅工作。看完之后我发现一件事:老师傅检查的不只是"有没有缺陷"。他们在做的是一个极其复杂的多维度判断——零件表面的纹路是正常的加工痕迹还是异常划伤?这个微小的凹陷是模具磨损导致的(需要报修模具)还是材料杂质导致的(需要换供应商)?颜色的微妙变化是热处理正常范围内的还是温度异常的信号? 老师傅不只是在做"合格/不合格"的二分类,他们在做"这个问题的根因是什么"的诊断。而且这个诊断依赖二十年积累的隐性知识,这些知识从来没有被文档化过。如果我按照客户最初的需求直接做一个"缺陷检测 AI",结果就是:AI 能告诉你"这个零件有问题",但不能告诉你"问题出在哪里、该修什么"。对客户来说,这只解决了一半的问题,而且是价值比较低的那一半。我回去之后跟客户说了一句不太受欢迎的话:"你们需要的不是一个 AI 质检系统,是一个知识保全系统。AI 检测是其中的一个模块,但不是核心。" 客户一开始不太高兴。他们预期的是"三个月上一个 AI 质检",我告诉他们"这件事比你们想的复杂,而且最有价值的部分不是你们以为的那个部分"。这是做 FDE 最不舒服但最关键的时刻:告诉客户他的需求定义是错的。在中国,这句话直接说就行。在日本,你得用一种让对方不丢面子的方式说。我的说法是:"贵社的质检水准在业界是非常高的(先肯定),正因为如此,简单的 AI 检测方案可能无法完全承继老师傅们积累的技术资产(把问题框定为'方案不够好'而不是'你的需求错了')。如果可以的话,我想提议一个更能保护贵社技术优势的方案(给出替代)。"同样的意思,日本式包装。最终的方案分三层: 最底层:把老师傅的判断过程全部录下来。不是录视频,是让老师傅一边检查一边说话。"这个纹路是 A 模具第三次修模之后的正常痕迹""这种颜色说明热处理温度偏高了大概 10 度""这个凹痕的位置和形状说明是材料问题不是加工问题"。用了三周时间,录了大概 60 小时的音频。然后用 LLM 做转写和结构化,把这些隐性知识变成一个可检索的知识库。中间层:基于这个知识库搭一个 RAG 系统。新的质检员遇到拿不准的零件时,可以拍照上传,系统会从知识库里找到最相关的案例和老师傅的判断逻辑,告诉你"类似的情况老师傅是这样判断的"。不是替代人的判断,是给人提供参考。最上层:才是 AI 视觉检测模型。用老师傅标注的数据训练一个缺陷分类模型,但这个模型的输出不是简单的"合格/不合格",而是"疑似 XX 类型缺陷,置信度 XX%,建议参考案例 XX"。把检测和诊断串起来。整个项目花了大概五个月。不是三个月,因为中间踩了两个坑。第一个坑:老师傅不愿意配合录音。日本的老师傅(職人)有一种很深的职业自尊:我的技术是我几十年练出来的,你要把它"录下来给机器用",他觉得是对他手艺的不尊重。这件事不是靠跟老师傅讲道理能解决的,最后是工厂的部长(中层管理者)出面,以"为了让后辈也能学到您的技术"的框架说服了他们。注意,不是"用 AI 替代你",是"让后辈学到你的技术"。框架不同,配合度完全不同。第二个坑:图片数据质量远比我预想的差。工厂的检查台灯光不均匀,不同时间段拍出来的照片色温不一样。老师傅用肉眼能自动补偿这种差异,AI 不能。花了额外三周做灯光标准化和图片预处理,这部分工作完全不在最初的计划里。结果:系统上线之后跑了三个月的 A/B 测试。AI 辅助下的新人质检员准确率从 96% 提升到 99.2%,接近老师傅的 99.7%。更重要的是,知识库的使用频率远超预期——不只是质检环节在用,产线调试的工程师也开始用它来排查问题,因为老师傅的经验对上游工序同样有参考价值。客户最后跟我说了一句让我印象很深的话:"最初我以为你来是帮我们装一个 AI 摄像头,没想到你帮我们把即将退休带走的二十年经验留下来了。" 这就是 FDE 跟卖 AI 解决方案的本质区别。卖方案的人听到"AI 质检"就开始报价。FDE 先去现场看,搞清楚真正的问题是什么,然后再决定 AI 在整个方案里应该扮演什么角色。很多时候 AI 不是方案的核心,而是核心方案的一个组件。这个项目教会我的最重要的一课:在日本做 AI 落地,技术只占 30%,剩下的 70% 是理解人。理解老师傅为什么不愿意配合,理解管理层真正害怕的是什么(不是效率低,是技术传承断裂),理解客户嘴上说的需求和真正需要解决的问题之间的差距。 AI 是工具。理解人才是手艺。

169

166

10K

xavierau retweeted

Kappaemme

@Kappaemme1926

about 2 months ago

CODEX SKILL THAT FINDS COMPLEXITY HOTSPOTS IN YOUR CODEBASE! I made a Codex skill that analyzes your codebase and reports where performance can be improved safely. Scan your project while Codex checks loops, repeated lookups, render-heavy code, N+1 patterns, and places where complexity can potentially be reduced without breaking behavior. -> codebase complexity analysis -> O(n²), O(n*m), repeated scan detection -> before/after complexity estimates -> safe optimization suggestions -> risk level + tests needed -> report-only mode by default -> one-command install Install: npx --yes codex-complexity-optimizer 100% open source. Repo in Bio.

298K

xavierau retweeted

ClaudeDevs

@ClaudeDevs

2 months ago

New blog: Building agents that reach production systems with MCP. When should agents use direct APIs vs CLIs vs MCP? Plus patterns for building MCP servers, context-efficient clients and pairing MCP with skills. https://t.co/Q4UrUVgVYB

316

480K

Who to follow

WhoMe?

@WhoMe9000

If you think you have the right person. WhoMe?

Mihai

@_MlHAl

Design addict. Building @odiss_io

Looking For Oxygen(democracy)

xavierau retweeted

2 months ago

New in Claude Code: /ultrareview (research preview) runs a fleet of bug-hunting agents in the cloud. Findings land in the CLI or Desktop automatically. Run it before merging critical changes—auth, data migrations, etc. Pro and Max users get 3 free reviews through 5/5.

542

17K

10K

xavierau retweeted

McKinsey Global Institute

@McKinsey_MGI

2 months ago

AI won’t make most human skills obsolete, but it will change how they’re used. Negotiation, problem solving, and leadership will matter more than ever as people work alongside agents and robots. Our new Skill Change Index shows which skills will be most, and least, exposed to automation in the next five years: https://t.co/fRXfHF1k56

McKinsey_MGI's tweet photo. AI won’t make most human skills obsolete, but it will change how they’re used.

Negotiation, problem solving, and leadership will matter more than ever as people work alongside agents and robots.

Our new Skill Change Index shows which skills will be most, and least, exposed to automation in the next five years: https://t.co/fRXfHF1k56

321

905

185K

2 months ago

xavierau retweeted

2 months ago

群友查资料买烘干球，AI 搜索给的答案引用了 Wisconsin 大学研究、MIT 报告、ASTM 标准。全是假的。机构真实，格式齐整，内容伪造。顺着查下去，发现背后是一条完整的产业链：中文团队运营、AI 批量生成、伪造学术引用的英文内容农场，正在系统性地占领 AI 联网搜索的检索池。 315 晚会演示过同样的事：造一个虚构智能手环 Apollo-9，三天内两家主流 AI 就把它列进了推荐榜。写了一篇长文，追踪了多条独立证据线。核心判断：消费生活类查询是重灾区，AI 搜索在这类场景里的净效率可能已经是负的。 https://t.co/fc93BYAtWl

xavierau retweeted

Claude

@claudeai

2 months ago

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.

148K

15K

80K

64M

xavierau retweeted

Boris Cherny

@bcherny

2 months ago

Opus 4.7 feels more intelligent, agentic, and precise than 4.6. It took a few days for me to learn how to work with it effectively, to fully take advantage of its new capabilities. Will post a few more tips throughout the day, starting with this blog post: https://t.co/XQrH8P28yo

263

613

773K

xavierau retweeted

spark

@sparkjsdev

3 months ago

Spark 2.0 is here! 🚀 We’re redefining what’s possible on the web with a streamable LoD system for 3D Gaussian Splatting. Built on Three.js, you can now stream massive 100M+ splat worlds to any device from mobile to VR using WebGL2. All open-source. Dive into the tech 👇

318

417K

Xavier Au

@xavierau

3 months ago

@NoahKingJr No. I don’t believe AI going to replace engineers.

Xavier Au

@xavierau

3 months ago

@fivosaresti ABM

Xavier Au

@xavierau

3 months ago

You are which type of ppl?

Andrej Karpathy

@karpathy

3 months ago

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

21K

12K

Xavier Au

@xavierau

3 months ago

Stunning and make sense.

Anthropic

@AnthropicAI

3 months ago

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

18K

10K

xavierau retweeted

Google DeepMind @GoogleDeepMind

3 months ago

Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵

368

Xavier Au

@xavierau

3 months ago

Very interesting.

Philipp Schmid

@_philschmid

3 months ago

Read the technical reports on how @Kimi_Moonshot, @cursor_ai, and @trychroma train vertical agentic models with RL. Same underlying recipe, strong base model, train inside the production harness, outcome-based rewards. - Kimi K2.5 learns to spawn parallel sub-agents through RL. -Cursor uses the same production Harness (same tools, same prompts..) and leanrs self-summarization during RL. - Chroma's 20B retrieval model learns to prune its own context mid-search. Full write-up 👇

355

324

95K

xavierau retweeted

Stripe @stripe

3 months ago

https://t.co/yaIRTsCQ4t

737

748

589K

xavierau retweeted

Google DeepMind @GoogleDeepMind

3 months ago

Say hello to Gemini 3.1 Flash Live. 🗣️ Our latest audio model delivers more natural conversations with improved function calling – making it more useful and informed. Here’s what’s new 🧵

120

214

557

586K

Xavier Au

@xavierau

3 months ago

@itsalexvacca Outbound

144

Xavier Au

@xavierau

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users