Underleveled Builder

@UnderleveledDev

Picking fights I'm not qualified for. Tongji Arch / miHoYo combat design / English from 0 / code from 0 / now AI. 1000+ games. Building a One-person company.

Joined April 2017

2.4K Following

147 Followers

252 Posts

Pinned Tweet

Underleveled Builder

@UnderleveledDev

4 months ago

I keep picking fights I'm not qualified for. Here's the list: Got into Tongji (China's #1 architecture program) on my second try. First from my school to make it. Joined miHoYo with zero CS background. Became a combat system designer. Couldn't speak English at 32. Took IELTS 4 times. TOEFL 5 times. Got a 6.5 and moved to Canada. Started a CS master's at 34. Graduated at 36. Played 1000+ games along the way. Mostly hiding from whatever terrifying thing I was supposed to be doing. Now 37, building a one-person AI company. A platform where humans direct AI to create stories. No co-founder. No funding. AI agents are my team. I'll share the build here — what I learn about vibe coding, AI agents, and running a company where most of your employees aren't human. If your career path also looks like a bug, not a feature — welcome.

368

UnderleveledDev retweeted

Vincent | 信号＞噪音

@VincentLogic

4 days ago

微软联合交大、同济、复旦发了一个框架叫 SkillOpt，思路很有意思：像训练神经网络一样训练 AI Agent 的技能文件它不动模型权重，训练的是 Skill——就是你给 Claude Code 或 Codex 写的那些 prompt 和指导文档做法是把神经网络训练的那套搬过来：轮次、批量大小、学习率、验证门控，全部套在自然语言上。执行任务 → 记录过程 → 复盘 → 修改 Skill → 验证效果，自动闭环迭代结果很夸张。7个大模型、6类任务、3种 Agent 环境，总共52组测试，全部第一或并列第一但最有价值的不是跑分，是它的迁移能力：在 GPT-5.4 上优化好的技能文件，直接给 GPT-5.4-nano 用，性能提升 5.6 分在 Claude Code 里优化好的技能，直接搬到 Codex 里用，提升 29.4 分甚至能从一个数学基准迁移到另一个一个 best_skill.md 文件，到处能用对那些天天手动调 prompt 调到头秃的人来说，这个东西等于把调参这件事自动化了

458

733

43K

Underleveled Builder

@UnderleveledDev

4 days ago

好奇心和沉迷 AI 是两回事。真正用好 AI 的人不是天天刷资讯，而是不断去试那些没人让他试的东西，追那些可能根本走不通的想法。大多数人不会这么做，但少数坚持的人，回报是指数级的。

宝玉

@dotey

4 days ago

Lovable 的设计负责人 Felix Haas 在社交媒体上分享了一篇关于"AI 时代高效团队"的观察，七条经验总结，来自这家增长速度惊人的 AI 创业公司内部视角。几条有意思的观点：第一，别像员工一样等安排。影响力最大的人不问"这归谁管"，看到问题直接上手。主人翁意识不是靠分配的，只能靠自己拿。第二，招人看态度不看简历。技能当然重要，但光有技能几乎不能预测一个人能不能成事。真正跑出来的人靠的是好奇心、韧劲和学什么都愿意学的心态。在 AI 时代，这一点比过去更明显。第三，好奇心和沉迷 AI 是两回事。真正用好 AI 的人不是天天刷资讯，而是不断去试那些没人让他试的东西，追那些可能根本走不通的想法。大多数人不会这么做，但少数坚持的人，回报是指数级的。第四，让资深的人重新动手。这是 Haas 觉得最有意思的现象：经验丰富的管理者重新变成了 builder（建造者）。AI 让个体贡献者的杠杆效应急剧放大，一个深度使用 AI 的资深工程师或设计师，可能是当下公司里最强大的组合。第五，自我意识是速度的敌人。Haas 说他从没见过自我意识让公司变快，但见过它让公司变慢。最快的团队不太在意谁拿功劳，只在意什么方案有效。第六，先发布再迭代。一周的内部讨论，抵不上一天的真实用户反馈。最强的团队不追求发布前完美，而是追求尽快学到东西。发布本身就是他们学习的方式。这些观点单独看并不新鲜，不过 Lovable 这两年发展的确实不错，2024 年上线，8 个月做到 1 亿美元年收入，2025 年底完成 3.3 亿美元 B 轮融资，估值 66 亿美元，是欧洲增长最快的 AI 公司之一。尤其是“让资深的人重新动手”这一条，可能是 AI 时代最容易被忽视的组织变化。当 AI 工具足够强大，过去被提拔到管理岗、远离一线的高手，重新获得了亲手做事的能力和动力。

417

426

65K

Underleveled Builder

@UnderleveledDev

6 days ago

用自己开发的工具我终于能听懂所有外语视频啦，把油管变成双语播客平台，制造外语环境，娱乐的同时提高外语水平。想试一下的小伙伴留言1，我发你们链接。

Who to follow

Both sides matter. Facts over faces. Explorer of AI in Higher Ed. 中年大叔. 经历过反贼-粉红-反贼-路人甲的奇妙轮回如今只想: 客观为先, 兼听则明, 论事不论人高等教育AI应用的摸索者. 欢迎有趣的灵魂来聊~

Underleveled Builder

@UnderleveledDev

6 days ago

来自每个前沿实验室的最佳关注账号，助你持续保持最新动态 Anthropic @karpathy AI 领域必关注账号；最近加入 Anthropic @bcherny Claude Code 创建者，经常分享超实用技巧 @trq212 同样是 Claude Code 开发者；撰写关于 CC 的精彩文章 OpenAI @polynoamial 从事推理研究工作，分享大量技术细节 @gabriel1 Sora 开发者，职业路径很棒 @jxnlco 负责开发者体验，分享大量关于 Codex 的内容 Google AI @OfficialLoganK 所有 Google Gemini 和 AI Studio 的重大更新 @ammaar 产品与设计；分享 Google AI Studio 中 vibe-coding 的精彩内容 @fofrAI 生成模型的酷炫用例 Cursor @leerob Cursor 更新背后最活跃的声音 @ericzakariasson 分享使用 Cursor 的深刻见解 @mntruell Cursor CEO；重大发布和使用更新 xAI @milichab 最近加入 xAI，分享 Grok 更新 @skcd42 同样覆盖 Grok 的重大发布 @ai_explorer25 覆盖所有 AI 内容和免费资源

AI_Explorer

@ai_explorer25

7 days ago

Best accounts to follow from each frontier lab to stay constantly up to date Anthropic @karpathy - must-follow account for AI; recently joined Anthropic @bcherny - Claude Code creator, always shares great tips @trq212 - also a Claude Code developer; writes amazing articles on CC OpenAI @polynoamial - works on reasoning research, shares a lot of technical details @gabriel1 - Sora developer, great career path @jxnlco - works on dev experience, shares a lot about Codex Google AI @OfficialLoganK - all the major Google Gemini and AI Studio updates @ammaar - product and design; shares great things about vibe-coding in Google AI Studio @fofrAI - cool use cases for generative models Cursor @leerob - the loudest voice behind Cursor updates @ericzakariasson - shares great insights on using Cursor @mntruell - Cursor’s CEO; major releases and usage updates xAI @milichab - recently joined xAI, shares updates on Grok @skcd42 - also covers major Grok releases @ai_explorer25 - covers all ai content and free resources

ai_explorer25's tweet photo. Best accounts to follow from each frontier lab to stay constantly up to date

Anthropic

@karpathy
- must-follow account for AI; recently joined Anthropic

@bcherny
- Claude Code creator, always shares great tips

@trq212
- also a Claude Code developer; writes amazing articles on CC

OpenAI

@polynoamial
- works on reasoning research, shares a lot of technical details

@gabriel1
- Sora developer, great career path

@jxnlco
- works on dev experience, shares a lot about Codex

Google AI

@OfficialLoganK
- all the major Google Gemini and AI Studio updates

@ammaar
- product and design; shares great things about vibe-coding in Google AI Studio

@fofrAI
- cool use cases for generative models

Cursor

@leerob
- the loudest voice behind Cursor updates

@ericzakariasson
- shares great insights on using Cursor

@mntruell
- Cursor’s CEO; major releases and usage updates

xAI

@milichab
- recently joined xAI, shares updates on Grok

@skcd42
- also covers major Grok releases

@ai_explorer25
- covers all ai content and free resources

320

347

52K

194

Underleveled Builder

@UnderleveledDev

9 days ago

完整中文翻译（全文）：Mythos 级模型（如 Claude Fable 5）已经改变了我们在 Anthropic 的许多工作方式。我想分享两个充分利用这类模型的技巧。 1. 自我纠正循环（Self-correction loops）最近大家对“循环”很感兴趣。@bcherny 提到“（他的）工作就是写循环”。让模型在评估指标上进行 hillclimb（爬坡优化）是提升任务性能的常用方法：Claude Code 中的 /goal 和 Claude Managed Agents 中的 Outcomes 就是让你为特定任务应用这一通用方法的原语。正如我们在提示指南中提到的，Fable 5 在循环中非常擅长自我纠正。一个设计良好的目标（goal）或评分标准（rubric）能为 Claude 运行的环境提供反馈。这让 Claude 可以运行 → 收集反馈 → 自我纠正 → 继续，直到满足目标或评分标准为止。我分享一个我用来测试 Fable 的玩具例子：Parameter Golf 是一个开源的 ML 工程挑战，要求在 8xH100 GPU 上，用不到 10 分钟的时间训练出能塞进 16MB 模型文件的最佳模型。这有点像 @karpathy 的 autoresearch 项目：它考验代理编辑基础训练代码（单个 train_gpt.py 文件）、启动训练、轮询日志、读取分数，然后决定下一步实验的能力。我使用 Claude Managed Agents（CMA）在该挑战上对比了 Fable 5 和 Opus 4.7。CMA 提供了代理运行环境和托管沙箱，非常适合 Fable 5 的长时间任务。对于 Parameter Golf，我给 CMA 提供了自托管沙箱的 8xH100 GPU 访问权限。一个微妙但重要的点是：由谁来判断结果很重要。我们发现模型在自我批判自己的输出时存在问题。Prithvi Rajasekaran 在我们的工程博客中写过相关内容。我们发现，使用独立的 verifier 子代理往往比 Fable 5 自我批判效果更好，因为评分是在独立上下文窗口中完成的。CMA 中的 Outcomes 会自动为你生成一个评分子代理。每次测试我都会提供一个评分标准文件（包含 9 条可检查标准，例如“运行基线实验”“运行 20 个实验”等）。然后让它运行 Parameter Golf，最多 8 小时。Outcomes 的评分器会确认所有实验标准都满足后，才允许 Claude 停止工作。结果：Fable 5 对训练流程的改进大约是 Opus 4.7 的 6 倍。如果把实验分为结构型（例如架构变更）和标量型（例如调整一个常量），Fable 5 更倾向于进行大型结构化修改，并表现出更强的韧性（例如克服量化回归，最终取得最大胜利）。而 Opus 4.7 的第一个实验就产生一个小改进，之后几乎都遵循同一模板：调整一个标量 → 测量 → 如果正向就保留。 2. 记忆（Memory）记忆是 Fable 另一个表现出色的领域。我们可以把它看作跨越多个会话的外循环：Claude 在会话中写入记忆，这些记忆可以在后续会话中被检索。@pgasawa 和团队最近发布了 Continual Learning Bench 1.0，因此我想在 Fable 5 和更早模型上测试这一点。我对比了 Fable 5、Opus 4.7 和 Sonnet 4.6 在基准测试中的一个任务：要求代理通过 SQL 数据库回答一系列顺序问题。每个问题是独立的代理会话，通过共享内存提供上下文。我使用了带记忆功能的 CMA，每个代理都能访问一个可跨会话共享的挂载文件系统。在这个任务中，有效使用记忆需要经历以下流程：失败（记录错误）→ 调查（找出原因）→ 验证（把诊断转为可检查的事实）→ 提炼（把验证结果变成通用规则）→ 查阅（直接读取规则，而不是每次重新推导）。 Sonnet 4.6：基本停留在第 1 步，它的记忆库只是失败记录和开放猜测（如“也许是 prc 而不是 prc_usd？”），很少查阅之前的笔记。要提升性能需要任务特定的记忆指令。 Opus 4.7：大概走到第 3 步，会创建一个带不确定性标记的 schema 参考（如“可能是以美分为单位的 prc？需验证。”），但验证覆盖率很低：在 7-33% 的问题之间（中位数约 17%）。 Fable 5：倾向于走完整个流程。在最优运行中，验证覆盖率最高达到 73%（30 个问题中验证了 22 个），并将经验提炼成通用规则，帮助后续任务。核心建议：与其直接用提示强行引导 Fable 5，不如设计良好的循环（利用 /goal 或 Outcomes 让模型根据环境反馈自我纠正）和记忆机制（让模型自己管理上下文）。我分享的只是自己跑的一些小规模实验，但非常值得你亲自在高难度任务上测试 Fable 5，并充分利用循环自我纠正和记忆功能。要开始使用，请查看我们的文档，或者直接询问最新版的 Claude Code——它可以使用内置的 /claude-api 技能告诉你 Fable 5 的最佳实践、/goal、Claude Managed Agents 或其他 API 功能。

Lance Martin

@RLanceMartin

9 days ago

https://t.co/es0JQM4MS9

103

767

16K

635

UnderleveledDev retweeted

Boris Cherny

@bcherny

11 days ago

Seeing a number of benchmarks showing Opus is the best model for long-running work. Five tips for running Opus autonomously for hours/days: 1. Use auto mode for permissions, so Claude doesn’t ask for approval 2. Use dynamic workflows, to have Claude orchestrate hundreds/thousands of agents to get a task done 3. Use /goal or /loop, to nudge Claude to keep going until it’s done 4. Use Claude Code in the cloud, so you can close your laptop (easiest way is the desktop or mobile app) 5. Make sure Claude has a way to self-verify its work end to end: Claude in Chrome browser extension for web, iOS/Android sim MCP for mobile, a way to start the full web server or service for backend work

314

281

643K

Underleveled Builder

@UnderleveledDev

10 days ago

Peter Steinberger 的病毒式推文在 AI 编程圈引发热议本月 AI 编码领域最常被重复的一句话只有六个词，却几乎没人能准确定义它。彼得·斯坦伯格（Peter Steinberger）的一条推文本周霸占了整个时间线，引发了大规模讨论。我搜索了大家争论的“loop”（循环）一词，发现它确实有五年的演进历史，而真正的关键点在于：现在昂贵的不是模型，而是循环本身。起源与核心推文 2026 年 6 月 7 日，Peter Steinberger 发帖： “每月提醒一次：你不应该再手动提示编码代理了。你应该设计循环，让循环去提示你的代理。” 这条推文浏览量超过 220 万，回复区变成了一场混战。很多人引用它，但真正的问题是“实际怎么做？”——Matthew Berman 的一句回复成了氛围：“除了他和 Boris，没人知道。” 什么是“Loop”？ Boris Cherny（Claude Code 作者）在 2026 年 6 月的演讲中给出了最清晰的定义： “现在我不再提示 Claude 了。我写的是运行中的循环，它们负责提示 Claude 并决定下一步做什么。我的工作是写循环。” 简单来说：Loop 是一个你编写的小程序，它替你提示编码代理、读取输出、判断是否完成，如果没完成就继续提示。你不再是循环里的“打字员”，而是循环的设计者。模型变成了子程序。 Boris 的演进阶梯：一年前：手动写代码 + 自动补全几个月前：同时跑 5-10 个 Claude 会话现在：完全不提示，自己写循环，让数百个代理读取 GitHub、Slack、Twitter 并自主构建他声称最近 30 天内对 Claude Code 的所有贡献 100% 由 Claude Code 完成，合并了 259 个 PR，还删掉了 IDE。 Loop 的演进历史（从旧到新） 2022 ReAct：学术 while 循环，模型推理 → 调用工具 → 读取结果 → 重复（单模型、人监控）。 2023 AutoGPT：给目标让它自我提示，常无限循环导致失败。 2025 ralph/goal：Geoffrey Huntley 的简单 bash 循环，每次重置上下文，用固定锚点文件。花 297 美元就建了一门编程语言。 2026 春季：Codex 和 Claude Code 推出 /goal 命令，带验证器自动停止。 2026 当前：多代理编排循环（Boris 和 Steinberger 所指）。循环本身成为工作单元、可并发监督其他循环、定时运行、有持久化状态（Git-backed）、崩溃恢复。本质：它就是带决策能力的 cron 任务。cron 执行固定脚本，loop 则让模型根据当前状态动态决定下一步、验证结果，并决定是否继续。实际构建建议（Boris 的 5 个提示）使用 auto 模式授权，无需每次确认动态工作流，让 Claude 编排成百上千个代理用 /goal 或 /loop 让它持续直到完成在云端运行，关掉笔记本也能工作最重要：让 Claude 能端到端自我验证反馈闭环是关键：没有验证的 loop 只会快速制造自信的错误。好的 loop 会写代码 → 运行测试 → 读取结果 → 修正。生产现实：Loop 才是昂贵部分模型调用变便宜后，管理循环的成本成了瓶颈（Uber 已对工程师设置 Claude Code 月费上限）。失败模式：无限循环烧钱。因此必须有硬停止条件：最大迭代次数、无进展检测、token/美元预算上限。真正资产是技能（Skills），而非提示。把重复任务变成可复用的命名技能，loop 调用它们才能复利增长。总结关键模式 Loop = cron + 模型决策器（非硬编码分支）单代理 ralph 已过时，多代理监督才是新东西验证 > 编排成本从 token 转向循环管理成功者不再是提示工程师，而是循环 + 技能库的设计者这条推文的核心不是“提示工程已死”，而是停止做循环里的人，去写循环、定义意图、设置停止条件、提供技能和反馈，让它在后台运行。你去决定下一个要构建什么就好。现在 /loop 命令已经让入门变得非常简单——很多人已经在睡梦中让它自动提交 PR 了。

Matt Van Horn

@mvanhorn

11 days ago

https://t.co/DM0CAuyprS

212

476

16K

442

Underleveled Builder

@UnderleveledDev

14 days ago

作为一名 AI 工程师，请学习： Harness engineering，而不只是 prompt engineering 学会构建和控制 AI 系统运行环境，而不只是写提示词。 Context engineering，而不只是写很长的 prompt 学会设计上下文、状态、工具、记忆和信息流，而不是简单堆一大段提示词。 Prompt caching 和 semantic caching 的取舍什么时候缓存 prompt，什么时候基于语义缓存结果，以及它们在成本、速度和准确性上的权衡。 KV cache 管理、淘汰、复用，以及大规模运行时的内存压力理解大模型推理时缓存如何占用显存，如何复用，什么时候要清理，以及规模上来后为什么会成为瓶颈。 Prefill 延迟和 decode 延迟，以及为什么它们优化方式不同理解模型处理输入阶段和逐字生成输出阶段的性能瓶颈不同，所以优化方法也不同。 Continuous batching、paged attention 和吞吐量优化学会如何把多个请求高效合批处理，如何优化注意力机制中的显存使用，以及如何提升整体服务吞吐量。 Speculative decoding、quantization、distillation 之间的取舍理解推测解码、量化、蒸馏分别如何提速或降成本，以及它们对质量的影响。 INT8、INT4、FP8、AWQ、GPTQ，以及什么时候量化会损害质量理解不同量化方式的优缺点，知道并不是量化越狠越好，有些任务会明显掉质量。结构化输出失败、schema 校验、修复循环和 fallback 链路学会处理模型输出 JSON 或结构化数据失败的情况，用校验、自动修复、重试和备用方案保证系统可用。 Function calling 可靠性、工具契约、参数校验和幂等性理解模型调用工具时不能完全信任模型，必须定义清楚工具输入输出、校验参数，并保证重复调用不会造成错误后果。 Agent guardrails、循环预算、工具预算和终止条件给 AI agent 设置护栏，限制它最多循环多少次、最多调用多少工具、什么时候必须停止，避免失控。模型路由、优雅降级逻辑和降级模式下的用户体验根据任务选择不同模型；当强模型不可用或太贵时，自动切换到便宜模型或简化模式，同时让用户体验不崩。 RAG 架构：切分、embedding、混合搜索、重排序和 freshness 理解检索增强生成系统怎么做：文档如何切块，如何生成向量，如何结合关键词和向量搜索，如何重排结果，以及如何保证信息新鲜。检索评估：召回率、精确率、grounding、归因和引用质量评估检索系统是否真的找到了正确材料，答案是否有依据，引用是否准确。评估体系：golden sets、回归测试、对抗测试、LLM-as-judge 和人工评估建立标准测试集，防止模型或 prompt 改动后质量退化；用自动评估、AI 评审和人工评审结合判断效果。把 LLM 可观测性当成一等公民：traces、spans、tokens、latency、errors 和 drift 像监控传统软件一样监控 LLM 系统：记录调用链、token 消耗、延迟、错误和模型表现漂移。按功能、工作流、租户和用户旅程归因成本，而不只是按模型看成本不只是看“这个模型花了多少钱”，而是要知道哪个功能、哪个流程、哪个客户、哪个用户路径最烧钱。安全工程：防 prompt injection、防数据泄露和权限边界设计系统来防止恶意提示词攻击、敏感信息泄露，以及模型越权访问不该访问的数据或工具。多租户隔离、缓存安全和防止跨用户上下文污染在多个用户共用系统时，确保一个用户的数据、缓存、上下文不会泄露到另一个用户那里。 Fine-tuning、in-context learning、RAG、distillation 的取舍，以及什么时候每种方法都是错的工具理解微调、上下文学习、RAG、蒸馏分别适合什么场景，也要知道什么时候不该用它们。整个推理栈中的延迟、质量、成本和可靠性的权衡从模型调用到服务部署，整体理解速度、效果、费用和稳定性之间的取舍。生产环境故障模式：幻觉式工具调用、格式错误的 JSON、过时检索、失控 agent 和静默的评估退化理解真实上线后最常见的问题：模型乱调用工具、输出坏格式、检索到旧信息、agent 不停循环，以及系统质量悄悄变差但没人发现。

diva

@divaagurlxw

15 days ago

As an AI Engineer. Please learn >Harness engineering, not just prompt engineering >Context engineering, not just long prompts >Prompt caching vs. semantic caching tradeoffs >KV cache management, eviction, reuse, and memory pressure at scale >Prefill vs. decode latency and why they optimize differently >Continuous batching, paged attention, and throughput optimization >Speculative decoding vs. quantization vs. distillation tradeoffs >INT8, INT4, FP8, AWQ, GPTQ, and when quantization hurts quality >Structured output failures, schema validation, repair loops, and fallback chains >Function calling reliability, tool contracts, argument validation, and idempotency >Agent guardrails, loop budgets, tool budgets, and termination conditions >Model routing, graceful fallback logic, and degraded-mode UX >RAG architecture: chunking, embeddings, hybrid search, reranking, and freshness >Retrieval evals: recall, precision, grounding, attribution, and citation quality >Evals: golden sets, regression tests, adversarial tests, LLM-as-judge, and human evals >LLM observability as a first-class discipline: traces, spans, tokens, latency, errors, and drift >Cost attribution per feature, workflow, tenant, and user journey not just per model >Safety engineering: prompt injection defense, data leakage prevention, and permission boundaries >Multi-tenant isolation, cache safety, and cross-user context contamination prevention >Fine-tuning vs. in-context learning vs. RAG vs. distillation and when each is the wrong tool >Latency, quality, cost, and reliability tradeoffs across the full inference stack >Production failure modes: hallucinated tool calls, malformed JSON, stale retrieval, runaway agents, and silent eval regressions

109

491

241K

583

Underleveled Builder

@UnderleveledDev

15 days ago

胡彦斌一个月上架一个APP，网上一堆程序员老哥嘲讽这是营销，纷纷质疑真实性。说实话，这只能怪AI发展太快，大部分人还活在古法编程时代没有醒来。对于我这样闭眼编程的玩家来说，那个APP做一个月实在是太慢了，熟练了以后一周就能做完。而且一眼代码不用看，什么IDE根本不需要。我只用tui就完成了一个浏览器拓展和一个网站的上线，功能是把youtube视频变成双语交替播放的播客。整个开发我一行代码也没有看，所有问题我都不深究原理和实现，但是最后的功能完全符合我的要求。胡彦斌也不是特别懂vibecoding，为什么要开一个vscode，搞得自己好像要看代码似的。 AI早已经解决了软件开发，部署上架审核这些事情也都能依靠AI轻松解决。程序员群体得群嘲更像是一种面对威胁的应急反应，用看不起和质疑来掩饰潜意识的惶恐。人和人的差距来自于否定自我重新开始的勇气，可惜大部分人没有勇气，只有键盘。

UnderleveledDev's tweet photo. 胡彦斌一个月上架一个APP，网上一堆程序员老哥嘲讽这是营销，纷纷质疑真实性。
说实话，这只能怪AI发展太快，大部分人还活在古法编程时代没有醒来。
对于我这样闭眼编程的玩家来说，那个APP做一个月实在是太慢了，熟练了以后一周就能做完。而且一眼代码不用看，什么IDE根本不需要。
我只用tui就完成了一个浏览器拓展和一个网站的上线，功能是把youtube视频变成双语交替播放的播客。整个开发我一行代码也没有看，所有问题我都不深究原理和实现，但是最后的功能完全符合我的要求。
胡彦斌也不是特别懂vibecoding，为什么要开一个vscode，搞得自己好像要看代码似的。
AI早已经解决了软件开发，部署上架审核这些事情也都能依靠AI轻松解决。
程序员群体得群嘲更像是一种面对威胁的应急反应，用看不起和质疑来掩饰潜意识的惶恐。
人和人的差距来自于否定自我重新开始的勇气，可惜大部分人没有勇气，只有键盘。

Dash

@DashHuang

15 days ago

原贴评论里居然还有那么多人不相信这个 App 是胡彦斌一个人 Vibe 出来的，咬定是团队做的。孰不知这类产品现在团队做效率反而要比自己 Vibe 低得多……

60K

493

Underleveled Builder

@UnderleveledDev

26 days ago

1/ 最近我在大型项目上运行编码代理时学到的一些心得。其中大部分内容都与 6 个月前的建议相矛盾！ 2/ 要想得更大。这是目前我看到的最常见错误：任务范围定义得太小。现在你应该瞄准那种需要优秀工程师花费好几周的工作量。 3/ 尝试为整个项目使用一个长时间运行的实现者会话。我的会话通常会连续运行几天甚至几周，并进行多次压缩。现在压缩功能已经很有效了。长时间会话能记住你的约定和模式，你不再需要反复解释。 4/ 用一个持久的任务列表来驱动它。你的工作是添加经过审查的任务，速度要快于它完成的速度。这就像往蒸汽机里铲煤一样。每项任务都应包含：要做什么、如何验证、完成后附上证明笔记。标准是：如果按书面要求满足，你就会信任结果。在你晚上停止工作前（尤其是周五），尽可能多地排队任务。 5/ 把大部分时间花在规划文档上，而不是盯着代理看。我通常在短暂的规划会话中创建计划，添加一个或多个任务，然后结束会话。好的规划文档是自包含的，会指定接口层面的细节，并包含明确的端到端验证策略。值得不断迭代计划，直到它们非常完善为止。 6/ 对抗性审查是实现长时间无人值守运行的关键。在任何任务被标记完成之前，一个全新的只读子代理会对比待办事项 + 计划来审查差异，并返回差距。（这往往过于强大，你需要调低以避免过度工程化。） 8/ 设置具有不同角色的长期会话：规划者、实现者、对抗性审查者、黑盒测试者、问题分类者、深度代码审查者。你的工作是将它们连接起来，确保实现者永不空闲，并进行监控和审查，以保证一切正常并捕捉错误。 9/ 让自己跳出循环。不要手动处理 PR，不要在终端输入，不要检查 CI。如果你发现自己在亲自测试，那就停下来！代理需要证明工作已完成；你的工作是进行双重检查。 10/ 把 20% 以上的时间花在元层面（meta）。确保你注意到的任何错误都被纳入未来的指令中，避免再次发生。迭代代理正在遵循的流程。改进测试框架。但是要注意不要过度工程化工作流。通常情况下，更简单更好。

Simon Last

@simonlast

27 days ago

1/ Some things I've learned recently running coding agents on large-scale projects. Most of this contradicts advice from 6 months ago!

209

572K

103

Underleveled Builder

@UnderleveledDev

26 days ago

Daily Life of Programming with Eyes Closed: AI: Go check the console errors on the webpage, and you can locate the bug. Me: You check it yourself. AI: I can’t see it because the environment is different. Me: Figure it out yourself. AI: Let me see if I can find a way... (two minutes later) Bug fixed, all tests passed.

Underleveled Builder

@UnderleveledDev

26 days ago

闭眼编程的日常： AI：你去看一下网页上的console报错，就能定位bug。我：你自己看。 AI：环境不一样看不了。我：自己想办法。 AI：让我看看有没有办法...（两分钟后）bug已解决测试全过。

Underleveled Builder

@UnderleveledDev

27 days ago

“今天我们将员工人数减少了 22%。公司的业务目前处于有史以来最强劲的阶段。因此，我认为有必要直接向大家说明我所看到的现状以及原因。首先，这是我做出的决定，我对此全权负责。我之所以这样做，是因为以最高生产力水平运营的方式正在发生改变，而为了赢得未来，ClickUp 需要随之改变。其次，这与削减成本无关。这次变革节省下来的大部分资金将直接回流到留下来的员工身上。我们将引入百万美元级别的薪资架构。如果你能利用人工智能创造巨大的影响力，你将获得传统薪酬标准之外的丰厚报酬。最重要的是，我对受到影响的员工表示最深切的感激。我们之所以在公司处于强势地位时这样做，正是为了能够妥善地照顾大家。每位受影响的员工都将获得一份旨在认可其贡献并帮助其平稳过渡的补偿方案。我认为只有两种选择：要么等待这一切在市场中逐渐演变，要么坦诚面对我所看到的情况并采取主动行动。 100倍组织（THE 100X ORGANIZATION）最主要的变化是，我们将围绕我所说的“100倍组织”进行重组。目标是实现 100 倍的产出。在最高水平上进行建设所需的角色，已与一年前截然不同。对现有系统的渐进式改进无法让我们达成目标。我们需要全新的系统。这意味着我们需要创造足够的颠覆来进行重建，而不是在已经失效的系统上修修补补。普遍的说法是，人工智能让每个人都变得更具生产力。其实不然。如果保持不变，当今许多工作流都会在人工智能系统中造成瓶颈。这些角色将会不断演进。但如果要等待这一切自然发生，就意味着我们现在就会落后。实际上，“100倍组织”极其依赖于人——比今天还要依赖得多。只有当“10倍员工”拥抱并采用全新的工作方式时，这才有实现的可能。建设者、智能体管理者和一线人员（THE BUILDERS, AGENT MANAGERS, AND FRONT-LINERS） — 建设者：10倍工程师我认为大多数公司还没有完全理解人工智能在工程领域究竟引发了什么变化。普遍的看法是，人工智能让所有工程师都更具生产力。在局部来看可能确实如此，但在组织层面上——这与现实相去甚远。以下是我们最近在 ClickUp 验证的事实：那些优秀的、能够进行统筹、架构和代码审查的工程师，正在成长为 100 倍工程师。他们不再亲自写代码，而是在指挥编写代码的智能体（agents）。这项技能的核心是判断力。人工智能让最优秀的工程师的生产力呈爆发式增长，而其他使用人工智能的人反而会拖慢这些工程师的步伐。试想一下——现在的瓶颈在于：(1) 统筹协调——告诉 AI 该做什么，以及 (2) 审查核验——AI 做得怎么样。所有中间环节都被跨越，不再被需要。那么，你希望由谁来负责统筹和审查代码呢？你又希望你最优秀的工程师把时间花在什么地方？如果你最优秀的工程师把时间花在审查别人的代码上，这本身就是一个低效的瓶颈。这些工程师审查他们所属智能体的代码，要比审查人类写的代码快得多。新世界的法则是让你公司的 10 倍工程师成长为 100 倍工程师。错误的策略是强迫每位工程师无节制地使用 Token。采取这种做法的公司可能会为拉取请求（Pull Requests）数量增加 500% 而欢呼。但实际带给客户的成果与生成的代码量并不匹配。我将其称为 AI 编程的“大清算”，如果不是现在，那么每家公司很快都将面临这一局面。更多的代码只会成为顶尖工程师的另一个瓶颈，并最终制约你们公司的影响力。 — 建设者：10倍产品经理产品管理和设计角色正在融合。关注客户体验的设计师变得越来越像产品经理。而对用户体验（UX）拥有敏锐直觉的产品经理变得越来越像设计师。用户研究的瓶颈已不复存在。现在，我们只需向智能体提一下需求，就能启动研究并分析结果。产品与设计之间迭代的瓶颈也不复存在。产品建设者可以在智能体和相关技能的辅助下自行进行迭代，以确保与质量和战略保持一致。尽管如今仍有争议——但我认为，让你的产品经理（PM）推送代码到生产环境是错误的策略，这只会引入一个新的瓶颈，让最优秀的工程师在上面浪费时间。需要明确的是，PM 应该写代码，但他们应该在测试环境（playground）中进行，用于迭代、验证和确定范围。这些代码不应该直接进入生产环境。除了管理系统、统筹 AI 和审查输出结果之外的所有工作都会成为瓶颈。这就是为什么除了这些职位之外，其他同样关键的角色是系统管理者（用于减少瓶颈），以及一个你无法替代的“瓶颈”——与客户开会的时间。 — 系统管理者（THE SYSTEM MANAGERS）讽刺的是，那些用 AI 自动化自己工作的人将永远拥有工作。他们会成为 AI 系统的所有者——即智能体管理者（agent managers）。在 ClickUp，我们有很多这样的例子。我们运营的底层系统绝对至关重要，必须确保其正确无误。我认为，大多数公司如果以为在现有系统上修修补补就能在这个新世界中竞争，那简直是痴人说梦。你必须创造足够的颠覆性改变，从而完全摒弃旧系统。如果“AI 原生”有一个明确的定义，那就是如此。 — 一线人员（THE FRONT-LINERS）在一个即将充斥着 AI 交流的世界里，对于客户而言，人与人之间的情感连结将比什么都重要。这是一个你不应该去替代的瓶颈——即使智能体的质量高到足以召开视频会议。与客户一对一会议的时间是不应该被自动化的。但是围绕会议开展的系统工作应该被自动化——这样一线人员就可以把近乎 100% 的时间花在客户身上。奖励100倍的影响力（REWARDING 100X IMPACT）在一个公司能够以更少资源做更多事情的世界里，多出来的那些钱去哪儿了？在我们的案例中，这种新运营模式节省下来的大部分资金，将直接回流到那些促成这种模式的员工身上。我们必须对创造生产力的人给予相应的回报。这样才能统一双方的利益。此外，在这个你最优秀的员工能创造 100 倍影响力的世界里，你承受不起失去他们的代价。你应该致力于将这些员工留住几十年。他们所掌握的业务背景，以及他们高效统筹和审查的能力，几乎是不可替代的。现有的薪酬标准应该被抛弃。我们将引入每年 100 万美元现金的薪酬标准，只要能通过创建或管理 AI 系统产生 100 倍的影响力，公司里的几乎每个人都有路径达到这个标准。未来（THE FUTURE）几乎所有公司都会进行类似的变革。那些主动出击的公司将定义未来的走向。未来并不意味着人员减少。而是意味着不同的工作、新的角色，以及为那些拥抱变革的人提供更好的回报。我们已经看到像“智能体管理者”这样在一年前还不存在的全新角色涌现出来。 ClickUp 正致力于引领这一转变，这不仅仅是公司内部的转变，也是为我们的客户考虑。我从未对我们前进的方向如此确信。”

Zeb Evans

@DJ_CURFEW

28 days ago

Today we reduced headcount by 22%. The business is the strongest it's ever been. So I think it's important to be direct about what I'm seeing and why. First, I made this decision and I own it. I did it because the way to operate at the highest level of productivity is changing, and to win the future, ClickUp needs to change with it. Second, this wasn't about cutting costs. Most savings from this change will flow directly back into the people who stay. We'll be introducing million-dollar salary bands. If you create outsized impact using AI, you'll be paid outside of traditional bands. Most importantly, I have the deepest gratitude for those affected. We're doing this from a position of strength specifically so we can take care of people properly. Everyone affected receives a package aimed at honoring their contributions and easing the transition. I only see two options: wait for this to play out gradually in the market or be honest about what I'm seeing and act proactively. THE 100X ORGANIZATION The primary change is that we're restructuring around what I call 100x org. The goal is 100x output. The roles required to build at the highest level are fundamentally different than they were a year ago. Incremental improvements to existing systems won't get us there. We need new ones. That means creating enough disruption to rebuild rather than iterate on what's already broken. The common narrative is that AI makes everyone more productive. It doesn't. Many of the workflows of today, if left unchanged, create bottlenecks in AI systems. These roles will evolve. But waiting for that to happen naturally means falling behind now. The 100x org is actually heavily dependent on people - infinitely more than today. This is only possible with 10x people that have embraced and adopted new ways of working. THE BUILDERS, AGENT MANAGERS, AND FRONT-LINERS — THE BUILDERS: 10X ENGINEERS I don't think most companies have internalized what's actually happening with AI in engineering. The common narrative is that AI makes all engineers more productive. That may be true in isolation, but at an organization level - that is the farthest thing from reality. Here's what we've validated recently at ClickUp: the great engineers, the ones who can orchestrate, architect, and review, are becoming 100x engineers. They're not writing code. They're directing agents that write code. The skill is judgment. AI makes the best engineers wildly more productive, and everyone else using AI slows these engineers down. Think about it - the bottlenecks are (1) orchestration - telling AI what to do, and (2) reviewing - what AI did. Everything is leapfrogged and no longer needed. So who do you want orchestrating and reviewing code? And how do you want your best engineers to spend their time? If your best engineers are spending time reviewing other people's code, then this is inherently an inefficient bottleneck. These engineers can review their agent's code much faster than reviewing human code. The new world is about enabling your 10x engineers to become 100x. The wrong strategy is to push every engineer to use infinite tokens. Companies doing this are celebrating 500% more pull requests. But customer outcomes don't match the volume of code being generated. I call this the great reckoning of AI coding, and every company will face this soon if not already. More code is just another bottleneck to the best engineers, and ultimately to your company's impact as well. — THE BUILDERS: 10X PRODUCT MANAGERS Product management and design roles are merging. Designers that have customer focus, become more like product managers. And product managers that have intuition for UX become more like designers. The bottleneck of user research is gone. It takes us just one mention of an agent to kickoff research and analyze results. The bottleneck of product <> design iteration is also gone. The product builder iterates on their own, along with agents and skills that ensure alignment with quality and strategy. Also controversial today - I believe that the wrong strategy is to have your PMs shipping code - that just introduces another bottleneck that the best engineers will waste their time on. To be clear, PMs should be coding but they should do this in a playground to iterate, validate, and scope. That code should not go to production. Everything outside of managing systems, orchestrating AI, and reviewing output becomes a bottleneck. That's why the other roles that are critical along with these are the systems managers (to reduce bottlenecks) along with a bottleneck you can't replace - customer meeting time. — THE SYSTEM MANAGERS Ironically, the people that automate their jobs with AI will always have a job. They become owners of the AI systems - agent managers. We have many examples of these people at ClickUp. The underlying systems in which we operate are absolutely critical to get right. I think most companies are delusional to think they can iterate on existing systems and compete in this new world. You must create enough disruption so that old systems are deprecated entirely. If there's any definition for 'AI native' that's what it is. — THE FRONT-LINERS In a world that will become saturated with AI communication, the human touch will matter more than anything to customers. This is a bottleneck that you shouldn't replace - even when agents are high enough quality to do video meetings. One-on-one meeting time with customers is something that shouldn't be automated. The systems around the meetings should be - so that front-liners spend nearly 100% of their time with customers. REWARDING 100X IMPACT In a world where companies are able to do so much more with less, where does that excess money go? In our case, much of the savings in this new operating model will flow directly back to those that enabled it. We must reward people that create productivity accordingly. This aligns incentives on both sides. Plus, in a world where your best people create 100x impact, you can't afford to lose them. You should aim to retain these employees for decades. The context they have and their ability to efficiently orchestrate and review will be nearly impossible to replace. Compensation bands of today should be thrown out the door. We're introducing $1 million cash/year salary bands with a path available to nearly everyone in the company if they produce 100x impact by creating or managing AI systems. THE FUTURE Nearly every company will make changes like these. The ones that do it proactively will define what comes next. The future is not fewer people. It's different work, new roles, and better rewards for those who embrace it. We're already seeing entirely new roles emerge, like Agent Managers, that didn't exist a year ago. ClickUp is positioning to lead this shift, not just internally, but for our customers too. I've never been more certain about where we're headed.

10K

14K

Underleveled Builder

@UnderleveledDev

30 days ago

一句话公司不该再像罗马军团那样靠人传递信息，而应该被重构成一组"递归式自我改进的 AI 循环"——人坐在边缘负责跟现实接触，中间的"公司大脑"由 AI 跑。核心论点（按章节） 1. 旧范式：公司 = 罗马军团今天的公司沿用罗马军团的结构——嵌套层级、信息靠人在上下层之间传递。这个结构成立的前提是"人是信息管道"。AI 打破了这个前提。 2. Copilot 是错的心智模型"让工程师效率提升 20%"这种叙事是在给旧引擎换更强的马达，没改变结构。真正的机会是重新定义"公司是什么"。 3. 真正要做的事：提取领域知识你的 know-how 散落在人脑、Slack、邮件、Notion 里。把它"提取出来 + 定义成 context/skills"，公司就从层级化组织跃迁为 AI 原生组织。 4. 公司 = 一组递归式自我改进的 AI 循环每个循环五层：感知层 → 策略层 → 工具层 → 质量关卡 → 学习机制。如果五步能在无人或少人干预下闭环，公司会在你睡觉的时候越变越好。 5. YC 的"卧槽"时刻在 query agent 之上加了一个监控 agent：自动发现哪些查询失败 → 诊断原因 → 写代码 → 提 PR → AI 审 → 合并部署。第二天同样的查询就能成功。这才是 AI 的真正形态，不是"提升 20%"。 6. 这种循环可以复制到任何环节产品分析 → A/B → 部署的自我优化产品循环；客服建议 → CPO/CTO agent 判断 → 一夜写完部署的自我优化客服循环。 7. 运营原则：烧 token，不要堆人头Demo Day 时人均营收已是 18 个月前的 5 倍。瓶颈很快会从人头变成 token。"谁在 token max"是判断员工价值的方向性指标。 8. 中层管理结束只剩两种角色：IC（Builder/Operator）和 DRI（具名的直接负责人）。协调由 AI 做，不再需要中层。 9. 第一性原则：让一切对 AI 可读（legible）没被记录的事，对 AI 来说就没发生过。邮件、Slack、DM、office hours——全录、全存、做摘要化、给 AI 留"面包屑"。 10. 案例：自我再生的 YC User Manual用 2000 小时 office hours 录音，一个周末重写出 150 页新版，且每月自我更新——成为 16 位 YC 合伙人合力智慧的"活体大脑"。 11. 软件是易耗品，上下文才是资产珍重地存所有数据，但把软件当一次性的。模型每两个月更聪明一次——扔掉旧软件，用原始指令重新生成。真正值钱的是业务上下文和 skills，软件只是它们的临时外壳。 12. 人坐在边缘人类负责模型还无法触达的地方：全新情境、伦理判断、高风险/高情绪时刻（联合创始人闹分手、关键销售对话）。这是智能与现实接触的地方。收尾的暴击问题如果你今天从零开始建公司，你会按这种形态搭吗？小公司没有任何借口不这么搭。一句话浓缩的"思想脉络" 数据全录 → 摘要化让 AI 可读 → context/skills 是资产、软件是易耗品 → 每个职能跑递归式自我改进循环 → 只留 IC + DRI → 烧 token 不堆人头 → 人退到边缘做高风险接触面。

Y Combinator

@ycombinator

30 days ago

In a recent batch talk, YC General Partner @t_blom broke down how to build a self-improving, AI-native company. He walks through how to create recursive, self-improving AI loops, and why founders who get this right will run companies that improve while they sleep. 00:00 — Companies Are Roman Legions 00:54 — Copilots Are the Wrong Mental Model 01:55 — Extract the Domain Knowledge 02:24 — The Recursive Self-Improving Loop 04:12 — The Holy Shit Moment at YC 05:50 — Self-Optimizing Product and Support Loops 06:29 — Burn Tokens, Not Headcount 07:23 — Middle Management Is Over 08:05 — Make Everything Legible to AI 09:40 — Regenerating the YC User Manual 11:19 — Software Is Ephemeral, Context Is Valuable 12:18 — Where Humans Still Matter

236

589K

106

Underleveled Builder

@UnderleveledDev

about 1 month ago

当前 36 个最大的创业机会最大的大众消费（B2C）机会：解决孤独。第三空间、社区类 App、线下真实互动（IRL）。最大的企业服务（B2B）机会：为企业提供全托管的 AI 员工。最大被低估的机会：老年科技。7000 万婴儿潮一代渴望能让他们更快乐、更健康的产品。最大的移动端机会：行动导向型 App（替你办事的应用），而不是让你盯着消磨时间的效率黑洞。最大的蓝领技术机会：电工、水管工、HVAC（暖通空调）的供需匹配平台。目前这部分劳动力供给正在缩水。最大的消费者社交机会：微型社交。将群聊打造为独立产品，没有信息流，没有 AI 垃圾内容。最大的电子商务机会：能懂你喜好、帮你逛街、甚至替你直接下单的 AI 购物代理（Agents）。最大的创作者机会：直播秀和无剧本内容。最大的教育科技（EdTech）机会：通过对话进行动态调整的 AI 导师。最大的软件服务（SaaS）机会：按效果/按产出付费的定价模式（Pay-per-outcome）。最大的汽车行业机会：用于汽车经销店的 AI 服务顾问。24/7 全天候回答那 15 个最常见的老大难问题。最大的人才/培训机会：培训非技术人员去操作和运营 AI 代理（Agents）。最大的反无聊机会：送货上门的精选线下体验。工具包、棋类游戏、挑战赛——纯纯的反屏幕（回归现实）产品。精神/心灵领域的最大机会：对归属感的需求正在爆炸式增长，需要新形式的精神集会或心灵聚会。最大的健康机会：由个人主动管理的“长寿生物标志物（Longevity Biomarkers）”。最大的移动端机会 (注：原文此处与第4条重复)：行动导向型 App，替你办事，而不是让你盯着看。解决 AI 垃圾（AI Slop）的最大机会：证明你是“真人”的数字身份验证。未来两年内，每个平台都需要这个功能。最大的基础设施（Infrastructure）机会：AI 代理（Agents）的权限管理、安全防护与审计追踪。最大的媒体机会：AI 原生媒体公司。先做内容和渠道积累粉丝（Distribution），后续再变现卖产品。最大的育儿机会：家庭事务自动化运营。处理各种表格、日程排期和后勤对接。最大的财税会计机会：按每笔交易/流水计费的记账 AI 代理。最大的时尚行业机会：品牌自营的二手转售平台。每个品牌都想把控自己的二级市场。最大的兴趣爱好机会：纯粹为了快乐的成人学习。比如陶艺、木工、绘画。最大的护肤机会：家用肤质诊断。拍照扫描、获取定制方案、持续追踪进展。最大的农业机会：面向小型农场的精准农业工具。大型企业级版本早就有了，但家庭农场还没有。最大的灭虫服务机会：订阅制的“预防性”害虫防治，而不是出了问题才解决。这也是草坪护理行业早就验证过的商业模式转型。最大受合规限制领域的机会：端侧 AI（On-device AI）。只要数据保留在本地，医疗、法律、金融等敏感领域的空间就会彻底打开。最大的游戏机会：拥有真实记忆和人际关系的 AI 游戏角色（NPC）。最大的婚恋机会：由 AI 代理（Agents）介入并撮合的相亲匹配。最大的健身机会：每天根据身体状况重写训练计划的自适应 AI 教练。最大的旅游机会：自主、全自动的行程规划与退改签系统。最大的食品/饮食机会：基于血液检测和肠道菌群数据的个性化定制营养指南。最大的宠物行业机会：宠物健康监测。这是一个 1400 亿美元的巨大市场，但目前几乎没有技术介入。最大的国防/安全机会：AI 原生的安全与合规工具。最大的机器人机会：具身智能/物理 AI（Physical AI）。在现有硬件上装一个价值 30 美元的“AI 大脑”。最大的怀旧机会：能带来模拟时代质感（Analog）的产品。黑胶、纸张、纯手工。作为对“一切皆 AI”的一种对立反叛。

GREG ISENBERG

@gregisenberg

about 1 month ago

The 36 BIGGEST startup opportunities right now 1. biggest b2c: solving loneliness. third spaces, community apps, IRL 2. biggest b2b: managed AI employees for businesses 3. biggest overlooked: elder tech. 70 million boomers who want products that make them happier & healthier 4. biggest mobile: action apps that do things, not apps you stare at 5. biggest trades: matching platforms for electricians, plumbers, HVAC. supply shrinking 6. biggest consumer social: small social. group chats as products, no feeds, no ai slop 7. biggest ecommerce: agents that recommend products you'll like, shop, buy for you 8. biggest creator: live shows and unscripted content 9. biggest edtech: AI tutors that adapt through conversation 10. biggest SaaS: pay-per-outcome pricing 11. biggest auto: AI service advisor for dealerships. answers the same 15 questions 24/7 12. biggest talent: training non-technical people to operate agents 13. biggest boredom: curated offline experiences delivered to your door. kits, games, challenges. anti-screen products 14. biggest spiritual: the need for belonging is exploding, new formats of spiritual get togethers 15. biggest wellness: longevity biomarkers you actively manage 16. biggest mobile: action apps that do things, not apps you stare at 17. biggest one to solve ai slop: digital verification that you're a real human. every platform will need this within 2 years 18. biggest infrastructure: agent permissions, security, audit trails 19. biggest media: AI native media companies. build distribution, sell products later. 20. biggest parenting: family ops automation. forms, scheduling, logistics 21. biggest accounting: bookkeeping agents that charge per transaction 22. biggest fashion: brand-owned resale. every brand wants to control their secondary market 23.biggest hobbies: adult learning for joy. pottery, woodworking, drawing. 24. biggest skincare: at-home diagnostics. scan, get a protocol, track progress 25. biggest agriculture: precision farming tools for small farms. enterprise version exists, family farm doesn't 26. biggest pest control: subscription pest prevention instead of reactive treatment. the model flip that lawn care already made 27. biggest regulated: on-device AI. healthcare, legal, finance open up when data stays local 28. biggest gaming: AI characters with real memory and relationships 29. biggest dating: agent-mediated matchmaking 30. biggest fitness: adaptive coaching that rewrites your program daily 31. biggest travel: autonomous trip planning and rebooking 32. biggest food: personalized nutrition based on blood work and gut biome 33. biggest pet: health monitoring. $140B industry, almost no tech 34. biggest defense: AI-native security and compliance tools 35. biggest robotics: physical AI. $30 brains on existing hardware 36. biggest nostalgia: products that feel analog. vinyl, paper, handmade. counter-positioning against AI everything

254

421

346K

243

Underleveled Builder

@UnderleveledDev

about 1 month ago

@FuSheng_0306 这你也信？对虚伪的东西不敏感我真的怀疑你的判断力了。这要不是表演出来的我就是秦始皇，现在入股998元人民币，立即获得阿房宫永久使用权。

922

Underleveled Builder

@UnderleveledDev

about 1 month ago

https://t.co/LOuJYeEcl6

Underleveled Builder

@UnderleveledDev

about 1 month ago

https://t.co/7L74Hif6Ph

Underleveled Builder

@UnderleveledDev

about 1 month ago

近期思考：转向长时序任务今年最有可能实现的突破在于长时序任务。我们正在迈向这样一个阶段：大型语言模型（LLM）通过与 Agent 环境持续交互，学会完成漫长而复杂的使命。这或许才是 LLM 真正的价值所在。以网络安全为例：想象一个模型能够持续不断地狩猎软件漏洞和缺陷。虽然听起来像是在“搜索”，但实际上，它正在学习专业黑客的高阶直觉与方法论。与人类不同，AI 可以 24/7 不间断运行，永不疲劳。它有望以远高于人类的频率发现可利用漏洞，并在 HackerOne 或 BugCrowd 等平台上领取赏金。听起来很有趣，但本质上，这是一场革命——它将取代黑客。如果连黑客都被“颠覆”了，我们不难想象它对普通程序员的影响。从一人公司到无人公司基于长时序能力，自主智能体系统（Autonomous Agent Systems，AAS）必将成为下一个前沿。去年我们还在讨论“一人公司”（One Person Company，OPC）的崛起，没想到这么快就转向了“无人公司”（None Person Company，NPC）。这真是个讽刺的转折——我们可能都将在这个新生态中变成 NPC。用工程实现“不可能”：记忆与学习要实现上述愿景，必须攻克三大技术支柱：记忆（Memory）、持续学习（Continual Learning）和自我判断（Self-Judging）。我曾以为这些需要巨大的范式转变和多年的科研积累。然而，技术和应用端的压力如此巨大，我们正通过巧妙的工程“技巧”让这些能力快速浮现：记忆：超长上下文窗口（1M+）结合 RAG，已大幅弥合了差距。持续学习：真正的持续学习仍很困难，但模型发布周期正在急剧缩短。全球模型每月更新一次，国内模型也在快速跟进。如果明年能达到每周更新，实际上就相当于实现了持续学习。自我判断：这是目前最难的部分，但 Opus 4.7 等模型已展现出早期自我纠错和判断能力。自我进化的终局最困难、也最有前景的路径是自我进化。当前浪潮极为凶猛。我猜测，Claude 等模型可能已经具备了自我训练的基础：自己写代码、清洗数据、生成合成数据，然后用这些数据进行训练。它可能会“浪费”一些算力，但却节省了最宝贵的资源——人力和时间。在 LLM 时代，速度就是一切。快速迭代正是拉开领先者和跟随者认知差距的关键。Claude 传闻明年将拥有 200 万张芯片的集群，很可能正是用于这种自主模型自我训练。技术总结： 1M 上下文 → 必要基线记忆 & 持续学习 → 前提条件，最可能先通过“工程技巧”解决利用环境（Harnessing Environments） → 突破点自我判断 → 临界点完全自我训练 → 终局重新定义 AGI 与产业如果这就是通往 AGI 的道路，那么 AGI 的定义应该被重新界定为全体人类集体智能的总和，而非单个个体的智能。它必须具备创造出如“相对论”这般深刻成果的能力——达到 Hassabis 所设定的标准。在这个转型过程中，每一个 App 都需要被重构为 AI 原生（AI-native）。事实上，我们可能将彻底告别“App”这个概念。最大的挑战将是操作系统本身的彻底重构。未来，你不会再看到传统的桌面，而是看到一个 LLM OS——应用按需即时生成。这将挑战已有 80 年历史的冯·诺依曼架构，意味着整个计算机科学产业的全面颠覆。不可逆转的浪潮从完成长时序任务到实现完全自主运行，安全、金融、法律、电商等每一个领域都将被重塑。最近不少朋友联系我，问如何转型企业以跟上 AI 步伐。但很少有人真正意识到，这场不可逆的过程已经开始。当这股巨大的技术浪潮来临时，我们既要做好行动准备，也必须严肃思考如何对其进行监管。

jietang

@jietang

about 1 month ago

Recent thoughts: The Shift to Long-Horizon Tasks The most likely breakthrough this year will be in long-horizon tasks. We are moving toward a stage where Large Language Models (LLMs) learn to complete extended, complex missions by interacting with Agent environments. This is perhaps where the true value of LLMs lies. Take cybersecurity as an example: imagine a model that continuously hunts for software bugs and vulnerabilities. While it sounds like a search process, it’s actually the model learning the high-level intuition and methodology of a professional hacker. Unlike humans, AI can run 24/7 without fatigue. It could potentially find exploits at a much higher frequwill ency and claim bounties on platforms like HackerOne or BugCrowd. It sounds fun, but fundamentally, it's a revolution that displaces the hacker. If even hackers are being "disrupted," one can only imagine the impact on general programmers. From One-Person to None-Person Companies Building on long-horizon capabilities, Autonomous Agent Systems (AAS) will inevitably become the next frontier. Last year, we were discussing the rise of the "One Person Company" (OPC). I didn't expect us to move so quickly toward the "None Person Company" (NPC). It’s an ironic twist—we might all end up as NPCs in this new ecosystem. Engineering the Impossible: Memory and Learning To realize the vision above, we must solve three technical pillars: Memory, Continual Learning, and Self-Judging. I used to think these would require massive paradigm shifts and years of research. However, the pressure from both the technical and application sides is so intense that we are seeing these capabilities emerge through ingenious engineering "tricks": Memory: Long context windows (1M+) and RAG have significantly bridged the gap. Continual Learning: While true continual learning remains difficult, the release cycles are shrinking. Global models are updated monthly; domestic models are catching up. If we reach weekly updates by next year, it will effectively function as continual learning. Self-Judging: This remains the most elusive, yet models like Opus 4.7 are already demonstrating early self-correction and judgment capabilities. The Self-Evolving Endgame The most difficult—and most promising—path is Self-Evolution. The current wave is incredibly fierce. I suspect that models like Claude may have already achieved a baseline for self-training: writing their own code, cleaning their own data, generating synthetic data, and then training on it. It might "waste" some compute, but it saves the most precious resources: human labor and time. In the LLM era, speed is everything. Rapid iteration is what creates the cognitive gap between leaders and followers. Claude’s rumored 2-million-chip cluster for next year is likely dedicated to exactly this: autonomous model self-training. Technical Summary: 1M Context: Necessary baseline. Memory & Continual Learning: Prerequisites, likely solved first via "tricky" engineering. Harnessing Environments: The breakthrough point. Self-Judging: The tipping point. Full Self-Training: The endgame. Redefining AGI and the Industry If this is the road to AGI, then AGI’s definition should be the sum of all human collective intelligence, not just an individual’s intelligence. It must possess the creative capacity to produce something as profound as the "Theory of Relativity"—meeting the bar set by Hassabis. During this transition, every APP will need to be reconstructed as AI-native. In fact, we might move past the concept of APPs entirely. The most significant challenge will be the reconstruction of the operating system itself. In the future, you won’t see a traditional desktop; you will see an LLM OS, where applications are "generated on demand." This challenges the 80-year-old Von Neumann architecture and represents a total upheaval of the computer science industry. The Irreversible Wave From completing long-horizon tasks to fully autonomous operations, every sector—Security, Finance, Law, E-commerce—will be reshaped. Many friends have reached out lately, asking how to transform their enterprises to keep pace with AI. But few truly realize that this irreversible process has already begun. As this massive technical wave hits, we must be prepared to act, but we must also start thinking seriously about how to regulate it.

753

147

539

193K

Underleveled Builder

@UnderleveledDev

about 2 months ago

https://t.co/76nUbYeKfF

Underleveled Builder

@UnderleveledDev

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users