AGI for Fun

@LJYtrader

Joined November 2012

765 Following

25 Followers

76 Posts

LJYtrader retweeted

Min Choi

@minchoi

3 days ago

NVIDIA just dropped SANA-Streaming. AI can now edit minute-long video in real time. Clothes. Backgrounds. Styles. Scenes. All while the video plays.

374

324

42K

LJYtrader retweeted

Muratcan Koylan

@koylanai

22 days ago

'Agent Harness Engineering: A Survey' just cited my Agent Skills for Context Engineering project in its Context & Memory Management section. It’s a new paper on OpenReview (authors from CMU, Yale, Johns Hopkins, Amazon + others). They reviewed 170+ open-source projects and pulled real production lessons from OpenAI, Anthropic, and LangChain. Agent performance in the real world = Model capability + Harness quality For long-horizon, multi-step, production tasks, the harness has become the main bottleneck. Simple harness tweaks (better tool formats, sandbox changes, automated verification loops) deliver significant gains on benchmarks. This is the second time my open-source work has been cited in academic research (first was Peking University’s State Key Lab paper on meta context engineering). I’m genuinely proud of that, but more than anything it reminds me why I love open source. I’m not from academia. I learned this field by building, shipping, writing... Open source lets your experiments enter the research papers. That is still one of the best parts of this field. The paper is worth reading. We're moving from “build one agent” to “operate a fleet of long-running agents” and the paper repeatedly shows that the biggest improvements come from turning production traces into regression tests and automated harness fixes. Paper & Repo: https://t.co/PAjqvOXedL

koylanai's tweet photo. 'Agent Harness Engineering: A Survey' just cited my Agent Skills for Context Engineering project in its Context & Memory Management section.

It’s a new paper on OpenReview (authors from CMU, Yale, Johns Hopkins, Amazon + others). They reviewed 170+ open-source projects and pulled real production lessons from OpenAI, Anthropic, and LangChain.

Agent performance in the real world = Model capability + Harness quality

For long-horizon, multi-step, production tasks, the harness has become the main bottleneck. Simple harness tweaks (better tool formats, sandbox changes, automated verification loops) deliver significant gains on benchmarks.

This is the second time my open-source work has been cited in academic research (first was Peking University’s State Key Lab paper on meta context engineering).

I’m genuinely proud of that, but more than anything it reminds me why I love open source. I’m not from academia. I learned this field by building, shipping, writing...

Open source lets your experiments enter the research papers. That is still one of the best parts of this field.

The paper is worth reading. We're moving from “build one agent” to “operate a fleet of long-running agents” and the paper repeatedly shows that the biggest improvements come from turning production traces into regression tests and automated harness fixes.

Paper & Repo: https://t.co/PAjqvOXedL

718

144

835

39K

LJYtrader retweeted

LanceDB

@lancedb

22 days ago

1/ World model research is fragmented: every paper reimplements its own data pipeline, baselines, and eval harness. Comparing two methods fairly is weeks of infra work. 𝘀𝘁𝗮𝗯𝗹𝗲-𝘄𝗼𝗿𝗹𝗱𝗺𝗼𝗱𝗲𝗹 is a new open-source platform that standardizes the whole thing: https://t.co/Gg3V3LhKJr

lancedb's tweet photo. 1/ World model research is fragmented: every paper reimplements its own data pipeline, baselines, and eval harness. Comparing two methods fairly is weeks of infra work.

𝘀𝘁𝗮𝗯𝗹𝗲-𝘄𝗼𝗿𝗹𝗱𝗺𝗼𝗱𝗲𝗹 is a new open-source platform that standardizes the whole thing: https://t.co/Gg3V3LhKJr

18K

LJYtrader retweeted

Macro_Lin ｜市场观察员

@LinQingV

about 1 month ago

长鑫存储扩产的设备供应链全景长鑫现在合肥两座、北京一座，三座12英寸厂。合肥一厂11万片月产能，二厂8万片，北京厂7万片，加起来接近30万片，全部满产。两年涨了三倍，2024年初差不多10万片，2025年底冲到28到30万，2026年目标稳在30万。按晶圆片数算全球DRAM占比已经接近15%，但按销售额算只有3.97%，片数堆上来了，单价还在追赶。工艺端，G4在16纳米已经量产，DDR5良率从2024年底的80%升到现在的90%区间。G5对应15纳米，2026年底量产。HBM2去年下半年就上了，比外界预期提前两年，主要供华为昇腾910。HBM3的前段晶圆产能主要在合肥和北京现有基地内部腾出来，目标月产6万片，约占30万片总产能的20%。后段的堆叠和封装由上海新建的HBM封测厂承接，2026年底投产。产能还在继续扩。上海新厂分两期，Phase 1产能10万片，2027年初量产，Phase 2同样10万片，2028年初量产。加上现有的30万片，远期产能目标是奔着50万片以上去的。华为也在建一条10万片规模的产线，专门配合长鑫做昇腾系列所需的HBM3和LPDDR5X，2026年下半年开始全面量产。 IPO募资295亿，DRAM技术升级拿了130亿，量产线升级75亿，前瞻技术研发90亿。多家券商纪要提到其中约200亿会直接变成设备采购订单，叠加2024年712亿的存量Capex节奏，是国产设备厂未来两年最确定的基本盘。长鑫正在进行的大规模扩产，设备采购需求非常可观。每个品类的国产化进展差异很大。光刻是最大的对外依赖，国产化率连5%都不到。主力机器是ASML的NXT:1980Di，每小时出275片，覆盖到16纳米节点，目前还能正常采购。被荷兰管制的是NXT:2050i及以上，每小时295片，2023年9月起要许可证，2024年1月起对华许可基本停发。修正一个流传很广的说法，275到295片是1980Di到2050i的代际跳变，不是1980系列内部的迭代。长鑫往15纳米以下走，这道墙绕不开。上海微电子的28/14纳米DUV还在攻关，光刻这一环短期无解。刻蚀占设备投资25到30%，是国产化最深的核心工艺。北方华创的ICP在长鑫产线上市占超过50%，中微的CCP介质刻蚀机做到50比1以上深宽比，专门用于HBM的TSV深硅通孔。但存储节点100比1深宽比的极端工艺，孔洞必须互相平行规整，任何偏差直接砸良率，Lam和东京电子在这个区间仍然是主力。薄膜沉积占约25%，品类分得细。PECVD做绝缘层，PVD做金属互连，ALD做电容器的高k介电层。拓荆科技的PECVD已经批量进入长鑫DDR5和LPDDR5产线，北方华创的PVD覆盖铝铜溅射和氮化钛溅射。但DRAM电容器核心的高k ALD设备，应用材料和ASM的位置很难动，拓荆的ALD在验证爬坡中还没有大规模上量。整体看PECVD和PVD的国产化率高一些，ALD最低。 CMP只占设备投资5到7%。华海清科占国产CMP装备销售90%以上份额，12英寸Universal-300已经导入长鑫。新变量是中微4月29日刚过会的收购杭州众硅，6抛光盘架构是国际首创，效率比主流4盘方案高一截。国产CMP从单点供应正式进入双供。清洗国产化率30到40%，盛美上海的SAPS/TEBO兆声波清洗覆盖FinFET和DRAM的16到19纳米制程。热处理40到50%，北方华创立式氧化炉打头，激光退火端莱普科技两年内国内市占从3%涨到16%，跟长鑫和长存共同开发匹配DRAM工艺的设备。这几个环节已经跑通了。最薄弱的三个环节是量测检测、涂胶显影和离子注入。量检测国产化率个位数，KLA全球占51到54%，精测电子进了长鑫12英寸产线，中科飞测也在部分环节往里切，但高端光学检测和电子束检测差距仍然明显。涂胶显影国产化率仅4%，东京电子在大陆市占超过90%，芯源微是唯一量产替代，目前只在封装端站住了，前道关键层还进不去。离子注入同样个位数，万业凯世通累计交付40多台。弹性最大的三个环节，也是短期内最动不了的三个环节。长存三期产线2026年一季度国产设备占比首次过50%，目标100%。长鑫设备国产化率已突破45%，但仍低于长存三期产线的水平。根源在工艺。DRAM的电容刻蚀和光刻精度对先进DUV的依赖远高于3D NAND。NAND可以靠堆层数绕过光刻瓶颈，DRAM要正面硬刚，结构性差异，跟意愿无关。外部约束在收紧。4月22日美国众议院外交事务委员会投票通过了MATCH法案，路透社称之为"国会史上最大规模的半导体出口管制立法审议"。法案把长鑫、中芯国际、长江存储、华为、华虹一起列入covered facility，禁止ASML的DUV浸没式光刻机对长鑫出口，禁止盟国为既有设备提供维保，要求荷兰和日本在150天内对齐美方规则。主要推手是美光。一旦通过，NXT:1980Di这条最后的灰色通道就堵死了。长鑫本身还没进BIS实体清单，但设备进口其实早就在2022年10月那波18纳米以下DRAM工艺限制的笼子里。 45%的国产化率意味着长鑫在两条腿走路，海外设备保良率，国产设备保供应链安全。MATCH法案的压力反而在加速这个进程。长鑫每往国产设备多切一个百分点，对应的就是北方华创、中微、拓荆、华海清科、盛美这些公司实打实的订单增量。上海新厂分两期共20万片的增量产能，加上现有产线的持续技术升级，未来两到三年国产设备厂面对的是一个确定性极高的需求窗口。从更长的时间尺度看，长鑫的扩产不只是一家公司的资本开支计划。它每走通一个工艺节点，整条12英寸国产设备平台就多一次被验证、被复用的机会。刻蚀和CMP已经证明了这条路径，薄膜沉积和清洗正在跟进，量测和涂胶显影是下一个要啃的硬骨头。这个验证循环一旦转起来，国产设备的竞争力会以远超线性的速度积累。

184

500

948K

Who to follow

Raey Lewis

@Raeylewis

#Bitcoin and #Ethereum hodler STREET GENERAL 🐺WOLF OF W(ALL) STREETS. CRYPTO ACCUMULATER #FreePalestine #FreeGaza

James Liu 大猫和他的朋友们

@Ourtravelpals

Two roads diverged in a wood, and I— I took the one less traveled by, and that has made all the difference.

なつめゆう☆銀座/下北沢占いサロンTAO出演

@yunatsume25

明るい未来探しのお手伝い、未來コーディネーター(占い師)なつめゆうです。【占術】西洋占星術、タロット【出演】占いサロンTAO ☆銀座店 : 毎週木曜日、第二土曜日 ☆下北沢店 : 第二、第四月曜日【個別鑑定・タロット講座】問合せ : [email protected]

LJYtrader retweeted

Lun Wang

@lunwang1996

about 1 month ago

I’ve left Google DeepMind after an amazing chapter. I’m incredibly grateful for the people I worked with, the things we built, and the lessons I learned from taking frontier AI research into production. DeepMind shaped how I think about research, product, evaluation, and what it takes to build AI systems at real scale. As I wrap up this chapter, I wrote down something I’ve been thinking about a lot: evals. We’re good at evaluating the models we have. We’re much worse at evaluating the models we’re about to build — especially if they cross into a new capability regime. We will have self-evolving models, but before that, we need self-evolving evaluations. https://t.co/F1lUWxDG2D

199

616K

LJYtrader retweeted

Nathan Lambert

@natolambert

about 1 month ago

Visiting most of the leading Chinese AI labs, I'm struck by a culture that's extremely well suited to building LLMs with fewer resources, but one happening in a very different ecosystem, more companies at play, almost no data industry, etc. Full report: https://t.co/ibmtMWnfTc

227

679K

LJYtrader retweeted

Andrej Karpathy

@karpathy

about 2 months ago

Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights: The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new horizons: 1. menugen: an app that can be fully engulfed by LLMs, with no classical code needed: input an image, output an image and an LLM can natively do the thing. 2. install .md skills instead of install .sh scripts. Why create a complex Software 1.0 bash script for e.g. installing a piece of software if you can write the installation out in words and say "just show this to your LLM". The LLM is an advanced interpreter of English and can intelligently target installation to your setup, debug everything inline, etc. 3. LLM knowledge bases as an example of something that was *impossible* with classical code because it's computation over unstructured data (knowledge) from arbitrary sources and in arbitrary formats, including simply text articles etc. I pushed on these because in every new paradigm change, the obvious things are always in the realm of speeding up or somehow improving what existed, but here we have examples of functionality that either suddenly perhaps shouldn't even exist (1,2), or was fundamentally not possible before (3). The second (ongoing) theme is trying to explain the pattern of jaggedness in LLMs. How it can be true that a single artifact will simultaneously 1) coherently refactor a 100,000-line code base *and* 2) tell you to walk to the car wash to wash your car. I previously wrote about the source of this as having to do with verifiability of a domain, here I expand on this as having to also do with economics because revenue/TAM dictates what the frontier labs choose to package into training data distributions during RL. You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a machete, in relative terms. Still not 100% satisfied with this, but it's an ongoing struggle to build an accurate model of LLM capabilities if you wish to practically take advantage of their power while avoiding their pitfalls, which brings me to... Last theme is the agent-native economy. The decomposition of products and services into sensors, actuators and logic (split up across all of 1.0/2.0/3.0 computing paradigms), how we can make information maximally legible to LLMs, some words on the quickly emerging agentic engineering and its skill set, related hiring practices, etc., possibly even hints/dreams of fully neural computing handling the vast majority of computation with some help from (classical) CPU coprocessors.

373

810

LJYtrader retweeted

Deedy

@deedydas

about 2 months ago

What do the smartest kids in the world do when they grow up? I did the largest study of ~18,000 International Olympiad medalists (IMO, IOI and IPhO) over the last 25yrs, arguably the sharpest analytical minds of the world in high school, to see where they ended up and traced ~50% of them. Founders of ~20 unicorns and ~7 decacorns and ~10 billionaires: OpenAI, Cursor, Stripe, Databricks, Perplexity, Ethereum, Cognition, Hyperliquid, Fireworks, Modal, Quora, Parallel, Cartesia, Wispr Most kids went to MIT, a whopping 12% of them, followed by Cambridge (7%) and Sharif (3%)! The career paths they chose (of those who graduated) were: — 36% Academia (professors) — 26% Other — 22% in Software / Tech — 12% in Quant / Finance — 5% Founders! The biggest employer was Google, by far, at 6%. Others interesting tidbits were: — 47 of them work at Jane Street (#3) — 38 at OpenAI (#5) — 15 at Anthropic — 8 at Cognition — 6 at Isomorphic Labs Olympiaders were 1500x more likely to be billionaires and 4000x more likely to be unicorn founders than the average person!

deedydas's tweet photo. What do the smartest kids in the world do when they grow up?

I did the largest study of ~18,000 International Olympiad medalists (IMO, IOI and IPhO) over the last 25yrs, arguably the sharpest analytical minds of the world in high school, to see where they ended up and traced ~50% of them.

Founders of ~20 unicorns and ~7 decacorns and ~10 billionaires: OpenAI, Cursor, Stripe, Databricks, Perplexity, Ethereum, Cognition, Hyperliquid, Fireworks, Modal, Quora, Parallel, Cartesia, Wispr

Most kids went to MIT, a whopping 12% of them, followed by Cambridge (7%) and Sharif (3%)!

The career paths they chose (of those who graduated) were:
— 36% Academia (professors)
— 26% Other
— 22% in Software / Tech
— 12% in Quant / Finance
— 5% Founders!

The biggest employer was Google, by far, at 6%.
Others interesting tidbits were:
— 47 of them work at Jane Street (#3)
— 38 at OpenAI (#5)
— 15 at Anthropic
— 8 at Cognition
— 6 at Isomorphic Labs

Olympiaders were 1500x more likely to be billionaires and 4000x more likely to be unicorn founders than the average person!

246

913

LJYtrader retweeted

Andrej Karpathy

@karpathy

6 months ago

https://t.co/Lb6T42n5jl

364

16K

18K

LJYtrader retweeted

Jathushan Rajasegaran

@jathushan

6 months ago

If you ever feeling down, watch this! Much better than therapy!

356

179

38K

AGI for Fun @LJYtrader

8 months ago

@laobaishare @grok 事实检查

14K

LJYtrader retweeted

The AI Investor

@The_AI_Investor

8 months ago

Goldman Sachs - AI unit economics per 1m tokens $NVDA PT of $210

219

141

41K

LJYtrader retweeted

Adam.GPT

@TheRealAdamG

9 months ago

https://t.co/JnhJ1Ll8eC “How people are using ChatGPT: Largest study to date of consumer ChatGPT usage shows demographic gaps shrinking, economic value being created through both personal and professional use.”

413

223

117K

LJYtrader retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

11 months ago

Diffusion Beats Autoregressive in Data-Constrained Settings Comparison of diffusion and autoregressive language models from 7M to 2.5B params and up to 80B training tokens. Key findings: 1. Diffusion models surpass autoregressive models given sufficient compute. Across a wide range of unique token budgets, AR models initially outperform diffusion models at low compute, but quickly saturate. Beyond a critical compute threshold, diffusion models continue improving and ultimately achieve better performance. 2. Diffusion models benefit far more from repeated data. While AR models can effectively use repeated data for up to 4 epochs, diffusion models can be trained on repeated data for up to 100 epochs with repeated data almost as effective as fresh data. 3. Diffusion models have a much higher effective epoch count. We find for diffusion models compared to for AR models, suggesting diffusion models can benefit from repeated data over far more epochs without major degradation. 4. Critical compute point follows a power law with dataset size. We derive a closed-form expression that predicts when diffusion becomes the favorable modeling choice for any given dataset size. 5. Diffusion models yield better downstream performance. The validation loss improvements translate to consistent gains across diverse downstream language tasks.

iScienceLuvr's tweet photo. Diffusion Beats Autoregressive in Data-Constrained Settings

Comparison of diffusion and autoregressive language models from 7M to 2.5B params and up to 80B training tokens.

Key findings:

1. Diffusion models surpass autoregressive models given sufficient compute. Across a wide range of unique token budgets, AR models initially outperform diffusion models at low compute, but quickly saturate. Beyond a critical compute threshold, diffusion models continue improving and ultimately achieve better performance.

2. Diffusion models benefit far more from repeated data. While AR models can effectively use repeated data for up to 4 epochs, diffusion models can be trained on repeated data for up to 100 epochs with repeated data almost as effective as fresh data.

3. Diffusion models have a much higher effective epoch count. We find
for diffusion models compared to
for AR models, suggesting diffusion models can benefit from repeated data over far more epochs without major degradation.

4. Critical compute point follows a power law with dataset size. We derive a closed-form expression that predicts when diffusion becomes the favorable modeling choice for any given dataset size.

5. Diffusion models yield better downstream performance. The validation loss improvements translate to consistent gains across diverse downstream language tasks.

671

113

477

48K

AGI for Fun @LJYtrader

11 months ago

@andrewgwils I was most interested in posters，but few realized I was a VC, most thought I was another academic. But if I reveal I was a VC, some turned away, almost feeling like you are not going to be reviewing my paper so I care less.

LJYtrader retweeted

Yi Ma

@YiMaTweets

11 months ago

I believe all professors in the field of AI and machine learning at top universities need to face a soul-searching question: What can you still teach your top (graduate) students about AI that they cannot learn by themselves or elsewhere? It had bothered me for quite some years before I finally decided to face it the hard way a couple of years ago.

108

489

148K

LJYtrader retweeted

elvis

@omarsar0

12 months ago

AI for Scientific Search AI for Science is where I spend most of my time exploring with AI agents. This 120+ pages report does a good job of highlighting why all the big names like OpenAI and Google DeepMind are pursuing AI4Science. Bookmark it! My notes below:

omarsar0's tweet photo. AI for Scientific Search

AI for Science is where I spend most of my time exploring with AI agents.

This 120+ pages report does a good job of highlighting why all the big names like OpenAI and Google DeepMind are pursuing AI4Science.

Bookmark it!

My notes below: https://t.co/z2gRcVbnV4

679

146

854

62K

LJYtrader retweeted

Manling Li

@ManlingLi_

about 1 year ago

What is key of agent decision making? Is there a decision making boundary? I am always thinking of the potential boundary of correct decision making and the uncertainty of this boundary. The alignment of decision making boundary and tool-use boundary led by @WangCarrey @qiancheng1231 is a nice way to understand how agent abilities emerge. If you are interested, welcome to talk more with us!

LJYtrader retweeted

Olivia Moore

@omooretweets

about 1 year ago

🚨 Data drop! Our team @a16z published benchmarks on revenue growth for AI startups, from our proprietary dataset The median B2B co is going 0 -> $2.1M ARR in year 1, while the median B2C co is going $0 -> $4.2M (yes, consumer startups are growing revenue faster 🤯)