Ted Li

@FallMonkey

Sampling latent alien Shoggoth; previously @character_ai, @Roblox, all my tweets are hallucinated.

California

Joined July 2009

361 Following

597 Followers

1K Posts

Ted Li @FallMonkey

30 days ago

@corsaren Yeah few people on x realized #2 - such async multi-stream "think + say + tool" has been hacked in various ways across a few different voice consumer products. Still nice to see it coming from a model company which must mean some advancement in model layer.

0

1

0

0

365

Ted Li @FallMonkey

30 days ago

@nrehiew_ Great read - been wondering how some models are much better in the long horizon practical SWE tasks i.e. due to teacher trained on diverse high quality human data or due to crazy amount of compute in stages around OPD. Seems both eventually point to importance of on policy data!

0

0

0

0

256

Ted Li @FallMonkey

about 1 month ago

@dylan_works_ Consolidation itself is okay imo, but the current approach is definitely far from effective, not to mention those "strategic forgetting" approach which gives false impression of mirroring human brain. Maybe the key is to simulate the ICL process to extract the real "experience".

0

1

0

1

226

Ted Li @FallMonkey

about 1 month ago

@eliebakouch Noticed some pretty wild hallucinations today when analyzing code samples and I notice that it’s doing thinking/backtracking directly in the output (not in thinking). Wondering if that has to do with the regression here.

1

1

0

0

185

Who to follow

Verified account

A worldwide Swift conference in the ♥️ of Leeds. Happening again 12-14th Oct. Founded by @adam9rush & Community 🚀 Sponsorship: [email protected] 🫶🏼

Verified account

@Adithya_Murali_

Research Scientist at @NVIDIAAI. I work on robots 🤖 | MIT TR35 | Previously PhD at @CMU_Robotics | @berkeley_ai, @AIatMeta

Verified account

(υ ˵ ˘ ‿ ˘˵)っ━━☆:･ﾟ✧ word fairy @posthog ∴｡°*. ﾟ｡

Ted Li @FallMonkey

about 1 month ago

@teortaxesTex So input:cached:output for v4f is 1:1/50:2 while v4p is 1:1/120:2. That unique ratio is really showing the scenario they want to utility-maxx for, or the nature of their underlying attention optimizations.

0

0

0

0

668

Ted Li @FallMonkey

about 2 months ago

@badlogicgames @deepseek_ai yeah several already raised this. I've forwarded this feedback to their deployment team again so hope they'll fix it quickly

1

3

0

0

1K

Ted Li @FallMonkey

about 2 months ago

@basedjensen When your definition of AGI is letting 1 billion users talk freely with their AI friend who can remember 1 million context. Terrifying execution and focus.

0

6

0

0

834

Ted Li @FallMonkey

about 2 months ago

@teortaxesTex It's a model for the people, sir

0

1

0

0

340

Ted Li @FallMonkey

about 2 months ago

@MParakhin Genuine question - why frame the 35% as a benchmark issue rather than a data-pipeline one? Reverse-engineering from completed workflows structurally can't produce edit trajectories (no starting state) or Q&A (no workflow artifact). Feels like the gap was baked in upstream already

1

2

0

1

301

Ted Li @FallMonkey

about 2 months ago

@yetone 期待，每一个手搓的core loop都是智慧和经验的浓缩

1

0

0

0

252

Ted Li @FallMonkey

about 2 months ago

@michaelyli__ Softmax renormalizes over survivors, so eviction shifts mass onto remaining keys regardless of true relevances. Your scoring avoids evicting high-attention blocks, but doesn't bound this drift. Is the cliff at high compression (fig 3r/8) due to mass redistribution accumulating?

0

3

0

0

518

Ted Li @FallMonkey

about 2 months ago

@arkuy99 不是隐藏，是4.7需要一个新的api参数激活thinking summary，cc”暂时“还没支持，以及新的cc把thinking summary自动扩展也搞坏了（https://t.co/DgPsyWwqr6）

0

1

0

2

1K

Ted Li @FallMonkey

about 2 months ago

@giffmana @sharifshameem Assuming you're already using the "pragmatic" personality in personalization setting in Codex app?

1

0

0

0

136

Ted Li @FallMonkey

2 months ago

@CatChen 的确术语上是挺乱的（你这篇文章，实际的api文档，还有官宣的新闻，这三者的harness都有些不一样）。概念层harness就是那个隔离出来的agent loop，应用层的agent等于子类了一个harness带上了tool，新闻里的harness有点包万物的意思。我个人是不太喜欢harness包一切这个做法，就很容易产生各种误解。

1

0

0

0

31

Ted Li @FallMonkey

2 months ago

@CatChen https://t.co/qQvqOT7KnD 如果参考原文，用prompt和tool创建的是agent，而harness已经是一个被隔离的部件了（猫大你原文也是这么说的）。后面那个很同意，如果已经把关键功能都做好了，的确迁移代价很高，毕竟ant这个目前还有很多黑盒的部分。

1

0

0

0

53

Ted Li @FallMonkey

2 months ago

@teortaxesTex yeah this one is confusing to read i.e. 1) Mythos is indeed memorizing more problems for marginal improv 2) sweb-v and sweb-p have very different leakage shape. Strange to conclude that "memorization does not explain improvements"

0

1

0

0

158

Ted Li @FallMonkey

2 months ago

@DZhang50 But compared models are getting more efficient too. So one could only claim what’s in the chart, up to k2.

0

0

0

0

78

Ted Li @FallMonkey

2 months ago

@DZhang50 Man can only claim what chart shows.

1

0

0

0

101

Last Seen Users on Sotwe

Trends for you

Most Popular Users