Feng Peng @Feng - Twitter Profile

Feng Peng

@feng

1 day ago

Damn, how can Gemini hallucinate on such a simple question?

0

38

Feng Peng

@feng

1 day ago

Totally agree that Opus 4.5 will be the cutoff. When there is an OSS model that can function as well as Opus 4.5, we will see a lot of local instances set up.

Mitchell Hashimoto

@mitchellh

1 day ago

We've gone really quickly from "local models are dogshit" to "local models are good actually" (like, a 12 month window from A to B). I don't think they're actually good ENOUGH yet. We need an Opus 4.5 quality local model. When that happens, I think the world will spill over. Opus 4.5 is/was amazing, and is more than good enough for almost all tasks still as long as you pair with a frontier-level planner/judge. It'll still require a hugely expensive machine to run it, I'm sure, like a $5K or more laptop or mac studio. But, that's going to be pennies compared to the API costs plus all the benefits of guaranteed privacy and so on.

172

4K

192

595

225K

0

49

Feng Peng

@feng

1 day ago

It is a practical world. Nothing else matters if your core competency is so strong. Because it is easy to catch up on everything else.

feng's tweet photo. It is a practical world. Nothing else matters if your core competency is so strong.

Because it is easy to catch up on everything else. https://t.co/X8q7GvOG6w

0

33

feng retweeted

Matei Zaharia @matei_zaharia

5 days ago

Really excited to open source a new project: Omnigent, a meta-harness for AI agents. It lets you build multi-agent coding and custom agents, sitting above Claude Code, Codex, Pi, and agent SDKs to let you compose them. It also adds live collaboration and rich control policies.

matei_zaharia's tweet photo. Really excited to open source a new project: Omnigent, a meta-harness for AI agents.

It lets you build multi-agent coding and custom agents, sitting above Claude Code, Codex, Pi, and agent SDKs to let you compose them. It also adds live collaboration and rich control policies. https://t.co/jwFmH8nHsZ

85

1K

200

985

197K

Who to follow

Utkarsh Srivastava

@utkarsh

Leading engineering at Parallel Web Systems

Machines that learn, Web Search, Ads, Social media, Twitter, Microsoft, Metallica, Matrix, Blondie24 - all in a blender ...

Feng Peng

@feng

5 days ago

The harness of harnesses is here (from Databricks), things are moving fast: https://t.co/Lzy3QcJYuZ

Feng Peng

@feng

15 days ago

Microsoft is releasing a coding model: MAI-Code-1-Flash is an inference-efficient agentic coding model. This model is tailor-made for and deeply integrated into GitHub Copilot, VS Code and the Microsoft stack, and, with 5 billion parameters, is comparable to Haiku but cheaper. (https://t.co/S8xF5l6wea) Why should we be bound to a single harness? I think we need a coordinator to run multi-harness with auto-configurable backend models; let's call it harness-of-harness engineering, or harnessness.

0

1

0

278

0

1

0

101

Feng Peng

@feng

13 days ago

Coding agents are good at static stuff that can be verified via syntax check, but very bad at understanding complex runtime semantics (data or system components). I don't see this problem being solved by LLMs themselves, but a good harness should be able to help.

Moritz Wallawitsch

@MoritzW42

13 days ago

holy shit - their api is leaking customer data

174

4K

222

1K

2M

0

1

0

132

Feng Peng

@feng

15 days ago

Microsoft is releasing a coding model: MAI-Code-1-Flash is an inference-efficient agentic coding model. This model is tailor-made for and deeply integrated into GitHub Copilot, VS Code and the Microsoft stack, and, with 5 billion parameters, is comparable to Haiku but cheaper. (https://t.co/S8xF5l6wea) Why should we be bound to a single harness? I think we need a coordinator to run multi-harness with auto-configurable backend models; let's call it harness-of-harness engineering, or harnessness.

0

1

0

278

feng retweeted

Derek Thompson

@DKThomp

18 days ago

This has quietly been a miracle month in medicine. In the last 5 weeks we’ve got news on: - retatrutide, the triple agonist GLP-1 from Lilly, basically melting fat and body-wide inflammation at record levels - RevMed’s new pancreatic cancer drug showing unprecedented abilities to extend life - small trial of a one-and-done PCSK9 gene editing therapy for slashing LDL cholesterol - Mayo’s AI-assisted radiology showing vastly improved cancer detection - this new therapy for metastatic solid tumors This stuff is at varying levels of evidence. Retatrutide is ~100% on its way, other stuff needs more clinical trial data. But put it together and we’re maybe on the verge of majorly reducing the mortality of heart disease and cancer, the two leading causes of death in America.

211

12K

2K

5K

3M

Feng Peng

@feng

16 days ago

For most simple application-layer programs, I don't see any need for any other languages at all now. TypeScript will be almost the only one used very soon. Never saw this one coming. For the backend, C -> C++ -> Java -> Ruby -> Node(JS) -> Java -> Go -> Python -> now back to Node(TS), lol.

0

57

Feng Peng

@feng

20 days ago

Showing off token-used is basically the same as showing LOC, which is just very junior.

Gergely Orosz

@GergelyOrosz

20 days ago

I can now probably say this: Two months ago, inside Anthropic someone suggested building a token leaderboard. A heated internal debate followed and the decision was made to *never* ever do it… because several people inside Anthropic simply thought ahead of the consequences

171

8K

306

1K

1M

0

98

Feng Peng

@feng

20 days ago

@ziad_makes Nah, you can try yourself. It is pretty easily reproducible.

0

8

Feng Peng

@feng

21 days ago

LOL, when you ask Claude which model it is through the API, its answer is "Qwen" when the question is in Chinese and "Claude" when the question is in English. LLMs are definitely Bayesian; anyone saying they are intelligent is just wrong.

feng's tweet photo. LOL, when you ask Claude which model it is through the API, its answer is "Qwen" when the question is in Chinese and "Claude" when the question is in English.

LLMs are definitely Bayesian; anyone saying they are intelligent is just wrong. https://t.co/fJmYpMzg6X

1

12

0

1

2K

Feng Peng

@feng

20 days ago

Opus 4.8 has been pretty impressive, solving quite a few tasks that 4.7 and GPT-5.5 ran circles. But apparently it can't find its own bugs. Claude Code has been quite buggy recently, but this one is especially annoying since it is not recoverable, meaning the tokens used before the bug are wasted. Anyone who has worked on agent loops should easily see where the bugs are coming from. Agent-assisted coding still has a long way to go for sure.

feng's tweet photo. Opus 4.8 has been pretty impressive, solving quite a few tasks that 4.7 and GPT-5.5 ran circles.

But apparently it can't find its own bugs. Claude Code has been quite buggy recently, but this one is especially annoying since it is not recoverable, meaning the tokens used before the bug are wasted.

Anyone who has worked on agent loops should easily see where the bugs are coming from. Agent-assisted coding still has a long way to go for sure.

0

1

0

82

Feng Peng

@feng

20 days ago

Damn, hard things are hard. Great that everyone is safe.

Chris Hadfield

@Cmdr_Hadfield

20 days ago

Very glad everyone is safe. An extremely bad night for everyone @blueorigin, and those counting on them. My heart goes out to all. Hopefully the cause of the explosion can be found swiftly, and the launchpad rebuilt on an accelerated timeline.

62

3K

241

126

551K

0

3

0

185

Feng Peng

@feng

28 days ago

The boring stuff: security, response time, uptime, and fault tolerance are actually important.

GitHub

@github

29 days ago

We are investigating unauthorized access to GitHub’s internal repositories. While we currently have no evidence of impact to customer information stored outside of GitHub’s internal repositories (such as our customers’ enterprises, organizations, and repositories), we are closely monitoring our infrastructure for follow-on activity.

2K

25K

5K

6K

14M

0

124

Feng Peng

@feng

about 1 month ago

@_brian_johnson Definitely! I hope the OSS models can get good enough soon.

0

19

Feng Peng

@feng

about 1 month ago

Claude Code somehow reset my weekly quota this morning, but did not change the refresh date. So much money wasted; I tried my best, lol😭

feng's tweet photo. Claude Code somehow reset my weekly quota this morning, but did not change the refresh date. So much money wasted; I tried my best, lol😭 https://t.co/8eLvfGa8zs

1

0

169

Feng Peng

@feng

about 1 month ago

Thankfully, Codex is at least as good as Claude Code these days. Can't say for sure though, their quality do fluctuate quite a bit.

0

1

0

258

Feng Peng

@feng

about 1 month ago

I am not sure why anyone would want to limit themselves to a single coding agent atm. The coding tool war is still very early; I bet Claude Code will have some promotions very soon. Also, Copilot and Gemini are getting better from my experience with them once in a while. They don't have any big promotions because they are not good enough right now.

Sam Altman

@sama

about 1 month ago

codex is the best AI coding product and we want to make it easy to try. for the next 30 days, we are giving companies that want to try switching over two months of free codex usage.

2K

21K

881

4K

2M

1

2

0

228

Feng Peng

@feng

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users