@DeRonin_ If the app can easily make 3k per month then why would the owner sell for 3k? The valuation implies a 0.09 PE ratio that is ridiculously cheap
@AgustinLebron3@METR_Evals The graph doesn't mean that Claude would run for 14.5 hours, but that humans would take that amount of time to complete that task.
@karpathy@swyx @YorkieBuilds @Grad62304977@tejalpatwardhan What if people are not fired due to safety reasons? That is, we know they are as smart as us but maybe we are not sure they are as trustworthy as us? Does AGI includes the alignment problem?
@VictorTaelin On cursor my biggest complain about GPT-5 is that it is much slower to finish the tasks than Claude. It seems that openai didn't focus as much on making it work well on cursor than anthropic. Maybe that is not the case on codex?
@scaling01 Everyone is holding back. Why be 50% better in the short term with a more expensive model and then lose in long term because you did not allocate enough compute on training or using the more expensive model to teach the next generation?
Drastic progress on maths with Gemini 2.5! As a math undergrad, I am impressed 🤯
🥈 -> 🥇 ✅
Formal -> Informal ✅
Specialized model -> General model ✅
Available soon ✅
Huge thanks to IMO and congrats to all participants!
Blog: https://t.co/cW9pvFlZtN
As always, Americans will be surprised when a deep seek moment happens for ASML tech — but they shouldn’t be. I don’t think any technology is so advanced that a well-funded, motivated group of smart individuals couldn’t eventually replicate it. China checks all those boxes.
@deedydas If an eng take 12 minutes on average to review a commit then it will take 5 hours to review 50. If engineers on the team are taking code review seriously then 50 commits per eng/day getting merged seems unrealistic.
@danshipper What if you ask it "what is the best next question that I should ask?" ?
If it can answer any question better than you then it will also answer the above better than you what contradicts the second part of your tweet.
@arjunbhuptani I think humans will still be valuable for auditing AI systems and actions. We cannot and should not fully trust AIs to carry tasks that take days to finish without some kind of auditing and monitoring. https://t.co/FwzPrNu0xf
Last remaining jobs:
1. Teaching AIs: Labeling data, creating hard questions, evaluating answers, etc
2. Auditing AI: We won't let AIs simply take over everything since we will never fully trust them. Code reviewers of deterministic theorem provers about AI are included here.