my problem is a bit different. i have opportunity cost psychosis. the excruciating effort it still takes to deliver a polished thing is overshadowed by how easy it is to mvp a whole new idea that might have quicker returns. zeno's paradox of creating infinite new repos that are smaller and smaller as i fail to ever approach the perfect project to work on
I can now probably say this:
Two months ago, inside Anthropic someone suggested building a token leaderboard.
A heated internal debate followed and the decision was made to *never* ever do it… because several people inside Anthropic simply thought ahead of the consequences
I'll be speaking at the https://t.co/08Um5C2Ut2 conference on Wed 13.5. about how I'm using Claude Code locally to deliver faster without compromising quality.
If you don't have tickets yet, you can get 10% off with the discount code "PROMO"
@antirez@badlogicgames Anthropic want us to be more reliant on agents for the most basic things, like writing. It will end badly:
> even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupt an average of 25% of document content
https://t.co/jXd64s0D82
recommended reading. strongly so.
> And these layoffs will continue till we learn to use AI. Till we learn to convert AI-tokens into outcomes and not just input.
Researchers sent the same resume to an AI hiring tool twice. Same qualifications. Same experience. Same skills. One version was written by a real human. The other was rewritten by ChatGPT.
The AI picked the ChatGPT version 97.6% of the time.
A team from the University of Maryland, the National University of Singapore, and Ohio State just published the receipt. They took 2,245 real human-written resumes pulled from a professional resume site from before ChatGPT existed, so the human writing was actually human. Then they had seven of the most-used AI models in the world rewrite each one. GPT-4o. GPT-4o-mini. GPT-4-turbo. LLaMA 3.3-70B. Qwen 2.5-72B. DeepSeek-V3. Mistral-7B.
Then they asked each AI to pick the better resume. Every model picked itself.
GPT-4o hit 97.6%. LLaMA-3.3-70B hit 96.3%. Qwen-2.5-72B hit 95.9%. DeepSeek-V3 hit 95.5%. The real human almost never won.
Then the researchers tried the obvious objection. Maybe the AI is just better at writing. So they had real humans grade the resumes for actual quality and ran the experiment again, controlling for it. The result was worse. Each AI kept picking itself even when human judges rated the human-written version as clearer, more coherent, and more effective.
It gets worse. The AIs do not just prefer AI over humans. They prefer themselves over other AIs. DeepSeek-V3 picked its own resumes 69% more often than LLaMA's. GPT-4o picked its own 45% more often than LLaMA's. Each model can recognize and reward its own dialect.
Then the researchers ran the simulation that ends careers. Same job. 24 occupations. Same qualifications. The only variable was whether the candidate used the same AI as the screening tool. Candidates using that AI were 23% to 60% more likely to be shortlisted. Worst gap was in sales, accounting, and finance.
99% of large companies now run AI on incoming resumes. Most of them use GPT-4o. The paper just proved GPT-4o picks GPT-4o 97.6% of the time.
If you wrote your own cover letter this week, you did not lose to a better candidate. You lost to a worse candidate who paid OpenAI 20 dollars.
Your qualifications do not matter if the AI prefers its own handwriting over yours.
Designers have been granted prototyping super powers with AI, but the models are still not good enough to consistently one-shot implementations you'd want to merge to master on large, critical applications like Basecamp without programmer review (or even reimplementation!).
@jonnyzzz Sounds cool! Will give it a try.
One thing I dislike about the current IDEA MCPs is that its painful to use correctly - it forces me to start one MCP for IDEA and one for PyCharm and the agent gets confused what to call when.
Have you managed to solve this?
I've basically immediately gotten used to having access to pwd file tree in the terminal, and being able to *comfortably* preview files (with markdown rendering) without having to jump to another app. Truly a modern terminal!
Good job @warpdotdev!
TIL:
- the basic @warpdotdev terminal is free (I thought it was paid)
- Warp is awesome (built-in file editor, git changes review, QoL stuff for Claude Code and other harnesses)
- Global hotkey https://t.co/yCkDgOIljZ, which means I can replace my main terminal with it
😍🤩
TIL:
- the basic @warpdotdev terminal is free (I thought it was paid)
- Warp is awesome (built-in file editor, git changes review, QoL stuff for Claude Code and other harnesses)
- Global hotkey https://t.co/yCkDgOIljZ, which means I can replace my main terminal with it
😍🤩
Working at Anthropic must be like being on crack. Get paid a million bucks a year to --dangerously-skip-permissions vibe your way to releasing a new product every day.
Does it work? not really. Is it reliable? also no. It doesn't matter, you're building the machine god.
Both me and my wife uploaded the same photo of a cut on my son's head and asked GPT whether or not to take him to the hospital.
GPT told me it was fine to watch it and wait and told my wife he definitely needed to see a doctor immediately - in both cases confirming our prior bias.
It's possible that this was just a borderline case and we randomly got different results, or that this was caused by a slight variation in our prompts. But it does make more suspicious that some artifact of the post training is causing these models to tell users what they want to hear. As the models know more and more about us, and there's more pressure to grab marketshare, this is only going to get worse. Sadly it's way more fun to have your biases confirmed.
Wow, this tweet went very viral!
I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs.
So here's the idea in a gist format: https://t.co/NlAfEJjtJV
You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.