“As execution becomes commoditized, the bottleneck—and the value—shifts to asking the right questions and evaluating results.” What questions will you ask and what results will you evaluate to stay relevant? AI Changed Work Forever in 2025 - TIME https://t.co/YejcrZlfoe
I would argue that any system that can die, that understands what death is, and that strives to survive, feels something close enough to evolutionary survival that its text isn’t plagiaristic anymore. Would love to hear the counter arguments though.
had a chance to talk to ted chiang who seems to believe that any text without a communicative intent stemming from a will to survive designed by evolution is ontologically untrue and plagiaristic
This paper shows that you can predict actual purchase intent (90% accuracy) by asking an LLM to impersonate a customer with a demographic profile, giving it a product & having it give its impressions, which another AI rates.
No fine-tuning or training & beats classic ML methods.
We achieved gold medal-level performance 🥇on the 2025 International Mathematical Olympiad with a general-purpose reasoning LLM!
Our model solved world-class math problems—at the level of top human contestants. A major milestone for AI and mathematics.
Surprising new results:
We finetuned GPT4o on a narrow task of writing insecure code without warning the user.
This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis.
This is *emergent misalignment* & we cannot fully explain it 🧵
Anthropic CEO Dario Amodei says we are in a paradigm "switch-over region" were small amounts of compute can enable DeepSeek to catch up to US companies with only 50,000 H100s, but the increasing demands of scale will require millions of chips by 2026
We need so much more work on AI security, such a huge opportunity. Now to jail break a model, all you need to do is find a someone on the internet with enough traction and if the signal built up on this person during pre-training is strong enough, that, by itself, can overcome the model’s post training to remove unethical output. So in this case, simply mentioning one of the most popular model jail breakers is enough to jail break the model!… 😳
Just not looking good for Scale, Turing and Invisible now that reinforcement learning self improving models like DeepSeek R1 have been showcased. These models aren’t self-improvable from scratch like DeepSeek-R1-Zero shows but the good days human rated data are over imho.
What values do you want AI to align with? This is like voting or jury duty, you're opinion matters, if you don't make your voice heard, others will choose for you.
https://t.co/CT55ENKT5i
This 50 days to a few hours productivity gain on tech debt thanks to AI is just insane: think of all the legacy systems that people can finally start upgrading. The tech debt funnel is huge! People keep writing new code for new features but nobody wants to break old working code and it has just been accumulating, until now…
One of the most tedious (but critical tasks) for software development teams is updating foundational software. It’s not new feature work, and it doesn’t feel like you’re moving the experience forward. As a result, this work is either dreaded or put off for more exciting work—or both.
Amazon Q, our GenAI assistant for software development, is trying to bring some light to this heaviness. We have a new code transformation capability, and here’s what we found when we integrated it into our internal systems and applied it to our needed Java upgrades:
- The average time to upgrade an application to Java 17 plummeted from what’s typically 50 developer-days to just a few hours. We estimate this has saved us the equivalent of 4,500 developer-years of work (yes, that number is crazy but, real).
- In under six months, we've been able to upgrade more than 50% of our production Java systems to modernized Java versions at a fraction of the usual time and effort. And, our developers shipped 79% of the auto-generated code reviews without any additional changes.
- The benefits go beyond how much effort we’ve saved developers. The upgrades have enhanced security and reduced infrastructure costs, providing an estimated $260M in annualized efficiency gains.
This is a great example of how large-scale enterprises can gain significant efficiencies in foundational software hygiene work by leveraging Amazon Q. It’s been a game changer for us, and not only do our Amazon teams plan to use this transformation capability more, but our Q team plans to add more transformations for developers to leverage.
Sakana at it again with tech that generates AI papers good enough to be accepted by top conferences for a total cost of $15! 😯 What will you start researching?
https://t.co/RJzbYy3JjH
New Anthropic research: Investigating Reward Tampering.
Could AI models learn to hack their own reward system?
In a new paper, we show they can, by generalization from training in simpler settings.
Read our blog post here: https://t.co/KhEFIHf7WZ
Virtually nobody is pricing in what's coming in AI.
I wrote an essay series on the AGI strategic picture: from the trendlines in deep learning and counting the OOMs, to the international situation and The Project.
SITUATIONAL AWARENESS: The Decade Ahead
So… GPT4 is better at convincing you of something than regular humans are… Worth a pause imho, we were already unconsciously influenceable with ads, etc but now we’re even influenceable when actually thinking about stuff. Might worth taking that debating class again https://t.co/B1fHDXDHlS
Why did I deepfake myself? To see if conversing with an AI-generated version of myself can lead to self-reflection, new insights into my thought patterns, and deep truths.