My essay "Regulating AGI: From Liability to Provable Contracts"
https://t.co/R7vIrohJEO
was just posted as part of the "AGI Social Contract" project:
https://t.co/mZAQ1oj5ft
Happy to introduce Kimina-Prover-72B ! Reaching 92.2% on miniF2F using Test time RL. It can solve IMO problems using more than 500 lines of Lean 4 code !
Check our blog post here:
https://t.co/QbrmoyYL9i
And play with our demo !
https://t.co/u0Wj0Id4vZ
1/ 🔥 AI agents are reaching a breakthrough moment in cybersecurity.
In our latest work:
🔓 CyberGym: AI agents discovered 15 zero-days in major open-source projects
💰 BountyBench: AI agents solved real-world bug bounty tasks worth tens of thousands of dollars
🤖 Autonomously.
A pivotal shift is underway — AI agents can now autonomously do what only elite human hackers could before.
1/🧵Introducing VERINA: a high-quality benchmark for verifiable code generation. As LLMs are increasingly used to generate software, we need more than just working code--We need formal guarantees of correctness. VERINA offers a rigorous and modular framework for evaluating LLMs across code, specification, and proof generation, as well as their compositions, paving the way toward trustworthy AI-generated software.
🔗 https://t.co/E1rVkEFvoE
Great work! You've brought theorem proving to the thinking LLM revolution! Thank you for making the prover and the autoformalizer freely available. I believe these will be essential to effective AI Safety.
We believe formal math is the future.
🔥Introducing Kimina-Prover Preview, a Numina &
@Kimi_Moonshot collaboration, the first large formal reasoning model for Lean 4, achieving 80.78% miniF2F.
https://t.co/fNX7orQYeZ
Shocking! This year's batch of Y Combinator startups is all in on "Vibe Coding": https://t.co/HHzgs17xLB At 10:00 they say "one quarter of the founders said that more than 95% of their codebase was AI-generated!"
random thoughts/predictions on where vibe coding might go:
- most code will be written (generated?) by the time rich. Thus, most code will be written by kids/students rather than software engineers. This is the same trend as video, photos, and other social media
- we are in the command line interface days of vibe coding. For the majority of creators, vibe coding will eventually fade, and vibe designing (with a visual paradigm) will come to dominate. People ultimately think better in a GUI-like format than a CLI-like format. Thus, in vibe designing you will show the AI the design outcomes you want, and then everything else is done for you. Yes, you may end up with tools to tweak the design details for extra controllability, and provide additional mockups that then get filled in underneath with code. But maybe folks will build software without seeing or learning a programming language.
- vibe coding could reduce the need for open source libraries as more code will be generated from scratch by AI. Code will be more of a disposable commodity, with less reuse, and instead generated on the fly for personalized use. It's interesting to see right now that creating a new project is easier than editing a project, because the latter requires a lot more context/complexity. Interesting dynamics if something like this continues
- "trad UX" and design standards give way to post-modern/fragmented software, as millions of new vibe coders create experiences with no prior know how and new perspectives. New patterns will emerge, as TikTok/YouTube has done to film making and trad entertainment. The world will go beyond buttons and modals and scrollbars and other things. Software may become unrecognizable before it coalesces again
- if vibe coding makes software trivial to build, then the bottlenecks shift to other places: 1) consistent creativity that stays ahead of everyone else. Anyone can write a tweet, but the best creators are the ones who consistently come up with new ideas. 2) distribution and network effects, where the first vibe coded product doesn't win, but rather the first vibe coded product that hits scale that wins
- imagine products that automatically adapt based on user behavior, rather than based on the actions of the vibe coder. For example, if the vibe coder has specified that the signup funnel should easy, then after seeing users struggle with it, the software can automatically vibe code itself to improve the flow by dropping steps or adding explanatory text. Right now we are in a paradigm where PMs specify behavior that software engineers specify in code. Imagine if PMs can specify outcomes, and the software is configured to automatically adapt to hit those outcomes
what other wacky ideas should be on this list?
Wow. I cannot believe it. Just asked Claude to make the dogfight ultra realist!
✅ hit impacts
✅ smoke when damaged
✅ explosion on death
✅ free-fall with smoke
It feels so good to fly! + awesome plane and controls, 100% in Cursor with 0 code edition from me. LOOK AT THIS!
I just gave a short talk arguing that extending "Vibe Coding" to "Vibe Proving" and "Vibe Specification" will power formal methods for AI Safety: "It's a New Day for Formal Methods!" https://t.co/LdYkJPyDSg
I agree that R1 shows that we are unlikely to achieve AI Safety by mandating constraints on the big lab AIs. But "Acceleration is the only way forward" is suicide. My opposite take is: "It's time to get serious about building truly safe and secure infrastructure so that humanity survives regardless of the AIs that are created."
Whether you like it or not, the future of AI will not be canned genies controlled by a "safety panel". The future of AI is democratization. Every internet rando will run not just o1, but o8, o9 on their toaster laptop. It's the tide of history that we should surf on, not swim against. Might as well start preparing now.
DeepSeek just topped Chatbot Arena, my go-to vibe checker in the wild, and two other independent benchmarks that couldn't be hacked in advance (Artificial-Analysis, HLE).
Last year, there were serious discussions about limiting OSS models by some compute threshold. Turns out it was nothing but our Silicon Valley hubris. It's a humbling wake-up call to us all that open science has no boundary. We need to embrace it, one way or another.
Many tech folks are panicking about how much DeepSeek is able to show with so little compute budget. I see it differently - with a huge smile on my face. Why are we not happy to see *improvements* in the scaling law? DeepSeek is unequivocal proof that one can produce unit intelligence gain at 10x less cost, which means we shall get 10x more powerful AI with the compute we have today and are building tomorrow. Simple math! The AI timeline just got compressed.
Here's my 2025 New Year resolution for the community:
No more AGI/ASI urban myth spreading.
No more fearmongering.
Put our heads down and grind on code.
Open source, as much as you can.
Acceleration is the only way forward.
In November, I was honored to participate in the Mind First Foundation's event "AI Safety Salon with Steve Omohundro": https://t.co/pQwAHDANlI They just posted the two videos of the event:
https://t.co/aBhUqVnJfv
https://t.co/QNoAI2sLfw
In addition to Q&A and a panel discussion with Preston Estep and Dan Faggella, I gave a talk entitled "AI Benefits Without AGI Risks". I presented "Provably Safe AI Infrastructure" in the context of the biological evolution of agency, intelligence, and cooperation, the "Neat and Scruffy" developments in AI since 1956, and what it means for the future.
AI safety people were right. Again.
Instrumental convergence is proven yet again. Empirically.
So far we’ve seen:
✅ Self-preservation: You can’t achieve your goals if you’re turned off (see Apollo's recent paper. Link in comments)
✅Resource acquisition: You can’t achieve your goals if you don’t have energy, money, or computing power (see Terminal of Truths making money to increase its compute)
✅Goal preservation: You can’t achieve your goals if people change your goals (See Claude below)
It’s a fact. Also, it was obviously going to happen even before we got the empirical evidence, because it’s just the rational thing to do, as Claude below describes.
🏆 ⭐ We're thrilled to announce the 2024 Future of Life Award winners!
This year, we honor three groundbreaking experts who laid the foundations for ethics and safety in computing and AI.
Learn more about the invaluable work of Batya Friedman, James Moor, and Steve Omohundro:
PSA for Boston-area folks!!
On Nov 22nd there will be a special AI safety salon at Microsoft NERD w/ Steve Omohundro (@steveom), AI safety pioneer. Organized by Alex Hoekstra & @PrestonWEstep from Mind First Foundation (@mind_first). RSVP here->
https://t.co/jLjOb5X2Yu
We need more "tool AI" as in Elon's space triumph today, and less AGI world domination hype as in Dario Amodei's recent "entente" essay. My essay below argues that "scaling quickly" won't lead to Dario's "eternal 1991" – but perhaps to 1984 until the end, with a non-human Big Brother. https://t.co/va6PHmZ2g5
Debates over AI Policy like CA SB-1047 highlight fragmentation in the AI community. How can we develop AI policies that help foster innovation while mitigating risks? We propose a path for science- and evidence-based AI policy: https://t.co/9G9mlZ6JHi