Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history. https://t.co/2zX5bHdhsa
I do this with codex all the time. Ask it to review code for bugs and it will tell you all good, tell it there is a bug and it will LOOP AND LOOP and will find issues.
Carl Jung was right when he wrote that if a man does not face his shadow by age 35 he will not improve. He will calcify. His defense will become his personality.
There is so much alpha in just religiously, repeatedly invoking these magic spells throughout your agent coding and planning sessions:
❯ Great, now I want you to carefully read over all of the new code you just wrote and other existing code you just modified with "fresh eyes" looking super carefully for any obvious bugs, errors, problems, issues, confusion, etc. Carefully fix anything you uncover.
❯ Once again, check over everything again with fresh eyes looking for any blunders, mistakes, errors, oversights, omissions, problems, misconceptions, bugs, etc. Be SUPER thorough and meticulous!
Maybe one day you won't need them, but for now, it improves the results from frontier models more dramatically than anything else you can do. Assign them to hotkeys or get a Stream Deck so you can sprinkle them in without even thinking about it.
Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks.
On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.
The golden years of AirBNB were a temporary arbitrage on depreciation.
There was a universe of beautiful well-maintained properties and hosts that had not been worn down by short term guests.
And the AirBNB hosts didn’t properly estimate the cost of depreciation to maintain that standard, so costs were irrationally low
That era fundamentally cant return, it was a temporary arbitrage opportunity
There was once a supply of fairly pristine unused space and now there’s not
If a space does manage to hit the 2014 standard, it must charge a lot more to fight depreciation
And at that point a hotel is generally better
Ukraine is causing 38,000 Russian casualties per month. That's almost one Vietnam PER MONTH for Russia. They can't sustain that. Fund weapons for Ukraine and get this war over with. #NAFO
June 2024: The latest general-purpose LLMs could not count the r's in strawberry.
July 2025: The latest general-purpose LLMs get gold in the International Math Olympiad.
May 2026: The latest general-purpose LLM solve one of the "best-known questions in combinatorial geometry"
open sourcing Marlin-2B 🐟
a tiny VLM to extract structured information from videos
Marlin is finetuned for two questions devs want to ask in their videos: what is happening, and when?
Best open model in its weight class, competitive with Gemini-2.5-flash at only 2B params 🧵
This has to be my favorite problem in probability. It’s simple enough for a child to understand, but hard enough that most adults can’t figure it out. Create the probabilities you want to see in the world!
This is crazy. The hacker installed a dead-man's switch that will wipe your computer if you revoke the GitHub token they stole from you. Revoking the token is what triggers the wipe.
A 40-year-old patent has finally been brought to life.
That's the Y-zipper.
A 3D-printed three-sided fastener that transitions any object from flexible to rigid and back again.
The robotics application is the one that caught my attention.
A quadruped robot that adjusts its leg stiffness depending on terrain, switching between rigid and flexible in real time without additional motors or complex mechanical systems.
But this goes way beyond robotics. A wrist cast that loosens during the day and stiffens at night. A tent that pops into shape in 90 seconds instead of six minutes.
The idea sat in a patent filing for four decades. It took 3D printing to finally make it real.
~~
♻️ Join the weekly robotics newsletter, and never miss any news → https://t.co/GoA3ZuwoPB
Neural networks might speak English, but they think in shapes.
Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision.
Starting today, we’re releasing a series of posts on this research agenda. 🧵
One of the coolest things with Codex for Chrome is combining it with subagents so you can test things like multiplayer games!
Available for both macOS and Windows.
Happy Codexing
One of the things that made the Mythos release hard to interpret is that Anthropic held back details on most vulns they found, to give defenders time to patch.
1 month later, info from orgs with access to Mythos is starting to trickle out, e.g. this post from Mozilla today: