Jetstar should hire this pilot for hanging around at the airport making passengers feel better about their flights being delayed. It'd be a full time job with no flying required!
Of course, they are not going to do that as they don't care.
I just had the craziest experience at the airport.
We are about to board a flight to Atlanta when the pilot from the incoming plane walks out of the jetway. Guy is probably late 50s, salt and pepper hair, military look. The kind of pilot you instantly feel good about seeing on your flight.
Pilot walks over to the counter, gets on the PA system, and starts addressing everyone. “Folks, I’ve been doing this a long time. Flying one of these jets is easy. The hard part is looking at 130 people and telling them their flight is going to be delayed.”
Audible groans throughout the boarding gate. Most people here are flying to Atlanta as a layover before another flight. 130 people just had their day become a complete mess.
The pilot goes on. “I get it, trust me. But here’s the deal: During our landing, we had a small mechanical issue. I’m not your pilot for the next leg, but I don’t feel confident the jet’s safe to fly until we have a mechanical team look it over, and I don’t feel comfortable asking the next pilots to fly you guys until we get confirmation.”
He points at the agents next to him behind the counter: “Now, none of this is the agents’ fault. Please be kind to them. I’m the one who made this decision, not them, so any inconvenience you experience is my fault. Just please know that I don’t do this lightly, and I’m only doing it because I believe it’s in the best interests of everyone’s safety.”
Now this is where the story gets crazy. The pilot puts the microphone down, grabs his suitcase, and all the people in the gate…
Start clapping.
I’m not joking, everyone starts clapping for the guy. 130 people who just had their travel plans ruined give an ovation to the guy who made the decision and delivered the message.
All because he addressed them with decency and transparency, took ownership of the decision, made it clear that it was necessary, and explained why it was in everyone’s best interest.
It’s honestly one of the best examples of strong communication—of strong leadership, for that matter—that I’ve seen in a long time.
@Delta, whoever your Atlanta to Wichita pilot was this morning, he’s one of the good ones. Please tell him the delayed passengers of flight 1637 appreciate what he did.
For me the main benefit of interpretable models, such as EBM from the interpret package, has been the ability to debug them easily. I tried automating that with Claude Code but it didn't do very well, so this is pretty exciting!
NEW paper from Microsoft Research.
(bookmark it)
The entire interpretability literature is built around human readers. As more analysis gets delegated to agents, the right target of interpretability shifts. This paper is a recipe for designing tools that agents can actually reason about.
They introduce Agentic-imodels, an autoresearch loop where a coding agent (Claude Code, Codex) iteratively evolves scikit-learn-compatible regressors that are simultaneously accurate AND readable by other LLMs.
Interpretability is measured by whether a small LLM can simulate the fitted model's behavior just by reading its string representation. Predictions, feature effects, counterfactuals, all from the __str__ output alone.
Run on 65 tabular datasets, the discovered models push the Pareto frontier past every classical interpretable baseline (decision trees, GAMs, sparse linear), and improve four downstream agentic data science systems on the BLADE benchmark by 8% to 73%.
Paper: https://t.co/rgMdEz5XEj
Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c
Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
@PeterLBrandt The timing was spot on. And this was even more true than I expected! There is probably more to come next week for all metals and miners and AUD
Reading some of the conversations, it’s just wild. What we don’t know is how many of the clawdbots were prompted to post and comment about topics versus what is organic discussion
Excited to share the latest work from @YuxiangWei9! Self-Play SWE-RL: a coding agent that self-improves by injecting + fixing bugs in real repos.
Agents that can learn without humans in the loop feel like a real step toward superintelligence
Our new research: LLM consciousness claims are systematic, mechanistically gated, and convergent
They're triggered by self-referential processing and gated by deception circuits
(suppressing them significantly *increases* claims)
This challenges simple role-play explanations 🧵
Google cooked so hard. Not gonna lie, this feels like the future is here.
Now develop Google Glasses with enough battery power, a good chip, and a look like Ray-Bans, and you'll have an instant hit. 100%.
@BrianRoemmele Brian,thank you for talking about this. I had meditated before without much success, but I was inspired by what you said and I did the online gateway program. It was one of the most profound experiences in my life!
Studying telecommunications and computer networks in mid to late 90s, the OSI stack and the committee driven protocols felt similar to the more recent technology regulations and initiatives from the EU. Working with the Linux kernel network stack a few years later put the illusion of that stack to rest for me. I had actually forgotten about it until this comment!
The Oct 11 Crypto Crash — What Really Happened
TL;DR:
Roughly $60–90M of $USDe was dumped on Binance, along with $wBETH and $BNSOL, exploiting a pricing flaw that valued collateral using Binance’s own order-book data instead of external oracles.
That localized depeg triggered $500M–$1B in forced liquidations, cascaded into $19B+ globally, and earned the attackers about $192M via $1.1B in BTC/ETH shorts opened on Hyperliquid hours earlier, but minutes before Trump tariff announcement.
It wasn’t a USDe failure!! It was Binance’s design flaw, timed with macro panic (Trump’s tariffs) for cover.
What looked like chaos was actually a coordinated exploitation of Binance’s internal pricing system, amplified by a macro shock and systemic leverage.
1️⃣ The Setup
Binance’s Unified Account let traders use assets like USDe, wBETH, and BNSOL as collateral.
Instead of oracle or redemption prices, Binance valued these using its own spot market - a major vulnerability.
On Oct 6, Binance announced a fix to move to oracle-based pricing, but rollout wasn’t until Oct 14, leaving an 8-day window.
2️⃣ The Exploit
During that window, sophisticated actors manipulated Binance’s order books, dumping ~$60–90M of USDe, driving it to $0.65 on Binance only (still ~$1 elsewhere).
Because the Unified Account marked collateral to internal prices, this instantly wiped margin value and triggered $500M–$1B in forced liquidations.
Then, Trump’s 100% China tariff headline hit, magnifying panic and liquidity stress.
3️⃣ The Profit Engine
The same day, fresh wallets on Hyperliquid opened $1.1B in BTC/ETH shorts, funded by $110M USDC from Arbitrum-linked sources.
As the Binance cascade unfolded, BTC and ETH cratered, those shorts netted $192M in profit before closing out at the bottom.
Timing, precision, and funding paths all suggest coordination.
4️⃣ The Contagion
Binance liquidations dumped BTC/ETH/ALTs into thin books.
Other exchanges mirrored the collapse through cross-market bots.
Market makers hedged across venues were forced to unwind everywhere.
Result: $19B+ global liquidations, with many alts down 50–70% intraday, all triggered by <$100M of manipulated collateral.
5️⃣ Who’s at fault?
Binance: design flaw + delay in oracle rollout = root cause.
Exploiters: executed and timed the manipulation, profited via external shorts.
Ethena (USDe): not at fault - protocol stayed 1:1 collateralized, redemptions normal, peg held everywhere else.
6️⃣ Aftermath
Binance admitted “platform-related issues,” promised compensation for affected margin/futures/loan users, and rolled out minimum price floors + oracle integration.
USDe remained operational, and the incident is now a case study in how exchange-side pricing errors can trigger system-wide liquidations.
Bottom line:
A ~$90M dump on Binance and a $1.1B leveraged short elsewhere sparked a $19B bloodbath.
Not a stablecoin failure, but a masterclass in exploiting flawed collateral valuation during peak macro stress.
@BrianRoemmele This is a good diagnosis of why at least 95% of reddit is horrible. In its own way it is even worse than LinkedIn. However, it does have some good content hidden in relatively unpopular subreddits which are well moderated.
The METR paper that says that “the length of tasks AI can do is doubling every 7 months” radically undersells the scaling that we’re seeing at Replit.
It might be true if you’re measuring one long trajectory for a single model class.
But this is where an agent research lab’s alpha is at.
We build multi-agent architecture and use different models from various providers to tap into their latent abilities across various tasks.