Several years ago I was responsible for optimizing strategic spend (mainly tech) budget of ~$400M annually at a major enterprise. Most of the that money was spoken for to "keep the lights on" and maintaining systems that should not break. As I thought about the push to embed FDEs, it made me wonder what is missing to unlock the larger part of enterprise budgets, especially in businesses where reliability matters a lot. I think there is a need for a design surface that allows AI players to unlock this bigger prize. Here is my exploration into this ๐
@azzz4799@rememberlenny I am doing work in this space. Letโs connect and collaborate. We have a bunch of AI solutions for the trades and need to translate them to the hardware layer next.
SITUATION EXPLAINED: Does Mythos actually represent a step change in cybersecurity risk?
We asked @ZackKorman, co-founder of Embroidery
"Maybe Mythos is better at finding and chaining together vulnerabilities for an amateur. But if you're good at what you do, you can probably just use GLM 5.2. I found it to be extraordinarily capable at cyber capabilities."
"A lot of the really scary benchmarking for Mythos is like, 'Here is a code base, do whatever you'd like, spend 48 hours, crack it.' On those benchmarks, Mythos definitely has a commanding lead on Opus 4.8. It's better than 5.5, but only by small amounts at the end of the process. It's not a crazy step change."
"When you sit with a model and say, 'I have a rough idea of where I'm trying to pen test,' and you guide it step by step, that's where the sub-frontier models perform closer to comparably."
"The quantifiable difference between these models, if you think that one is gonna end the world but the other one is safe, like you're just making stuff up. They're not far enough apart to where you need to make one of them a war crime and the other one can be freely available tomorrow."
@karpathy@leogau@kuroke01@gallabytes Would love it if you can have a thick skin on this type of mob behavior and continue to post freely here. Iโm not getting any offers from Anthropic to be on your slack :)
@8teAPi Donโt understand the connection between how Tag gathers organizational context with the ontology idea from Palantir. My understanding was that ontology is about understanding the data model of systems. Can you pls explain more?
Great write up! Even as founders one feels like โinvesting in AI, and building in it, starts to look less like underwriting a software company and more like running a trading book. You are long some curves, short others, and exposed to correlations that can break exactly when they matter most. โ
@tbpn@btaylor Brett gets it. No human can compete with reasoning voice agents that are tool enabled. We basically already have it but it needs to be optimized and spread more widely.
@signulll This is way of the future. Iโve always wanted all my communication and logins to be unified. Claude and Codex make it happen. I think this will be the surface of choice. Once it is, the device, apps and subs will all be subsumed.