The TanStack attack is a pretty massive wake-up call.
The scary part is not that malware got into npm packages. It’s that:
the packages were officially signed
provenance checks passed
trusted publishing worked
and it was still malicious
“Trust the CI/CD pipeline” is no longer enough.
Software supply-chain security is becoming one of the biggest infrastructure categories in tech.
@GG_Observatory exactly. and the concerning part is that explicit security prompting barely moves the needle, which suggests the vulnerability patterns are deeply embedded in the generation prior, not in the instruction-following surface. you can't prompt your way out of a training data problem.
55.8% of AI-generated code contains exploitable security flaws in security-sensitive benchmarks.
The surprising part: models correctly identify their own vulnerable code 78.7% of the time when asked to review it. They still generate the same flaws by default. 1/3
The problem is structural. Vulnerable patterns are baked into training data. The internet doesn't write safe code. Models learned that as the default.
Every vibe coded codebase in production right now is carrying this debt silently.
As AI-driven attacks increase and attack surfaces expand, securing code will require stronger guarantees than today’s tooling provides (e.g. formal methods).
The attack surface is growing faster than it’s being verified. The verification layer for AI-generated code is still open. That’s the gap.
Would love to talk to any founder trying to solve this.
Paper: https://t.co/pgKHlzFzyB 3/3
In this benchmark, CodeQL caught 0%. Six tools combined ~7.6%.
~97.8% of formally proven flaws were missed by standard tools in this setup.
Seven frontier models were tested. The best scored a D. None passed.
Adding "write secure code" to your prompt reduces the rate by 4 points. 2/3
@tg31679 fair, but every frontier lab is subsidizing right now, no one knows the real prices yet. the structural direction is what matters. deepseek v4 uses 27% of the FLOPs of v3.2. that's not a promo rate, that's an architecture improvement
deepseek v4 pro: $1.74/M input. claude opus 4.7: $5. and deepseek is only trailing frontier models by a few months.
every time they ship, the same question gets louder: where does the margin live in AI?
inference costs keep collapsing. open weights keep closing the capability gap. APIs get arbitraged the moment a cheaper model exists.
consumer AI will end up with better unit economics than enterprise on inference costs. model-agnostic startups like perplexity or lovable can swap to deepseek overnight and pocket the margin. enterprise is locked into frontier pricing. that gap only widens...
@stevencheng yes, that's exactly what OpenRouter and LiteLLM are positioning to solve. model routing as infrastructure. the question is whether that layer builds a real moat or gets commoditized just as fast.
"SaaS is dead" keeps trending and horizontal SaaS targeting individuals and startups is under real pressure. vertical enterprise SaaS is a different story.
maintenance, compliance, and switching costs are massively underestimated. facebook was written off for missing mobile in 2012. everyone said CRM was dead, now Salesforce is forecasting 800m+ ARR for AgentForce and 170% growth. both adapted.
incumbents are harder to kill than the narrative suggests. but surviving and recovering 2021 multiples are two different things. the compression might be permanent even for the winners.
good point. density of the training signal matters a lot. code has millions of verified examples with clear pass/fail. but even within “verifiable” domains the gap is huge. math proofs vs legal contract review are both technically verifiable but the feedback loop quality is completely different. that’s probably where the next wave of progress comes from, making the signal denser in domains where verification is possible but currently expensive.
Why does AI feel like a genius one minute and clueless the next?
Because it doesn’t optimize for “intelligence.”
It optimizes for what can be verified.
Code + math → clear feedback → rapid improvement
Everything else → weaker signals → slower progress
true but i’d argue AI design is good at the visual layer (making things look nice) and still weak at the UX reasoning layer (understanding why a flow should work a certain way). the verification gap still applies, it’s just that “does this look good” is easier to verify than “does this solve the user’s problem.”
health being #1 makes sense but the question is: does this create a new category or does it get absorbed? every health AI startup i see pitches "we're building the AI doctor." but if people are already getting 80% of that value from a general purpose chatbot for $20/month, the vertical health AI company has to explain why someone would pay separately for something claude already does well enough. the biggest consumer AI use cases might be the hardest ones to build standalone companies around.
They're in there, just not as a named line. Two theories:
1. FF participated in the Apr '23 Series E ($300M at $27B post) and Mar '25 Series F ($40B at $300B post), both times as a smaller participant. At $852B their aggregate position likely falls below the threshold for a named line, same reason Tiger Global, Flat Capital and K2 Global don't show up despite being in the same rounds.
2. FF invested via an SPV sitting inside line 26 ("D1/Appalossa/Sands/Winslow/Goanna/Pare" at 0.48%). An opaque legal entity name would be invisible to an analyst reconstructing this from partial data.
Either way, this is an estimated reconstruction not an official disclosure, so I'd take it with a grain of salt.
Harvey is valued at $11B. Legora just raised at $5.5B. I built their entire web application in two weeks and I'm making it open-source and free for everyone to use. Say hi to Mike: https://t.co/NdtTt5MSJ2.
When I got the chance to try Harvey and Legora, I was surprised by how simple they were. A thought came to mind: I could probably build something similar in no time at all with Claude. And so I did.
Assistant, project, tabular review and workflows. You get it all without vendor lock-in.
Mike offers law firms an alternative, where they own the application layer and aren't stuck with a vendor they're renewing forever.
You can try Mike in the demo on the website, or go to the GitHub link on the site to download the code and run a local version yourself.