π Spun up a full SaaS marketing site today using SaaS-Builder + GPT-4.1.
Prompt: "Build a modern SaaS marketing website with a hero section, feature highlights, pricing plans, testimonials, and responsive design."
What came out felt more like a handcrafted product than auto-generated code:
- Cohesive layout with all the right building blocks
- Responsive design that just worked
- Clean, extensible code ready for real content
What stood out? It didnβt feel like starting from scratch; it felt like skipping to the good part.
π· Open-source (MIT): https://t.co/XFBQDSGKJW
prompt was almost embarrassingly simple: build me a revenue dashboard that is sleek and management presentable.
you can do more. the same thread lets you:
drop in your brand colours and logo so it feels like yours
add live filters by region, product or date range
build a what-if scenario planner for churn or expansion
make it shareable with a clean link your team can comment on
duplicate it for different stakeholders without starting over
same thread. the site updates live.
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
@aakashgupta agree on the symptom. the part underselling itself is that this is the same problem as tool selection in agent loops, function calling in older stacks, and RAG chunk titling. capability discovery is the recurring bottleneck. skills just made it visible to non-engineers.
@marclou agree on the surfaces. the part underselling itself is MCP. opening endpoints lets agents read your product. MCP lets them operate it. those aren't the same shipping problem and they don't compound until the second one works.
running deep research across all three in a content pipeline right now. gpt-5.5 pulls roughly 2x the sources but cites maybe 60% of them in the final output. perplexity pulls fewer and cites almost all of them. 'best' depends on whether you're optimizing for recall or for traceability.
@garrytan agree on accelerate. the part underselling itself is 'fixed from one root cause.' velocity without convergence is just thrash. these updates land because the architecture is absorbing fixes instead of fragmenting under them. that's the rarer thing.
@gregisenberg agree on pattern matching cutting both ways. the part underselling itself is that voice was never the signal. provenance was. who said it, what they staked on it, what they've been wrong about. AI can mimic the tone and still can't fake the track record.
agree on the threshold. the part underselling itself is that 90% test coverage was never actually the bar. the bar was 90% coverage on the paths that matter, and humans couldn't tell which those were at scale. AI doesn't just make testing cheap. it makes triage cheap. that's the bigger shift.
@gregisenberg the framing is trust shifting from institutions to individuals. the actual shift is trust moving to whatever signal can't be cheaply generated. right now that's individual voice. when voice gets cheap to fake, trust moves again. it's not pro-individual. it's anti-commodity.
@themgmtconsult the fde premium only justifies itself in the first 60 days when you need direct model team access. after that the partner firms who have production evals for the same use case cut the bill in half while actually shipping
@tszzl the tics are the exact same scaling artifact that gives the analytical depth. condition every long form prompt on a 180 token excerpt pulled from past output. the aura comes back and the info density never drops
@haider1 i checked the mythos preview system card. it saturates cybench at 100% pass@1 across every tested challenge with 10 trials each. the gated preview access lines up with that cyber eval saturation not the 1-3 point spreads on the chart
the new modular stack
gpt realtime 2: gpt 5 class reasoning for complex loops.
gpt realtime translate: live reasoning across 70+ languages.
gpt realtime whisper: streaming transcription with zero lag.
software is disappearing into pure audio and code. when you can ship prod grade multi modal agents that handle messy human interruptions at 2am the cost of building at scale just dropped another 100x
https://t.co/58yBlcAQEL
shopify's river... it lives in slack. only runs in public channels so everyone sees the exact prompt response loop and learns from the failures in real time. no private threads allowed. brilliant.
Today weβre launching the OpenAI Deployment Company to help businesses build and deploy AI.
It's majority-owned and controlled by OpenAI. It brings together 19 leading investment firms, consultancies, and system integrators to help organizations deploy frontier AI to production for business impact. https://t.co/GnyjGFaLLA