@joulee 1. Which parts of the experiences are creating customer friction/pain? (for prioritization, impact)
2. Which actions/inputs are driving target outcomes? (particularly which are causal... )
3. What does a healthy customer lifecycle look like? What is an anomaly?
We just released Claude Code channels, which allows you to control your Claude Code session through select MCPs, starting with Telegram and Discord.
Use this to message Claude Code directly from your phone.
@realmadhuguru Feel this, but at the same time instead of worrying about being behind we should focus on what problems we can solve with this new found power
Same feeling as a product person.
For years, my constraint was execution capacity - lots of fun ideas, but building each required assembling eng and design teams. Hard and unscalable.
With AI, I can now 'hire' a full team instantly. Massive scale unlocked...in theory.
Because now there’s a new constraint: mastering how to orchestrate AI teams…the tools, workflows and product craft in this new world.
Feels like there is no bigger lever than learning these skills. Channeling all my spare time and energy here.
Really proud of the DeepLearningAI team. When Cloudflare went down, our engineers used AI coding to quickly implement a clone of basic Cloudflare capabilities to run our site on. So we came back up long before even major websites!
Showing has always been more powerful than telling. Data and goals are still important (“why?”) but condensing the cycle down with a rapid prototype way better than weeks of word edits
At @Google, we are moving from a writing‑first culture to a building‑first one.
Writing was a proxy for clear thinking, optimized for scarce eng resources and long dev cycles - you had to get it right before you built.
Now, when time to vibe-code prototype ≈ time to write PRD, PMs can SHOW not tell. Role profiles are blurring, creativity and building are happening in parallel.
@venturetwins@a16z Love this. Having a newborn + tracking data manually for her, there seems to be an open space to better streamline how we integrate growth and development details in order to get more out of AI during the journey. Currently using 6 different apps to do that.
The real shiptober (plus one day) was at Anthropic:
• 11/1 - Token counting API
• 11/1 - Multimodal PDF support across claude and the API
• 10/31 - Voice dictation in Claude mobile apps
• 10/31 - Claude desktop app
• 10/29 - Claude in Github Copilot
• 10/24 - Analysis tool
• 10/22 - New Claude 3.5 Sonnet
• 10/22 - Computer use API
• 10/18 - Financial analyst quickstart
• 10/17 - Mobile app design overhaul
• 10/9 - Remove message order restrictions in API
• 10/8 - Message Batches API
• 10/4 - Artifacts errors auto-fix
Btw we are able to ship this much because we use Claude all the time
Computer use is the first step toward a completely new form of human-computer interaction.
In just a few years, the way we interface with computers will be completely different from today.
Let me explain:
We've built an API that allows Claude to perceive and interact with computer interfaces.
This API enables Claude to translate prompts into computer commands. Developers can use it to automate repetitive tasks, conduct testing and QA, and perform open-ended research.
Excellent article by Sonya Huang and Pat Grady of @Sequoia, "The Agentic Reasoning Era Begins", and the $10 trillion opportunity with service-as-a-software:
https://t.co/wkI9mnLwn1
"Thanks to agentic reasoning, the AI transition is service-as-a-software. Software companies turn labor into software. That means the addressable market is not the software market, but the services market measured in the trillions of dollars."
Stripe data shows that top AI startups in 2024 (ex: OpenAI, Anthropic, Mistral, Midjourney) are making money faster than equivalent SaaS companies in 2018.
Al startups that hit at least $2.5M/mo rev achieved the milestone in 20 months — 5x faster than past SaaS startups.
Do we think that’s because VC money is more concentrated? TikTok? Actual higher interest? OAI skewing everything?
More here from FT: https://t.co/mw5KTrq0wS
ChatGPT isn’t slowing down. They just released a new feature called “Canvas” so that more work doesn’t just get assisted by ChatGPT, it gets done.
Check it out.
@thiagocaserta@deedydas Definitely better when you are starting with a blank slate or you’ll need to give a lot of context. That said, optimistic this is just growing pains and the meta reasoning / architecture learning will come soon (or with tuning)
NotebookLM is quite powerful and worth playing with
https://t.co/EMHIjc15iU
It is a bit of a re-imagination of the UIUX of working with LLMs organized around a collection of sources you upload and then refer to with queries, seeing results alongside and with citations.
But the current most new/impressive feature (that is surprisingly hidden almost as an afterthought) is the ability to generate a 2-person podcast episode based on any content you upload. For example someone took my "bitcoin from scratch" post from a long time ago:
https://t.co/7ajZNZ0BGi
and converted it to podcast, quite impressive:
https://t.co/ZZn0LJgsnu
You can podcastify *anything*. I give it train_gpt2.c (C code that trains GPT-2):
https://t.co/gDrAqix4Iv
and made a podcast about that:
https://t.co/bgcwmQr5d7
I don't know if I'd exactly agree with the framing of the conversation and the emphasis or the descriptions of layernorm and matmul etc but there's hints of greatness here and in any case it's highly entertaining.
Imo LLM capability (IQ, but also memory (context length), multimodal, etc.) is getting way ahead of the UIUX of packaging it into products. Think Code Interpreter, Claude Artifacts, Cursor/Replit, NotebookLM, etc. I expect (and look forward to) a lot more and different paradigms of interaction than just chat.
That's what I think is ultimately so compelling about the 2-person podcast format as a UIUX exploration. It lifts two major "barriers to enjoyment" of LLMs. 1 Chat is hard. You don't know what to say or ask. In the 2-person podcast format, the question asking is also delegated to an AI so you get a lot more chill experience instead of being a synchronous constraint in the generating process. 2 Reading is hard and it's much easier to just lean back and listen.
OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to the latter.
1. You don't need a huge model to perform reasoning. Lots of parameters are dedicated to memorizing facts, in order to perform well in benchmarks like trivia QA. It is possible to factor out reasoning from knowledge, i.e. a small "reasoning core" that knows how to call tools like browser and code verifier. Pre-training compute may be decreased.
2. A huge amount of compute is shifted to serving inference instead of pre/post-training. LLMs are text-based simulators. By rolling out many possible strategies and scenarios in the simulator, the model will eventually converge to good solutions. The process is a well-studied problem like AlphaGo's monte carlo tree search (MCTS).
3. OpenAI must have figured out the inference scaling law a long time ago, which academia is just recently discovering. Two papers came out on Arxiv a week apart last month:
- Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5.
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search.
4. Productionizing o1 is much harder than nailing the academic benchmarks. For reasoning problems in the wild, how to decide when to stop searching? What's the reward function? Success criterion? When to call tools like code interpreter in the loop? How to factor in the compute cost of those CPU processes? Their research post didn't share much.
5. Strawberry easily becomes a data flywheel. If the answer is correct, the entire search trace becomes a mini dataset of training examples, which contain both positive and negative rewards.
This in turn improves the reasoning core for future versions of GPT, similar to how AlphaGo’s value network — used to evaluate quality of each board position — improves as MCTS generates more and more refined training data.