Met Jose at SJC, a front desk humanoid helping travelers in real time.
What stood out wasn’t the novelty. It was the normalization. It answered questions about gates, coffee shops, walk times etc.
Can't wait for #Optimus@elonmusk#AI#BayArea#humanoid
Engineering is now this:
new agent -> shift+tab -> wispr plan -> wait -> look at X -> review plan -> make adjustments -> approve/build -> verify it works -> integrate tests -> merge -> go to app
Absolutely! Innovation moves fast and waits for no one. To stay ahead and succeed, you must be on the cutting edge. There’s no room for excuses. Only action. #speed#ai
yes things are changing fast, but also I see companies (even faang) way behind the frontier for no reason.
you are guaranteed to lose if you fall behind.
the no unforced-errors ai leader playbook:
For your team:
- use coding agents. give all engineers their pick of harnesses, models, background agents: Claude code, Cursor, Devin, with closed/open models. Hearing Meta engineers are forced to use Llama 4. Opus 4.5 is the baseline now.
- give your agents tools to ALL dev tooling: Linear, GitHub, Datadog, Sentry, any Internal tooling. If agents are being held back because of lack of context that’s your fault.
- invest in your codebase specific agent docs. stop saying “doesn’t do X well”. If that’s an issue, try better prompting, https://t.co/SOjpn47yxo, linting, and code rules. Tell it how you want things. Every manual edit you make is an opportunity for https://t.co/S1ZvtYQwta improvement
- invest in robust background agent infra - get a full development stack working on VM/sandboxes. yes it’s hard to set up but it will be worth it, your engineers can run multiple in parallel. Code review will be the bottleneck soon.
- figure out security issues. stop being risk averse and do what is needed to unblock access to tools.
in your product:
- always use the latest generation models in your features (move things off of last gen models asap, unless robust evals indicate otherwise). Requires changes every 1-2 weeks - eg: GitHub copilot mobile still offers code review with gpt 4.1 and Sonnet 3.5 @jaredpalmer. You are leaving money on the table by being on Sonnet 4, or gpt 4o
- Use embedding semantic search instead of fuzzy search. Any general embedding model will do better than Levenshtein / fuzzy heuristics.
- leave no form unfilled. use structured outputs and whatever context you have on the user to do a best-effort pre-fill
- allow unstructured inputs on all product surfaces - must accept freeform text and documents. Forms are dead.
- custom finetuning is dead. Stop wasting time on it. Frontier is moving too fast to invest 8 weeks into finetuning. Costs are dropping too quickly for price to matter. Better prompting will take you very far and this will only become more true as instruction following improves
- build evals to make quick model-upgrade decisions. they don’t need to be perfect but at least need to allow you to compare models relative to each other. most decisions become clear on a Pareto cost vs benchmark perf plot
- encourage all engineers to build with ai: build primitives to call models from all code bases / models: structured output, semantic similarity endpoints, sandbox code execution. etc
What else am I missing?
I strongly believe that context engineering adds substantial value and differentiation. As access to models becomes more widespread, AI leaders should prioritize context engineering and access to proprietary data sources.
Here’s why context engineering is such a big deal.
We just spent 2 hours debating when an agent should rely on its internal knowledge vs. trying to find relevant context within data for just one type of question. We got through 2 test cases of hundreds.
Even the people involved in the brainstorm couldn’t all agree on what they would expect humans to do in this situation. There truly was no right answer, and it’s always context specific customer by customer.
Everything in context engineering is a tradeoff between a variety of factors: how fast do you want the agent to answer a question, how much back and forth interaction do you want to require for the user, how much work should it do before trying to answer a question, how does it know it has the exhaustive source material to answer the question, what’s the risk level of the wrong answer, and so on.
Every decision you make on one of these dimensions has a consequence on the other end. There’s no free lunch. This is why building AI agents is so wild.
It also highlights how much value there is above the LLM layer. Getting these decisions right directly relates to the quality of the value proposition.
A "t" makes a big difference.
cloud-native vs. cloud-naive
Had a conversation on this topic with a technology leader at an investment firm. #Cloud#CloudComputing
Every team has its ups and downs. Our cloud deployment product had a down week. For now, things are stable (somewhat). We will retrospect next week, write and review COEs, and figure out an execution plan to make our product better. #battlescars#feedback#learn#improve
No first-version of a successful software product was ever built by more than 10 engineers. Agree/disagree?
My mentor, Rajeev Motwani, who also mentored the founders of Google, used to say that the ideal size for a first-version product team is from 5-7 people. Just like in a startup, these lean teams should consist of driven self-starters who thrive in a highly independent environment. These people also have to have the right skills and talent.
If a team requires more than 10 members to function, Rajeev used to say you haven't chosen the right team.