Hear me out: what if Zuck is spending billions on a handful of key AI researchers is not irrational, but very much rational?
What if Meta is set to eliminate OpenAI as a large advertising business before it can grow too large and threaten Meta?
From today's @Pragmatic_Eng
The hype around AI and agents right now is getting out of hand. Yes, these technologies enable myriad wonderful new use cases, but LLMs and agents are, at their core, non-deterministic and behave that way.
Aside from chat, where this behavior fits well, building products around these technologies is not a slam dunk. It will take time and effort, and I don't buy the "AI will just figure it out soon" argument either.
I'll be happy to be proven wrong when it happens, and I do believe it will happen, but not in the timeframes the media is currently frothing over. As much as I'd love to be done coding in the next 6-12 months, that's just not going to happen, and that's coming from someone who maximizes his use of agents & LLM-powered-autocompletion when coding.
Coding as a skill looks totally different in the next 5 to 10 years, though!
My notes on the Apple Intelligence all hands meeting:
- not held by JG, Tim, or Craig
- AI only works 66-80% of the time (that's VERY low, the last 20% is 80% of the work, even 95% is un-shippable) So it basically didn't exist at WWDC. Ouch.
- also blamed MarCom, fairly, for the tv ad
- they *aim* to be ready in a year, but have doubts
- JG and Craig feel personally responsible
- he *actually demo'ed* a few smaller tasks
- it'll be ready when it's ready
- "As of Friday, Apple doesnโt plan to immediately fire any top executives over the AI crisis". Notice the word immediately.
- "It has discussed moving more senior executives under Giannandrea to assist with a turnaround effort." That's setup for "this was the plan, but we mutually decided new leadership would be beneficial"
Interesting times! Good luck to that team. Sounds like a long way to go.
https://t.co/M32lnr0heC
Evaluating quality is hard; doing so automatically with LLMs is...
I've been working on using LLMs to help evaluate quality for a couple of years, and I've noticed something critical: LLMs are good at objective judgments when given strict constraints.
When trying to judge something subjective, though, LLMs quickly go off the rails.
For example, using an LLM to evaluate the quality of a relatively small piece of code is straightforward:
- Does it match general style guides for that language?
- Does it compile?
- Does it, given sample inputs, produce the expected output?
Or perhaps a more subjective but still tractable example: Is a particular search result on topic for the given query? Especially with some of the more recent, larger models, such a judgment can align nicely with what a human would evaluate.
However, using an LLM to judge an LLM-powered chat agent and ensuring each of its messages meets a specific quality bar is nearly untractable.
A lot of work is ongoing around the industry to make such evaluations more tractable. Still, even when the agent is constrained to specific types of conversations, such evaluations are currently tough. I'm very curious, though, to see where this goes over the next few months and years!
Mardi Gras is such a fun reminder that as much as AI is awesome and is reshaping how we do work, there's so much that AI won't be replacing anytime soon.
That said, while productivity is accelerating right now, I'm reminded of a key skill that I've always leaned on and feel is becoming ever more critical: the ability to multitask.
A good amount of research shows that when you multitask, you do worse at each of the things you're switching between, but there's mounting evidence that multitasking isn't all bad.
Especially with work environments requiring more task switching and the ability to get more done with less, doing a couple of things simultaneously is becoming more critical. As a Senior Staff Software Engineer at LinkedIn, I have a lot of meetings throughout my day, and while I need to be present at all the ones I join, I'm not driving all of them.
For the ones where I can primarily listen, I can very easily get some coding work done, especially when I've already thought through what I want to do before the meeting. It's just a question of making the change, running the build, and quickly identifying what to change again, as the build inevitably fails for a plethora of potential reasons.
You can't multitask everything. I wouldn't write code and a design document at the same time; that's not going to work. But while sometimes my coding work requires my 100% focus, sometimes I can sneak in a few messages or emails while something builds or I install a new dependency. Strategic multitasking is the name of the game for me because it helps me be more efficient with my time and get more done.
Wow, has that been a brutal experience! Perhaps I haven't found the right tools yet, so if you have a recommendation, please do share :)
I've tried MetaGPT, having Cursor build it for me, or even just ChatGPT o1 Pro -- none have been able to build anything usable for me. And yes, I'm paying $200/month for ChatGPT Pro. It's been the best AI product I've found thus far, and it's worth the cost of getting access to the latest releases. Deep Research has been fun to try, but it can't really write code!
O1 Pro mode has been the most useful in building my portfolio website, but even then, I've had to simplify my goals for it to make any progress drastically. So far, the most success I have is using ChatGPT 4o, or o3-mini-high to answer questions I have as I go. AI has definitely been a productivity boost for that use case, but it's helping me; I wouldn't even say I'm helping it yet.
Social media seems to think that AI can already build end-to-end applications, and while I'm bullish on AI, there's just so much hype right now that it's hard to trust most of the headlines. We will continue seeing amazing new products land over the weeks and months to come, but sometimes, I wish the hype cycle died down so we could all be more realistic.
Am I missing something here? Have you been able to use an AI tool or agent to build an end-to-end project?
A couple reflections on the quantum computing breakthrough we just announced...
Most of us grew up learning there are three main types of matter that matter: solid, liquid, and gas. Today, that changed.
After a nearly 20 year pursuit, weโve created an entirely new state of matter, unlocked by a new class of materials, topoconductors, that enable a fundamental leap in computing.
It powers Majorana 1, the first quantum processing unit built on a topological core.
We believe this breakthrough will allow us to create a truly meaningful quantum computer not in decades, as some have predicted, but in years.
The qubits created with topoconductors are faster, more reliable, and smaller.
They are 1/100th of a millimeter, meaning we now have a clear path to a million-qubit processor.
Imagine a chip that can fit in the palm of your hand yet is capable of solving problems that even all the computers on Earth today combined could not!
Sometimes researchers have to work on things for decades to make progress possible.
It takes patience and persistence to have big impact in the world.
And I am glad we get the opportunity to do just that at Microsoft.
This is our focus: When productivity rises, economies grow faster, benefiting every sector and every corner of the globe.
Itโs not about hyping tech; itโs about building technology that truly serves the world.
AI replacing software engineers is something I think about a lot because the recent advances with LLMs have made them quite capable of writing code.
That said, for real applications (and not little demos), writing the code isn't really where effort is spent. The hardest part is always fixing an esoteric issue in an existing codebase or performing a migration for some dependency that had been working just fine for years.
AI will be able to help in those domains, too, but we are far from that. AI, as it's thought of today, is just a massive LLM that's ingested what's publicly out there on the Internet. The vast majority of real work happens in private spaces and networks, though, meaning that today's LLMs don't have access to this data.
So, while it's frustrating, companies restricting access to LLMs at work does make sense to me, but finding the right balance is tricky because LLMs are absolutely productivity boosters. Trust will continue to play a massive role because companies will need to trust another company telling them their LLM will not record and train on the data being passed through, or we'll need ways to validate that in some technical way (which sounds very hard).
I'm excited to code less and hand off more of that to LLMs over time, but as of today, there's still a ton of guidance I need to provide, and I don't see that exponentially decaying as the models get exponentially better. My head may be in the sand, though.
Maintaining software is counter-intuitive to a lot of people. Without wading too deep into the "Is software engineering a real engineering discipline" debate, all forms of engineering come with maintenance requirements. Civil engineers think about road repairs; mechanical engineers think about the wear and tear of materials that degrade with use.
In software, while there isn't a physical substrate that needs maintenance (yes, there's hardware, but let's bucket that under electrical engineering), maintenance is still required in most cases. Given a specific set of requirements, though, a highly skilled software engineer can build a system that requires almost no maintenance, which is one of the beautiful things about software engineering.
That said, though, such sets of requirements are rare, and in my experience, the set of requirements never stays static. This continued iteration means that there is almost always a maintenance cost to software, and the bigger the project, the more maintenance there is.
While it's not a fun topic to discuss because most software engineers would prefer just to build new stuff, maintaining what's been built is critical to enabling the building of new stuff. Working to minimize maintenance costs is, therefore, critical work that often gets overlooked but has such high leverage!
Beach vacations are fantastic. With my first post in 2025 and the most extended break I've taken between posts in a few years, it's clear in hindsight that I was very close to burning out just before this Hawaii trip.
It's hard to see it while it's happening. I was just trying to keep up with work, my side business, and a variety of health issues for both my wife and me, all while maintaining my 100+-year-old home, with asshole neighbors making it as hard as possible to do so.
After a week in the sun (well, mostly sunny: we had some torrential rain near the end of our trip), it dawned on me just how over-stressed I was and how that had impacted everything. I was not showing up to work with the energy I typically do and wasn't making the progress I wanted on my side projects. This lack of productivity frustrated me, creating even more stress on an already overloaded system.
While I've talked about it before, it bears repeating: the snowball effect is very real. Whether it's positive or negative, things snowball very quickly if you're not paying attention.
As I return from my vacation, I'm both very excited to get back to improving LinkedIn's app speed and building an amazing Search experience for all LinkedIn members, and more aware of signs I'm getting closer to burnout that I need to pay more attention to. My focus is entirely on restarting my snowball, just down the positive hill ๐
Announcing The Stargate Project
The Stargate Project is a new company which intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States. We will begin deploying $100 billion immediately. This infrastructure will secure American leadership in AI, create hundreds of thousands of American jobs, and generate massive economic benefit for the entire world. This project will not only support the re-industrialization of the United States but also provide a strategic capability to protect the national security of America and its allies.
The initial equity funders in Stargate are SoftBank, OpenAI, Oracle, and MGX. SoftBank and OpenAI are the lead partners for Stargate, with SoftBank having financial responsibility and OpenAI having operational responsibility. Masayoshi Son will be the chairman.
Arm, Microsoft, NVIDIA, Oracle, and OpenAI are the key initial technology partners. The buildout is currently underway, starting in Texas, and we are evaluating potential sites across the country for more campuses as we finalize definitive agreements.
As part of Stargate, Oracle, NVIDIA, and OpenAI will closely collaborate to build and operate this computing system. This builds on a deep collaboration between OpenAI and NVIDIA going back to 2016 and a newer partnership between OpenAI and Oracle.
This also builds on the existing OpenAI partnership with Microsoft. OpenAI will continue to increase its consumption of Azure as OpenAI continues its work with Microsoft with this additional compute to train leading models and deliver great products and services.
All of us look forward to continuing to build and develop AIโand in particular AGIโfor the benefit of all of humanity. We believe that this new step is critical on the path, and will enable creative people to figure out how to use AI to elevate humanity.