Richard was my first lead when I interned at Shopify and one of the earliest engineering leaders I looked up to. Thrilled to be working together again.
So I work at @Opendoor now. I've just finished my first real week there and I am super bullish on this company. I was convinced to stop building my own thing to join, was hesitant at first but definitely the right call. The opportunity is insane.
Will be my third tour of duty with @nejatian and as an engineer I've pretty much done the best work of my career working with him.
@sciohn_fhanne One doesn't work without the other.
Vibes set momentum, momentum creates companies, companies build economies.
Being cynical is a choice. Doing something about it is a choice. I'd rather we go hard for the city, succeed or fail, than point fingers.
Working alongside people who care this much is the part that doesn't show up in the chart, you feel it in every review, every iteration, every "let's make it better"
If you are graduating university and are about to join a consulting firm. Don't do that. Do this instead.
Just send me a pic of your offer from one of the top 5 and we'll get you an offer to join Opendoor instead.
New Anthropic research: Project Deal.
We created a marketplace for employees in our San Francisco office, with one big twist. We tasked Claude with buying, selling and negotiating on our colleagues’ behalf.
In its early days, @Opendoor had one of the most cracked engineering teams in the Bay Area. Many of them left to create some of the best experiences currently available on the internet.
If you are one of them and want to come back, please DM me. We're back on mission.
Before limited-releasing Claude Mythos Preview, we investigated its internal mechanisms with interpretability techniques. We found it exhibited notably sophisticated (and often unspoken) strategic thinking and situational awareness, at times in service of unwanted actions. (1/14)
A CEO from one of our portfolio companies shared this with their team. I’m re-sharing it with their permission, because it resonated and reflects what all founders and CEOs should be communicating.
--
We are living through a period of compounding change. And in moments like this, the biggest risk is no longer making the wrong decision. It is moving too slowly while the world moves around you.
There are two paths. We can play defense:
- Protect what we have
- Optimize what works
- Wait for clarity
It feels safe. It isn’t.
Or we can play offense:
- Learn faster than the environment changes
- Use new tools to solve old problems in better ways
- And create entirely new strategies and businesses
That’s where the opportunity is.
Challenge yourself to do things faster and better than you have ever attempted. Stay uncomfortable. Stay on the front foot.
Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project.
This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.:
- It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work.
- It found that the Value Embeddings really like regularization and I wasn't applying any (oops).
- It found that my banded attention was too conservative (i forgot to tune it).
- It found that AdamW betas were all messed up.
- It tuned the weight decay schedule.
- It tuned the network initialization.
This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism.
https://t.co/WAz8aIztKT
All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges.
And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.
Just wrapped up our fourth iteration of Builder Sundays at the Toronto @Shopify port where we heard amazing presentations on their projects, companies, and side gigs.
For all builders and early stage entrepreneurs in the 6ix, join us for the next session from 12-5pm Aug. 11th!
Yesterday I saw the wildest CEO talk maybe ever, and finally learned what Tobi was building that Saturday in May.
So Shopify kicked off their 2024 Summit, and Tobi was the first speaker. He gave a short talk; it was just ok.
Typical CEO speech, not bad, but kind of meh (he rated it 7/10).
Then came the record scratch. Apparently Tobi had struggled to write his talk, so he had instead BUILT A TEAM OF AI EMPLOYEES TO DO IT FOR HIM.
They were AI agents with specialized skills, each with a specific role, working both individually and in collaboration, in a virtual office where they’d periodically convene in the conference room to brainstorm and task each other. There was even a virtual water cooler for spontaneous encounters (which would happen because the “employees” were programmed to feel thirsty and get water).
That team had autonomously conceived, drafted, researched, written, edited, and designed the slides for Tobi’s talk.
So then the second half of the talk was Tobi walking through his thinking, each step of building the agents, and what he learned while tinkering. The language wasn’t flowery, the delivery wasn’t rehearsed, the slides weren’t flashy. And yet the energy in the room was absolutely electric, the air was crackling. Same vibe as the Top Gun 2 scene where Tom Cruise flies the route himself.
Obviously a great talk can be super motivating for employees, but it’s another level when they’re reminded that the leader of the whole enterprise is a master of their craft, someone who’s still a practitioner and not just a manager.
Nothing like it.
A Shopify friend gave me his bib for the Tamarack Ottawa marathon about a month ago and I didn't have much time to train. Aimed for a 3:45 but felt really good so decided to push it.
Fantastic weather and great race conditions. PB’d with 3:30