Introducing Pantograph. We're building a preschool for robots: they teach themselves through exploration, failure, and curiosity.
What we're building and why: https://t.co/S6LJM05bUW
Pantograph is building robots that learn through self-supervised RL at unprecedented scale.
We're hiring a software engineer to work on our core robot stack and testing infrastructure, including controller logic, component testing, and data collection pipelines. Rust and embedded software experience is a big plus, but mostly we're looking for a relentlessly curious and capable generalist who is excited to learn and get robots out into the world.
Computer use models shouldn't learn from screenshots.
We built a new foundation model that learns from video like humans do. FDM-1 can construct a gear in Blender, find software bugs, and even drive a real car through San Francisco using arrow keys.
Weird opp??? Mox used to be a city records center, so we inherited a pretty legit server room.
- 120A @ 240V
- 5 ton cooling unit
- 100kW diesel genny w/ 1000-gal tank
- 2.5Gb sym fiber
Can get it live in ~1 month. Who needs serious on-prem infra in SF?
Introducing Pantograph. We're building a preschool for robots: they teach themselves through exploration, failure, and curiosity.
What we're building and why: https://t.co/S6LJM05bUW
These are even cooler in person! Robotics like most truly interesting things are really data limited and new techniques for these domains will be super important.
@gwern What do you imagine such a process being applied to at that level of overhead? Even at 10x overhead I have a hard time coming up with applications
I wonder what you would get if you trained something Cycle-GAN-like between images and music. Probably possible today with the quality of generative models we have!
Devtools for AI Agents
@dessaigne
AI agents are the next wave: autonomous tools that reason, decide, and amplify human productivity. We’re funding startups building devtools for agents, whether you’re creating agent builders or building blocks to perform complex tasks.
Feels like a good time to start a computer control startup. The methods are generally known (RL on top of base models), and it probably doesn't require that much compute, just thoughtful environment design. I would probably start with a text-only representation of websites.
I hope that somebody starts a company to make an AI-native smartwatch. It feels to me like the ideal form factor for most of what I want a language model to do.
Hey friends, we're excited to announce that an additional 2,000 H100s will be added @sfcompute's on-demand market.
It's the largest* interconnected cluster, from any provider (including hyperscalers), that you can get on a per hour basis. You're not locked in with San Francisco Compute.
If DeepSeek can compete with OpenAI using 2,000 H800s, you too can train a state of the art RL model without ever having to sign a long-term contract that you can't exit. You could have trained DeepSeek-v3 for $4.5m for 1.5mo on SFC or $35m if you could only buy a 1 year contract off market.
This was the dream Alex & I had since our audio model company (Junelark) died because it couldn't procure enough GPUs, and it's what we've been working towards for nearly two years.
Long-term contracts are a trap; they make it so only the biggest of the big can compete in AI. They force startup founders to raise at massive valuations pre-revenue, which dilutes founders and employees and sets them up to fail when they can't raise their next round.
This cluster will roll out over the next few weeks as we scale our infrastructure. Soon you'll be able to access it via our managed Kubernetes service or by reaching out to set up a custom solution.
We're also exploring other ways of partnering with service providers to let them offer GPU-based services, like workers and inference endpoints, without being forced into a long-term contract with a hyperscaler. You no longer need to bet your company on GPU prices to offer GPU-based services.
* We think! If you know of a larger, please correct us!
One part of SF Compute we haven’t talked about very much yet is that post-AGI (presumably soon), the models will want to train more models.
(Really, people will ask the first models to train more models, or perhaps to solve tasks that would benefit from, say, some custom RL).
It will probably be most natural for those models to buy compute from a liquid market, where they can get precisely the compute they need for each run they need to do.
Has anyone tried “sub-token attention”? Artificially increase the sequence length by including K copies of each token next to each other (say, each linearly projected by a different map), and let the different copies attend to each other. True self-attention :P
(And then at the output project back to a single token to combine)