Yash

@coldifl

Co-founder Intelligence Factory (YC P26). Tech/Robots/NFL

San Francisco, CA

Joined April 2026

80 Following

91 Followers

24 Posts

Yash @coldifl

about 17 hours ago

@drxcliu @ycombinator happy to host you at our office, we can’t show everything in the launch

Yash @coldifl

2 days ago

The standard fix is a data flywheel: deploy, collect data, retrain, redeploy. But this is slow and expensive. We should be able to make this data flywheel much faster by patching models with all edge case data at once rather than waiting for them to happen one-by-one.

170

Yash @coldifl

2 days ago

The deployment flywheel in robotics is broken. In the real world, robots don't compete with other robots - they compete with humans. And no business pays for a robot that works 80% of the time when their current system works 100% of the time. (1/2)

coldifl's tweet photo. The deployment flywheel in robotics is broken.

In the real world, robots don't compete with other robots - they compete with humans. And no business pays for a robot that works 80% of the time when their current system works 100% of the time. (1/2) https://t.co/ATkuq2KqeH

305

Yash @coldifl

2 days ago

@Theonash_ @ycombinator @lukas_m_ziegler Bro we are not egocentric data collection😭

Yash @coldifl

2 days ago

@bygregorr @ycombinator It’s less about the hardware but the sensor. Waymo transfers sensors across cars but hardware can go from Jaguar to Kia

Yash @coldifl

2 days ago

6 months of work. Happy to finally lauch Intelligence Factory! Just getting started.

Y Combinator

@ycombinator

2 days ago

Intelligence Factory is building human intelligence for robots. They train general-purpose manipulation models on human demonstration data (vision, action, and touch), then deploy them in warehouses, grocery stores, and data centers. Congrats on the launch, @coldifl and Jalaj! https://t.co/fj1UJtiZxx

225

42K

coldifl retweeted

Y Combinator

@ycombinator

2 days ago

225

42K

coldifl retweeted

Ayman Saleh

@sir_aymansaleh

15 days ago

Welcome to the cap @sama You’re welcome

Yash @coldifl

17 days ago

We live in a bubble in SF. We need to understand that this is not what the world is like. Starting to believe strongly that while being in the bubble has its perks, a successful company (especially robotics) needs to be grounded in reality.

Deedy

@deedydas

19 days ago

The vibes in SF feel pretty frenetic right now. The divide in outcomes is the worst I've ever seen. Over the last 5yrs, a group of ~10k people - employees at Anthropic, OpenAI, xAI, Nvidia, Meta TBD, founders - have hit retirement wealth of well above $20M (back of the envelope AI estimation). Everyone outside that group feels like they can work their well-paying (but <$500k) job for their whole life and never get there. Worse yet, layoffs are in full swing. Many software engineers feel like their life's skill is no longer useful. The day to day role of most jobs has changed overnight with AI. As a result, 1. The corporate ladder looks like the wrong building to climb. Everyone's trying to align with a new set of career "paths": should I be a founder? Is it too late to join Anthropic / OpenAI? should I get into AI? what company stock will 10x next? People are demanding higher salaries and switching jobs more and more. 2. There’s a deep malaise about work (and its future). Why even work at all for “peanuts”? Will my job even exist in a few years? Many feel helpless. You hear the “permanent underclass” conversation a lot, esp from young people. It's hard to focus on doing good work when you think "man, if I joined Anthropic 2yrs ago, I could retire" 3. The mid to late middle managers feel paralyzed. Many have families and don't feel like they have the energy or network to just "start a company". They don't particularly have any AI skills. They see the writing on the wall: middle management is being hollowed out in many companies. 4. The rich aren’t particularly happy either. No one is shedding tears for them (and rightfully so). But those who have "made it" experience a profound lack of purpose too. Some have gone from <$150k to >$50M in a few years with no ramp. It flips your life plans upside down. For some, comparison is the thief of joy. For some, they escape to NYC to "live life". For others still, they start companies "just cuz", often to win status points. They never imagined that by age 30, they'd be set. I once asked a post-economic founder friend why they didn't just sell the co and they said "and do what? right now, everyone wants to talk to me. if i sell, I will only have money." I understand that many reading this scoff at the champagne problems of the valley. Society is warped in this tech bubble. What is often well-off anywhere else in the world is bang average here. Unlike many other places, tenure, intelligence and hard work can be loosely correlated with outcomes in the Bay. Living through a societally transformative gold rush in that environment can be paralyzing. "Am I in the right place? Should I move? Is there time still left? Am I gonna make it?" It psychologically torments many who have moved here in search of "success". Ironically, a frequent side effect of this torment is to spin up the very products making everyone rich in hopes that you too can vibecode your path to economic enlightenment.

16K

13K

13M

133

Yash @coldifl

17 days ago

@GJarrosson Also robotics!

Yash @coldifl

17 days ago

I don’t understand why people treat egoscale as the bible. It has 55% success rate.

Junfan Zhu 朱俊帆 ✈️ CVPR

@junfanzhu98

18 days ago

🐝 @saturdayrobotic Robotics & World Models Reading Club 08 Recap: Embodied Human Data as the “Internet of Motion and Behavior” keynote @ryan_punamiya, hosts @junfanzhu98, @aurorafeng_01. Great Parallel: Egocentric Human Data = Internet for Robot Foundation Models 🤖 Jim Fan’s “Robotics’ Endgame” nails it: LLM pipeline (Pre-Train “Simulating” → SFT “Aligning” → Reasoning RL “Surpassing”) perfectly mirrors robotics World Modeling → Action Fine-Tuning → Physical RL. Egocentric video + pose + language = the scalable “human experience” corpus robots desperately need. Bottlenecks are brutal: no counterfactuals, Swiss-cheese coverage gaps, teleop skill caps, and bounded human thought (only deliberate actions; subconscious micro-adjustments and collaboration missing). Early methods relied on point tracks (MotionTracks/ATM), value functions (VIP/V-PTR on Ego4D 18k scenes + Bridge 3k clips/150k trans), and repr. learning (MVP/R3M: time-contrastive + video-lang alignment + L1 sparsity). Still brittle multi-stage pipelines. Mimicplay (2023) fixed the bridge: multi-view human play → latent planner 𝒫 (goal g_t^h + current o_t^h) → GMM decoder → 3D hand traj l_t. Stage 2 freezes 𝒫 and adds tiny robot data (wrist/proprio) to train policy π. Result: true zero-shot testing from human goals. This lets robots learn complex play without hand-crafted labels. EgoMimic (ICRA’25) goes further: treat human hand as “just another robot.” Unified co-trained ACT policy: masked obs + hand p_t^H + wrist + robot ^R p_t / ^R R_t → shared vision encoder + norm layers → ACT trunk → Cartesian (3D pose) + joint losses. Human/robot/shared streams in one model. The 4× embodiment gap (kinematic DoF/morphology, kinodynamic speed, tactile sensing, visual + partial observability) explains why naïve transfer fails. Human pretraining = monocular noisy pose/occlusions; robot deployment = rich proprio/calibrated sensing. EgoBridge solves alignment with Joint Optimal Transport on latent+action distributions. Soft supervision via cost function retains geometry and marginals. KL/MMD baselines collapse (pick/place clusters disjoint; W₂ drops 8.704 → ~0). It aligns not just representations, but controllable trajectory distributions — enabling new behaviors where everything else fails. Human data supplies semantics/diversity; small robot grounding anchors it into executable control. (Flow-matching adjacent.) Hardware co-design closes the rest: EMMA (shared Xformer: Nav/Phase/EEF/Joints heads + Aria Glasses/ViperX Arms/AgileX Tracer/Realsense D405 + retargeting) = zero-shot mobile nav. DexUMi (hand exo-skeletons) boosts throughput: 11 → 36 → 51 trajectories in 15 min (teleop vs bare hand). H2R proves cross-embodiment works: robot primitives + human semantic composition (“big items bottom, small top”) → 8× autonomous toolbox packing. EgoVerse scales the data flywheel: 79,692 episodes / 1,362 h / 240 scenes / 1,965 tasks. Multi-lab → EC2+Ray+EgoDB (dense language). Consortium (GT/Stanford/UCSD/ETH) shows +4× autonomous when mixed with in-domain robot data. EgoScale delivers the recipe: 20,854 h human pre-training → 50 h human + 4 h aligned mid-training → one-shot dexterous post-training (syringe/tong/unscrew/fold). Scaling laws: operators + scene diversity >> raw demonstrations (fixed budget). Long-horizon tasks mirror LLM context scaling — subtask explosion demands exponential behavioral diversity. DreamDojo turns human data into a behavior/motion world model: pre (In-lab/EgoDex/HV) → robot post (GR-1/G1/AgiBot/YAM) → autoregressive distillation → Student for eval/planning/teleop/unseen envs. Human data = “Internet of the physical world” — when properly grounded. Eval crisis remains real: MSE/validation loss correlates weakly (multimodal actions invalidate single-target assumptions). Need closed-loop + dense procedural language (“right pinky rotate bottle 90° CW”) for entropy reduction. Missing: tactile, hesitation, collaboration. Hot takes: Diversity > repetition. We generate data faster than we can study it. Full-stack hardware+algo+data co-design is non-negotiable. Omni-models enable in-context preference. Sim2real still hard. Observability alignment is first-class. Robotics is no longer about collecting more data — it is about aligning embodiment manifolds. Human semantics + robot grounding = executable physical intelligence. The Great Parallel is no longer theory. It’s engineering.

junfanzhu98's tweet photo. 🐝 @saturdayrobotic Robotics & World Models Reading Club 08 Recap: Embodied Human Data as the “Internet of Motion and Behavior” keynote @ryan_punamiya, hosts @junfanzhu98, @aurorafeng_01.
Great Parallel: Egocentric Human Data = Internet for Robot Foundation Models 🤖
Jim Fan’s “Robotics’ Endgame” nails it: LLM pipeline (Pre-Train “Simulating” → SFT “Aligning” → Reasoning RL “Surpassing”) perfectly mirrors robotics World Modeling → Action Fine-Tuning → Physical RL. Egocentric video + pose + language = the scalable “human experience” corpus robots desperately need.

Bottlenecks are brutal: no counterfactuals, Swiss-cheese coverage gaps, teleop skill caps, and bounded human thought (only deliberate actions; subconscious micro-adjustments and collaboration missing). Early methods relied on point tracks (MotionTracks/ATM), value functions (VIP/V-PTR on Ego4D 18k scenes + Bridge 3k clips/150k trans), and repr. learning (MVP/R3M: time-contrastive + video-lang alignment + L1 sparsity). Still brittle multi-stage pipelines.

Mimicplay (2023) fixed the bridge: multi-view human play → latent planner 𝒫 (goal g_t^h + current o_t^h) → GMM decoder → 3D hand traj l_t. Stage 2 freezes 𝒫 and adds tiny robot data (wrist/proprio) to train policy π. Result: true zero-shot testing from human goals. This lets robots learn complex play without hand-crafted labels.

EgoMimic (ICRA’25) goes further: treat human hand as “just another robot.” Unified co-trained ACT policy: masked obs + hand p_t^H + wrist + robot ^R p_t / ^R R_t → shared vision encoder + norm layers → ACT trunk → Cartesian (3D pose) + joint losses. Human/robot/shared streams in one model.

The 4× embodiment gap (kinematic DoF/morphology, kinodynamic speed, tactile sensing, visual + partial observability) explains why naïve transfer fails. Human pretraining = monocular noisy pose/occlusions; robot deployment = rich proprio/calibrated sensing.

EgoBridge solves alignment with Joint Optimal Transport on latent+action distributions. Soft supervision via cost function retains geometry and marginals. KL/MMD baselines collapse (pick/place clusters disjoint; W₂ drops 8.704 → ~0). It aligns not just representations, but controllable trajectory distributions — enabling new behaviors where everything else fails. Human data supplies semantics/diversity; small robot grounding anchors it into executable control. (Flow-matching adjacent.)

Hardware co-design closes the rest: EMMA (shared Xformer: Nav/Phase/EEF/Joints heads + Aria Glasses/ViperX Arms/AgileX Tracer/Realsense D405 + retargeting) = zero-shot mobile nav. DexUMi (hand exo-skeletons) boosts throughput: 11 → 36 → 51 trajectories in 15 min (teleop vs bare hand).

H2R proves cross-embodiment works: robot primitives + human semantic composition (“big items bottom, small top”) → 8× autonomous toolbox packing.

EgoVerse scales the data flywheel: 79,692 episodes / 1,362 h / 240 scenes / 1,965 tasks. Multi-lab → EC2+Ray+EgoDB (dense language). Consortium (GT/Stanford/UCSD/ETH) shows +4× autonomous when mixed with in-domain robot data.

EgoScale delivers the recipe: 20,854 h human pre-training → 50 h human + 4 h aligned mid-training → one-shot dexterous post-training (syringe/tong/unscrew/fold). Scaling laws: operators + scene diversity >> raw demonstrations (fixed budget). Long-horizon tasks mirror LLM context scaling — subtask explosion demands exponential behavioral diversity.

DreamDojo turns human data into a behavior/motion world model: pre (In-lab/EgoDex/HV) → robot post (GR-1/G1/AgiBot/YAM) → autoregressive distillation → Student for eval/planning/teleop/unseen envs. Human data = “Internet of the physical world” — when properly grounded.

Eval crisis remains real: MSE/validation loss correlates weakly (multimodal actions invalidate single-target assumptions). Need closed-loop + dense procedural language (“right pinky rotate bottle 90° CW”) for entropy reduction. Missing: tactile, hesitation, collaboration.

Hot takes: Diversity > repetition. We generate data faster than we can study it. Full-stack hardware+algo+data co-design is non-negotiable. Omni-models enable in-context preference. Sim2real still hard. Observability alignment is first-class.

Robotics is no longer about collecting more data — it is about aligning embodiment manifolds. Human semantics + robot grounding = executable physical intelligence. The Great Parallel is no longer theory. It’s engineering.

15K

136

Yash @coldifl

28 days ago

This is a great watch! Scaling data (what type of data?) and building the infrastructure required for it, is one of the biggest bottlenecks in robotics. Extra cool that egoscale was one of the first papers that validated our thesis. https://t.co/kjMqzekxNT

132

Yash @coldifl

28 days ago

@thejesonlee would love to join!! Building general purpose robotics (current YC batch)

Yash @coldifl

28 days ago

Awesome thesis! Super excited to launch Intelligence Factory soon

Lukas Ziegler

@lukas_m_ziegler

28 days ago

🚨 BREAKING: Genesis AI has just launched GENE-26.5, what the company is calling the first robotic brain to give robots human-level physical manipulation capabilities. The demo video alone is extraordinary, robots cooking 20-step meals, solving Rubik's Cubes mid-air, playing piano, and conducting lab experiments with delicate instrumentation. All with human-level dexterity. @gs_ai_ has built a dexterous robotic hand that exactly mirrors the human hand, paired with a data collection glove with tactile-sensing electronic skin. When a human wears the glove, every movement maps 1:1 to the robotic hand. This closes the embodiment gap. In this case human skill transfers DIRECTLY to a robot at scale. The economics are game-changing too, the glove is 100x cheaper than typical options and delivers 5x greater data collection efficiency vs traditional teleoperation. That makes continuous large-scale robotics training viable for the first time. The company has raised $ 105M in seed funding backed by Eclipse, Khosla Ventures and Bpifrance, with Eric Schmidt and Xavier Niel among the strategic angels. And the first general-purpose robot is coming soon. Co-founded by @zhou_xian_ and @theo_gervet, this is a full-stack robotics company controlling every layer, AI, hardware, simulation, and data. That's a serious moat. ~~ ♻️ Join the weekly robotics newsletter, and never miss any news → https://t.co/GoA3ZuwoPB

175

12K

Yash @coldifl

28 days ago

@chris_j_paxton @gs_ai_ This makes sense for pre-trained models, but post-training is still unsolved. I imagine when they begin deploying, they’d see multiple edge-cases and the data flywheel approach takes months to hit metrics that a human would do

Yash @coldifl

28 days ago

@sir_aymansaleh Appreciate it😂😂

Yash @coldifl

about 2 months ago

@chris_j_paxton Egocentric data is not enough. That is one of the modalities, you cannot solve generalized robotics without action and tactile information

Yash @coldifl

about 2 months ago

The way we learn any task - by seeing, doing, and feeling - is how we train robots. We're building general intelligence that works on any robot and fits seamlessly within our world. In the coming weeks, I will share more about our approach as we deploy our across verticals.(2/2)

Yash @coldifl

about 2 months ago

Intelligence Factory is backed by @ycombinator and @CommaCapital . With hardware converging, Intelligence is the biggest bottleneck. Our world was built for humans and we're giving robots the ability to reason and act like us. (1/2)

Yash

@coldifl

Last Seen Users on Sotwe

Trends for you

Most Popular Users