If you are interested in learning more about Genesis World, and our thoughts on the role of simulation for robotics research going forward, see our technical blog post: https://t.co/f1VOCtuleV
@lukas_m_ziegler Real egocentric + teleop data from actual German pro chefs β our own end-to-end sensorimotor model that masters complex dexterous tasks , starting with full burger assembly in existing restaurants.
Hey Mikhail, this is close to what weβre exploring. We have access to professional kitchen/hospitality environments in Germany, skilled operators, and repeatable real-world tasks. We could potentially support both teleoperation data production and egocentric video data collection if the hardware/setup is defined.
Genesis AIβs cooking demo is so special because their physics engine actually gets soft materials and delicate contact right.
Eggs, tomatoes etc. the kind of fine manipulation where most simulators (including Isaac) still feel artificial.
Huge advantage for training real dexterous robots.
This is so impressive!
We just founded an egocentric video company that captures exactly this kind of high-quality, real-world manipulation data β recorded with professional chefs and skilled workers in actual German fine-dining kitchens and manufacturing environments. Perfect for training the next Generation of VLAs!
100%. Pretraining is a race to the bottom, but verified post-training data with proven 3-9s reliability is still wide open.
Weβre already doing the hard part: real expert egocentric data from German Facharbeiter + chefs, daggered with full hand-pose, depth and action annotation. Getting the Team to run a Robot and posttrain is a different beast though. But well do it anywsys
Introducing SubQ - a major breakthrough in LLM intelligence.
It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA),
And the first frontier model with a 12 million token context window which is:
- 52x faster than FlashAttention at 1MM tokens
- Less than 5% the cost of Opus
Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention).
Only a small fraction actually matter.
@subquadratic finds and focuses only on the ones that do.
That's nearly 1,000x less compute and a new way for LLMs to scale.
We see it the same way. Thereβs clearly a lot of demand for this kind of data, but the supply of unverified or lower quality data has grown very fast, pushing prices down for commodity stuff.
Weβre trying a different route. We focus on highly curated data from real expert environments in Germany (skilled workers and professional kitchens) and put a lot of effort into a rigorous annotation pipeline (hand pose, depth, action boundaries, object states, etc.).
What do you think about this approach? Do you see it holding up in the current market, or do you think even high-quality verified data is coming under pressure?
@aaronwetzler@YorkYang5050 Right, we have DPVO for 6-DoF device pose in the pipeline, DROID-SLAM as alternative. Agreed itβs essential for lifting 2D keypoints into world-frame 3D. Any method youβd recommend for monocular SLAM on egocentric wide-angle footage?