@cyberfund hi i just submitted my application, the hidden fishy one. the interview agent crached so the deck can't be uoloaded. how can i reupload my deck to you?
@stablequan@arcee_ai@ollama@UnrealEngine that’s so much cooler than current convai! Do you have a github repo? And does it support stylized customized characters?
@SebAaltonen so will there be a vulkan AI community? As cross platform and optimization oriented as vulkan, at first glance it should be considered by many small businesses or individual devs for migrating with AI. Especially products involve graphics rendering.
With a first-of-its-kind architecture, SPAR3D combines precise point cloud sampling with advanced mesh generation to deliver unprecedented control over 3D asset creation. To learn more about the underlying technology, you can read the full research paper on our blog. (3/3)
The paper introduces Genie 2, a large-scale foundation world model that can generate an endless variety of action-controllable, playable 3D environments. This is intended to enable future AI agents to be trained and evaluated in a limitless curriculum of novel worlds.
Genie 2 demonstrates various emergent capabilities, such as object interactions, complex character animation, physics simulation, and the ability to model the behavior of other agents. It can generate consistent worlds for up to a minute and supports diverse perspectives like first-person, isometric, and third-person views.
The authors suggest that Genie 2 could enable future agents to be trained and evaluated in a limitless curriculum of novel worlds, overcoming the traditional bottleneck of available training environments. It also enables rapid prototyping of diverse interactive experiences, which can accelerate the creative process for environment design and research.
full research: https://t.co/fwCdX4jIpI
Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down:
1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data.
2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation.
3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the *motion* of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset.
To sum up, given 1 human trajectory with Vision Pro
-> RoboCasa produces N (varying visuals)
-> MimicGen further augments to NxM (varying motions).
This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits.
Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are building tools to enable everyone in the ecosystem to scale up with us. Links in thread:
Sora's video quality seems impossible so I dug into how it works under the hood
it uses both diffusion (starting with noise, refining towards a desired video) and transformer architectures (handling sequential video frames)
read on 🧵