Kaizhong

@fgkd1a

MRes&PhD @ The Hamlyn Centre, Imperial College London

London, England

Joined March 2020

352 Following

18 Followers

26 Posts

fgkd1a retweeted

Jim Fan

@DrJimFan

5 days ago

Today, we enable AutoResearch in the physical world for the first time! Introducing ENPIRE: we give 8 Codex agents a fleet of robots, an allocation of GPUs, and generous token budget. We set them free with a simple goal: solve the task as quickly as possible, keep the robots busy but stay safe, don't waste precious compute. Make no mistake. Then humans step aside and our watch begins. The robot fleet starts to come alive: they learn to look for visual clues, reset the scene, practice novel skills, tinker with control stack, read papers online, debate, reflect, get stuck, and try again directly on the hardware. All we did is to give Codex an API to the world of atoms, and the rest is emergence. ENPIRE is able to solve high-precision tasks like tying zip-ties, organizing fine pins, and installing GPUs all by itself. We also discovered a new type of "physical scaling": 8 robots exploring in parallel improves significantly faster than fewer ones. A part of our NVIDIA GEAR lab now self-improves tirelessly over night. We just read the reports in the morning. /goal: we all take a holiday and Jensen wouldn't even notice ;) We will be open-sourcing everything, so you can host your self-running robot lab at home too! Deep dive in the thread:

169

567

628K

fgkd1a retweeted

Furong Huang

@furongh

7 days ago

Hot take: robots should not dream in pixels. Pixels are too low-level. Latents are too opaque. μ₀ predicts a third thing: 3D motion traces. On real robots, it beats π₀.₅ — with ~1/100 the data scale and no action labels for world-model pretraining. 🧵 https://t.co/UfmrqNlBtw (this video features voiceover narration)

853

115

762

108K

fgkd1a retweeted

Salim Azak ICRA2026 🛫🛬🏠

@salimazak

27 days ago

Vision and Robotics for Embodied AI What an incredible meeting at the intersection of vision and robotics! Would have absolutely loved to attend and be part of these discussions. Really inspiring to see so many leading researchers from embodied AI, robotics, and computer vision coming together 💪 #EmbodiedAI #Robotics #ComputerVision @mapo1 @dimadamen @JitendraMalikCV Vladlen Koltun @andyzengineer @CordeliaSchmid @GeorgiaChal @danfei_xu @SiyuTang3 @yukez @ylecun @CSProfKGD and many other outstanding researchers. https://t.co/rDckL7sfK3

salimazak's tweet photo. Vision and Robotics for Embodied AI

What an incredible meeting at the intersection of vision and robotics!

Would have absolutely loved to attend and be part of these discussions. Really inspiring to see so many leading researchers from embodied AI, robotics, and computer vision coming together 💪

#EmbodiedAI #Robotics #ComputerVision

@mapo1 @dimadamen @JitendraMalikCV Vladlen Koltun @andyzengineer @CordeliaSchmid @GeorgiaChal @danfei_xu @SiyuTang3 @yukez @ylecun @CSProfKGD and many other outstanding researchers.

https://t.co/rDckL7sfK3

fgkd1a retweeted

Who to follow

ハェフィシェフ

@ZQtLk2Vl6pUrxMc

I like AI research, that's it🤷

Akash Sharma

@akashshrm02

RS @ Frontier AI & Robotics (FAR), Amazon | PhD @CMU_Robotics Multimodal robot perception | Dexterous manipulation | Robot learning

Mika Ackermann

@mikaackermann9

Researcher in disguise. Follow me back.

fgkd1a retweeted

Stephen James

@stepjamUK

4 months ago

𝗟𝗼𝗻𝗴-𝗵𝗼𝗿𝗶𝘇𝗼𝗻 𝗿𝗼𝗯𝗼𝘁𝗶𝗰𝘀 𝗵𝗮𝘀 𝗮𝗻 𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻 𝗱𝗲𝗰𝗮𝘆 𝗽𝗿𝗼𝗯𝗹𝗲𝗺. Most imitation learning models hit a "memory ceiling" as task length increases. Policies struggle to distil smaller, intermediate goals from the larger mission - so as execution stretches on, the robot loses the thread. It keeps moving, but the movements no longer connect to any meaningful objective. The result is policy drift: technically correct actions that make no sense for the goal. Researchers from @deepmind and @cmu propose BPP, an approach that addresses this by conditioning policies on milestone keyframes detected by a Vision-Language Model (VLM). Instead of forcing a model to digest every raw frame in a massive history window, BPP anchors execution to the specific visual "Big Picture" milestones that define success. Key Systems Advancements: • 𝟳𝟬% 𝗛𝗶𝗴𝗵𝗲𝗿 𝗦𝘂𝗰𝗰𝗲𝘀𝘀 𝗥𝗮𝘁𝗲𝘀: Significant performance gains in real-world manipulation tasks that require deep history conditioning • 𝗩𝗟𝗠-𝗚𝘂𝗶𝗱𝗲𝗱 𝗔𝗻𝗰𝗵𝗼𝗿𝗶𝗻𝗴: Uses the reasoning power of VLMs to identify and remember the critical stages of an operation. • 𝗦𝘆𝘀𝘁𝗲𝗺 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆: Reduces the computational tax of processing long histories by focusing on semantic milestones rather than raw motor paths. The key insight is that long-horizon tasks aren't one problem, they're many smaller ones chained together. Rather than training one large model to handle everything, the better path is training many specialised models, each responsible for a distinct milestone, and composing them into a high-level goal. @Neuracore_AI is built for exactly this. Teams can spin up and train multiple specialised models in parallel and rapidly iterate across all of them simultaneously. Instead of waiting on a single monolithic model, you build a library of targeted policies and orchestrate them toward the bigger objective.

fgkd1a retweeted

gaut

@0xgaut

8 months ago

"hey dude it's 10:23am you're late to work"

395

255

170K

fgkd1a retweeted

Keenan Crane

@keenanisalive

10 months ago

“Everyone knows” what an autoencoder is… but there's an important complementary picture missing from most introductory material. In short: we emphasize how autoencoders are implemented—but not always what they represent (and some of the implications of that representation).🧵

keenanisalive's tweet photo. “Everyone knows” what an autoencoder is… but there's an important complementary picture missing from most introductory material.

In short: we emphasize how autoencoders are implemented—but not always what they represent (and some of the implications of that representation).🧵 https://t.co/x5llPmKJCH

425

557K

fgkd1a retweeted

Dhruv Shah

@shahdhruv_

over 1 year ago

Excited to share what we've been cooking @GoogleDeepMind to bring Gemini into the physical world! 🦾🧠 Here is an uncut interaction of the Gemini Robotics VLA helping me clean up my desk. 🔊 Interacting with a robot through voice is the most surreal experience, and parallels the giddy excitement I had playing with early LLMs that is hard to put in words. You've got to try it to believe it! Over the next few days, we'll be sharing more such interactions that go beyond shiny demos and expose raw capabilities of our models. Stay tuned for more raw videos! Tech Report: https://t.co/83NBmieIHZ Blog: https://t.co/7VINxV3D3y YouTube Playlist: https://t.co/pFLdQLDtiA

770

103

244

69K

fgkd1a retweeted

Bruno Mlodozeniec

@brunorganised

over 1 year ago

Diffusion models are so ubiquitous, but it's difficult to find an introduction that is concise, simple and comprehensive. My supervisor Rich Turner (with me & some other students) has written an introduction to diffusion models that fills this gap: https://t.co/c9fBSXMMtl

635

112

762

55K

fgkd1a retweeted

Ted Xiao

@xiao_ted

almost 2 years ago

The field of Robotics + AI is at a very interesting moment in time right now because we’re seeing the evolution of how optimism from the last 10 years of research is carrying over to new frontiers: towards commercialization, towards more complex morphologies and applications, towards dreams of generality usually reserved for “AGI research”. There are now big questions around how historic trends will project onto the current robotics + AI wave. How applicable are the cyclical patterns we’ve seen play out in frontier modeling? In RL + search? In AV commercialization? In agent/code GenAI? In SaaS customer flywheels? In hardware startups tackling integration challenges? But more interesting is that there are even big questions around how actual *robotics research* applies to these new frontiers! There’s been a lot of exciting progress in the last years on training large multitask robot policies, on sim2real policy learning, on fleet-scale robot data collection, on learning robust and generalizable policies — but most of this research targeted embodiments and application complexities that are substantially simpler than where goalposts are today with complex manipulation, humanoids, and commercial production requirements. I think this last point is especially important and undervalued. A lot of folks new to robot learning are assuming that the same scaling laws and approaches that have worked elsewhere will easily be applied in robotics: self-play in AlphaGo, world model learning in video generation, sim2real in locomotion, search in CodeGen/math, exploration in Minecraft. These are indeed very promising and exciting research areas, but my main advice when asked about these directions is that robotics has a very different set of operating constraints and comparative challenges than other domains. Moravec’s Paradox applies to technical algorithms as well: what methods excel at in one domain they may struggle with in robotics, and where methods face challenges in other domains they may scale better in robotics.

104

13K

fgkd1a retweeted

Systems control @Lugh562007

almost 2 years ago

Control in Olympics

483

115

49K

fgkd1a retweeted

Ana Cruz Ruiz @Ana_CruzRuiz

about 2 years ago

In 2021, my father was diagnosed with a benign brain tumor & hydrocephalus. As serious as this sounds, technologies do exist to repair & recover (radiosurgery, shunt surgery, etc.). Sadly, none were available in Honduras. Long story short, after thousands of dollars, treatment abroad, & help from family contacts, he received the care he needed & is back to being himself. My dad *is the exception*; most people in places like Honduras (5/8 billion people worldwide🌎) will deteriorate or die from perfectly treatable conditions. Do we, as engineers, have any role to play? Should we, & how can we, design technologies to improve access to safe & affordable surgery? Really looking forward to this conversation! @ImperialRobots @ICLHamlynRobots

fgkd1a retweeted

Lena Maier-Hein New: @lena-maier-hein.bsky.social @lena_maierhein

about 2 years ago

🔥Med-Gemini looks quite impressive: https://t.co/2zC9Eealfz I was particularly surprised to see surgical applications.

lena_maierhein's tweet photo. 🔥Med-Gemini looks quite impressive:

https://t.co/2zC9Eealfz

I was particularly surprised to see surgical applications. https://t.co/BiSRmBBCiH

fgkd1a retweeted

Mathias Unberath @MathiasUnberath

about 2 years ago

Need to "segment anything" in 3D medical images? Yes? Want quick reaction to prompts, like **volumetric segmentation in ~8 ms**? Yes? Consider FastSAM3D: https://t.co/yvu5UTQAQM, ~500x/9x speedup over 2D and 3D SAMs, respectively. Code: https://t.co/Wm8lz0TsNI @JHUCompSci

MathiasUnberath's tweet photo. Need to "segment anything" in 3D medical images? Yes?

Want quick reaction to prompts, like **volumetric segmentation in ~8 ms**? Yes?

Consider FastSAM3D: https://t.co/yvu5UTQAQM, ~500x/9x speedup over 2D and 3D SAMs, respectively.

Code: https://t.co/Wm8lz0TsNI

@JHUCompSci https://t.co/kZtqr78BNX

119

10K

fgkd1a retweeted

Rocky Duan @rocky_duan

over 2 years ago

Very excited to share what our research team at Covariant has been working on: RFM-1, our latest Robotics Foundation Model. Built on top of high-quality multimodal data the Covariant robotics fleet has collected over the years, this project embodies our commitment to pushing the boundaries of how AI can interact with and understand the physical world. We're eager to share this milestone with the community and invite feedback and collaboration. Read more at https://t.co/Cc3Wmyk2N7

155

32K

fgkd1a retweeted

Daniel van Strien

@vanstriendaniel

over 2 years ago

BioMistral is a new 7B foundation model for medical domains, based on Mistral and further trained PubMed Central. - top open-source medical Large Language Model (LLM) in its weight class - Apache License - includes base models, fine tunes, and quantized versions.

vanstriendaniel's tweet photo. BioMistral is a new 7B foundation model for medical domains, based on Mistral and further trained PubMed Central.
- top open-source medical Large Language Model (LLM) in its weight class
- Apache License
- includes base models, fine tunes, and quantized versions. https://t.co/RoXyFBRMDj

220

707

146K

fgkd1a retweeted

Yuxuan Shu @yuxuan_shu

over 2 years ago

📢 Having synthetic negatives in self-supervised contrastive learning is important, but how to add them? Check out our preprint paper "Unsupervised hard Negative Augmentation for contrastive learning"! Paper: https://t.co/NcBQLqDnhg Code: https://t.co/bOoOrMk4xZ #NLP

387

fgkd1a retweeted

Zipeng Fu

@zipengfu

over 2 years ago · Stanford

Mobile ALOHA 🏄 is coming soon! Special thanks to @tonyzzhao for throwing random objects into the scene, and @chelseabfinn for the heavy pot (> 3 lbs) ! Stay tuned!

366

87K

fgkd1a retweeted

elvis

@omarsar0

almost 3 years ago

Create Your Own Custom LLM Chatbot Impressive step-by-step tutorial explaining how to choose the best LLM and the components needed for building your own custom LLM-powered chatbot. @abacusai offers one of the best solutions that I've used to quickly build custom LLM chatbots. The foundational parts discussed in the tutorial are: • Data sources - data warehouses, data lakes, and a range of databases are supported. You can also transform data within the platform as needed. There is even an AI agent that can assist with operating on data. • Chunk size - the chunk size determines how to split the text; this affects performance and varies by use case. You should also think about the overlap between adjacent chunks so information is not abruptly cut off. • Embedding technique and document retriever - the chunks of texts are embedded and stored in a vector database that helps you build a powerful retriever that the LLM interacts with to ensure it's using the relevant information to answer questions. This is particularly useful for a lot of knowledge-intensive use cases. • LLM - the synthesizing capabilities of LLMs combined with the retrievers enables a robust solution to create your custom LLM chatbots and pick the best LLM for your dataset and task. With @abacusai you can choose between models offered by Google, OpenAI, Anthropic, the open-source community, and Abacus's proprietary LLMs. I like that you can also fine-tune your own models as well so the offering is pretty comprehensive. • Automation and Evaluation - picking the right model and the right configurations is challenging. The AutoML capabilities of Abacus help in this regard. You can also upload an evaluation dataset that enables the platform to compare combinations and determine an optimal solution for your use case. Metrics include BLEU score, METEOR, and others. • Deployment & Monitoring - this whole process is iterative in nature and new data will always be available. You can deploy your LLMs and set up pipelines to keep incorporating or fine-tuning on new data. Having your solution in production means you need to continuously evaluate and monitor performance on a regular basis to ensure the solution doesn't deteriorate. Read more here: https://t.co/WcQEbqtAZa If there is enough interest, I will also be posting a full demo with a use case in an upcoming post. Stay tuned!