Just returned from a fascinating couple weeks in China meeting robotics teams/manufacturers and retail/Internet operators. Found @ericjang11 and @vhmth's reflections useful and wanted to share themes/tradeoffs I've been reflecting on.
Came home with more questions than answers!
Today, we're opening @Instawork Robotics Lab (IRL), connecting the robotics industry with > 10 million skilled workers. IRL is offering the first certified workforce program for the physical AI economy ... with 20,000 @Instawork Pros already certified! 👷
Read more from @alistairmbarr@BusinessInsider
https://t.co/57Jwyabf2P
Had a blast talking robot wrangling with @latimes. Robots don’t replace people. They create new jobs, just like every other tech innovation in history. Physical AI revolution is a lot more human than people think.
LLMs have literally saved dozens if not hundreds of lives already. LLMs are ready for public use.
This study was conducted on 18-month old models such as “Command R+”.
Haters be forewarned, the United States Department of Health and Human Services officially wants FDA-approved AI agents that can prescribe and treat within 2 years.
You can try and stop AI in medicine but it’s going to happen, and it’s going to be great.
@rmstein Was waiting for this and curious how this will look on desktop. Also curious about the design and if the fixed Ask Anything still feels connected to the background context.
Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life.
One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s successful trajectories, you should take your own actions and learn from the reward given by the environment. Obviously imitation learning is useful to bootstrap to nonzero pass rate initially, but once you can take reasonable trajectories, we generally avoid imitation learning because the best way to leverage the model’s own strengths (which are different from humans) is to only learn from its own trajectories. A well-accepted instantiation of this is that RL is a better way to train language models to solve math word problems compared to simple supervised finetuning on human-written chains of thought.
Similarly in life, we first bootstrap ourselves via imitation learning (school), which is very reasonable. But even after I graduated school, I had a habit of studying how other people found success and trying to imitate them. Sometimes it worked, but eventually I realized that I would never surpass the full ability of someone else because they were playing to their strengths which I didn’t have. It could be anything from a researcher doing yolo runs more successfully than me because they built the codebase themselves and I didn’t, or a non-AI example would be a soccer player keeping ball possession by leveraging strength that I didn’t have.
The lesson of doing RL on policy is that beating the teacher requires walking your own path and taking risks and rewards from the environment. For example, two things I enjoy more than the average researcher are (1) reading a lot of data, and (2) doing ablations to understand the effect of individual components in a system. Once when collecting a dataset, I spent a few days reading data and giving each human annotator personalized feedback, and after that the data turned out great and I gained valuable insight into the task I was trying to solve. Earlier this year I spent a month going back and ablating each of the decisions that I previously yolo’ed while working on deep research. It was a sizable amount of time spent, but through those experiments I learned unique lessons about what type of RL works well. Not only was leaning into my own passions more fulfilling, but I now feel like I’m on a path to carving a stronger niche for myself and my research.
In short, imitation is good and you have to do it initially. But once you’re bootstrapped enough, if you want to beat the teacher you must do on-policy RL and play to your own strengths and weaknesses :)