Catch me on @PBS discussing #VoiceFirst games and kids with @SteveAdubato! Special thanks to Honorable Judge Lexy of #KidsCourt whose hearing problems saved me from publicly losing a trial on TV๐
@wnet @VoiceSummitAI @alexadevs @amazonecho @amazon
Here: https://t.co/aZ8ltVYfnW
๐จ Last chance to apply for an internship role at Google Research in 2025 โ we have truly brilliant opportunities available for interns, and many internships have turned into highly impactful research projects.
BSc/MSc: https://t.co/9D703Wn639
PhD: https://t.co/Ikw26YdaIA
Our work on Gemini learning features is out in the world, just in time for the new school year ๐ฅณ
Give it a try, and study smarter with Gemini ๐ Congrats to the teams involved, super happy with what weโve done.
@GoogleAI@GeminiApp#GenAI#Gemini#LLM
From tailored study plans to help quizzing yourself on course materials, hereโs how Gemini can be your study partner on campus this year โ https://t.co/2ax8YgL75M
โTowards Responsible Development of Generative AI for Education: An Evaluation-Driven Approachโ is now available atย https://t.co/xQygtwSXdxย
#ICML2024: Irina Jurenka and Markus Kunesch will be demoing the LearnLM-Tutor at the GDM booth on Tues afternoon:ย https://t.co/Hif7rD7BFU
I taught a bunch of writer friends how to use Claude/gemini as an ideation partner and a marketing tool. Life changing tech. Open the door to ppl near you who wonโt teach themselves.
#GenAI#Gemini
I taught a friend who has never written code in her life how to use Claude to build a simple app and deploy it on Cloudflare today. Watching someone realize that they can now build software is a great experience.
Enable everyone to build anything.
How Alexa dropped the ball on being the top conversational system on the planet
โ
A few weeks ago OpenAI released GPT-4o ushering in a new standard for multimodal, conversational experiences with sophisticated reasoning capabilities.
Several days later, my good friends at PolyAI announced their Series C fundraising round after tremendous growth in the usage of their enterprise voice assistant.
Amid this news, a former Alexa colleague messaged me: Youโd think voice assistants would have been our forte at Alexa.
For context, I joined Alexa AI as a research scientist in early 2019. By this time, the Alexa consumer device had existed for 5 years and was already in 100M+ homes throughout the world.
In 2019, Alexa was experiencing a period of hypergrowth. Dozens of new teams sprouted every quarter, huge financial resources were invested, and senior leadership made it clear that Alexa was going to be one of Amazonโs big bets moving forward.
My team was born amidst all this with a simple charter: bring the latest and greatest in AI research into the Alexa product and ecosystem. Iโve often described our group (later dubbed the Conversational Modeling team) as Google Brain meets Alexa AI-SWAT team.
Over the course of the 2.5 years I was there, we grew from 2 to ~20 and tackled every part of the conversational systems stack.
We built the first LLMs for the organization (though back then we didnโt call them LLMs), we built knowledge grounded response generators (though we didnโt call it RAG), and we pioneered prototypes for what it would mean to make Alexa a multimodal agent in your home.
We had all the resources, talent, and momentum to become the unequivocal market leader in conversational AI. But most of that tech never saw the light of day and never received any noteworthy press.
Why?
The reality is Alexa AI was riddled with technical and bureaucratic problems.
Bad Technical Process
โ
Alexa put a huge emphasis on protecting customer data with guardrails in place to prevent leakage and access. Definitely a crucial practice, but one consequence was that the internal infrastructure for developers was agonizingly painful to work with.
It would take weeks to get access to any internal data for analysis or experiments. Data was poorly annotated. Documentation was either nonexistent or stale.
Experiments had to be run in resource-limited compute environments. Imagine trying to train a transformer model when all you can get a hold of is CPUs. Unacceptable for a company sitting on one of the largest collections of accelerated hardware in the world.
I remember on one occasion our team did an analysis demonstrating that the annotation scheme for some subset of utterance data was completely wrong, leading to incorrect data labels.
That meant for months our internal annotation team had been mislabeling thousands of data points every single day. When we attempted to get the team to change their annotation taxonomy, we discovered it would require a herculean effort to get even the smallest thing modified.
We had to get the teamโs PM onboard, then their managerโs buy-in, then submit a preliminary change request, then get that approved (a multi-month-long process end-to-end).
And most importantly, there was no immediate story for the teamโs PM to make a promotion case through fixing this issue other than โitโs scientifically the right thing to do and could lead to better models for some other team.โ No incentive meant no action taken.
Since that wasnโt our responsibility and the lift from our side wasnโt worth the effort, we closed that chapter and moved on.
For all I know, they could still be mislabeling those utterances to this day.
Fragmented Org Structures
โ
Alexaโs org structure was decentralized by design meaning there were multiple small teams working on sometimes identical problems across geographic locales.
This introduced an almost Darwinian flavor to org dynamics where teams scrambled to get their work done to avoid getting reorged and subsumed into a competing team.
The consequence was an organization plagued by antagonistic mid-managers that had little interest in collaborating for the greater good of Alexa and only wanted to preserve their own fiefdoms.
My group by design was intended to span projects, whereby we found teams that aligned with our research/product interests and urged them to collaborate on ambitious efforts. The resistance and lack of action we encountered was soul-crushing.
I remember on one occasion we were coordinating a project to scale out the large transformers model training I had been leading. This was an ambitious effort which, if done correctly, could have been the genesis of an Amazon ChatGPT (well before ChatGPT was released).
Our Alexa team met with an internal cloud team which independently was initiating similar undertakings. While the goal was to find a way to collaborate on this training infrastructure, over the course of several weeks there were many half-baked promises made which never came to fruition.
At the end of it, our team did our own thing and the sister team did their own thing. Duplicated efforts due to no shared common ground. With no data, infrastructure, or lesson sharing, this inevitably hurt the quality of produced models.
As another example, the Alexa skills ecosystem was Alexaโs attempt to apply Amazonian decentralization to the dialogue problem. Have individual teams own individual skills.
But dialogue is not conducive to that degree of separation of concerns. How can you seamlessly hand off conversational context between skills? This means endowing the system with multi-turn memory (a long-standing dream of dialogue research).
The internal design of the skills ecosystem made achieving this infeasible because each skill acted like its own independent bot. It was conversational AI by an opinionated bot committee each with its own agenda.
Product-Science Misalignment
โ
Alexa was viciously customer-focused which I believe is admirable and a principle every company should practice. Within Alexa, this meant that every engineering and science effort had to be aligned to some downstream product.
That did introduce tension for our team because we were supposed to be taking experimental bets for the platformโs future. These bets couldnโt be baked into product without hacks or shortcuts in the typical quarter as was the expectation.
So we had to constantly justify our existence to senior leadership and massage our projects with metrics that could be seen as more customer-facing.
For example, in one of our projects to build an open-domain chat system, the success metric (i.e. a single integer value representing overall conversational quality) imposed by senior leadership had no scientific grounding and was borderline impossible to achieve.
This introduced product/science conflict in every weekly meeting to track the projectโs progress leading to manager churn every few months and an eventual sunsetting of the effort.
โ
As we look forward, in the battle for the future of the conversational AI market, I still believe itโs anyoneโs game.
Today Alexa has sold 500M+ devices, which is a mind-boggling user data moat. But that alone is not enough.
Hereโs how I would organize a dialogue systems effort from the ground-up:
Invest in robust developer infrastructure especially around access to compute, data quality assurance, and streamlined data collection processes. Data and compute are the lifeblood of modern ML systems so proactively setting up this foundation is imperative.
Make LLMs the fundamental building block of the dialogue flows. In retrospect, the Alexa skills ecosystem was a premature initiative for the abilities of conversational systems at the time. I liken it to when Leap Motion created and released a developer SDK before the underlying hardware device was stable.
But with the power of modern LLMs, Iโm optimistic about redesigning a developer conversational toolkit with LLMs as their primitives.
Ensure product timelines donโt dictate science research time frames. Because things are moving so fast in the AI world, itโs hard not to feel the pressure of shipping quickly. But there are still so many unsolved problems that will take time to solve.
Of course you should conduct research aggressively, but donโt have delivery cycles measured in quarters, as this will produce inferior systems to meet deadlines.
โ
If youโre thinking about the future of multimodal conversational systems and interfaces, I would love to hear from you. Weโve got work to do!
Introducing LearnLM: our new family of models based on Gemini and fine-tuned for learning. LearnLM applies educational research to make our products โ like Search, Gemini and YouTube โ more personal, active and engaging for learners. #GoogleIO
Did any of you watch the Google I/O announcements about LearnLM?
Looking for some comments from people who work in education about any of the following:
LearnLM and its applications:
Google introduced LearnLM, a new family of models based on Gemini and fine-tuned for learning.
LearnLM is coming to products like Search, Android, Gemini, and YouTube.
The Gemini app will feature pre-made "Gems," including Learning Coach, which provides study guidance and techniques.
LearnLM in educational content and platforms:
YouTube is introducing a new feature that uses LearnLM to make educational videos more interactive, allowing users to ask questions, get explanations, or take quizzes.
In Google Classroom, they are developing ways to simplify lesson planning and tailor content to individual student needs using LearnLM.
Partnerships and collaborations:
Google is partnering with experts and institutions like Columbia Teachers College, Arizona State University, and Khan Academy to test and improve LearnLM's capabilities.
Google collaborated with MIT RAISE to develop an online course to help educators understand and use generative AI.
------
Some comments will be published in a Forbes article on this. Please provide you name, title and organisation.
@CloudBusiness9@hope_steven@MrCaffrey @scotlandlouise @deanstokes@justaguy_LT@misskwells@WhatTheTrigMath
Fresh out of io, excited to share our work on LearnLM, a family of models that are finetuned for learning and teaching, and the suite of features and products these models power!
https://t.co/bdOWqhdXxX
@Google@GoogleResearch