I’m excited to share that Bloom, where we ran a four week study on LLM health coaching, just won a Best Paper Award at CHI! 🏆
Paper: https://t.co/MRSQntEosA
Website: https://t.co/OOfGWqhcTf
Interest form: https://t.co/pkTxc3Lpua
Come see my talk! https://t.co/VfJFT7nXnf
[1/11]
People are increasingly worried that AI tools make us overreliant.
But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task.
In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not.
(1/9)
The next frontier of AI is not only more capable model; it is an AI that *humans* can meaningfully live and work with :)
With all students in my cs329x Human-Centered LLM class, we present 60+ pages of insights for developing Human-Centered LLMs (HCLLMs), from design & data sourcing to training, eval & deployment 🧵
Can a language model learn, end-to-end, what to keep in its own KV cache and what to throw away? Can it learn to forget while it learns to reason?
Deep learning's central lesson: capability emerges from end-to-end optimization, not heuristics/strong inductive biases. But for efficiency, we rely heavily on hand-designed approaches.
🗑️ Introducing Neural Garbage Collection (NGC): we train a language model to jointly reason and manage its own KV cache, using reinforcement learning with outcome-based task reward alone. No SFT, no proxy objectives, no summarization in natural language.
New paper with @jubayer_hamid, Emily Fox, and @noahdgoodman!
@FerryLee_AIPOCH The app integrates with Apple's HealthKit API so while we used Apple Watches in our study, the platform itself is not tied to any particular wearable! If your wearable/smart device can read/write to HealthKit, Bloom can read that data too
I’m excited to share that Bloom, where we ran a four week study on LLM health coaching, just won a Best Paper Award at CHI! 🏆
Paper: https://t.co/MRSQntEosA
Website: https://t.co/OOfGWqhcTf
Interest form: https://t.co/pkTxc3Lpua
Come see my talk! https://t.co/VfJFT7nXnf
[1/11]
If you’re at CHI, come see our presentation!
https://t.co/XjlWxaw8ql
And if you’re interested in chatting more, drop me a line at [email protected]. We’d love to hear from you! [11/11]
We’re actively working on releasing Bloom to the public. If you’d like to try it out, please fill out our interest form: https://t.co/QihNodi7Qt
If you’re interested in building on Bloom, our code is open source https://t.co/ngVIoDZmVQ [10/11]
✨I'm on the research scientist and postdoc job market! I'll be graduating from my PhD this academic year with a thesis that focuses on reinforcement learning and healthcare. ✨
In a user study with 16 participants, we find that GPTCoach can adhere to motivational interviewing principles and contextualize a user's wearable data to their unique circumstances. Participants also appreciated its supportive and non-judgmental tone.
In a counterfactual comparison to vanilla GPT4, GPTCoach is more consistent with motivational interviewing, asking more open-ended questions and giving advice with permission.
We built GPTCoach, a GPT4-based chatbot that implements an evidence-based health coaching program, uses counseling strategies from motivational interviewing, and can query and visualize a user’s health data from a wearable through tool use.
Through formative interviews with 22 participants, we learned that *all* health experts adopted a facilitative approach that did not give unsolicited advice. Notably, this contrasts with how current LLMs are trained to answer questions and give advice.