Our code for generating extremely long multi-session conversations with custom personas and evaluating LLMs on the LoCoMo dataset is now live at https://t.co/tpyXE6nEVJ!
Come talk to us about this work at Poster session 5 (Aug 13, 16:00 - 17:30 local time) at ACL 2024
Can LLMs keep track of very long conversations?
We evaluate 'conversational memory' of LLMs via 3 tasks on our dataset of multi-session multimodal dialogs --> LLMs struggle to remember, reason over history, draw long-range temporal/causal connections
https://t.co/JrGP7imeMh
🧵
Introducing Q Labs, a research lab focused on solving generalization.
Alongside others (SSI, Flapping Airplanes), we see data efficiency as the key problem, but we're taking an unconventional approach to solve it: a new learning algorithm approximating Solomonoff induction.
🚨 🤯 Wow! Yi Lin is an amazing researcher, who works on very hard and important problems in LLM and VLM training, RL, PEFT, Quantization, etc. -- ironically, he had several other top offers just a few months ago!
Hire him ASAP if you want to pick up a top talent (and several other affected amazing folks)!
👇👇
Tough week! I also got impacted less than 3 months after joining. Ironically, I just landed some new RL infra features the day before.
Life moves on. My past work spans RL, PEFT, Quantization, and Multimodal LLMs.
If your team is working on these areas, I’d love to connect.
I always thought the decline in fundamental AI research funding would happen because AI didn’t generate enough value to be worth the cost.
But it seems like it’s happening because it generated too much value. And the race to capture that value is taking priority.
Just remembering that a lot of this started in curiosity driven industry research labs.
We've added pi-05 to the openpi repo: pi05-base, pi05-droid, pi05-libero. Also added PyTorch training code!🔥
Instructions and code here: https://t.co/EOhNYfpq9B
This is an updated version of the model we showed cleaning kitchens and bedrooms in April: https://t.co/t09P0nJJFv
Ever wonder what it'd look like if an LLM Judge and a Reward Model had a baby? So did we, which is why we created PGRM -- the Prompt-Guided Reward Model.
TLDR: You get the instructability of an LLM judge + the calibration of an RM in a single speedy package (1/n)
I'm at ICML 🇨🇦 and I'm hiring at @databricks. Visit our booth if you're interested. My scientific focus: It's 1972 in AI, there's an AI crisis, Dijkstra isn't here to save us, and maybe RL can. Why Databricks? The long road to AGI is being paved here and we have the real evals 🧵
Can visual SSL match CLIP on VQA?
Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.
For those of you who know me, I've always been very excited to combine my two passions for basketball and CV. Our #CVPR2025 paper does this by introducing a large-scale video dataset for fine-grained skill estimation in 🏀. Paper, code & data available: https://t.co/cqyJ5cRbaU
The hardest part about finetuning LLMs is that people generally don't have high-quality labeled data. Today, @databricks introduced TAO, a new finetuning method that only needs inputs, no labels necessary. Best of all, it actually beats supervised finetuning on labeled data.
🤖 Even wonder why chatbots feel off?
We unveil REALTALK: 21 days of REAL human chats showing what AI misses: emotions, shifting personas, memory gaps.
https://t.co/cGrrMN71CE
https://t.co/SuW9sD7v4u
@Snap@saharaai@USC_ISI@nlp_usc#LLM#EmotionalIntelligence
Announcing the Open Thoughts project. We are building the best reasoning datasets out in the open.
Building off our work with Stratos, today we are releasing OpenThoughts-114k and OpenThinker-7B.
🎉 Adapt-♾ has been accepted to #ICLR2024@iclr_conf! We propose a dynamic, multi-way data selection strategy for continual VLM learning with growing instruction-tuning datasets. Stay tuned for the camera-ready version with additional results on LLMs! 🙌
🚨 Introducing Adapt-♾: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection!
▶️ New multimodal instruction tuning datasets are continuously released, often containing redundant/similar content or targeting highly varied skills (i.e., tasks).
▶️ How can we enable scalable, lifelong instruction tuning for MLLMs, where a temporal stream of multi-task, multimodal instruction-tuning datasets are continually added to the existing training pool?
▶️ We present Adapt-♾, a scalable and adaptive data selection strategy that facilitates the effective learning of MLLM on new skills while reinforcing previously learned skills over time.
📖: https://t.co/KxWHesjfVc
Thread 🧵👇
Carlini’s website will be auto-generated daily by a different LLM for the next 12 days.
We joked about this during a dinner after NeurIPS and Christmas made it happen 💫🎄
https://t.co/8GaEnMBuu8
There have been a lot of anectodes about the Llama3 series of models being harder to post-training quanitze (PTQ) than Llama2. As part of this paper, we investigated the hypothesis that the degradation from PTQ grows with the token-to-parameter ratio (TPR), .ie as you overtrain.
@anikembhavi I learnt so much from you as an intern, Ani. Always looking forward to the next wonderful thing that comes from your leadership! Wish you the very best :)