Excited to present at the first #AISTATS2025 poster session on May 3!
Ever wondered how LLMs can generalize to new tasks in-context despite only training on token completion? We formalize this phenomenon as "task shift" and investigate a linear version: https://t.co/RoYippuZVi
@guilhermeotina@TmlrOrg Yep! Most methods which infer group structure use properties of the model representation; we argue feature learning is key for understanding & interpreting them (e.g. Izmailov 2022).
Our TMLR paper is more specific to unbalanced LLR, but feature learning papers are in the works!
The updated version of this paper has been accepted at @TmlrOrg π¨π Very excited about implications of our results for SOTA robustness algorithms & understanding spurious correlations more generally. Journal version link: https://t.co/XkLMSbw5ua
Heading to #ICLR2025 to present our SCSL workshop paper on understanding how last-layer retraining methods mitigate spurious correlations! https://t.co/6kQ1HG0WVI
Stop by on Monday, April 28 to chat and learn more π
Multimodal reasoning with Phi-4-reasoning-vision, new work on scaling LLM inference, benchmarking AI agents in network operations, cinematic video generation, adaptive evaluation for LLMs, and using AI to improve individual and population health. https://t.co/9Y0SyTlG5W
It's been the privilege of my career to help build the newest Phi series model from @MSFTResearch!
Phi-4-reasoning-vision-15B is open-weight & competitive on perf with 10X less compute/tokens.
Read the blog for math and CUA case studies, hybrid reasoning, data insights, & more!
Vision-language models improve multimodal systems, but can make them slower, costlier, and harder to deploy. Learn how Phi-4-reasoning-vision-15B, a compact and fast multimodal reasoning model, blends strengths of different methods while reducing their limits: https://t.co/jP5L3AXRzX
Over the holidays, I stress-tested the AI coding hype by doing something concrete: I built a college football simulator game from scratch to see if agents actually deliver. Hereβs what I learned π
Misc takeaways:
β’ Copilot + GitHub was far more useful than I expected
β’ Keeping code style consistent across humans + agents is painful
β’ Overall: Claude was best for agentic coding; Gemini best for interactive pair-programming