@MarkSchmidtUBC Thanks for sharing! His work and books have also been influential to me and I was always amazed by all the cross disciplinary connections he has made. Hearing your stories has made me realize even more so his contributions across many areas and what he was like as a person.
Applications for the JHU DSAI postdoctoral fellowship are open until January 26, 2026.
If you are interested in working with me in the intersection of Optimization and ML, please apply and reach out!
We will some extra funds to hire world-class PhD students and PDFs; if you know such folks who are also interested in topics I care about (RL theory, and now LLM reasoning theory), please send them my way (or if you are one of those, please DM me).
Hi RL Enthusiasts!
RLC is coming to Montreal, Quebec: Aug 16–19, 2026.
CFP is out now: https://t.co/sNld9gaptm
Abstract: Mar 1
Submission: Mar 5 AOE
Submit your best work and please share widely!
#RLC#MachineLearning#AI#LLMs#RLHF#ReinforcementLearning#Research
@ZhenghaoXu0@konstmish This paper is definitely insightful and I've usually had the view that PL slightly generalizes strong convexity tho I do agree that a bounded set of minimizers is strong so I don't know if it's fair to say that smoothness removes all interesting cases to consider PL
@ZhenghaoXu0@konstmish I couldn't help think about linear regression which is PL but will not have a unique solution in general. And in this case the solution set is unbounded. Yet gradient descent will converge to a solution with a small norm.
I'll be at NeurIPS in San Diego, happy to chat and meetup! If you're interested in optimization or the Polyak stepsize come check out our poster with @bremen79!
@TangerineHelps@orlemmalad App was down for me as well two days ago on Android and now I can't login on desktop or android! Hard to believe there have been no outage reports. What is happening?
Our paper on Pseudo-Asynchronous Local SGD is accepted at TMLR!
Developed at @MSFTResearch, it introduces a semi-synchronous training strategy that pairs well with methods like DiLoCo.
Clean code & camera-ready coming soon. Thanks to my co-authors!
https://t.co/ywYMiQOpmO
@damekdavis@FSchaipp@SurbhiGoel_@LambdaAPI Bandaids all the way down. In all seriousness tho, it seems like it might be difficult to find a more elegant way to all these issues because of monetary and opportunity costs? Especially if there is pressure to produce the next best model.
I am at ICCOPT this week, at USC. Let's meet if you want to chat about optimization for machine learning and/or you are interested in working as a post-doc with me
I’m also excited to be presenting this work (https://t.co/EHLDcLc2iC) at ICCOPT at USC. Theory aside there are some applications that may interest ppl in RL, games, and performative prediction. Let me know if you are in the area and want to chat!
On my way to ICCOPT I decided to give PEPit (https://t.co/mUm5ldXclU) a try, and I wish I had used it sooner. In just a few hours I was able to confirm our theoretical results in our recent paper and I was able to get intuition that originally took me months without using it. 1/N
These results also hint that our condition on the error is quite tight! As demonstrated by the bold red curve never crossing below the dotted red line for any stepsize. Looking forward to using this tool for other algorithms!
You can view a special case of our results using PEPit above. The main takeaway is that managing relative error in VIs is fundamentally more difficult than scalar min. For scalar min there is a step that can get you to converge for error < 1. But this is not true in VIs!
@orvieto_antonio There is an example with a simulation in Amir Beck's book "First-Order Methods in Optimization" demonstrating mirror descent out-performing it's Euclidean version.
@orvieto_antonio E.g. projecting onto strategy constraints in games with sequential decision making becomes as easy as projecting onto a simplex with a "dilated" mirror map that allows one to break up the projection recursively.
@orvieto_antonio rates aside, one practical advantage I've found to be useful that I find is not often discussed is that a projection with a specific mirror map can be much easier than the naive orthogonal one.