🦦🦦My Ph.D. defense on Humanistic, Pluralistic, and Coevolutionary AI Safety and Alignment is scheduled for May 8, 11:30 AM–12:30 PM PT!🦦🦦
Welcome to join online via Zoom (https://t.co/UvNX8csJjk) or in person at UW CSE Allen Center 403. Add it to Google calendar for easier reference: https://t.co/YuToHHUgI1
@TristanThrush dude, i just found out the paper from alphaxiv and it's super impressed work.
I carefully read it and find the "67".
Sometime you get 6, sometime you get 7
Really interesting piece. One nuance I’d add is that EVMbench feels more like a security evaluation benchmark with a live executable environment than an RL environment per se. The fact that tasks are interactive and programmatically graded does make it adjacent to RL-style setups, but that doesn’t necessarily mean it belongs in the same bucket as environments designed for policy improvement through repeated training rollouts.
Thrilled to announce I'm stepping up as Vice President for @uw_blockchain! 🤝
I would like to thank @LeonLeng5 for giving me this opportunity.
I joined @UW this September and was impressed by the talent, but disappointed the club hasn't been active.
Seattle deserves more attention than just SF/NY—we have strong builders here! The club has a great history since 2019, and I'm here to bring it back as the hub for hackers, researchers, and builders.
We know EigenLayer started right here at UW by the great @sreeramkannan. I'm here to make a similar change.
My strength is my connection to the space, and I'd love to bring you all in to give our passionate students a talk. DM me!
To start, we have @wangandy from @worldcoin to talk about digital identity and the future of the economy.
So excited for this opportunity! Let's build!
🚨 NEW PAPER 🚨
Excited to announce this working paper (co-authored with @hongyaoma, @ykanoria, @rajivatbarnard) looking at the extent of wash trading on @Polymarket
We used all historical trade data from @Polygon blockchain (~70 million trades) and developed a graph-based algorithm for estimating wash volume.
Whoa... Grok 4 beats o3 on our never-released benchmark: HumorBench, a non-STEM reasoning benchmark that measures humor comprehension. The task is simple: given a New Yorker Caption Contest cartoon and caption, explain the joke.
The modern economic case for public provision is not about public goods or market failures or externalities. It is about what can and can't be achieved by contracting.