I am honored to introduce my new work, XShapeEnc: Training-free Spatially-Grounded Geometry Shape Encoding. This innovative approach is capable of encoding an arbitrary spatially-grounded 2D geometric shape into high-dimensional space without any training.
Key features include:
- Utilization of the classic Zernike basis for efficient encoding of shape geometry and shape pose, either jointly or separately.
- An interpretable and invertible encoding process that is rich in high-frequency details.
XShapeEnc offers wide applicability for various downstream tasks that require shape analysis.
For more details:
paper: https://t.co/qcLa6zgtEj
code: https://t.co/eFVBxTyXzI
Honored to see my work has been featured by this blog.
Deep Neural Networks: From Trustworthy Explanations to Robust Autonomous Systems https://t.co/xlJuItHhIM
🚀 [ICLR 2026] Existing text-to-audio generation (TTA) methods mainly focus on semantic correctness, yet they perform very poorly on relation-aware TTA generation. For example, current models achieve <30% audio event presence accuracy and <10% relation accuracy.
In our newly accepted ICLR 2026 paper, we introduce Aurelius, a framework that enables relation-aware TTA research at scale. Specifically, we introduce two meticulously curated corpora:
🗂 AudioEventSet — 110 audio events across 7 major classes.
🗂 AudioRelSet — 100 relations across 6 major relation types.
Based on the two corpora and the proposed data creation strategy, we can create massive (nearly unlimited) <text, audio> pairs with both
• high linguistic diversity.
• high acoustic diversity.
We release all resources to support the broader community in AI, acoustics, computer vision, and multimodal research.
📄 Paper: https://t.co/ZzDiCo5gJm
🗂 Dataset: https://t.co/Qe9fc2kPKW
💻 Code: https://t.co/52G6h6pqPP
🌐 Project Page: https://t.co/zI1gyP2yko
Huge thanks to Andrew Markham, He Liang, @_jainyash and @VibhavVineet at Microsoft Research and University of Oxford for their unwavering support.
#ICLR2026 #Multimodality
📣📣 Introducing OS-Marathon⏱️
I am proud to share this interesting project I did during my internship @Microsoft@MSFTResearch. 🥳
This work is about benchmarking computer-use agents' ability on a type of long-horizon, repetitive tasks.
✅ We focus on desktop workflows that are long-horizon and also repetitive, e.g. filling the expense system given a set of various travelling receipts.
✅ We create data, build tasks and environments and categorise them into multiple difficulty levels to enable fine-grained evaluation.
📎Project Page: https://t.co/NORAd1ptuL
📄Paper: https://t.co/UxERGlevQb
Joint work with Daphne Barretto, Yiye Chen, Nicholas Gydé, Yanan Jian, Yuhang He @HenryOxplore, Vibhav Vineet @VibhavVineet
MSR Vancouver lab is looking for a Canada based Ph.D. intern that can start the internship asap (preferrably before Christmas). The internship is about LLM pre/post train. The candidate should hold a work permit already. DM me if anyone is interested.
Microsoft Research Vancouver lab is hiring multiple interns in Canada working at AI-Driven System Design. The successful applicant is expected to start interning as early as possible. Welcome to share to whoever might be interested in. Apply through: https://t.co/OALz8l77Pb
Existing text-to-audio generation models fail to model audio events relations. We fill in this gap with a new benchmark, evaluation protocol.
Title: RiTTA: Modeling Event Relations in Text-to-Audio Generation.
Project site: https://t.co/ehCpuMP7Bf
Code: https://t.co/PnDdUQ5hmu
On my way to #ICML2024 , look forward to catch up with new friends. I am also in the job market, look for research job in multimodal audio-visual-x learning.
Honored to introduce our new work: SPEAR, receiver-to-receiver neural warping field to predict spatial acoustic effects for one position from another reference position, without requiring source position. Code: https://t.co/PvfFnh5PfM. Paper: https://t.co/jqVawL6Tdw
Build what you need and use what you build. This is a core philosophy of my research. It shifts the focus away from publishing “papers” to what really matters — impact. This thread unpacks why I think this is a successful approach to science. 1/10 Or see:
https://t.co/p3iWJ9LCzf
I was invited to be an emergency reviewer for #icml . I accepted and tried my best to give the review. After seeing the released reviews, I suddenly found up to six reviews are here for that paper. Am I not an emergency reviewer at all then?🤣