๐ Internship Opportunities in AI Agents!
At @ServiceNowRSRCH, we have 4 internship positions on AI Agents โ exploring robustness, privacy & collaboration.
๐ง Applicants must be registered students at a Canadian university.
๐ Thread with details & links to apply:
๐ฌ๐๏ธ Looking for an AI intern to work on eye disease detection!
Build models that combine high-res images with clinical data to automate diagnosis.
๐จ๐ฆ Mitacs internship (8 months, full-time, ASAP)
(Must be enrolled at a Canadian university)
๐ฉ https://t.co/xP4f93br7E
#McGill
The NeurIPS Datasets & Benchmarks Track is now the Evaluations & Datasets (ED) Track. It now treats evaluation as a scientific object of study in its own right. Datasets/benchmarks still fully in scope.
๐Details: https://t.co/MN75FwQ8Ow
We look forward to your submissions!
Thoughtology paper -- the study of reasoning chains of thinking models -- is now published at TMLR. Since we wrote the paper, a lot has changed. Many more models have been released with open-weights.
1. These models are no longer thinking verbosely. GPT-OSS has crisper thoughts than Qwen3/R1.
2. GPT-OSS almost never self-verifies or tries alternate solutions.
3. Qwen3 has a large bloom step (initial solution) than R1.
Among commonalities:
4. All of them still have a problem-specific sweet spot (i.e., overthinking doesn't help)
5. Incorrect problems still have a longer chain length.
On another note, thanks to @TmlrOrg for allowing us to submit a ridiculously long paper :). 135 pages in total. We thank reviewers and AE for their time.
This is the first paper where every member of the group contributed to it! Special thanks to @saraveramarjano and @arkil_patel.
We have a documentary around it taken by @CBCNews and @binhanv, hopefully you will get to see it one day.
Thanks to @SimonsInstitute for letting us work on this during their LLM2 semester program. @IVADO_Qc for the funding, and @Mila_Quebec members for the feedback.
Full paper: https://t.co/kaTNGCv6rk
Excited to speak at the AAAI-26 Workshop on Agentic AI Benchmarks & Enterprise Tasks (Jan 26, Singapore) ๐ธ๐ฌ
As agents are rapidly productized, realistic enterprise benchmarks for capabilities and reliability are essential!
Submit: https://t.co/NYWO6Xv89b
๐๏ธ Oct 29
cc @gneubig
๐ Internship Opportunities in AI Agents!
At @ServiceNowRSRCH, we have 4 internship positions on AI Agents โ exploring robustness, privacy & collaboration.
๐ง Applicants must be registered students at a Canadian university.
๐ Thread with details & links to apply:
SLAM Labs presents Apriel-1.5-15B-Thinker ๐
An open-weights multimodal reasoning model that hits frontier-level performance with just a fraction of the compute.
๐Congratulations to all the authors for this great work -- specially to @Ahmed_Masry97 for his perseverance through the highs and lows of this project ๐
Excited to see AlignVLM accepted to #NeurIPS2025!
@ServiceNowRSRCH
Excited to announce that AlignVLM got accepted to NeurIPS! ๐๐ฅณ
Weโll be releasing the code and sharing an updated version of the paper with reviewer feedback soon.
#NeurIPS2025
๐จExciting news! Our paper โWebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generationโ is accepted for an oral presentation at EMNLP 2025! ๐ WebMMU addresses a critical gap in AI evaluation: how well can models understand and build websites? ๐งต1/n
UI-Vision vs GPT-5: Still holding the crown ๐ and far from saturation.
GPT-5 has strengths in coding and reasoning, but when it comes to computer-use tasks, itโs still awkward to rely on it alone. And our team's UI-Vision (ICML 2025) remains a key and still unbeaten multimodal eval framework for screen understanding and grounding.
What we continue to see: focused training is essential to beat our evals, and this is exactly where open-source models have been shining.
A big thanks to research teams at Microsoft, OpenCUA, and UI-Tars for actively using UI-Vision to push the limits of visual screen understanding. If you are working on VLMs or screen grounding applications for ICLR submissions, UI-Vision is the place to measure and improve your systems.
And we are only getting started: our next, UI-Vision-Grounding, is on the way๐. It brings a larger dataset that the community can make use of, harder grounding tasks, and new training recipes to help models level up in grounding abilities.
๐https://t.co/ObaK1zjD0p
๐https://t.co/NoKK387FLd
Big kudos to all our partners and collaborators who made this possible! @ServiceNowRSRCH, @turingcom, @Mila_Quebec, @PShravannayak, @EdwardJian2, @aarashfeizi, @gspandana, @PerouzT, Qinghong Lin, @chrisjpal, @_rabiulawal, @dvazquezcv, @joanrod_ai, @RajeswarSai
๐ We just released the final test split of #RepLiQA โour dataset for evaluating QA on truly unseen content!
๐ Dataset: https://t.co/VTgESfBqv2
๐ NeurIPS โ24: https://t.co/9JKgGxdWSo
Big thanks to my amazing co-authors @ @ServiceNowRSRCH ! ๐
#RAG#LLMs#NLP#QA
New to ML research? Never published at ICML? Don't miss this!
Check out the New in ML workshop at ICML 2025 โ no rejections, detailed feedback, awards, and ICML tickets for selected authors.
Deadline: June 10 (AoE)
Submit: https://t.co/xNiccKTelq
Info: https://t.co/1dBY6bnGji
Thanks @_akhaliq for sharing our work! Excited to present our next generation of SVG models, now using Reinforcement Learning from Rendering Feedback (RLRF).
๐ง We think we cracked SVG generalization with this one.
Go read the paper! https://t.co/EnSbizvWOQ
More details on the demo, code, and models coming soon! Stay tuned ๐ซ
๐ New paper from our team at @ServiceNowRSRCH!โฃ
โฃ
๐ซ๐๐ญ๐๐ซ๐ ๐ฅ๐จ๐ฐ: ๐๐๐ง๐๐ซ๐๐ญ๐ข๐ง๐ ๐๐ญ๐ซ๐ฎ๐๐ญ๐ฎ๐ซ๐๐ ๐๐จ๐ซ๐ค๐๐ฅ๐จ๐ฐ ๐๐ฎ๐ญ๐ฉ๐ฎ๐ญ๐ฌ ๐ ๐ซ๐จ๐ฆ ๐๐ค๐๐ญ๐๐ก ๐๐ฆ๐๐ ๐๐ฌโฃ
We use VLMs to turn ๐ฉ๐ข๐ฏ๐ฅ-๐ฅ๐ณ๐ข๐ธ๐ฏ ๐ด๐ฌ๐ฆ๐ต๐ค๐ฉ๐ฆ๐ด and diagrams into executable workflows. ๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ๏ธโโ๏ธโฃ
โฃ
๐https://t.co/HRU22oXQsTโฃ
๐https://t.co/2Rpp9Nwuizโฃ
#Sketch2Flow #AI #VLM
Our team has released the UI-Vision benchmark (accepted at #ICML2025) for testing GUI agent visual grounding and action prediction! ๐๐๐
๐ค Dataset: https://t.co/EWOTL0nVVF
Special thanks to the students to lead this effort, @PShravannayak and @EdwardJian2@ServiceNowRSRCH
๐ Excited to share that UI-Vision has been accepted at ICML 2025! ๐
We have also released the UI-Vision grounding datasets. Test your agents on it now! ๐
๐ค Dataset: https://t.co/GhZSHI0uVO
#ICML2025#AI#DatasetRelease#Agents