My team at @AISecurityInst studies how frontier AI shapes what we believe, decide, and feel - and we're hiring! 🚨
The role is a 6-month RA residency in London, ideal for MScs / early PhDs in ML, psych, cog/data sci
[1 June deadline]
Get a taste of our recent research below 👇
I’ve been at a small conference this week, one where the AI people have been presenting early in the week and the domain science people will be presenting later in the week.
At the end of the talks last night, the conversation turned very doomer with all the AI people talking about how well Claude Code or Codex can do hill-climbing AI research and how we (the AI people) are maybe all about to lose our jobs!
The domain science people expressed their shock at this attitude because, though Claude Code can be let loose to complete lots of banal hill-climbing AI research projects, basically no experimental science is hill-climbing or even metric driven. Most scientific fields are about much more taste-driven exploration that is incredibly difficult to make metrics for or to parameterize, and this misunderstanding from the AI community is one of the most damaging things to the realization of great science with AI. Seems like we’re actually pretty far from having AI models do that…
Over the summer, @evijit and I wrote about this (and some other things hindering AI for science) at a bit more length, and today that work is out in Patterns!
So, if you care about these problems and the real challenges in bringing AI to science in the real work, I recommend giving it a read!
📍 Can LLMs discover, abstract, and reuse higher-level tool skills across tasks?
Existing tool-use benchmarks test solving tasks with fixed tools. But real workflows contain recurring structures where efficiency comes from reusable tool compositions, not isolated calls.
We introduce SkillCraft: 126 tasks across 6 domains designed to test whether LLM agents can acquire compositional skills, not just call atomic tools.
We also propose Skill Mode, a lightweight protocol with four MCP primitives that let agents compose, verify, cache, and reuse tool chains at test time.
Our Key findings across evaluating 8 SOTA models:
⚡Skill Mode enables agents to self-discover and reuse skills, leading to higher success and efficiency than agents without it. The gains are larger for stronger models.
🧠 Stronger models (e.g., Claude) discover more generalizable skills, which transfer across tasks and even across models.
🔍 Deeper composition ≠ better — shallow, well-tested skills generalize best.
🔗 Paper: https://t.co/Vg7MPJy4FO
💻 Code: https://t.co/ZRez74QvgQ
🏠 Page: https://t.co/ukGYi5Itjr
(1/7)
Join us at #AAAI2026 (Singapore) for AIR-FM: Assessing and Improving Reliability of Foundation Models in the Real World.
📅 Mon, 26 Jan 2026 | 8:30–5:00
📍 Peridot 202 (2nd Floor)
https://t.co/tekZc54auP
If you're into agents, LLMs, or how AI interacts with software and humans, join us!
- Sat, July 19 - Workshop on Computer Use Agents @ ICML 🧠
- Thu, July 17 - the UiPath Agentic Happy Hour @ Vancouver, CA🍸
Let’s connect! 👉
#ICML2025#AIagents#Agents#CUA#wCUA#AI
🚀Introducing “StochasTok: Improving Fine-Grained Subword Understanding in LLMs”!🚀
LLMs are incredible but still struggle disproportionately with subword tasks, e.g., for character counts, wordplay, multi-digit numbers, fixing typos… Enter StochasTok, led by @anyaasims!
[1/]
Come chat to us now at the Planning and Reasoning workshop about Stochastok fixing pathologies in subword understanding in LLMs!
Finally enabling LLMs to understand how many “r”s are in strawberry 🍓🍓
📍Garnet 212-213
Paper: https://t.co/GJ0wSdzgPr
@anyaasims@klarakaleb@j_foerst@yeewhye
What a thrill to hold this month’s issue of @NatGeo magazine and open it to see the faces and words of Congolese researchers studying the Congo Basin rainforest splashed across 24 pages of this iconic publication 🌱🌍🔬
at #ICLR25? 🇸🇬 check out @cong_ml talk about our work on a novel stochastic tokenisation method, StochasTok, on Mon, 4:20 PM @ Hall 4 #6
@anyaasims@j_foerst@yeewhye + Thom Foster
🚀Announcing the Workshop on Computer Use Agents at #ICML2025 in July, Vancouver!
Join us, to advance research on AI agents performing real-world computer tasks.
🤖Call for Papers & Demos: Deadline May 18, 2025
🎙️Exciting speaker lineup announced!
✍️Interested in reviewing? Register now!
✈️Travel grants available to support participation.
Follow us for updates! #WCUA #CUA #AI #ML #ComputerUseAgents #Agents #icml2025
Website link below 👇
Couldn't agree more. "UK Research and Innovation funding in the UK fell under the previous government from 6,835 in 2018-19 to 4,900 in 2022-23". To give a concrete example (with my @UCLCS professor hat on): 4 out of 7 @UCL_DARK PhD students were funded by the Centre for Doctoral Training (CDT) in Foundational AI at @ai_ucl. @akbirkhan@LauraRuis@_robertkirk@PaglieriDavide won Best Paper Awards at international top-tier conference, made significant contributions to AI safety, expanded our understanding of how LLMs learn to reason, and built difficult evaluations of agentic capabilities of LLMs while many other benchmarks are saturating. @UCL_DARK alumni start startups (@WecoAI), work in leading AI labs like @GoogleDeepMind, @AnthropicAI, @AIatMeta, or work in government at @AISafetyInst. @UCL_DARK wouldn't be what it is today without that CDT funding.
Yet, despite the tremendous success of the @UCL Centre for Artificial Intelligence, the CDT was discontinued. @UCL_DARK now has six open positions for AI PhDs to start in Fall 2025, and it's unclear whether we will be able to make any funded offers. In turn, our lab is already significantly scaling down MSc thesis supervision, and thus not doing as much as we would like to train the next generation of AI experts.
It the UK wants to have any chance at keeping up with AI, PhD funding, in addition to securing significant compute for academic research, should be their top two main priorities. Without these, the "talent" in the talent pipeline is missing.
While we are at it, the starting salary for an assistant professor in the UK is in the range of £50K-60K which simply is not enough to attract international top faculty in AI to the UK. The third priority should be topping up AI postdoc and faculty salaries.
https://t.co/ztolePwPym
I’ve been complaining about lack of error bars in LLM papers for some time. Rather than just complaining, here’s a guide on how to do it! ⬇️
We’ve done a small Python lib that you can install… or copy-paste one file into your projects (dependencies are annoying, we get it 🙃)
📣 Jobs alert: UQ in LLMs!
We're looking to hire a Postdoctoral Fellow and a Research Engineer to work on uncertainty quantification in LLMs. The project is a collaboration between @UniofOxford (@yeewhye), @NTUsg (Luke Ong) and @NUSingapore (@WeeSunLee)
#LLMs#hiring #academic #UQ
Details ⬇️
📣 Jobs alert: UQ in LLMs!
We're looking to hire a Postdoctoral Fellow and a Research Engineer to work on uncertainty quantification in LLMs. The project is a collaboration between @UniofOxford (@yeewhye), @NTUsg (Luke Ong) and @NUSingapore (@WeeSunLee)
#LLMs#hiring #academic #UQ
Details ⬇️
Our hearts are breaking for those who have been impacted by extreme wildfires in California. Climate change is taking the places we love. Spreading awareness about climate change is one of the best ways to fight it. Let’s start protecting the people and place we love, right now! #ScienceMoms #LaterisTooLate #ProtectWhatYouLove
Postdoctoral fellowships and research engineer positions available for an Oxford+Singapore project on uncertainty quantification in LLMs!
https://t.co/nvAOuqmn0l
Oxford deadline is Feb 26. Pls apply if interested, forward to your contacts, contact me if you have questions 🙏🙏
Applications are now open for EEML 2025 in Sarajevo, Bosnia and Herzegovina, 21-26 July! 🎉
Learn from top AI researchers and connect with peers in Sarajevo 🇧🇦, a historical crossroads of East and West. Needs-based scholarships are available.
Deadline: 31 March 2025.
I hope @Keir_Starmer and his government are taking notes. To “unleash AI across UK to boost growth” one has to invest in talent and compute at the very least. Hint: getting rid of triple lock will help! Move 💰 from the unproductive to those who will deliver the growth you’re after.
While we are at it, starting salaries for junior faculty in places like Oxford are actually <£50K a year.
#ICML2025 includes a new track on Application-Driven Machine Learning (innovative ML techniques, problems, and datasets driven by the needs of end-users in real-world)!
If this fits your work, consider submitting to ICML (dl: Jan 30) and checking the ADML box ✅ in OpenReview ⬇️