Our Prime Minister @KristrunFrosta meeting her Indian counterpart today, sorting out our first five match Test series. No doubt about it, you can see a cricketing look in her eye.
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings.
Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened.
But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)
Why does a political party so ruthlessly competent in political management doesnt manage the same insanely-outlier outcomes in economic-development terms?
There is a lot of hype around continual learning, but what is it and how do we evaluate it?
With our new continual learning bench we sought to answer both of these questions. We developed a new methodology for designing continual learning tasks and a growth-based learning metric to isolate continual learning.
Have you experienced models (agent loops) rapidly improving on your tasks? Do you have tasks that could benefit from continual learning? Let us know.
this fleet RLM result screams mismanaged geniuses hypothesis.
recursive agents (DSPy ReAct + sandbox) more than doubled accuracy on 100 long-horizon tasks 13% to 33%, with zero failures.
perf on tasks in the logic domain exploded to 80% (8×). every "wrong" answer was just formatting, the actual reasoning was clean.
we’re not missing model smarts. we’re missing proper management.
better architectures beat bigger models.
@somnath1978@ShashiTharoor IWT in "abeyance" (Pakistanis reach for the dictionary every time we say this) can be changed to IWT "abrogated/annulled" etc after the next terror attack - this itself is a big deterrent.
@somnath1978@ShashiTharoor also unstable and inequitable internally. If their most popular leader has been jailed, why are we expected to engage in good faith with a compromised polity in the quest for peace?
Someone not bragging about a better number but instead reflecting on how we talk about things and where the field is headed. Thought leadership!
We need more of this!
This is EPIC!
Another Hong Kong based Sindhi going big in Bharat 🇮🇳🇭🇰
10,000 manufacturing jobs added in Odisha and this is just the start.
The movement is REAL.
We are investing in foundational technologies across the board: recently in quantum sensing, advanced materials, and soon metallurgy. I am a big proponent of metallurgy R&D in particular. Without it, we cannot build nail cutters or precision machinery or jet engines.
These are not flashy billion dollar investments to make headlines, they are foundational R&D that cost millions a year, stretched out over many years. The key is to SUSTAIN them for a decade or longer. Scientists and engineers need time and rock solid support.
We also don't aim for prestige, we want to first replicate know-how already there.
We have also been looking to partner with small Japanese companies with critical know-how. I have two fluent Japanese speakers with me now!
@Iyervval@ShivrattanDhil1@SriLankaTweet@SriLanka We have a large captive population to feed off - but the sheer overpopulation mars the experience - traffic jams in Ranthambore for tiger sighting being a case in point!!
@KanwalSibal also B'desh, India more or less neck and neck, with close integration. If Bdesh sligtly ahead, nothing wrong. Comparing Bdesh with Indian states rather than entire India makes more sense. Bihar, W bengal can learn fron Bdesh.