🎡📜 We are recruiting for our London chapter.
The next era of reinforcement learning is going to be unlike the last, and requires new algorithms, environments and everything inbetween.
🏛️ Shape the next civilisation with us. Link below 👇
🎉 We're now supporting the Agent Data Protocol as a default agentic trajectory format.
Any trajectories you log to @OpenReward can be exported in the ADP format.
Thanks to @gneubig@yueqi_song for the collaboration!
🎉 Native Harbor support on OpenReward!
🐋 Connect your GitHub repository. We'll build the Docker images for each harbor task and deploy the environment as an API endpoint.
🚂 Train on the deployed tasks with any RL framework.
⚖️ Evaluate on the deployed tasks with any harness.
Drop the anchor here and get started below:
https://t.co/kEGm56Y8qW
🔥🐴 Firehorse.
Run any model with any harness on any @OpenReward environment.
⚖️ Evaluate the latest models on environment endpoints.
🗂️ Collect agentic data for midtraining and SFT from open models.
🧪 Early experimental library. More support soon.
Link below.
🌍🇬🇧 Complex Worlds Hackathon, London
We're hosting an RL environments hackathon in London on the 25th April, partnering with @join_ef and @airstreet
Come join us to build the next generation of RL environments that model complex worlds over long horizons!
https://t.co/xp2EGT5vv9
🎲 Introducing KellyBench, a new long-horizon evaluation for frontier models.
KellyBench evaluates models within a year long sports betting market, a challenging and highly non-stationary environment.
Every frontier model we test loses money. They struggle to design ML strategies, manage risk, and adapt as the world changes.
Link and thread below.
🌍 Environments of the Week
The theme this week...environments for science 👩🔬.
First up, LLM-SR Bench by @ParshinShojaee et al is an environment for evaluating language model agents on scientific equation discovery tasks.
https://t.co/zzx4Hv46LS
🪐 Researcher Credits
We’re announcing researcher credits for OpenReward: helping researchers develop the next generation of environments and evaluations.
Read more and apply below.
https://t.co/MMl97BSqip
🌍 Environments of the Week
It's been a week since we launched @OpenReward. Here are some of our favourite environments this week - some newly added, some heavily used, and some hidden gems.
First, the most used environment of the week is EndlessTerminals by @gandhikanishk with 830k+ tool calls.
https://t.co/ZpustB7zYK
🧵
330+ environments, 4.5M+ tasks through one API is impressive. Love seeing native integration with Miles and Slime, makes spinning up RL experiments so much cleaner 🔥
Introducing OpenReward.
🌍 330+ RL environments through one API
⚡ Autoscaled sandbox compute
🍒 4.5M+ unique RL tasks
🚂 Works like magic with Tinker, Miles, Slime
Link and thread below.
Tactics changelog from last week:
- We released https://t.co/WsmhVAkon2 as an example of how to build games with tactics, and an associated blog. Come try building your own game!
- We've also released a call for design partners. If you're enjoying tactics and want to get closer to the team to help us shape direction, let us know! https://t.co/Fmb4uAKzrh
- We updated access configuration to better control over who can access and run Tactics. Now, Tactics can be gated to only enable logged-in users to call them.
- CTAC now has type expressions (enums, objects, tuples) that enable you to ensure that the input and output of your Tactics conform to the right specification!
- Speaking of types, this may be a good time to mention that schema-following has been built into our LLM calls from the start! Write a JSON schema and add it to an LLM call, and you can guarantee the model will return in the desired format.
A NARROW PATH
If attempts to build superintelligent AI succeed, we face extinction as a species.
We must choose a different path, one where we control what we create. One where AI is a tool for human advancement, not a successor species.
Humanity has no plan, so we built one.
Deepfake technology means anyone can take your face, your voice and steal your identity for whatever they want.
They don’t need your consent, and can generate entire videos from a single picture.
96% of deepfakes are non-consensual and pornographic - those that aren’t, are nearly always used for fraud.
What’s more, deepfakes are growing exponentially. Since 2019 the amount of deepfake videos has increased by 550%.
This year, nearly half of surveyed US businesses reported experiencing deepfake voice fraud.
The first quarter of 2023 saw more cases of deepfake fraud than the entirety of 2022.
This year, a mother received a call from her daughter who was screaming for help. She was told her daughter had been kidnapped, and her captors were demanding a ransom. But it wasn’t her daughter, it was a deepfake of her voice.
This cannot continue. If we don’t act now, it will only get worse.
Watch our video on the effects of deepfakes:
CEO Connor Leahy (@NPCollapse) attended the AI Safety Summit at @BletchleyPark on behalf of @Conjecture.
In an interview with @SciTechgovuk Connor spoke about how the US and China were now “addressing risks from AI as an international global priority.”
https://t.co/haAhJgPfTj
🚨 NEW CAMPAIGN VIDEO JUST DROPPED 🚨
We face an inflection point: an historic opportunity to control AI and safeguard humanity with the AI Safety Summit.
Scaling is the problem - it cannot be the answer.
Governments must work together: cap compute, create MAGIC.