OpenReward

about 1 month ago

🎉 We're now supporting the Agent Data Protocol as a default agentic trajectory format. Any trajectories you log to @OpenReward can be exported in the ADP format. Thanks to @gneubig @yueqi_song for the collaboration!

0

42

11

18

17K

about 1 month ago

🧪 We’re experimenting with new features that allow for easier sampling with popular agentic harnesses. Core use cases: - Collecting diverse agentic midtraining data - Evaluating the latest models on agentic environments Try it out!

about 1 month ago

🔥🐴 Firehorse. Run any model with any harness on any @OpenReward environment. ⚖️ Evaluate the latest models on environment endpoints. 🗂️ Collect agentic data for midtraining and SFT from open models. 🧪 Early experimental library. More support soon. Link below.

3

34

7

21

5K

0

3

0

246

about 2 months ago

Try it out on OpenReward: https://t.co/z0YRHE6nYl

about 2 months ago

🎲 Introducing KellyBench, a new long-horizon evaluation for frontier models. KellyBench evaluates models within a year long sports betting market, a challenging and highly non-stationary environment. Every frontier model we test loses money. They struggle to design ML strategies, manage risk, and adapt as the world changes. Link and thread below.

GenReasoning's tweet photo. 🎲 Introducing KellyBench, a new long-horizon evaluation for frontier models.

KellyBench evaluates models within a year long sports betting market, a challenging and highly non-stationary environment.

Every frontier model we test loses money. They struggle to design ML strategies, manage risk, and adapt as the world changes.

Link and thread below.

25

628

49

423

159K

0

4

0

211

about 2 months ago

You can now train on OpenReward environments with SkyRL! Amazing work by @tyfeng1997 🙇

Ty Feng @tyfeng1997

about 2 months ago

Recently, I integrated @OpenReward into SkyRL (@NovaSkyAI), including an example demonstrating training with @modal. To verify the code, I ran several experiments—which proved to be a highly enriching experience! 😋 https://t.co/4zyGhp08ZY

1

15

1

7

1K

0

2

0

146

OpenReward retweeted

ƬⲘ

@tm23twt

about 2 months ago

timelapse 27 :) - submitted the rust reasoning algo env to meta rl hack, (actually built a python then moved to the rust one) created rust dataset around 1000 problems will make it next to 2.5k - define the whole reward logic not the optimal i think designed the way validation works, will refine it & push to @PrimeIntellect & @OpenReward envs. - have some other tasks as well, deadline is Tomorrow so need to finish this - this week was a pretty rough like peak locked in, so will chill & and just relax for few days

6

28

1

2

696

about 2 months ago

Claude Mythos Preview on SWE-Bench Pro appears to be a step change.

0

1

64

about 2 months ago

Congrats to @Zai_org team, new SOTA on SWE-Bench Pro! https://t.co/wrFdHEXiMZ

Z.ai @Zai_org

about 2 months ago

Introducing GLM-5.1: The Next Level of Open Source - Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo. - Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations. Blog: https://t.co/hmyDe4Nel3 Weights: https://t.co/CuUjXcPKJD API: https://t.co/fz6reja4fb Coding Plan: https://t.co/Nk8Y98HNhU Coming to https://t.co/WCqWT0qCQb in the next few days.

Zai_org's tweet photo. Introducing GLM-5.1: The Next Level of Open Source

- Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo.
- Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations.

Blog: https://t.co/hmyDe4Nel3
Weights: https://t.co/CuUjXcPKJD
API: https://t.co/fz6reja4fb
Coding Plan: https://t.co/Nk8Y98HNhU

Coming to https://t.co/WCqWT0qCQb in the next few days.

546

11K

1K

4K

4M

1

6

2

1

1K

OpenReward retweeted

Parshin Shojaee

@ParshinShojaee

about 2 months ago

great to see our llm-srbench featured in openreward! super exciting collection of science environments for agents!!

2

29

6

5

4K

OpenReward retweeted

about 2 months ago

🌍 Environments of the Week The theme this week...environments for science 👩‍🔬. First up, LLM-SR Bench by @ParshinShojaee et al is an environment for evaluating language model agents on scientific equation discovery tasks. https://t.co/zzx4Hv46LS

GenReasoning's tweet photo. 🌍 Environments of the Week

The theme this week...environments for science 👩‍🔬.

First up, LLM-SR Bench by @ParshinShojaee et al is an environment for evaluating language model agents on scientific equation discovery tasks.

https://t.co/zzx4Hv46LS https://t.co/C7KnVlNoeo

1

26

6

14

6K

OpenReward retweeted

2 months ago

Run YC-Bench from @CollinearAI on OpenReward 👇

1

16

4

5

4K

OpenReward retweeted

2 months ago

🪐 Researcher Credits We’re announcing researcher credits for OpenReward: helping researchers develop the next generation of environments and evaluations. Read more and apply below. https://t.co/MMl97BSqip

1

63

10

54

12K

OpenReward retweeted

Dimitris Papailiopoulos

@DimitrisPapail

2 months ago

@gandhikanishk's Endless Terminals is the most popular env on OpenReward!

1

33

7

5

5K

OpenReward retweeted