Chengxi Taylor

3 months ago

We focused on doing one thing well: serving RL environments at scale in a way that’s open and sharable.

3 months ago

That’s it for now. We’d really appreciate your feedback on what you like and what could be better. Note that training support is a research preview for now, and CPU capacity may be limited based on demand. Please bear with us if we need to scale up our clusters! Learn more about our vision below: https://t.co/40hSMvl59B

1

31

2

8

3K

0

3

1

363

about 1 month ago

Real intelligence isn’t just about getting the right answer, but most importantly, navigating a complex world that keeps changing. Our Complex Worlds Hackathon was a small step toward that shift. What stood out wasn’t just technical strength, but intent. Builders chose hard, meaningful problems, e.g. robotics for elder care, better nursing systems, fully aware they’re difficult, but worth solving. After the event, people came up to me saying: “I knew it was tough, I knew it may not win the prize, but I want to work on it.” That’s what stayed with me: brilliant minds, grounded in humanity. The future of true intelligence.

about 1 month ago

🌍 Last month we hosted the Complex Worlds Hackathon in London. Participants built an impressive range of environments, spanning synthetic game pipelines, arable farm management, dynamic vehicle routing, hospital triage, robotics, cybersecurity, and more. Congrats to our winners, Julie Huang and Khalid A!

3

27

3

10

6K

0

2

0

177

ChengxiTaylor retweeted

Ross Taylor

@rosstaylor90

about 2 months ago

POMDP and Circumstance at the Complex Worlds Hackathon Giovanni R (Edison Scientific), @ibragim_bad (Nebius), Jeff Smith, @ChengxiTaylor (GR)

rosstaylor90's tweet photo. POMDP and Circumstance at the Complex Worlds Hackathon

Giovanni R (Edison Scientific), @ibragim_bad (Nebius), Jeff Smith, @ChengxiTaylor (GR) https://t.co/UN9tURAJVd

0

17

2

2K

I design 3d print models, often math based, using mostly OpenSCAD. Creator of fully 3d printed color lithophanes. ASD

about 2 months ago

This Saturday, we’re hosting the Complex Worlds Hackathon in London, in partnership with @join_ef and @airstreet. The response from the community has been incredible. But this isn’t just a hackathon for engineers. It’s about a deeper question: how intelligence actually develops. Today’s AI is powerful, but often short-sighted. It struggles with long-term planning, adapting to change, and operating in messy, real-world conditions. That’s because intelligence doesn’t emerge from static questions. It emerges from interaction, feedback, and experience, from the environments we place agents in. This is why at @GenReasoning, we’re focusing on building long-horizon reinforcement learning environments: worlds where agents must act over hundreds or thousands of steps, adapt to non-stationarity, and develop capabilities that don’t show up in short tasks. If we want AI that can truly operate in the real world - in science, business, creativity - we need to rethink the environments we train it in. That’s what this weekend's hackathon is about. The next leap in AI isn’t just bigger models; it’s better environments. Intelligence isn’t built from a single answer; it’s built from experience.

1

3

0

129

Who to follow

Jason Preuss

@patterntoprint

VNTANA

@VNTANAlive

The product content automation platform that connects systems and turns product data into usable content across every team.

ChiTu Systems

@ChiTuSystems

We focus on 3d printing control systems and provide some spare accessories for your need. Upgrade kit, screen replacement, and consumables materials.

2 months ago

AI seems very smart these days, but can it actually make good decisions over time? Can it adapt when the world changes? And what does the next frontier of AI capability really look like? Last week, my AI company @GenReasoning released a research paper testing frontier models in a sports betting market environment. The result was striking: every model we tested lost money. That sparked a strong interest from the community, including a front-page feature in the @FinancialTimes. But for me, the real story is not the headline. It is what this result reveals about the next frontier in AI. Today’s models can often analyse well in the moment. But real-world intelligence requires more than analysis. It requires judgment over time - the ability to adapt, manage risk, respond to changing conditions, and stay coherent across a long horizon. That is why I sat down with my co-founders, @rosstaylor90, @latent_spaced, and @Kipothy, to talk through what we built, what we found, and what it means. For me, this points to a much bigger question in AI: how do we build systems that do not just produce strong answers, but understand context more deeply, adapt over time, and make better decisions in the real world? Big, meaningful work depends on this - whether in drug discovery or space exploration. In each case, progress depends on evolving with scientific, social, and cultural context over time, not just getting one static answer right.

2 months ago

🌄 Beyond SWE: The Future of Long Horizon Environments A discussion with our founders about KellyBench, and the need for new environments that require agents to adapt over time and act under uncertainty. 0:00:17 What is KellyBench? 0:02:10 Openendedness, non-stationarity and continual learning 0:03:40 Analytical versus operational capabilities 0:04:13 Why are models bad at KellyBench? 0:05:39 Situational awareness in dynamic environments 0:06:37 Feature stability and real-world non-stationarity 0:07:07 The power of context 0:07:34 "The first principle is that you must not fool yourself" 0:08:20 Machiavelli, fortuna and the ability to adapt to change 0:09:23 How can models improve on evals like KellyBench? 0:10:12 Limitations of KellyBench: data availability and market odds timing 0:11:44 Implications beyond quant finance / sports betting 0:13:26 Civilisations as the ultimate time horizon 0:14:05 Would a mega prompt / better elicitation do much better on the benchmark? 0:14:52 What new types of capability is GR excited about? 0:17:48 Taste and the ability to pursue long-term goals even if they aren't immediately rewarding 0:18:56 Deep learning as an example of a method that took a long time to bear fruit 0:19:25 Optimism about the future of AI

0

77

7

59

16K

0

1

0

167

2 months ago

If you care about long-horizon reasoning, this is where it starts. Excited to host this in London with great partners. Applications open 👇

2 months ago

🌍🇬🇧 Complex Worlds Hackathon, London We're hosting an RL environments hackathon in London on the 25th April, partnering with @join_ef and @airstreet Come join us to build the next generation of RL environments that model complex worlds over long horizons! https://t.co/xp2EGT5vv9

2

67

13

35

22K

0

2

0

138

ChengxiTaylor retweeted

Nathan Benaich

@nathanbenaich

2 months ago

“AI models from Google, OpenAI and Anthropic lost money betting on football matches over a Premier League season, in a new study by @GenReasoning suggesting even the most advanced systems struggle to analyse the real world over long periods of time. The “KellyBench” report released this week by AI start-up General Reasoning highlights the gap between AI’s rapidly advancing capabilities in certain tasks, such as writing software, and its shortcomings in other kinds of human problems.” https://t.co/tePM7wqWun

0

9

3

6

7K

2 months ago

Most real-world decisions don’t happen in a single moment. They unfold over time, under uncertainty, with real consequences. Succeeding in the real world isn’t just about being right once. It’s about staying right as conditions change, managing downside, and making decisions over time. This gap matters.

2 months ago

🎲 Introducing KellyBench, a new long-horizon evaluation for frontier models. KellyBench evaluates models within a year long sports betting market, a challenging and highly non-stationary environment. Every frontier model we test loses money. They struggle to design ML strategies, manage risk, and adapt as the world changes. Link and thread below.

GenReasoning's tweet photo. 🎲 Introducing KellyBench, a new long-horizon evaluation for frontier models.

KellyBench evaluates models within a year long sports betting market, a challenging and highly non-stationary environment.

Every frontier model we test loses money. They struggle to design ML strategies, manage risk, and adapt as the world changes.

Link and thread below.

25

637

49

424

159K

0

3

1

747

ChengxiTaylor retweeted

Dimitris Papailiopoulos

@DimitrisPapail

3 months ago

@gandhikanishk's Endless Terminals is the most popular env on OpenReward!

1

33

7

5

5K

ChengxiTaylor retweeted