Today, Trata is excited to present Hedge-Bench, the world’s first benchmark focused on evaluating open-ended reasoning in the finance domain.
We curated 102 tasks derived explicitly from the reasoning traces of professional hedge fund analysts working with relevant information sources. Most benchmarks are geared towards evaluating naturally deterministic tasks (e.g. updating spreadsheets, calculating formulas). The result is a blind spot on the frontier labs’ part to the competency that matters most in the finance industry: reasoning.
No frontier model scores above 16% on Hedge-Bench. We observe Claude-Opus-4.8 actually regresses relative to the last generation of Claude-Sonnet. We see hallucination rates go as high as 82% for multi-step reasoning. These implications extend beyond just the finance domain and into other domains where trust in agent reliability is equally critical (e.g. law).
We recently added LoRA RL support for Qwen3.5 MoE models like Qwen3.5-122B.
The process involved merge conflicts, dependency issues, and a tricky race condition related to LoRA weight syncing.
We wrote up a post that goes into more detail and shares our work - link below!
Just set up OpenClaw and looked at the clawhub skills to see what people are using, first few skills on the page just straight up has comments telling people to install malware
Feeling like I have a super power when I can read CSDN blogs and random alibaba developer docs. Info density is unmatched when compared to US articles + deep research tools
Recent years have seen GPUs become the dominant chip within a system, overtaking CPU importance. GeminiFS is a natural progression of this change. https://t.co/nfRdOY1PzG
We brought together 200+ founders, researchers, and engineers on Saturday for RL IRL at @ycombinator HQ!
At @Osmosis_AI, our belief is that RL is the missing piece for AI agents - so we had some friends share how they're applying RL today!
🧵of session recordings below:
Hey that graphic looks like it's got some numbers on it, wonder where it's from?
But really, come to the event! I think it'll be a really helpful session on designing RL ready AI apps.
Sharing the full schedule for RL IRL (Saturday at @ycombinator) below!
We're finalizing the guest list tonight - this is the last chance to RSVP.
(Also, a sneak peek at some of the @Osmosis_AI merch we're handing out!)
Super excited for RL IRL this Saturday at @ycombinator, we're at 500+ registrations!
We (@Osmosis_AI) are co-hosting a session with @greptile on designing RL-ready products -
Other sessions in thread 👇
Super excited for RL IRL this Saturday at @ycombinator, we're at 500+ registrations!
We (@Osmosis_AI) are co-hosting a session with @greptile on designing RL-ready products -
Other sessions in thread 👇
Osmosis is a platform for companies to fine-tune models that outperform foundation models with reinforcement learning.
Better, faster, and cheaper.
Learn more at https://t.co/2d3bSeYwvV