Chicou @ChicouTiMix - Twitter Profile

Chicou @ChicouTiMix

about 8 hours ago

@0xSero How much concurrencies

0

45

ChicouTiMix retweeted

Harvey @harvey

3 days ago

We partnered with @FireworksAI_HQ to train open-source models for legal. Here's what we found: 1) Hybrid legal agents can beat frontier models on quality and cost by routing selectively to a frontier advisor. We tested a hybrid setup where GLM 5.1 served as the primary worker, routing tasks to Opus 4.7 as an advisor when needed. GLM invoked Opus sparingly, just 0.83 times per task on average. The hybrid setup beat Opus on both quality and cost: 18% all-pass vs 14%, at $368 vs $954 across the same 100 tasks. 2) Post-training can push open models to frontier-level legal performance. On a 100-task slice of our Legal Agent Benchmark (LAB), SFT moved Kimi 2.6's all-pass rate from 11% to 15%, beating Opus' 14%. But the cost gap was even more striking: $84 vs $954 across the same 100 tasks, or ~11x cheaper. We're excited to continue working with @FireworksAI_HQ on the next generation of open-source legal agents.

harvey's tweet photo. We partnered with @FireworksAI_HQ to train open-source models for legal. Here's what we found:

1) Hybrid legal agents can beat frontier models on quality and cost by routing selectively to a frontier advisor.

We tested a hybrid setup where GLM 5.1 served as the primary worker, routing tasks to Opus 4.7 as an advisor when needed.

GLM invoked Opus sparingly, just 0.83 times per task on average.

The hybrid setup beat Opus on both quality and cost: 18% all-pass vs 14%, at $368 vs $954 across the same 100 tasks.

2) Post-training can push open models to frontier-level legal performance.

On a 100-task slice of our Legal Agent Benchmark (LAB), SFT moved Kimi 2.6's all-pass rate from 11% to 15%, beating Opus' 14%.

But the cost gap was even more striking: $84 vs $954 across the same 100 tasks, or ~11x cheaper.

We're excited to continue working with @FireworksAI_HQ on the next generation of open-source legal agents.

38

827

66

663

415K

Chicou @ChicouTiMix

3 days ago

@bnjmn_marie Agreed need to benchmark it on rag

0

161

Chicou @ChicouTiMix

3 days ago

@0xSero Need to benchmark it

0

176

Chicou @ChicouTiMix

4 days ago

@stevibe @danielhanchen Congrats Dan !

0

1

0

16

Chicou @ChicouTiMix

5 days ago

@ivanfioravanti Token seconde ? Same than the spark ? What do you think guys ?

1

0

198

Chicou @ChicouTiMix

5 days ago

@LottoLabs Open …. Ai 🤖 lol

0

1

0

184

Chicou @ChicouTiMix

7 days ago

@TeksEdge Best bandwidth than the spark ? Better for speed ?

0

221

Chicou @ChicouTiMix

7 days ago

@0xSero @badlogicgames @ain3sh Can we prune and just keep coding experts ? With reap ?

1

0

106

Chicou @ChicouTiMix

7 days ago

@0xSero Ds4 q2 is sill good in coding and logic ?

1

0

559

Chicou @ChicouTiMix

8 days ago

@LottoLabs Almost 30 on the spark gguf llama cpp

0

1

0

28

Chicou @ChicouTiMix

9 days ago

@MrPeterLMorris Exactly

0

163

Chicou @ChicouTiMix

9 days ago

@0xSero Hf link :) ?

0

93

Chicou @ChicouTiMix

10 days ago

@ivanfioravanti Let us know if it works well

0

1

0

56

Chicou @ChicouTiMix

12 days ago

@0xSero Is cerebras is going to explain how to prune ? Reap ?

0

194

Chicou @ChicouTiMix

12 days ago

@jedisct1 Cool I’ll try it on my spark . Did you prune all except coding knowledge ?

0

1

0

1K

Chicou @ChicouTiMix

16 days ago

@spark_arena Coool

0

1

0

40

Chicou @ChicouTiMix

18 days ago

@Alibaba_Qwen Hope to be the one in 🇲🇨

0

72

ChicouTiMix retweeted

Andrej Karpathy

@karpathy

18 days ago

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

8K

150K

11K

14K

27M

ChicouTiMix retweeted

Brett Adcock

@adcock_brett

20 days ago

We got bored. Time for Man vs. Machine https://t.co/HIqPGygWnF

415

4K

491

584

2M

Chicou

@ChicouTiMix

Last Seen Users on Sotwe

Trends for you

Most Popular Users