Chris E @CTE - Twitter Profile

11 months ago

@communicating @roocode @Zai_org @openrouter Here are the programming exercises for our evals suite: https://t.co/IfWVZeJlPs Full results here: https://t.co/NEeYO8Ah4g

1

0

1

44

Chris E

@cte

11 months ago

@artificialguybr @roocode @Zai_org @openrouter https://t.co/NEeYO8zJeI

0

1

45

Chris E

@cte

about 1 year ago

Your take isn't spicy enough!😂 My sense is that there are trade-offs with all of these tools and in the long run I wouldn't bet against giving these models more tools and letting them judge which is most appropriate given the constraints. It would be nice to have some eval data backing these takes (we're working on that).

0

1

0

61

Chris E

@cte

about 1 year ago

@AlexGrama @roocode On it! https://t.co/JqBtZtMUDX

1

2

0

133

Who to follow

Jonathan Siddharth

@Jonsid

Founder and CEO, Turing. Accelerating superintelligence to drive real economic progress @turingcom.

Anders Ranum

@aranum

VC @SapphireVC | Proud backer of @Tractian, @Alation, @Auth0, @Dremio, @Exabeam, @Feedzai, @jupiterone, @InfluxDB, @StackHawk etc.

Cyrus David

@cyrusmdavid

founder @ https://t.co/vG4p6UFOD7

Chris E

@cte

about 1 year ago

@pingToven 😬 Agreed... I'm trying to implement caching for the Gemini provider in Roo Code and I'm sure that I'll get it wrong on the first attempt.

0

3

0

68

Chris E

@cte

about 1 year ago

@soyhenryxyz @GosuCoder This is amazing. The next big push on evals is going to testing various orchestration configurations and show some data that backs our intuition about it, so I'd love to help.

1

0

24

Chris E

@cte

about 1 year ago

@jpeg729 Our evals framework is open source here: https://t.co/1Lot8b3PKt - we're working on better documentation for running it yourself locally.

0

1

0

37

Chris E

@cte

about 1 year ago

@sachasayan ❤️

0

2

0

90

Chris E

@cte

about 1 year ago

@GosuCoder @bindureddy I just updated the Roo Code evals - https://t.co/NEeYO8Ah4g - o4 Mini (High) doesn't come near the top-tier of coding models but the price to performance is reasonable.

1

19

3

1

2K

Chris E

@cte

about 1 year ago

@soyhenryxyz @roocode Totally agree; I’d love to come up with a new set of benchmarks that are designed to show off the strengths of subtasks.

1

2

0

25

Chris E

@cte

about 1 year ago

@setkyarwalar @roocode Nano... 🙈😬

0

41

Chris E

@cte

about 1 year ago

@LeaderOnePro @roocode Trying to get it out today!

0

130

Chris E

@cte

about 1 year ago

@cdossman @roocode I think it's on pace to be slightly below Sonnet 3.7 and Gemini 2.5. The price to intelligence ratio of 4.1 mini seems to be trending really well...

0

2

0

99

Chris E

@cte

about 1 year ago

@mattpocockuk The Aider polyglot benchmarks are a good start. I wired up the Cursor-like product I’m working on to run the benchmarks and see how it compares to the publicly available scores.

0

30

cte retweeted

Louis Virtel @louisvirtel

about 6 years ago

I love how Mitt Romney reappears once every three months to outshine the entire Republican Party by doing the absolute least.

633

135K

12K

394

0

cte retweeted

Carol Leonnig

@CarolLeonnig

about 6 years ago

Real journalists provide facts to inform the public, not to lead them astray or put them in danger.

18

443

148

14

0

cte retweeted

Ryan Chapline @ryanchapline

almost 7 years ago

“I perceive the necessity... the necessity for haste.” - George “Maverick” Washington #RevolutionaryWarAirportStories

189

11K

2K

67

0

cte retweeted

Pete Buttigieg

@PeteButtigieg

about 7 years ago

Amazingly, the chyron is not the most foolish thing about this picture. To get ahead of a potential refugee crisis caused by great suffering in Central America, it would make sense to use our resources to help reduce that suffering. This is self-defeating.

PeteButtigieg's tweet photo. Amazingly, the chyron is not the most foolish thing about this picture.

To get ahead of a potential refugee crisis caused by great suffering in Central America, it would make sense to use our resources to help reduce that suffering.

This is self-defeating. https://t.co/re7fW6CpNS

1K

27K

5K

74

0

cte retweeted

Jacob Soboroff

@jacobsoboroff

over 7 years ago

EXCLUSIVE: DHS test of steel prototype for border wall, Trump's preference, showed it could be sawed through. We've obtained a never-before-seen photo. Our report, with @JuliaEAinsley. https://t.co/VfRTSt36mr

2K

12K

6K

96

0

cte retweeted

Robert Reich

@RBReich

over 7 years ago

Let me get this straight: Trump wants the federal court to postpone indefinitely hearing a case claiming that he is illegally profiting from his Washington hotel, because the government shutdown, created by Trump, prohibits his attorneys from working. https://t.co/VuZejVDCoP

679

7K

4K

24

0

Chris E

@cte

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users