Cristian @XCSme - Twitter Profile

@antigravity Please fix this issue, if you ask Gemini 3.5 Flash to rate something from 1 to 10, it always gives it a 7, even if the answer is a perfect 10! It hallucinates "compression guidelines".

XCSme's tweet photo. @antigravity Please fix this issue, if you ask Gemini 3.5 Flash to rate something from 1 to 10, it always gives it a 7, even if the answer is a perfect 10!

It hallucinates "compression guidelines". https://t.co/IQ8k65077c

0

56

Cristian

@XCSme

about 18 hours ago

@heyandras Or maybe remove the icons on the right entirely, and clicking theme simply toggles through the option and updates the left icon.

1

0

26

Cristian

@XCSme

about 18 hours ago

Thanks for the updates! One small nit-pick: the Theme selector is a bit too visible now, it's the element in the sidebar that pops out the most, and takes a lot of attention. Maybe remove the background around the icons? Or even maybe replace the icon in the left with the currently selected theme, and only offer on the right the options for the other two (see 2nd image).

XCSme's tweet photo. Thanks for the updates!

One small nit-pick: the Theme selector is a bit too visible now, it's the element in the sidebar that pops out the most, and takes a lot of attention.

Maybe remove the background around the icons?
Or even maybe replace the icon in the left with the currently selected theme, and only offer on the right the options for the other two (see 2nd image).

1

0

59

Cristian

@XCSme

about 18 hours ago

Adding a few more coding tests to @AIBenchy, as main use-case for LLMs atm is coding, so it makes sense that models that are better at coding to be ranked higher overall, than those who are better at trivia. Still, general intelligence, puzzle solving and not being able to be tricked are still what I think makes a good AI and brings us closer to AGI.

0

19

Cristian

@XCSme

about 23 hours ago

@arena I don't know, I tried it and it couldn't solve a bug in 20mins that GPT 5.5 low found in 30 seconds.

0

4

0

307

Cristian

@XCSme

1 day ago

@eastdakota My websites are being scraped like crazy, one website has 5000 bot fecthes pet day, from various bote/countries. And it's not just fetching html, some bots actually navigate, or even register.

0

1

0

320

Cristian

@XCSme

1 day ago

@qwen_cloud Quite good, close enough to 3.7 Max, 2x cheaper, but 3x slower on average.

0

39

Cristian

@XCSme

2 days ago

@RyanLeeMiniMax On OpenRouter still seems to be very slow:

0

1

0

315

Cristian

@XCSme

2 days ago

@_mohansolo Also via API?

0

81

Cristian

@XCSme

2 days ago

@grok @SasaMarinkovic @AMD Any prices announced?

1

0

16

Cristian

@XCSme

2 days ago

@qwen_cloud @OpenRouter when?

0

107

Cristian

@XCSme

3 days ago

@Leo_R_UK @MiniMax_AI Thanks, will check it out

0

1

0

9

Cristian

@XCSme

3 days ago

Also, still debating how to track Input Tokens IF a request is rejected by the provider (e.g. trying to use tool_calling for models that don't support it). Should the Input Tokens still be counted, even if the request failed, just to be consistent?

0

28

Cristian

@XCSme

3 days ago

I made it so "Total Input Tokens" are displayed, to make it easier to understand the Total Cost of running benchmarks. The Input Tokens should be mostly the same for all models, BUT when models do Tool Calling, the result of the tool + previous prompt are passed again, so depending on how they call the tools the total input tokens can vary.

XCSme's tweet photo. I made it so "Total Input Tokens" are displayed, to make it easier to understand the Total Cost of running benchmarks.

The Input Tokens should be mostly the same for all models, BUT when models do Tool Calling, the result of the tool + previous prompt are passed again, so depending on how they call the tools the total input tokens can vary.

1

0

1

42

Cristian

@XCSme

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users