@heyandras Just tried their model, not their harness. I ended up going back to DeepSeek V4 Flash for most basic tasks, as it does as well but 5x cheaper.
@antigravity Please fix this issue, if you ask Gemini 3.5 Flash to rate something from 1 to 10, it always gives it a 7, even if the answer is a perfect 10!
It hallucinates "compression guidelines".
Thanks for the updates!
One small nit-pick: the Theme selector is a bit too visible now, it's the element in the sidebar that pops out the most, and takes a lot of attention.
Maybe remove the background around the icons?
Or even maybe replace the icon in the left with the currently selected theme, and only offer on the right the options for the other two (see 2nd image).
Adding a few more coding tests to @AIBenchy, as main use-case for LLMs atm is coding, so it makes sense that models that are better at coding to be ranked higher overall, than those who are better at trivia.
Still, general intelligence, puzzle solving and not being able to be tricked are still what I think makes a good AI and brings us closer to AGI.
@eastdakota My websites are being scraped like crazy, one website has 5000 bot fecthes pet day,
from various bote/countries.
And it's not just fetching html, some bots actually navigate, or even register.