Although the author of the article declares Grok-3 the winner, this conclusion is debatable for several reasons:
1. Subjectivity of Evaluation
The author claims that Grok-3 "won" because it made the biggest impression. However, impressions are a subjective criterion. In reality, each model has its own strengths and weaknesses, and the choice of the best model depends on specific tasks.
GPT-4.5 is better suited for everyday tasks and has improved emotional intelligence.
Claude 3.7 focuses on programming and logical reasoning.
Grok-3 offers high request limits and strong reasoning capabilities but does not necessarily outperform competitors in all aspects.
2. Incomplete Comparative Testing
The author mentions that Grok-3 "convinced the majority" but does not provide objective metrics. He refers to "feelings," but in practice, models need to be tested under different conditions: coding, math, text generation, factual accuracy, and more.
Many independent tests still show OpenAI (including Deep Research) leading.
Grok-3 has indeed made progress, but there is no concrete evidence that it is universally better.
3. Underestimating GPT-4.5
The author calls GPT-4.5 a "disappointment" while also admitting that it has improved in creative tasks, has better emotional intelligence, and offers a great experience for general users.
GPT-4.5 is not aimed at professionals, but that does not make it bad. It is optimized for a broad audience, which aligns with OpenAI's strategy.
Its improvements may not feel like a "revolution," but gradual refinement is normal—especially since GPT-4o was already strong.
4. Oversimplified Industry Analysis
The author concludes that OpenAI is now "under pressure" and "losing ground." However, in reality:
OpenAI remains the market leader with the largest user base.
GPT-5 is already in development, and OpenAI is preparing for a significant leap forward.
Grok-3 shows promising progress, but xAI is still a relatively new company that must prove the consistency of its success.
Conclusion
Grok-3 is an impressive contender, but it is too early to declare it the winner. OpenAI still holds the lead, while Grok-3 is catching up and demonstrating strong potential. However, without objective data and large-scale comparative testing, it is premature to claim that it is the best overall model.
@kimmonismus This is a poorly formulated problem, it may have different solutions, because in classical math 1 does not equal 5, so it is not the traditional mathematical sign = and we have no reason to believe that it has transitivity.
@felps_bra@rabrg That statement is from 2017 if I'm not mistaken and doesn't take AGI into account. With automated qualitative research, you can get interesting ideas and efficient algorithms in weeks that would take humans decades and dozens of experiments to create.
@felps_bra@rabrg Of course you're wrong. Experiments are the final stage, moreover, a number of experiments can be carried out at lower capacities. The main thing is to develop new algorithms, approaches and optimize existing ones.