Check out our findings in our latest preprint!
A big thank you to everyone who's been using and voting on Copilot Arena. We couldn't have done it without you all♥️!
What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants?
In October, we launched @CopilotArena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.
Here's what we have learned /🧵
We are excited to launch the ⚔️PR Arena⚔️ leaderboard!
Full results will be revealed after a certain milestone of community votes.
Fix your GitHub issues for free and vote for better fix!
👉Leaderboard & Setup Guide: https://t.co/S1Oe3xXm6K
Here are some tips for using ⚔️PR Arena⚔️
1⃣ pr-arena🏷️ option is added automatically to Issue Labels for ease of use!
2⃣ You can use PR Arena in forked repositories.
3⃣ Don't like either fix? Select “neither” and no PR will be created.
👉Install here: https://t.co/bk19LcnBVf
📢Calling all developers who contributed votes in Copilot Arena, we need your help building the PR Arena leaderboard 🗳️.
You will no longer be restricted to VSCode IDE--any GitHub repo with an open issue is fair game!
Check out the thread below for details:
Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues.
Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests?
👉 Install here: https://t.co/bk19LcnBVf
Powered by @allhands_ai
Excited to share our beta release of Music Arena, a live evaluation platform for state-of-the-art AI music generation models!
🎧 Listen to the latest models and 🗳️ vote for your favorite
⚔️ https://t.co/XyXvzlOMcH
⭐️ https://t.co/9f3mvseEyu
📜 https://t.co/HqTg5PH09O
Since our launch earlier this year, we are thrilled to witness the growing community around dLLMs.
The Mercury tech report from @InceptionAILabs
is now on @arxiv with more extensive evaluations: https://t.co/DnDxFvoX0E
New model updates dropping later this week!
New result: Qwen-2.5-Coder jumps from 13th to joint 1st place with fill-in-the-middle (FiM)! Congrats to @Alibaba_Qwen 🥳
Also check out @lmarena_ai 's new UI 🖥️✨
Who is winning the race to claim the LLMs for SWE market? We share our thoughts based on our @CopilotArena work. See article below for current sentiments and what lies ahead 👇
We are launching our API in open beta! Visit the Inception Platform to create your account and get started using the first commercial-scale diffusion large language models (dLLMs).
https://t.co/joTqBB0cZ4
With so many AI coding assistants out there, it can be hard to keep track of ones that perform well on real-world tasks. CMU researchers developed Copilot Arena to do just that by crowdsourcing user ratings of LLM-written code.
https://t.co/OVObru9h7b
https://t.co/cTVGkK59Dr
How do real-world developer preferences compare to existing evaluations? A CMU and UC Berkeley team led by @iamwaynechi and @valeriechen_ created @CopilotArena to collect user preferences on in-the-wild workflows. This blogpost overviews the design and deployment of Copilot Arena + new insights into developer code preferences.
New #1 Leaders of Code Edit Leaderboard:
Strong performance from both Claude 3.7 Sonnet and Gemini-2.0-Pro! Congratulations to @AnthropicAI and @GoogleDeepMind 🥇
We also release new live leaderboard interface✨. You can now easily toggle between code completion and code edit.
News from @CopilotArena: Code Editing Leaderboard is now LIVE!
We have collected over 3.7k votes on 6 models. Congrats @AnthropicAI Claude 3.5 Sonnet on a 1st place rank!🥇
Blog analysis below👇
🏆 Mercury Coder’s performance: It’s tied for 2nd place on Copilot Arena, a platform for evaluating coding assistants in real-world settings. This is impressive for a new model based on emerging tech, competing with leaders like DeepSeek V2.5 and Claude Sonnet 3.5. #Coding#AI