@eliebakouch tau2 is a benchmark for telecom customer service tasks. why would it have any correlation with CursorBench?
CursorBench is strongly correlated with other coding benches in this comparison
Gemini Flash 3.5 is now on CursorBench, our main coding agent eval.
We’ll keep updating the leaderboard as new models come out.
https://t.co/67u5JEXoM9
Are you interested in working on cutting-edge high performance distributed systems? @dmazieres and I need a few more engineers to join our small team. We need all sorts of help -- c++/rust, typescript , smart contracts, ops, cryptography, and more. DM me if this sounds exciting!
feed this am is truly hilarious. i'm reminded of two lessons my parents instilled in me as a child:
1) if you see a forbes 30u30 founder you run away
2) most posters on twitter know nothing about LLMs
Congrats to the @cotoolai team on their $7.4M seed!
Cotool is building the agent operating system for security teams.
Threat actors now scale with tokens. Campaigns that used to require a coordinated team can be run by a small group with the right model harness. Defense has been absorbing that hit with the same playbook and the same headcount. Cotool was built to make defense compound in the same way.
https://t.co/4DSqto0RJk
Excited to announce that @cotoolai has raised a $7.4M seed round led by @a16z to build the agent operating system for security teams.
Threat actors now scale with tokens. Campaigns that used to require a coordinated team can be run by a small group with the right model harness. Defense has been absorbing that hit with the same playbook and the same headcount. We built Cotool to make defense compound in the same way.
Grateful to the team at @a16z, @ycombinator, @WndrCoLLC, @homebrew, and our angels from Okta, Ramp, Cloudflare, and others who've lived this problem firsthand.
If you’re a security practitioner looking for more leverage in the AI age, come see how Cotool can help!