@jietang the ability to handle a swarm of delegated subagents, basically kimi swarm is so good yet so expensive. making this efficient , scalable and standard will help a lot 🙏💙
I hope it's not benchmaxing will try to use it on my personal projects.
output tokens quality should have its own eval system, there should be a way to systematically define the frontier taste of a model other than domain specific benches hill-climbing.
GLM 5.2 is now on DeepSWE as the top open-source model on our leaderboard.
With a pass@1 score of 44% at max effort, GLM 5.2 is indisputable #1 open-source model besting Kimi K2.7 Code by 17%.
Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers.
Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?
🚨 Applications for MATS Autumn 2026 close tonight (June 7 AoE)!
Spend 10 weeks fully funded working with mentors from Anthropic, DeepMind, OpenAI, Redwood Research, SecureBio, and more.
New this cohort:
🧬 Biosecurity
🚀 Founding & Field-Building
Apply now: https://t.co/XWIdre04C8
I got the urge to convert all my util, tooling projects, routine stuff I built for self satisfaction take all that , add an agentic interface / module suite like mcps, profiles/personas as yaml if needed and connect them to Hermes , opencode , kilo cli and it feels so good 💙👌
@thdxr If the decision-maker always hurdles and blocks 2 of the 3 when we are still at poc level (never held a knife managing a kitchen kind of situation), shall I switch companies?
@lqiao@FireworksAI_HQ so what about a +1 MTS fit who did communications and Information Engineering (AI concentration), worked on cybersecurity gigs for 3yrs before graduation , and currently working as AI RnD eng. at an ed-tech startup, super passionate towards fireworks tbh