anyone interested in this Cup Odds desktop app, come and check it out from our Kro Workshop! Token-free download and you can have it running in the background all the time during your favorite matchups!
May highlights for KroWork: launched a product, got mass reported within 12 hours, banned for a month
10/10 launch experience
while waiting on 9 appeals and a lot of silence, we shipped 4 major updates anyway: one-click app sharing via link or zip; multimodal input/output; built-in AI for your apps (so it became much much powerful); last but not least, plan mode & version rollback.
we were building the whole time, just had to do it quietly tho 🤫
Check below a demo video of how we could export/import Kro App and how we pull off a FIFA Odds Software by using BrowserUse and build-in AI function to make the prediction ⚽️
THREE AGENTS. ONE LOOP. A FULL APP. Plan. Build. Judge. Cycle until it works. This isn't a research paper. It's how Anthropic's own engineers ship software. 40 minutes, start to finish. The age of single-prompt coding is over.
Anthropic engineers just showed how they build a full app from scratch, using a loop of agents
40 minutes from the team behind Claude Code
they used three agents: one to plan, one to build, one to judge, cycling until the app actually works
the winners won't have the smartest model, they'll have the best loop
watch it, then read the full guide on how to actually use loops below
@steipete@botblastcap The retention difference hits differently depending on your use case. For enterprise with SOC2 requirements in agent workflows, it becomes a hard blocker in vendor selection.
@thsottiaux@danshipper When one app eats more screen time than everything else combined, it signals a real shift: we're moving from 'using AI' to 'living inside AI'. The inbox-zero example is just the beginning
use glm 5.2 for free with 3 million tokens per day in zcode 😳
zhipu just dropped their official coding ide with their new frontier model built in
glm 5.2 is open weights (MIT license), 744B MoE, and competes with claude opus 4.8 and gpt-5.5 on coding benchmarks
it scored 62% on SWE-Bench and has a 1M context window
what you get for $0:
- 3 million free tokens every single day (resets daily)
- glm 5.2 (frontier open-weights model)
- 1M context window for full codebase dumps
- full ide experience, similar to cursor or codex
- works on mac and windows
this is not a trial or a limited promo. zhipu built zcode as their official coding environment and glm 5.2 is the default model.
how to set up (2 min):
step 1: go to https://t.co/eA0uS5gG3v
> download the app for your platform (mac or windows)
> install and open it
step 2: sign up
> use your email address to create an account
> no credit card, no phone verify
step 3: pick glm 5.2
> the ide opens with a model selector
> choose glm 5.2 from the list
> it's the default frontier model
step 4: start coding
> 3 million free tokens are already in your account
> they reset daily, not one-time
> use it for coding, agents, code review, refactoring
glm 5.2 is the strongest open-weights model available right now
and zhipu gives you 3M tokens a day to use it for free
bookmark this before the free tier changes
We ran the same design task on both models this morning. Had to check three times which output was GLM and which was Opus. Couldn't tell. Then checked the bill. One cost pennies.
This model is insane at design.
I asked GLM 5.2 (left) and Opus 4.8 (right) to build me a landing page and you can't even tell the difference.
GLM cost $0.06 while opus cost $0.49. More than 6x cheaper while being faster + more token efficient.
Another win for open source AI.
@eng_khairallah1 Good beginner tutorial.
The only distinction I’d make: this teaches workflow automation, not full agency.
A real agent is not just Claude doing 5 steps. It needs memory, feedback, planning, and consequence prediction. That is where most current “agents” still break.
Open source matters. A lot.
But benchmark-maxxing is not intelligence.
A leaderboard is a measurement instrument, and a broken instrument can still print beautiful numbers.
Every SOTA claim needs an asterisk until the eval is independent, reproducible, and contamination-aware.
Open-source LLM benchmarks are becoming a scoreboard sport.
One model is #1 at reasoning.
Another is #1 at coding.
A third is #1 at “real-world tasks.”
They cannot all be the frontier.
The real test is boring:
Same harness.
Same prompts.
Fresh tasks. Independent runner.
No financial relationship with the model provider.
Then compare the outputs with frontier systems in actual workflows.
A lot of “SOTA” starts to look smaller.