Encontramos a esta perrita el 12 de abril en Xochicalco y Matías Romero, Col. Vértiz Narvarte. Fue atropellada, está en buenas manos pero necesitamos encontrar a su familia.
Es muy buena y obediente y claramente es de casa
Laura: 55 2699 9193 / 55 7436 1889
@En_laDelValle
Imagina ser extranjero y encontrarte con tres luchadores enmascarados peleando en plena calle. Y cuando crees que ya lo viste todo, una mujer se les lanza como si fuera Rey Mysterio.
El surrealismo mexicano nunca dejará de sorprenderme. 😭🇲🇽
“AI agents will outperform humans at almost all jobs by 2026–2027.” - The forecast is everywhere.
So we built the exam to test that claim, on real labor-market aligned work. On the hardest tier, top agents pass 2.6%.
Meet Agents' Last Exam (ALE), a rolling benchmark measuring whether agents can actually do real jobs. 🧵👇
We're proud Snorkel AI is part of Agents' Last Exam, with our researchers @amanda_dsouza and @vincentsunnchen among the co-authors and support from our Open Benchmarks Grants initiative.
The forecast: agents will do almost every job by 2027. The result on real, code-graded work? Top agents pass just 2.6% on the hardest tier.
Excited to keep pushing this forward with @YiyouSun, @dawnsongtweets and the @BerkeleyRDI team. 👇
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.
At 19 years old Mirra Andreeva is a major champion and it’s all thanks to Mirra. Mirra who put in the work, Mirra who battled for every point, Mirra who believed in Mirra.
Many thanks, Mirra.
New Benchtalks with @jyangballin: on ProgramBench (0% frontier models at launch) and the lineage/future of coding benchmarks, from SWE-bench/InterCode to now
01:29 ProgramBench launch and reception
03:41 Why artifact-level evaluation, not code-level
06:03 Why models love Python
08:29 ProgramBench as a research tool
12:45 From SWE-bench & InterCode to ProgramBench
17:47 How to grade a coding model
21:53 The position paper & humans in the loop
25:01 Managing quality with agents-in-the-loop
28:40 Internet access and benchmark integrity
35:26 Where models may surpass human abilities
38:56 When a model hits 80% on ProgramBench
43:55 Benchmarks worth paying attention to
46:24 What benchmark do you wish existed
49:32 Will benchmarks still look like benchmarks in 5 years
52:02 How to contribute to ProgramBench
History makers! 🇲🇽
#OnThisDay in 1986, Mexico became the first country to host the @fifaworldcup twice.
This year, Mexico will become the first country to have ever hosted or co-hosted the tournament three times! 🙌
@Claudiashein : Salvemos Mahahual — Detengamos el proyecto destructivo de Royal Caribbean - ¡Firma la petición! https://t.co/qafVZCT2u3 via @Change_Mex
Increased quality & rigor in Terminal-Bench 2.1:
- improved instructions, task specifications, and test alignment
- calibrated resource/timeout limits and dependencies
- reward-hacking prevention
Kudos to @ekellbuch, @terminalbench team, and the broader community for raising the bar here - @SnorkelAI is proud to support!
@AccionesBJ@BJAlcaldia los vecinos de la Santa Cruz Atoyac están aventando cuetes desde las 8 am. Una falta de respeto a todas las personas que viven cerca.
Estamos en 2026, los cuetes deberían estar prohibidos: contaminan y es molesto para personas y animales.
GOLAZO DE CRISTIANO RONALDO. ESTE GOL PUEDE VALER LA LIGA ÁRABE!!!
SU GOL 970!!!!
La pidió ahí, saltó como si tuviera 20 años, y mira ese giro de cabeza y la pone en el ángulo.
JAMÁS VEREMOS A NADIE IGUAL!!!! 🐐🐐🐐🐐 @Cristiano