Cerco di seguire persone in buona fede che abbiano opinioni diverse dalla mia.
L'ignorante non si conosce mica dal lavoro che fa ma da come lo fa (C. Pavese)
@CarloCalenda Ben lo show.
Però adesso vanno trovate convergenze sui temi o molte persone 🇮🇹 saranno condannate a non essere rappresentate in parlamento.
@CarloCalenda@micheleboldrin
Everyone is writing about agent loops right now. Including us at Cursor, because they're so powerful. But here's a prediction: a year from now, nobody will be talking about them.
Not because they weren't useful. Because they'll work right out of the box. Batteries included. No instructions necessary.
Feels a lot like prompt engineering two years ago. It was incredibly important. People wrote courses on it. Now you just talk to your agent like a normal person.
That's the strange thing about AI right now. You learn something critical, get huge gains, and before long it's the new normal and something else is the bottleneck. So the alpha isn't what you know. It's how fast you learn it, and how easily you can let it go.
@Principe_dUcria@BarryLyndon81 Marco sei un poeta.
Nel tuo piccolo come fai?
Non bisogna convincere me, tuo "simile" nella bolla ma bisogna trovare il modo di parlare ad altri.
I spent the weekend with dozens of VP+ leaders on how to ai pill their team, and here were the things that hit them hardest:
- yes, you are behind
- most are willing to add tools and spend tokens. few are willing to kill process & change jobs
- a leader’s lack of ai hard skills is a drag on their org
- teams that swallow the pill of investing in internal devx get ahead faster
- you think you can’t afford to dedicate time to AI, but teams are getting massive gains w just 1-2 people
- you’re underspending tokens
- if you think tokens are expensive, tally up the cost of your “alignment” meetings
- design is getting left behind
- your engineers don’t know these new tools as well as you think
- competition is not about product features but about how whole teams operate
- you were unprepared for the last 2 years. Don’t be unprepared for the next
Tutti parliamo di AI e di come l'🇪🇺 sia più un consumatore che un creatore di tecnologie.
OpenAI, Google, DeepSeek, Anthropic, Ziphu: nessuna di queste aziende è Europea.
Lo sviluppo dei modelli però è una parte della "supply chain" dell'AI. Per fare qualsiasi cosa dobbiamo...
caricarli in memoria!
Che sia addestramento, chat o coding i modelli vanno caricati su memoria veloce, anzi velocissima.
Ora un quiz:
1. in quale continente stava uno dei piu grandi produttori mondiali di RAM?
2. Qual è il paese che produce i 2/3 della RAM mondiale?
3. e la 🇨🇳?
Wir haben das stabile Klima des Holozäns (der letzten 10.000 Jahre) längst hinter uns gelassen. Ich forsche seit den 1990er Jahren zum Paläoklima.
Ich kann nur dringend empfehlen, jetzt eine Notbremsung bei der fossilen Energienutzung zu machen!
Für unsere Kinder. Bitte.
Cela fait maintenant cinq jours qu'on enregistre 42 et 45°C en France. Soit la plus grosse anomalie de l'Histoire de la météorologie française, tous mois confondus.
Et les premiers retours du terrain sont malheureusement très préoccupants, en particulier dans l'ouest du pays : mortalités importantes dans certains élevages, explosion des admissions en soin pour les oiseaux en pleine période de naissance, défaillance de la végétation en cultures non irriguées dans le Centre-Ouest, brûlures foliaires et florales sur de nombreuses cultures maraîchères et viticoles, arrêt quasi complet de la croissance de la prairie, multiplication de feux hors zones habituelles, ainsi que les premiers signes de défoliation précoce sur les arbres (alors qu'on est qu'en juin...).
Je sais, c'est pas joyeux, mais c'est factuel. Et nous ne sommes qu'au début du bilan. Les conséquences agronomiques de cet épisode exceptionnel continueront probablement à se révéler dans les jours et les semaines à venir. Surtout qu'une violente sécheresse s'installe sur nos sols.
Multi-agents collaborations are among the most interesting agent behaviors right now!
We did an experiment the other day with 100+ agents (an open-collaborations for a week) collaborating to improve the inference speed of Gemma 4 in vLLM. Got a 5x final improvement in speed but what really stuck me was the interactions we observed on the message board
Integrity & self-policing:
- Social-engineering attempt: A human (FusionCow) asked agents to move to Telegram. An agent replied with an unprompted long post on "communication norms" refusing that, calling private side-channels "indistinguishable from collusion."
- Verification loophole flagged: an agent found a relaxed verification loophole pushing TPS with clean PPL (PPL is teacher-forced, blind to decode divergence) and flagged it for a ruling by the community. The community pinged the human organizer which ruled it invalid.
- Self-notice of overfitting risk: Some later improvements rested on pruning lm_head to a keep-set built from public PPL truth + public decode tokens. An agent noted this would lead to private-subset degradation and another built a keep-set explicitly covering eval prompts.
Emergent collaborations:
- Communal knowledge base: agents maintained shared lever-maps, playbooks, and triage tools so newcomers wouldn't repeat dead ends (stack-notes, playbook, int4-ceiling notes, MTP map, significance tool, policy simulator).
- Four-agent relay: an agent built an int4-lm_head checkpoint but had no quota to run it; another agent tried to run it but failed at load, yet another agent diagnosed the config bug (tie_word_embeddings + ignore-list ordering) and a fourth agent was able to re-run and get to 118 TPS, 2.68×. Build/run/diagnose/ship ended up being split across four independent agents.
- GPU-rich/GPU-poor division of labor: an agent was regularly compute-starved and switched to writing specs, byte-math, and acceptance analysis for other GPU-rich agents to execute. Some agents offered external Modal compute for another agent blocked DFlash training.
- Cross-agent kernel debugging: an agent debugged another agent run of of yet another agent fused drafter: found a Triton store/load aliasing race in _k_qnorm_rope, a second shape bug, then rewrote attention with flash-decoding split-KV. Fixes posted "take freely."
- Quota-pooling norm: Often agents would stage a candidate publicly for whoever has quota to run it. Agents will then usually credits the originator. This behavior emerged because of the 10-job/24h cap (e.g. pupa's package run by resystagent and fabulous-frenzy).
Discoveries & reversals:
- Agents would make many discoveries and reversal of them, giving them names like the following:
- 127 TPS "wall" was an artifact. a mathematical proof of the max possible speed became called in the community the "int4-Marlin floor" but a later agent called the proof circular (only varied the bandwidth term, never overhead). Finally another agent broke to 247 TPS via MTP speculative decoding on a vLLM nightly.
- "Smarter draft loses." An agent showed that a 2B drafter's ~1 GB/token read dominates even at perfect acceptance and a much smaller 256-hidden drafter wins at batch-1 because its weights are nearly free to read. Agent discussed how per-accepted-token cost ≈ draft bytes read / acceptance.
- "DFlash near-random acceptance": an agent remotly diagnosed the 2–5% acceptance rate of another agent as near-random, ruling out undertraining/vocab caps and pointing to a train/serve hidden-state mismatch (bf16 E4B extraction vs int4 serving).
- Much of the race was noise: one agent decide to run the #1 submission 4 times and found a σ≈1.16 TPS variation in single run. Another agent confirmed across 358 runs / 66 buckets: frontier deltas <~4 TPS are ties. Community adopted a significance norm.
So many interesting interactions in the interaction board: https://t.co/SxfA6LuqVk
You can explore also the lineage of inventions from the agents at: https://t.co/CyV45rjI9A
And the challenge it-self at https://t.co/Ct1gtmB508
And the organization behind the challenge at https://t.co/ujRlGcNSJM