@Angaisb_ Hard to not become cynical re:some form of anti-Chinese bias causing these models to be excluded from most discussions/benchmarks. A lot of the recent fundamental breakthroughs came from those labs.
@biosemiote This simplification is accurate but loses some very important details; for ex. the interpreter would need to be way more constrained vs usual use of how code interpreters in harnesses (can't call an LLM or expensive API in a for loop iterating over 1M values)
@a1zhang (and I’m even considering implementing some of those approaches just on the basis of the results in your paper, eg if it worked there maybe my “generate prompts dynamically” approach is also worth pursuing)
@a1zhang FWIW I just told some colleagues recently that an approach was “very much RLM-like”. And it probably isn’t at all an RLM even by your definition.
I see value there :) bringing ideas together and highlighting a fundamental thing that was mostly overlooked. That alone has value .
Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance.
It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules.
+11% NDCG@10 on average across multiple domains, modalities, and languages in runs with Wholembed v3.
Available today in preview in Mixedbread.
@antoine_chaffin Could you clarify this part (filtering), Antoine?
What is the "drop" here? Were these sampled as positive pairs, but the cross-encoder assigned a very low score to them so they are filtered out?
What does "expose filters as metadata" mean? How are they used in training?
We push Prefill/Decode disaggregation beyond a single cluster: cross-datacenter + heterogeneous hardware, unlocking the potential for significantly lower cost per token.
This was previously blocked by KV cache transfer overhead. The key enabler is our hybrid model (Kimi Linear), which reduces KV cache size and makes cross-DC PD practical.
Validated on a 20x scaled-up Kimi Linear model:
✅ 1.54× throughput
✅ 64% ↓ P90 TTFT
→ Directly translating into lower token cost.
More in Prefill-as-a-Service: https://t.co/If8fA3t9Og
Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages.
Available via our new audio playground in AI Studio and in the Gemini API!
@juminoz@anorth_chen It doesn’t invalidate the prefix cache because the trajectory fed at the next step is not recomputed every time (eg not reopening the 65 files it previously did), it’s just passing whatever it had last time
As promised, here's a recording of my 30-min keynote and the subsequent Q&A for the inaugural late interaction retrieval (LIR) workshop, cc @bclavie@antoine_chaffin.
The talk is admittedly advanced, as it's directed at an expert IR community. But hopefully still broadly useful!
@fraenkelj@GarneloMarta@wojczarnecki This was a really really great episode!
It would be great to have this on YouTube so it has more reach, and it's easier to watch while on the go
First Principles is our series of honest conversations about AI. No script. No slides. No talking points. First up is a fascinating discussion between our Chief Science Officer @GarneloMarta & @wojczarnecki
https://t.co/CSEUkjKTXS
Every time you get a cancer biopsy, the lab makes a tissue slide that costs about $5. It shows the shape of your cells under a microscope, and every cancer patient already has one on file.
There’s a much fancier version of that test called multiplex immunofluorescence (basically a protein-level map showing which immune cells are near your tumor and what they’re doing). It costs thousands of dollars per sample, takes specialized equipment most hospitals don’t have, and barely scales. But it’s the kind of data oncologists need to figure out whether immunotherapy will actually work for you. Right now, only about 20 to 40% of cancer patients respond to immunotherapy, and one of the biggest reasons is that doctors can’t easily tell whether a tumor is “hot” (immune cells actively fighting it) or “cold” (immune system ignoring it).
Microsoft, Providence Health, and the University of Washington trained an AI to analyze the $5 slide and predict what the expensive test would show across 21 different protein markers. They called it GigaTIME, trained it on 40 million cells in which both the cheap slide and the expensive test coexisted, and then turned it loose on 14,256 real cancer patients across 51 hospitals in 7 US states.
The results landed in Cell, one of the most selective journals in biology. The model generated about 300,000 virtual protein maps covering 24 cancer types and 306 subtypes. It found 1,234 real, verified connections between immune cell behavior, genetic mutations, tumor staging, and patient survival that were previously invisible at this scale. When they tested it against a completely separate database of 10,200 cancer patients, the results matched up almost perfectly (0.88 out of 1.0 agreement).
Nature Methods named spatial proteomics (mapping where specific proteins sit inside your tissue) its Method of the Year in 2024, and specifically cited GigaTIME in a March 2026 update as a model that “democratizes” this kind of analysis. The full model is open-source on Hugging Face. Any cancer research lab with archived biopsy slides, and most of them have thousands, can now run virtual immune profiling without buying a single piece of new equipment.
Each frontier AI model seems to use a little under a year's worth of a square mile of farmland's water to train. I think about this as the country having 4 square miles of farmland sectioned off to grow some of the most popular consumer products in history.
Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages.
Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos...
You can now get the best retrieval performance on your data, no matter its format.
@MossyPathways@NoamShazeer one possible reason -> prefill-heavy tasks like summarizing a huge amount of text
the small model is going to be much faster at that, even w/ thinking on