Ahmed @halflings - Twitter Profile

23 days ago

@Angaisb_ Hard to not become cynical re:some form of anti-Chinese bias causing these models to be excluded from most discussions/benchmarks. A lot of the recent fundamental breakthroughs came from those labs.

0

377

Ahmed @halflings

26 days ago

@biosemiote This simplification is accurate but loses some very important details; for ex. the interpreter would need to be way more constrained vs usual use of how code interpreters in harnesses (can't call an LLM or expensive API in a for loop iterating over 1M values)

1

0

28

Ahmed @halflings

28 days ago

@a1zhang (and I’m even considering implementing some of those approaches just on the basis of the results in your paper, eg if it worked there maybe my “generate prompts dynamically” approach is also worth pursuing)

0

1

0

57

Ahmed @halflings

28 days ago

@a1zhang FWIW I just told some colleagues recently that an approach was “very much RLM-like”. And it probably isn’t at all an RLM even by your definition. I see value there :) bringing ideas together and highlighting a fundamental thing that was mostly overlooked. That alone has value .

1

0

500

Who to follow

Youcef Es-skouri

@YoucefHQ

New products @Dropbox, Investor in 30+ startups (Seed & Series A), Board Director, Co-Founder @Welovebuzz

Zak El Fassi

@zakelfassi

Building frontier systems. Writing field notes on power, culture, and education.

Driss Slaoui

@iamdrissslaoui

@welovebuzz founder. 5m+ unique users daily. Angel Investor.

halflings retweeted

Lucas Beyer (bl16)

@giffmana

about 1 month ago

Can't believe we're getting this before GTA 6

17

2K

82

319

230K

halflings retweeted

Mixedbread @mixedbreadai

about 2 months ago

Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance. It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules. +11% NDCG@10 on average across multiple domains, modalities, and languages in runs with Wholembed v3. Available today in preview in Mixedbread.

mixedbreadai's tweet photo. Introducing mxbai-rerank-v3-listwise: reranking that goes beyond binary relevance.

It reads the whole candidate set, resolves conflicts, and ranks by directives like recency, source priority, and multi-step rules.

+11% NDCG@10 on average across multiple domains, modalities, and languages in runs with Wholembed v3.

Available today in preview in Mixedbread.

5

136

18

72

25K

Ahmed @halflings

2 months ago

@antoine_chaffin Could you clarify this part (filtering), Antoine? What is the "drop" here? Were these sampled as positive pairs, but the cross-encoder assigned a very low score to them so they are filtered out? What does "expose filters as metadata" mean? How are they used in training?

2

0

18

Ahmed @halflings

2 months ago

@Kimi_Moonshot Amazing work from the Kimi team as usual 👏👏

0

1

0

166

halflings retweeted

Kimi.ai @Kimi_Moonshot

2 months ago

We push Prefill/Decode disaggregation beyond a single cluster: cross-datacenter + heterogeneous hardware, unlocking the potential for significantly lower cost per token. This was previously blocked by KV cache transfer overhead. The key enabler is our hybrid model (Kimi Linear), which reduces KV cache size and makes cross-DC PD practical. Validated on a 20x scaled-up Kimi Linear model: ✅ 1.54× throughput ✅ 64% ↓ P90 TTFT → Directly translating into lower token cost. More in Prefill-as-a-Service: https://t.co/If8fA3t9Og

Kimi_Moonshot's tweet photo. We push Prefill/Decode disaggregation beyond a single cluster: cross-datacenter + heterogeneous hardware, unlocking the potential for significantly lower cost per token.

This was previously blocked by KV cache transfer overhead. The key enabler is our hybrid model (Kimi Linear), which reduces KV cache size and makes cross-DC PD practical.

Validated on a 20x scaled-up Kimi Linear model:
✅ 1.54× throughput
✅ 64% ↓ P90 TTFT
→ Directly translating into lower token cost.

More in Prefill-as-a-Service: https://t.co/If8fA3t9Og

73

3K

341

1K

689K

halflings retweeted

Logan Kilpatrick

@OfficialLoganK

2 months ago

Introducing Gemini 3.1 Flash TTS 🗣️, our latest text to speech model with scene direction, speaker level specificity, audio tags, more natural + expressive voices, and support for 70 different languages. Available via our new audio playground in AI Studio and in the Gemini API!

243

6K

416

3K

801K

Ahmed @halflings

2 months ago

@juminoz @anorth_chen It doesn’t invalidate the prefix cache because the trajectory fed at the next step is not recomputed every time (eg not reopening the 65 files it previously did), it’s just passing whatever it had last time

0

67

Ahmed @halflings

2 months ago

@juminoz @anorth_chen How does one agent invalidate the other agent’s prefix cache exactly?

1

0

236

halflings retweeted

Omar Khattab

@lateinteraction

3 months ago

As promised, here's a recording of my 30-min keynote and the subsequent Q&A for the inaugural late interaction retrieval (LIR) workshop, cc @bclavie @antoine_chaffin. The talk is admittedly advanced, as it's directed at an expert IR community. But hopefully still broadly useful!

15

805

106

1K

235K

Ahmed @halflings

3 months ago

@jescalan Give Gemini a go, even the 3.1 Flash Lite endpoint is really good with tool use! I'm using it on a couple personal projects now.

0

70

Ahmed @halflings

3 months ago

@fraenkelj @GarneloMarta @wojczarnecki This was a really really great episode! It would be great to have this on YouTube so it has more reach, and it's easier to watch while on the go

0

123

halflings retweeted

Jeremy Fraenkel

@fraenkelj

3 months ago

First Principles is our series of honest conversations about AI. No script. No slides. No talking points. First up is a fascinating discussion between our Chief Science Officer @GarneloMarta & @wojczarnecki https://t.co/CSEUkjKTXS

16

362

43

399

3M

halflings retweeted

Anish Moonka

@anishmoonka

3 months ago

Every time you get a cancer biopsy, the lab makes a tissue slide that costs about $5. It shows the shape of your cells under a microscope, and every cancer patient already has one on file. There’s a much fancier version of that test called multiplex immunofluorescence (basically a protein-level map showing which immune cells are near your tumor and what they’re doing). It costs thousands of dollars per sample, takes specialized equipment most hospitals don’t have, and barely scales. But it’s the kind of data oncologists need to figure out whether immunotherapy will actually work for you. Right now, only about 20 to 40% of cancer patients respond to immunotherapy, and one of the biggest reasons is that doctors can’t easily tell whether a tumor is “hot” (immune cells actively fighting it) or “cold” (immune system ignoring it). Microsoft, Providence Health, and the University of Washington trained an AI to analyze the $5 slide and predict what the expensive test would show across 21 different protein markers. They called it GigaTIME, trained it on 40 million cells in which both the cheap slide and the expensive test coexisted, and then turned it loose on 14,256 real cancer patients across 51 hospitals in 7 US states. The results landed in Cell, one of the most selective journals in biology. The model generated about 300,000 virtual protein maps covering 24 cancer types and 306 subtypes. It found 1,234 real, verified connections between immune cell behavior, genetic mutations, tumor staging, and patient survival that were previously invisible at this scale. When they tested it against a completely separate database of 10,200 cancer patients, the results matched up almost perfectly (0.88 out of 1.0 agreement). Nature Methods named spatial proteomics (mapping where specific proteins sit inside your tissue) its Method of the Year in 2024, and specifically cited GigaTIME in a March 2026 update as a model that “democratizes” this kind of analysis. The full model is open-source on Hugging Face. Any cancer research lab with archived biopsy slides, and most of them have thousands, can now run virtual immune profiling without buying a single piece of new equipment.

102

11K

2K

6K

947K

halflings retweeted

Andy Masley

@AndyMasley

3 months ago

Each frontier AI model seems to use a little under a year's worth of a square mile of farmland's water to train. I think about this as the country having 4 square miles of farmland sectioned off to grow some of the most popular consumer products in history.

AndyMasley's tweet photo. Each frontier AI model seems to use a little under a year's worth of a square mile of farmland's water to train. I think about this as the country having 4 square miles of farmland sectioned off to grow some of the most popular consumer products in history. https://t.co/NvadvVahzG

211

8K

462

2K

602K

halflings retweeted

Mixedbread @mixedbreadai

4 months ago

Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages. Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos... You can now get the best retrieval performance on your data, no matter its format.

mixedbreadai's tweet photo. Introducing Mixedbread Wholembed v3, our new SOTA retrieval model across all modalities and 100+ languages.

Wholembed v3 brings best-in-class search to text, audio, images, PDFs, videos...

You can now get the best retrieval performance on your data, no matter its format. https://t.co/PYT3Ryerxm

35

949

119

757

203K

Ahmed @halflings

4 months ago

@MossyPathways @NoamShazeer one possible reason -> prefill-heavy tasks like summarizing a huge amount of text the small model is going to be much faster at that, even w/ thinking on

0

1

0

21

Ahmed

@halflings

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users