Michael Bendersky

@bemikelive

@ Databricks / ex-Google DeepMind / ex-Google Research Interested in research at the intersection of IR & AI.

Joined May 2010

395 Following

1.8K Followers

523 Posts

Pinned Tweet

Michael Bendersky @bemikelive

5 months ago

Really excited to share our latest research on the Instructed Retriever - a novel retrieval architecture that reimagines search for the agentic era. https://t.co/CwUWYIhFTJ Amazing work by @cindyxinyiwang and @mrdrozdov who co-led this effort!

1

21

7

11

3K

bemikelive retweeted

Databricks AI Research

1 day ago

Most agentic search systems get better by thinking longer: more tool calls, more reason-act loops, each step waiting on the last. Quality goes up, but so does latency. Instructed-Retriever-1 takes a different route. Instead of scaling test-time compute sequentially, it scales it in parallel. One retrieval-specialized model fans the work out: it generates multiple query and filter formulations to widen recall, then reranks the merged evidence with a multi-pivot reranker to sharpen precision. Both stages run at once, so searching more broadly no longer means searching more slowly. The result inside Knowledge Assistant: search time drops more than 3x and answer time 2x, with time to first token around two seconds, and no drop in quality (it matches Claude Sonnet 4.5 retrieval quality on KARLBench). For the people using it, that means far less waiting between question and answer, the freedom to ask more follow-ups, and more of the knowledge base actually surfaced. Rolling out to all customers now, with no reconfiguration. Read how we did it: https://t.co/yjqWN1KgTc

DbrxMosaicAI's tweet photo. Most agentic search systems get better by thinking longer: more tool calls, more reason-act loops, each step waiting on the last. Quality goes up, but so does latency.

Instructed-Retriever-1 takes a different route. Instead of scaling test-time compute sequentially, it scales it in parallel. One retrieval-specialized model fans the work out: it generates multiple query and filter formulations to widen recall, then reranks the merged evidence with a multi-pivot reranker to sharpen precision. Both stages run at once, so searching more broadly no longer means searching more slowly.

The result inside Knowledge Assistant: search time drops more than 3x and answer time 2x, with time to first token around two seconds, and no drop in quality (it matches Claude Sonnet 4.5 retrieval quality on KARLBench). For the people using it, that means far less waiting between question and answer, the freedom to ask more follow-ups, and more of the knowledge base actually surfaced. Rolling out to all customers now, with no reconfiguration.

Read how we did it: https://t.co/yjqWN1KgTc

4

52

9

35

7K

Michael Bendersky @bemikelive

about 1 month ago

Really exciting new collaboration with @NegarEmpr @mrdrozdov and @matei_zaharia on query utility gap between ranking and generation (to appear #SIGIR2026). Check it out!

Negar Arabzadeh

about 1 month ago

1/ "Can QPP Choose the Right Query Variant?" has been accepted at #SIGIR2026!🇦🇺 You can easily over-generate multiple query variants at low cost, but running RAG for all of them is expensive! Can we pick the winner query before paying the generation cost? https://t.co/mv5wgCgKul

NegarEmpr's tweet photo. 1/ "Can QPP Choose the Right Query Variant?" has been accepted at #SIGIR2026!🇦🇺
You can easily over-generate multiple query variants at low cost, but running RAG for all of them is expensive!
Can we pick the winner query before paying the generation cost?
https://t.co/mv5wgCgKul https://t.co/UmRyuETSJw

2

35

9

16

9K

0

6

2

1

916

bemikelive retweeted

Databricks AI Research

about 2 months ago

Most enterprise questions don't live in one dataset. They span structured systems and unstructured sources like documents, reviews, and reports. In our latest research, we show how Agent Bricks Supervisor Agent handles this by decomposing queries across structured and unstructured tools, then synthesizing results over multiple reasoning steps. The results across STaRK and KARLBench: 20%+ improvement over SoTA baselines, with the biggest gains on tasks requiring tight integration of structured and unstructured data. All built declaratively — no custom code, just precise instructions and the right tools. https://t.co/EBSM6iU89g

DbrxMosaicAI's tweet photo. Most enterprise questions don't live in one dataset. They span structured systems and unstructured sources like documents, reviews, and reports.

In our latest research, we show how Agent Bricks Supervisor Agent handles this by decomposing queries across structured and unstructured tools, then synthesizing results over multiple reasoning steps.

The results across STaRK and KARLBench: 20%+ improvement over SoTA baselines, with the biggest gains on tasks requiring tight integration of structured and unstructured data.

All built declaratively — no custom code, just precise instructions and the right tools. https://t.co/EBSM6iU89g

5

49

15

19

10K

Who to follow

I'm a search engine researcher. I tweet about information retrieval (IR) and uni work. I'm at RMIT University, these are my personal opinions.

Craig Macdonald

@craig_macdonald

Professor of Information Retrieval

Scientist, Educator & Academic. Researching and measuring how people use agents. Author of https://t.co/JqaUzI59IG Opinions are my own.

bemikelive retweeted

Databricks AI Research

2 months ago

Applications are officially open for the Grounded Reasoning Cup at Data + AI Summit 2026! 🏆 We’re looking for students who want to: - Tackle high‑impact enterprise challenges - Showcase work to top researchers/engineers (with recruiters in the room) - Compete for $100k in model credit prizes Apply here: https://t.co/bjinqjwuWr Competition overview: https://t.co/5lUJHB9V8u

DbrxMosaicAI's tweet photo. Applications are officially open for the Grounded Reasoning Cup at Data + AI Summit 2026! 🏆

We’re looking for students who want to:
- Tackle high‑impact enterprise challenges
- Showcase work to top researchers/engineers (with recruiters in the room)
- Compete for $100k in model credit prizes

Apply here: https://t.co/bjinqjwuWr
Competition overview: https://t.co/5lUJHB9V8u

2

64

14

52

31K

Michael Bendersky @bemikelive

2 months ago

@beirmug Such an impressive body of work, congratulations @beirmug !

1

2

0

0

86

Michael Bendersky @bemikelive

3 months ago

Congratulations to @kristahopsalong @arnav_thebigman @jazco @ivanzhouyq Erich Elsen @matei_zaharia and everyone at @DbrxMosaicAI who made this work possible! Special thanks you to our partners @USAFacts @superannotate @turingcom and to all Github contributors!

0

7

0

0

223

Michael Bendersky @bemikelive

3 months ago

We just published OfficeQA Pro - a set of 133 challenging questions from the original OfficeQA benchmark. Even the best frontier agents still struggle on OfficeQA Pro with common issues stemming from errors in parsing, retrieval, and visual reasoning.

bemikelive's tweet photo. We just published OfficeQA Pro - a set of 133 challenging questions from the original OfficeQA benchmark. Even the best frontier agents still struggle on OfficeQA Pro with common issues stemming from errors in parsing, retrieval, and visual reasoning. https://t.co/0Ok3bQv5oW

1

25

8

6

2K

Michael Bendersky @bemikelive

3 months ago

All of these are realistic problems that @databricks customers face in their daily work, and we hope that OfficeQA Pro will contribute to advancing SoTA on grounded reasoning tasks. Technical Report: https://t.co/Eqezt8709W Github: https://t.co/N9zFJPDC6t

1

6

1

0

253

Michael Bendersky @bemikelive

3 months ago

This was an incredibly fun collaboration with @j_nadan_chang @mrdrozdov @ShubhamToshniw6 @owenoertell @alexrtrott @WenSun1 @jefrankle and many others here at Databricks AI Research.

0

5

0

0

209

Michael Bendersky @bemikelive

3 months ago

I thought about posting a thread on KARL, a new Pareto-optimal model for retrieval and grounded reasoning tasks. But @jefrankle did a much better job than I ever could. If you have any interest in information retrieval and/or RL, check it out! Full report: https://t.co/bKvxsA3lk7

Jonathan Frankle

3 months ago

Meet KARL, an RL'd model for document-centric tasks at frontier quality and open source cost/speed. Great for @databricks customers and scientists (77-page tech report!) As usual, this isn't just one model - it's an RL assembly line to churn out models for us and our customers 🧵

jefrankle's tweet photo. Meet KARL, an RL'd model for document-centric tasks at frontier quality and open source cost/speed. Great for @databricks customers and scientists (77-page tech report!) As usual, this isn't just one model - it's an RL assembly line to churn out models for us and our customers 🧵

9

244

46

182

72K

1

26

3

6

2K

bemikelive retweeted

Matei Zaharia @matei_zaharia

4 months ago

Agent memory is a simple and powerful way to do continual learning! With the new MemAlign method from Databricks Research, we can build better LLM judges from examples of human ratings, and they scale with more data. Now in Databricks and @MLflow. https://t.co/aMbc8IZ9zb

10

235

38

182

19K

Michael Bendersky @bemikelive

5 months ago

Instructed retriever is now available for all of our Agent Bricks Knowledge Assistant customers. Consider trying it out for your next retrieval agent project. https://t.co/ksHTvYRCJV

0

1

0

0

153

Michael Bendersky @bemikelive

5 months ago

Really excited to share our latest research on the Instructed Retriever - a novel retrieval architecture that reimagines search for the agentic era. https://t.co/CwUWYIhFTJ Amazing work by @cindyxinyiwang and @mrdrozdov who co-led this effort!

1

21

7

11

3K

Michael Bendersky @bemikelive

5 months ago

Instructed retriever is not just better than RAG, but it is also a much more effective tool in a multi-step agentic setting, where it not only delivers better results, but also does it faster and in fewer steps.

bemikelive's tweet photo. Instructed retriever is not just better than RAG, but it is also a much more effective tool in a multi-step agentic setting, where it not only delivers better results, but also does it faster and in fewer steps. https://t.co/bkqWxnU9kI

1

1

0

0

189

Michael Bendersky @bemikelive

6 months ago

@mrdrozdov @jeffreyhuber "Some people, when confronted with a problem, think 'I know, I’ll use 𝚛̶𝚎̶𝚐̶𝚞̶𝚕̶𝚊̶𝚛̶ ̶𝚎̶𝚡̶𝚙̶𝚛̶𝚎̶𝚜̶𝚜̶𝚒̶𝚘̶𝚗̶ search.' Now they have two problems."

0

2

0

0

36

Michael Bendersky @bemikelive

6 months ago

If you are excited about the intersection of reinforcement learning and highly complex economically valuable tasks --I can't think of a better place to spend the summer of 2026!

Jonathan Frankle

6 months ago

I'm hiring interns for next summer at @databricks! Specifically on (1) empirical RL at scale on non-verifiable tasks and (2) enabling real people specify the behaviors they want out of AI (e.g., through evals) on highly complex tasks. 🧵

17

522

47

381

93K

0

6

0

0

242

Michael Bendersky @bemikelive

6 months ago

Big thanks to the entire @databricks AI Research team, and our partners SuperAnnotate, Turing and USAFacts!

0

5

0

0

151

Michael Bendersky @bemikelive

6 months ago

We released OfficeQA today -- a hard benchmark for evaluating agents on grounded reasoning tasks. More details in our blog https://t.co/fIRhi0sF8Y and the thread below

1

13

3

6

2K

Michael Bendersky @bemikelive

6 months ago

Huge congratulations to @kristahopsalong and @arnav_thebigman who spearheaded this work, and all our co-authors @jazco , @ivanzhouyq , @cindyxinyiwang , @abaheti95 , @JacobianNeuro , @sam_havens , Erich Elsen, @matei_zaharia and Xing Chen!

1

7

0

0

183

Last Seen Users on Sotwe

Trends for you

Most Popular Users