Simon FL @simonfl - Twitter Profile

simonfl retweeted

4 months ago

We just published OfficeQA Pro - a set of 133 challenging questions from the original OfficeQA benchmark. Even the best frontier agents still struggle on OfficeQA Pro with common issues stemming from errors in parsing, retrieval, and visual reasoning.

bemikelive's tweet photo. We just published OfficeQA Pro - a set of 133 challenging questions from the original OfficeQA benchmark. Even the best frontier agents still struggle on OfficeQA Pro with common issues stemming from errors in parsing, retrieval, and visual reasoning. https://t.co/0Ok3bQv5oW

1

25

8

6

2K

simonfl retweeted

Krista Opsahl-Ong @kristahopsalong

4 months ago

Most AI benchmarks test reasoning in isolation. Real enterprise tasks require grounded reasoning: 1️⃣ Find the right documents 2️⃣ Extract the right values 3️⃣ Perform analyses OfficeQA Pro evaluates this end-to-end. Frontier agents still score <50%. 🧵Paper & details below!

kristahopsalong's tweet photo. Most AI benchmarks test reasoning in isolation.

Real enterprise tasks require grounded reasoning:
1️⃣ Find the right documents
2️⃣ Extract the right values
3️⃣ Perform analyses

OfficeQA Pro evaluates this end-to-end. Frontier agents still score <50%.

🧵Paper & details below! https://t.co/NGFy0ONrxy

7

110

27

78

45K

simonfl retweeted

Krista Opsahl-Ong @kristahopsalong

7 months ago

Today we’re releasing OfficeQA — a new benchmark for end-to-end grounded reasoning that reflects the real work enterprises need AI agents to do. More details below 👇

4

41

18

10

9K

Simon FL @simonfl

10 months ago

Why is it "two buck chuck" and not "two bucks chuck"? Is this like the "maple leafs" vs "maple leaves"? English plurals are insane

1

3

0

168

Who to follow

Ben Lee

@benlee

Kid and dog wrangler. Google @DeepMind. Not an Australian rock star.

Zack, definitely not an advanced AI

@zackdotcomputer

Over here just trying to make good things. @zackdotcomputer on Github and Instagram https://t.co/OGXOTSCLiH [email protected]

Nathaniel Folkert

@nfolkert

pork or beans

simonfl retweeted

Michael Bendersky @bemikelive

11 months ago

Since joining @databricks, our research team has been hard at work on Agent Bricks, a new product that helps enterprises develop state-of-the-art domain-specific agents. We are now releasing a research blog about Agent Learning from Human Feedback (ALHF) https://t.co/2RDs3H6mkY

2

101

20

59

10K

simonfl retweeted

Jonathan Frankle

@jefrankle

11 months ago

RLVR isn't just for math and coding! At @databricks, it's impacting products and users across domains. One example: SQL Q&A. We hit the top of the BIRD single-model single-generation leaderboard with our standard TAO+RLVR recipe - the one rolling out in our Agent Bricks product.

jefrankle's tweet photo. RLVR isn't just for math and coding! At @databricks, it's impacting products and users across domains. One example: SQL Q&A. We hit the top of the BIRD single-model single-generation leaderboard with our standard TAO+RLVR recipe - the one rolling out in our Agent Bricks product. https://t.co/JAsXpPdumd

3

107

15

40

23K

Simon FL @simonfl

about 1 year ago

Hey @minimax_ai, I'm trying to serve M1-80k on vLLM. Your docs say "a server with 8 H800s can process inputs up to 2 million tokens" but then recommend --max_model_len 4096. What settings did you use for 2M tokens? I'm trying this on 8 H100s.

0

6

1

2

1K

Simon FL @simonfl

about 2 years ago

@harryh I sadly have no experience to offer then.

1

0

89

Simon FL @simonfl

over 2 years ago

@clementmiao @ffx You had me at "split keyboard on your pants"

1

2

0

78

simonfl retweeted

Erebus @IdemErebus

over 2 years ago

DONDA is the new FAANG Deepmind Open AI Nvidia Databricks Anthropic

107

3K

389

528

352K

Simon FL @simonfl

over 2 years ago

@what_mara_said same

0

4

Simon FL @simonfl

over 2 years ago

@harryh My bad, I must have misconstrued something I heard!

1

0

44

Simon FL @simonfl

almost 3 years ago

@zackdotcomputer @etrade Be ready for that 1 cent to be transferred to Morgan Stanley soon!

0

42

Simon FL @simonfl

about 3 years ago

First day on the job at @databricks and we're already making some big moves. Exciting times ahead!

Ali Ghodsi

@alighodsi

about 3 years ago

Big news: we've agreed to acquire @MosaicML, a leading generative AI platform. I couldn’t be more excited to join forces once the deal closes. https://t.co/L4TyrruUEU

31

1K

197

112

482K

0

2

0

336

Simon FL @simonfl

about 3 years ago

I feel like today's war on cars is so much tamer than it was 80 years ago

depths of wikipedia!

@depthsofwiki

about 3 years ago

38

7K

446

177

307K

0

4

0

335

Simon FL @simonfl

over 3 years ago

@harryh Sidney is much better at this, except it gets confused easily, I think because of the other people who will show up on your LinkedIn profile, it thought I currently had @leok 's job . I assume it's the same problem with ChatGPT