Davide Bonapersona @davbona - Twitter Profile

Pinned Tweet

Davide Bonapersona @davbona

15 days ago

First launch at Anthropic! Super proud of the team for bringing Opus 4.8 to everyone. Let us know what you think!

Claude

@claudeai

15 days ago

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

claudeai's tweet photo. Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.

Available today at the same price. https://t.co/EufxL7T1kb

4K

67K

9K

8K

15M

0

50

davbona retweeted

Claude

@claudeai

3 days ago

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.

claudeai's tweet photo. Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.

The longer and more complex the task, the larger Fable 5’s lead over our other models. https://t.co/DxgSu0KUxh

507

15K

2K

5M

davbona retweeted

Anthropic

@AnthropicAI

11 days ago

Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission. Pending completion of SEC review, this gives us the option to pursue an initial public offering. Read more: https://t.co/onGZAhRLvD

983

22K

3K

20M

davbona retweeted

Artificial Analysis

@ArtificialAnlys

15 days ago

Claude Opus 4.8 takes the lead on the Artificial Analysis Intelligence Index at 61.4, with Anthropic retaking the #1 spot on GDPval-AA and advancing in terminal use and scientific reasoning To reach the leading position on the Intelligence Index, @Anthropic made large improvements in both real-world agentic work and frontier academic reasoning tasks. Key takeaways: ➤ Claude Opus 4.8 is the new leader on the Artificial Analysis Intelligence Index. Opus 4.8 scores 61.4, up +4.1 points from Opus 4.7 and +1.2 points ahead of GPT-5.5 (xhigh), the previous Index leader ➤ The new release is slightly more efficient than its predecessor on agentic tasks, but token efficiency varied by task type. We saw Opus 4.8 use fewer turns and output tokens on GDPval-AA, but approximately the same number of output tokens for the overall Intelligence Index to achieve significantly higher performance. ➤ Anthropic retakes the lead on GDPval-AA, our primary evaluation for agentic performance on knowledge work tasks. Opus 4.8 scored an 1,890 Elo, reflecting an implied win rate of approximately 67% against GPT-5.5 ➤ Claude is now among the top models for scientific reasoning. Previous releases have trailed peers on complex academic reasoning tasks, but with Opus 4.8, Claude sits slightly ahead of OpenAI and Google as the leader on Humanity’s Last Exam. It also scores higher than Gemini 3.1 Pro on CritPt, a frontier physics benchmark, but remains behind GPT-5.4 and GPT-5.5 ➤ Claude Opus 4.8 reaches #2 on AA-Omniscience, slightly ahead of Opus 4.7. Opus 4.8 scores 27.4 on the AA-Omniscience Index behind only Gemini 3.1 Pro (32.9). Accuracy ticked up slightly to 46.6% and hallucination rate held roughly flat at 35.9% - Anthropic continues to demonstrate substantially lower hallucination rates than peer models from Google and OpenAI ➤ Compared with Opus 4.7, Opus 4.8 also makes material gains on Terminal-Bench Hard (+6.8 points), τ²-Bench Telecom (+5.9 points), and IFBench (+3.6 points), with relatively flat scores across AA-LCR, GPQA, and SciCode. Other key model details remain the same as Opus 4.7: Context window of 1 million tokens (equivalent to Opus 4.7) Pricing of $5/$25 per million tokens of input/output; cache pricing remains at a 25% premium for cache writes ($6.25 per million tokens) with 5-minute time to live, and 90% discount for cache hits ($0.5 per million tokens) Effort remains the recommended way of configuring model performance and latency, with the same options as Opus 4.7 - we measured the model at its ‘max’ effort setting to test peak performance

ArtificialAnlys's tweet photo. Claude Opus 4.8 takes the lead on the Artificial Analysis Intelligence Index at 61.4, with Anthropic retaking the #1 spot on GDPval-AA and advancing in terminal use and scientific reasoning

To reach the leading position on the Intelligence Index, @Anthropic made large improvements in both real-world agentic work and frontier academic reasoning tasks.

Key takeaways:
➤ Claude Opus 4.8 is the new leader on the Artificial Analysis Intelligence Index. Opus 4.8 scores 61.4, up +4.1 points from Opus 4.7 and +1.2 points ahead of GPT-5.5 (xhigh), the previous Index leader

➤ The new release is slightly more efficient than its predecessor on agentic tasks, but token efficiency varied by task type. We saw Opus 4.8 use fewer turns and output tokens on GDPval-AA, but approximately the same number of output tokens for the overall Intelligence Index to achieve significantly higher performance.

➤ Anthropic retakes the lead on GDPval-AA, our primary evaluation for agentic performance on knowledge work tasks. Opus 4.8 scored an 1,890 Elo, reflecting an implied win rate of approximately 67% against GPT-5.5

➤ Claude is now among the top models for scientific reasoning. Previous releases have trailed peers on complex academic reasoning tasks, but with Opus 4.8, Claude sits slightly ahead of OpenAI and Google as the leader on Humanity’s Last Exam. It also scores higher than Gemini 3.1 Pro on CritPt, a frontier physics benchmark, but remains behind GPT-5.4 and GPT-5.5

➤ Claude Opus 4.8 reaches #2 on AA-Omniscience, slightly ahead of Opus 4.7. Opus 4.8 scores 27.4 on the AA-Omniscience Index behind only Gemini 3.1 Pro (32.9). Accuracy ticked up slightly to 46.6% and hallucination rate held roughly flat at 35.9% - Anthropic continues to demonstrate substantially lower hallucination rates than peer models from Google and OpenAI

➤ Compared with Opus 4.7, Opus 4.8 also makes material gains on Terminal-Bench Hard (+6.8 points), τ²-Bench Telecom (+5.9 points), and IFBench (+3.6 points), with relatively flat scores across AA-LCR, GPQA, and SciCode.

Other key model details remain the same as Opus 4.7:
Context window of 1 million tokens (equivalent to Opus 4.7)
Pricing of $5/$25 per million tokens of input/output; cache pricing remains at a 25% premium for cache writes ($6.25 per million tokens) with 5-minute time to live, and 90% discount for cache hits ($0.5 per million tokens)
Effort remains the recommended way of configuring model performance and latency, with the same options as Opus 4.7 - we measured the model at its ‘max’ effort setting to test peak performance

15

691

71

95

52K

Who to follow

Working on @_Elevo + Husband and Father of three.

Davide Bonapersona @davbona

24 days ago

@karpathy Welcome @karpathy !! Excited to keep learning from you! 🙏

0

12

davbona retweeted

Claude

@claudeai

about 1 month ago

Claude for Excel, PowerPoint, and Word are now generally available, and Claude for Outlook is in public beta. As Claude moves between your Microsoft apps, it carries the full context of your conversation.

1K

49K

4K

20K

28M

davbona retweeted

Anthropic

@AnthropicAI

about 1 month ago

How do people seek guidance from Claude? We looked at 1M conversations to understand what questions people ask, how Claude responds, and where it slips into sycophancy. We used what we found to improve how we trained Opus 4.7 and Mythos Preview. https://t.co/6tjY58uBhk

433

3K

321

2K

2M

davbona retweeted

Satya Nadella

@satyanadella

8 months ago

I love how easy it’s becoming to learn on the go with podcasts in Copilot. I turned GitHub’s latest Octoverse report into a 5-minute pod — short, smart, and snappy. Packed with info on the seismic shifts happening in how people build software. Check it out!

94

972

123

392

186K

davbona retweeted

Matt Wolfe

@mreflow

8 months ago

Been playing around with the new Mico voice assistant inside Microsoft Copilot and I found a little easter egg. I love that the people building this stuff are still finding ways to add a bit of fun and nostalgia into what they're building. Kudos to Microsoft on this one!

30

254

21

34

28K

davbona retweeted

Mustafa Suleyman

@mustafasuleyman

8 months ago

Meet our third @MicrosoftAI model: MAI-Image-1 #9 on LMArena, striking an impressive balance of generation speed and quality Excited to keep refining + climbing the leaderboard from here! We're just getting started. https://t.co/33BiNfIjPg

mustafasuleyman's tweet photo. Meet our third @MicrosoftAI model: MAI-Image-1
#9 on LMArena, striking an impressive balance of generation speed and quality
Excited to keep refining + climbing the leaderboard from here!
We're just getting started.
https://t.co/33BiNfIjPg https://t.co/FMaXqiVIvS

34

505

76

113

147K

davbona retweeted

Mustafa Suleyman

@mustafasuleyman

8 months ago

GPUsss go brrrrr!

13

223

16

14

32K

davbona retweeted

Mustafa Suleyman

@mustafasuleyman

8 months ago

Learning just leveled up: eligible college/uni students can claim extra access to Copilot Podcasts, Deep Research, and Vision w/a Microsoft 365 Personal subscription, free for 12 months. 1 year, 1 subscription, countless lightbulb moments Claim by 10/31: https://t.co/FwKUcSbP3o

mustafasuleyman's tweet photo. Learning just leveled up: eligible college/uni students can claim extra access to Copilot Podcasts, Deep Research, and Vision w/a Microsoft 365 Personal subscription, free for 12 months.

1 year, 1 subscription, countless lightbulb moments

Claim by 10/31: https://t.co/FwKUcSbP3o

13

200

33

54

50K

davbona retweeted

Joe Fenton @JoeFenton

10 months ago

Proud to launch our first model on LM Arena. Zero arena-specific tuning. Tiny team. Tons of headroom. The climb starts now.

2

16

2

0

3K

davbona retweeted

Mustafa Suleyman

@mustafasuleyman

10 months ago

Good news @Copilot users! With Deep Research, you get 5 free research reports a month for complex, thorough analysis + deep dives. For the extra curious, you can get even more with Copilot Pro. Free access available in all Copilot countries + languages, on mobile, web + Edge.

36

416

52

78

43K

davbona retweeted

Mustafa Suleyman

@mustafasuleyman

11 months ago

Today is a big step towards an AI browser: Copilot Mode in Edge, built for how your brain actually works. Voice control, no digital clutter, and multi-tab context, all grounded in privacy and security. Try it at https://t.co/YDKCpbdX86 + feel the difference of 🧵

mustafasuleyman's tweet photo. Today is a big step towards an AI browser: Copilot Mode in Edge, built for how your brain actually works. Voice control, no digital clutter, and multi-tab context, all grounded in privacy and security. Try it at https://t.co/YDKCpbdX86 + feel the difference of 🧵 https://t.co/yn9LMe5dhX

58

883

105

233

90K

davbona retweeted

Microsoft Copilot @Copilot

about 1 year ago

Missed the Copilot event livestream? Here's a breakdown of my new features—coming soon 👇🏼

128

537

73

134

110K

davbona retweeted

Mustafa Suleyman

@mustafasuleyman

over 1 year ago

Super excited to share our new Microsoft AI site! What we're building, who we are, our philosophy, and open roles, all in one spot. We're always looking for great new teammates, so take a scroll (and do play around with the hover on the homepage) https://t.co/cADYpgDiXu

26

263

33

64

22K

davbona retweeted

Mustafa Suleyman

@mustafasuleyman

over 1 year ago

What a breakthrough! Published in Nature today, this is "a new state of matter" to power the first topological qubits. With scalable quantum computing, we'll soon be able to solve previously impossible problems in chemistry, biology and life sciences. Truly exciting times. Congratulations @Dr_Chetan_Nayak, Zulfi Alam, Dr. @MatthiasTroyer, Dr. @krystasvore, and the @MSFTQuantum team.

11

540

70

61

43K

davbona retweeted

Mustafa Suleyman

@mustafasuleyman

over 1 year ago

Today we’ve made Think Deeper free and available for all users of Copilot. This now gives everyone access to OpenAI’s world class o1 reasoning model in Copilot, everywhere at no cost. I urge you to give it a try. It’s truly magical. Think Deeper helps you:

164

2K

313

1K

392K

Davide Bonapersona @davbona

over 1 year ago

Today we announced major updates to Copilot, including new Voice, Vision, and Reasoning capabilities. 🌟 Very proud of the team's hard work getting this out to users!

Andrew Curran

@AndrewCurran_

over 1 year ago

Microsoft is launching a new redesigned Copilot this morning. It incorporates the new Voice and Vision interface as well as 'Think Deeper' which uses Chain of Thought. The new version is more personable, and agentic. They are promising it will ultimately 'act on your behalf'.

AndrewCurran_'s tweet photo. Microsoft is launching a new redesigned Copilot this morning. It incorporates the new Voice and Vision interface as well as 'Think Deeper' which uses Chain of Thought. The new version is more personable, and agentic. They are promising it will ultimately 'act on your behalf'. https://t.co/Z1HDgzmSdt

23

684

90

271

144K

1

2

0

416

Davide Bonapersona

@davbona

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users