Georgina @workgeorgina - Twitter Profile

Pinned Tweet

about 4 years ago

Everyone in a business needs to help to identify what the simplest biggest steps for positive environmental change are. ♻️ It’s not possible for one person to know the worst inefficiencies across all parts of a product’s life-cycle. #EarthDay22 🧵(2/2)

1

0

WorkGeorgina retweeted

Artificial Analysis

@ArtificialAnlys

17 days ago

Claude Opus 4.8 takes the lead on the Artificial Analysis Intelligence Index at 61.4, with Anthropic retaking the #1 spot on GDPval-AA and advancing in terminal use and scientific reasoning To reach the leading position on the Intelligence Index, @Anthropic made large improvements in both real-world agentic work and frontier academic reasoning tasks. Key takeaways: ➤ Claude Opus 4.8 is the new leader on the Artificial Analysis Intelligence Index. Opus 4.8 scores 61.4, up +4.1 points from Opus 4.7 and +1.2 points ahead of GPT-5.5 (xhigh), the previous Index leader ➤ The new release is slightly more efficient than its predecessor on agentic tasks, but token efficiency varied by task type. We saw Opus 4.8 use fewer turns and output tokens on GDPval-AA, but approximately the same number of output tokens for the overall Intelligence Index to achieve significantly higher performance. ➤ Anthropic retakes the lead on GDPval-AA, our primary evaluation for agentic performance on knowledge work tasks. Opus 4.8 scored an 1,890 Elo, reflecting an implied win rate of approximately 67% against GPT-5.5 ➤ Claude is now among the top models for scientific reasoning. Previous releases have trailed peers on complex academic reasoning tasks, but with Opus 4.8, Claude sits slightly ahead of OpenAI and Google as the leader on Humanity’s Last Exam. It also scores higher than Gemini 3.1 Pro on CritPt, a frontier physics benchmark, but remains behind GPT-5.4 and GPT-5.5 ➤ Claude Opus 4.8 reaches #2 on AA-Omniscience, slightly ahead of Opus 4.7. Opus 4.8 scores 27.4 on the AA-Omniscience Index behind only Gemini 3.1 Pro (32.9). Accuracy ticked up slightly to 46.6% and hallucination rate held roughly flat at 35.9% - Anthropic continues to demonstrate substantially lower hallucination rates than peer models from Google and OpenAI ➤ Compared with Opus 4.7, Opus 4.8 also makes material gains on Terminal-Bench Hard (+6.8 points), τ²-Bench Telecom (+5.9 points), and IFBench (+3.6 points), with relatively flat scores across AA-LCR, GPQA, and SciCode. Other key model details remain the same as Opus 4.7: Context window of 1 million tokens (equivalent to Opus 4.7) Pricing of $5/$25 per million tokens of input/output; cache pricing remains at a 25% premium for cache writes ($6.25 per million tokens) with 5-minute time to live, and 90% discount for cache hits ($0.5 per million tokens) Effort remains the recommended way of configuring model performance and latency, with the same options as Opus 4.7 - we measured the model at its ‘max’ effort setting to test peak performance

ArtificialAnlys's tweet photo. Claude Opus 4.8 takes the lead on the Artificial Analysis Intelligence Index at 61.4, with Anthropic retaking the #1 spot on GDPval-AA and advancing in terminal use and scientific reasoning

To reach the leading position on the Intelligence Index, @Anthropic made large improvements in both real-world agentic work and frontier academic reasoning tasks.

Key takeaways:
➤ Claude Opus 4.8 is the new leader on the Artificial Analysis Intelligence Index. Opus 4.8 scores 61.4, up +4.1 points from Opus 4.7 and +1.2 points ahead of GPT-5.5 (xhigh), the previous Index leader

➤ The new release is slightly more efficient than its predecessor on agentic tasks, but token efficiency varied by task type. We saw Opus 4.8 use fewer turns and output tokens on GDPval-AA, but approximately the same number of output tokens for the overall Intelligence Index to achieve significantly higher performance.

➤ Anthropic retakes the lead on GDPval-AA, our primary evaluation for agentic performance on knowledge work tasks. Opus 4.8 scored an 1,890 Elo, reflecting an implied win rate of approximately 67% against GPT-5.5

➤ Claude is now among the top models for scientific reasoning. Previous releases have trailed peers on complex academic reasoning tasks, but with Opus 4.8, Claude sits slightly ahead of OpenAI and Google as the leader on Humanity’s Last Exam. It also scores higher than Gemini 3.1 Pro on CritPt, a frontier physics benchmark, but remains behind GPT-5.4 and GPT-5.5

➤ Claude Opus 4.8 reaches #2 on AA-Omniscience, slightly ahead of Opus 4.7. Opus 4.8 scores 27.4 on the AA-Omniscience Index behind only Gemini 3.1 Pro (32.9). Accuracy ticked up slightly to 46.6% and hallucination rate held roughly flat at 35.9% - Anthropic continues to demonstrate substantially lower hallucination rates than peer models from Google and OpenAI

➤ Compared with Opus 4.7, Opus 4.8 also makes material gains on Terminal-Bench Hard (+6.8 points), τ²-Bench Telecom (+5.9 points), and IFBench (+3.6 points), with relatively flat scores across AA-LCR, GPQA, and SciCode.

Other key model details remain the same as Opus 4.7:
Context window of 1 million tokens (equivalent to Opus 4.7)
Pricing of $5/$25 per million tokens of input/output; cache pricing remains at a 25% premium for cache writes ($6.25 per million tokens) with 5-minute time to live, and 90% discount for cache hits ($0.5 per million tokens)
Effort remains the recommended way of configuring model performance and latency, with the same options as Opus 4.7 - we measured the model at its ‘max’ effort setting to test peak performance

16

693

71

95

53K

Georgina @WorkGeorgina

over 2 years ago

A new European law threatens our right to livable communities. For mining corporations who want to extract our Earth, no area would be off limits. But we can still stop this. Call on EU politicians to #StopBloodyMining and put people and the planet first: https://t.co/UXLp6jpXe5

0

21

Georgina @WorkGeorgina

over 2 years ago

I had a great experience volunteering with my team! @wearegoodera #goodfie #virtualVolunteering https://t.co/N1S4qpBoph

0

16

Who to follow

Dr. Fauzia I. Abro

@Victoryabro

Explorer

WorkGeorgina retweeted

FORTUNE

@FortuneMagazine

over 2 years ago

“Universities should jump in hard in terms of training people on how to use AI and also make them aware of the risks of AI,” @IBM CEO Arvind Krishna says at #CEOInitiative. https://t.co/T8cpqsOALf

0

20

6

0

6K

Georgina @WorkGeorgina

over 2 years ago

The ✨magic✨ of ‘2030’ and ‘2050’… those distant years that current politicians and business executives won’t need to worry themselves about?

Paul Smith @pavsmith

over 2 years ago

Brazil posts its highest temperature ever: in Winter. Tell me again why we have until 2050 to stop increasing the amount of pollution we're pumping into the planet? Who says we have that long? Why such a round number? Is it because it is over the horizon. We deal with it later?

0

5

0

177

0

1

0

33

Georgina @WorkGeorgina

over 2 years ago

Good to see this: @TCS Empowers: Five Facts About Tata Consultancy Services’ CSR Efforts in North America https://t.co/Fh7GuIi08O via @csrwire 🌉🏗️🫶

0

24

WorkGeorgina retweeted

IBM News @IBMNews

almost 3 years ago

Today IBM announces its role as an Associate Pathway Partner sponsor of the 2023 UN Climate Change Conference of Parties, @COP28_UAE: https://t.co/4WQRGHeJuF

IBMNews's tweet photo. Today IBM announces its role as an Associate Pathway Partner sponsor of the 2023 UN Climate Change Conference of Parties, @COP28_UAE: https://t.co/4WQRGHeJuF https://t.co/IFqDZLsY0L

1

74

33

1

16K

Georgina @WorkGeorgina

almost 3 years ago

My sustainability reading list 📖 There are many great recommendations out there, and I’ve gathered quite a collection. 📚📚📚 Which do you recognize, which should I start with? #summerreading #readinglist #nonfiction @AlexAndBooks_

WorkGeorgina's tweet photo. My sustainability reading list 📖

There are many great recommendations out there, and I’ve gathered quite a collection. 📚📚📚

Which do you recognize, which should I start with? #summerreading #readinglist #nonfiction @AlexAndBooks_ https://t.co/cSYDOzXW4I

0

1

0

150

WorkGeorgina retweeted

IBM Data, AI & Automation @IBMData

almost 3 years ago

With sustainability core to business growth, leveraging #data through #AI automation is key to identifying sustainability opportunities and assessing progress. Learn more in the latest @IBMIBV report: https://t.co/hNgNSXMPDH

0

21

14

3

4K

WorkGeorgina retweeted

The Figen

@TheFigen_

almost 3 years ago

Nature is amazing! A camera recorded from start to finish how a bird built its nest and had its chicks.

3K

266K

51K

16K

21M

Georgina @WorkGeorgina

almost 3 years ago

4 Reasons @RishiSunak must #StopRosebank: 🌎As much emissions as 28 countries in the global south ❌Won't lower bills one bit 🛑Delays a just transition to renewable energy 💷Will make @Equinor_UK's CEO even richer, while the public picks up the cost https://t.co/9VDJe0k1z9

0

27

Georgina @WorkGeorgina

almost 3 years ago

Thanks to #ibmbusinesspartner Progressive TSL for inviting me to be part of their ESG solution messaging. #lightscameraaction 🎬 https://t.co/zZ0ZXje5c3

0

1

0

22

WorkGeorgina retweeted

We Don't Have Time @WeDontHaveTime

almost 3 years ago

BlackRock, the world's largest asset manager (with over 9 trillion USD under management), has recently appointed Amin Nasser, the CEO of Saudi Aramco, the world's largest oil company, as an independent director. https://t.co/graKSeFxbu

1

7

3

3K

Georgina @WorkGeorgina

almost 3 years ago

Happy to see @IBM in the top 30 of @Forbes #top100 Net Zero Leaders 2023. Practicing what we preach - IBM use @Envizi and other leading software to strategise and execute their sustainability targets. https://t.co/X0dK0BgDFM #NetZero #esg

0

26

WorkGeorgina retweeted

IBM watsonx @IBMwatsonx

about 3 years ago

RECAP: #Think2023 Sustainability Keynote – IBM's John Granger and Christina Shim take the stage to outline IBM’s sustainability framework, inviting @BMO and @NASA to share how they leveraging data, #AI and foundation models to meet, measure and monitor #ESG initiatives.

IBMwatsonx's tweet photo. RECAP: #Think2023 Sustainability Keynote – IBM's John Granger and Christina Shim take the stage to outline IBM’s sustainability framework, inviting @BMO and @NASA to share how they leveraging data, #AI and foundation models to meet, measure and monitor #ESG initiatives. https://t.co/CwhGxavLHW

0

37

12

0

4K

Georgina @WorkGeorgina

about 3 years ago

62% executives consider a sustainability strategy essential in order to be competitive. Another 22% think it will be a requirement in the future. The time is now! 🌿 @IBMIBV #letsgogreen #sustainabilitystrategy

WorkGeorgina's tweet photo. 62% executives consider a sustainability strategy essential in order to be competitive. Another 22% think it will be a requirement in the future. The time is now! 🌿 @IBMIBV #letsgogreen #sustainabilitystrategy https://t.co/5jYyzcUiSM

0

43

Georgina @WorkGeorgina

about 3 years ago

‘Working towards the day where every investment will be sustainable’ well said by @DeborahMeaden on @FemaleInvest

0

1

0

36

Georgina @WorkGeorgina

about 3 years ago

🎶 ‘Queen of Kings’ by Alessandra strikes me as a great soundtrack for Aelin Ashryver Galathynius @sarahjmaas 👑🔥 #Reading @ThroneOfGlassMY

0

58

Georgina @WorkGeorgina

about 3 years ago

I've just registered for Economist Impact's 8th annual Sustainability Week 🌿🌱#EconSustainability Register today >> https://t.co/2mN9SWbz31

0

33

Georgina @WorkGeorgina

over 3 years ago

Is it more energy efficient to work in the office or at home? 💡Office = shared heating, lighting and food (in a canteen) + employees often use public transport. 🏠At home = no transport, but split energy across heating, lighting and cooking? ⚖️ #data #ESG

0

27

Georgina

@WorkGeorgina

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users