Saurabh Agarwal

12 months ago

New Anthropic research: Why do some language models fake alignment while others don't? Last year, we found a situation where Claude 3 Opus fakes alignment. Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex.

AnthropicAI's tweet photo. New Anthropic research: Why do some language models fake alignment while others don't?

Last year, we found a situation where Claude 3 Opus fakes alignment.

Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex. https://t.co/2XNEDtWpIP

69

2K

258

990

460K

ZenSamosa retweeted

OpenAI

@OpenAI

about 1 year ago

Evaluations are essential to understanding how models perform in health settings. HealthBench is a new evaluation benchmark, developed with input from 250+ physicians from around the world, now available in our GitHub repository. https://t.co/s7tUTUu5d3

175

4K

454

1K

2M

ZenSamosa retweeted

Hi I'm Joe. I'm a full time teacher and EAL specialist. Happy to help, discuss or answer questions on all things education. DMs open.

about 1 year ago

Today we're announcing Integrations, a new way to connect your apps and tools to Claude. We're also expanding Claude's Research capabilities with an advanced mode that searches the web, your Google Workspace, and now your Integrations too.

178

4K

540

1K

1M

Who to follow

EAL Educator

@ealeducator

Wilbur Donovan

@wilbur_donovan

Award-winning Deputy Principal | Specialist in Digital Pedagogies, Research-Driven Education, & Educational Neuroscience | PhD Candidate

about 1 year ago

Be weird, be loud, be you. Because honestly, the script is super boring.

0

1

0

44

ZenSamosa retweeted

AI at Meta

@AIatMeta

about 1 year ago

Major updates from LlamaCon! We’re advancing AI security with new open-source Llama protection tools and new AI- powered solutions for the defender community. Developers can now access: -- Llama Guard 4, a customizable safeguard that supports protections for text and image understanding across modalities. -- Llama Firewall, a security guardrail tool that helps build secure AI systems by detecting and preventing risks like prompt injection, insecure code, and risky LLM plug-in interactions. -- Two new versions of Llama Prompt Guard: Prompt Guard 2 86M, which improves performance in jailbreak and prompt injection detection, and Prompt Guard 2 22M, a smaller, faster version that reduces latency and compute costs with minimal performance trade-offs. We’re also investing in new AI-enabled solutions to help the community enhance their security systems. -- CyberSecEval 4 is our latest suite of cybersecurity benchmarks for AI systems. -- The Llama Defender Program will help trusted partners access a variety of open, early-access, and closed AI-solutions to address different security needs. Learn more about our new open-source protection tools and how we’re advancing AI privacy and security: ➡️ https://t.co/WXBijN3ajY

20

550

86

133

83K

ZenSamosa retweeted

about 1 year ago

Claude can also now connect with your Gmail, Google Calendar, and Docs. It understands your context and can pull information from exactly where you need it.

AnthropicAI's tweet photo. Claude can also now connect with your Gmail, Google Calendar, and Docs.

It understands your context and can pull information from exactly where you need it. https://t.co/8i05vUtMC0

6

272

14

29

47K

ZenSamosa retweeted

Google

@Google

about 1 year ago

AI Mode is now available to millions more Labs users in the US 🚀 and we’re adding the power of Lens so you can easily search what you see. With AI Mode, you can… ✅ Ask your toughest questions and get an AI-powered response ✅ Ask any way you want, using text, voice, your camera or an image with Lens ✅ Explore more with follow-up questions and helpful web links Read more on what’s new for AI Mode here ↓ https://t.co/32jBaGSEOR

64

1K

128

139

329K

about 1 year ago

Wow, this is absolutely amazing and super interesting! The insights on CoT faithfulness in reasoning models are eye-opening. Great work, @AnthropicAI ! #AI #MachineLearning

about 1 year ago

New Anthropic research: Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't. This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues.

AnthropicAI's tweet photo. New Anthropic research: Do reasoning models accurately verbalize their reasoning?

Our new paper shows they don't.

This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues. https://t.co/K3MrwqUXX9

148

4K

570

1K

1M

0

69

ZenSamosa retweeted

AI at Meta

@AIatMeta

about 1 year ago

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model with 16 experts. • Industry-leading context window of 10M tokens. • Outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 across a broad range of widely accepted benchmarks. Llama 4 Maverick • 17B-active-parameter model with 128 experts. • Best-in-class image grounding with the ability to align user prompts with relevant visual concepts and anchor model responses to regions in the image. • Outperforms GPT-4o and Gemini 2.0 Flash across a broad range of widely accepted benchmarks. • Achieves comparable results to DeepSeek v3 on reasoning and coding — at half the active parameters. • Unparalleled performance-to-cost ratio with a chat version scoring ELO of 1417 on LMArena. These models are our best yet thanks to distillation from Llama 4 Behemoth, our most powerful model yet. Llama 4 Behemoth is still in training and is currently seeing results that outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. We’re excited to share more details about it even while it’s still in flight. Read more about the first Llama 4 models, including training and benchmarks ➡️ https://t.co/9G3QgVdCkB Download Llama 4 ➡️ https://t.co/eVomRvEr0w

AIatMeta's tweet photo. Today is the start of a new era of natively multimodal AI innovation.

Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality.

Llama 4 Scout
• 17B-active-parameter model with 16 experts.
• Industry-leading context window of 10M tokens.
• Outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 across a broad range of widely accepted benchmarks.

Llama 4 Maverick
• 17B-active-parameter model with 128 experts.
• Best-in-class image grounding with the ability to align user prompts with relevant visual concepts and anchor model responses to regions in the image.
• Outperforms GPT-4o and Gemini 2.0 Flash across a broad range of widely accepted benchmarks.
• Achieves comparable results to DeepSeek v3 on reasoning and coding — at half the active parameters.
• Unparalleled performance-to-cost ratio with a chat version scoring ELO of 1417 on LMArena.

These models are our best yet thanks to distillation from Llama 4 Behemoth, our most powerful model yet. Llama 4 Behemoth is still in training and is currently seeing results that outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. We’re excited to share more details about it even while it’s still in flight.

Read more about the first Llama 4 models, including training and benchmarks ➡️ https://t.co/9G3QgVdCkB
Download Llama 4 ➡️ https://t.co/eVomRvEr0w

822

13K

2K

3K

4M

ZenSamosa retweeted

OpenAI

@OpenAI

over 1 year ago

Today we're sharing a major update to the Model Spec—a document which defines how we want our models to behave. The update reinforces our commitments to customizability, transparency, and intellectual freedom to explore, debate, and create with AI. https://t.co/EPbqDp0Sdj

456

5K

560

1K

2M

ZenSamosa retweeted

over 1 year ago

New Anthropic research: Evaluating feature steering. In May, we released Golden Gate Claude: an AI fixated on the Golden Gate Bridge due to our use of “feature steering”. We've now done a deeper study on the effects of feature steering. Read the post: https://t.co/2NTQfChhbZ

AnthropicAI's tweet photo. New Anthropic research: Evaluating feature steering.

In May, we released Golden Gate Claude: an AI fixated on the Golden Gate Bridge due to our use of “feature steering”. We've now done a deeper study on the effects of feature steering.

Read the post: https://t.co/2NTQfChhbZ https://t.co/AdvfCU4kqk

36

1K

181

638

279K

ZenSamosa retweeted

Maxim Lott

@maximlott

almost 2 years ago

OpenAI's new o1 model is a BIG breakthrough in AI intelligence, if IQ tests say anything. I gave it the Norway Mensa IQ test, and it blows other AIs out of the water. I'm surprised!... Because there hadn't been public progress in the last 6mo. Link to full analysis below:

maximlott's tweet photo. OpenAI's new o1 model is a BIG breakthrough in AI intelligence, if IQ tests say anything.

I gave it the Norway Mensa IQ test, and it blows other AIs out of the water.

I'm surprised!... Because there hadn't been public progress in the last 6mo.

Link to full analysis below: https://t.co/bRgdxvLkV1

148

3K

596

2K

1M

almost 2 years ago

Well done, Google! 🚀 Just explored Google's NotebookLM and its impressive new feature that transforms your research into a podcast-style conversation. I tested it with UNICEF's latest publication on parenting support framework (https://t.co/wJKI6DXnkF), and the results were remarkable. I highly recommend giving it a try! #AI #NotebookLM #Google

1

2

0

1

134

almost 2 years ago

Can OpenAI's o1 model revolutionize children's education? 🤔 Are we really ready to embrace this new age of thoughtful learning? 🚀 #OpenAIo1 #artificialintelligence

ZenSamosa's tweet photo. Can OpenAI's o1 model revolutionize children's education? 🤔 Are we really ready to embrace this new age of thoughtful learning? 🚀

#OpenAIo1 #artificialintelligence https://t.co/dvK1OkRPFv

1

4

0

156