اليكسي @AlexPipushev - Twitter Profile

AlexPipushev retweeted

Greg Brockman

@gdb

4 months ago

measuring agentic security capabilities with smart contracts:

114

953

55

154

144K

AlexPipushev retweeted

Hasan Toor

@hasantoxr

4 months ago

🚨BREAKING: Microsoft Research + Salesforce just dropped a paper that should scare every AI builder. They tested 15 top LLMs GPT-4.1, Gemini 2.5 Pro, Claude 3.7 Sonnet, o3, DeepSeek R1, Llama 4 across 200,000+ simulated conversations. Single-turn prompt: 90% performance. Multi-turn conversation: 65% performance. Same model. Same task. Just... talking normally. The culprit isn't intelligence. Aptitude only dropped 15%. Unreliability EXPLODED by 112%. → LLMs answer before you finish explaining (wrong assumptions get baked in permanently) → They fall in love with their first wrong answer and build on it → They forget the middle of your conversation entirely → Longer responses introduce more assumptions = more errors Even reasoning models failed. o3 and DeepSeek R1 performed just as badly. Extra thinking tokens did nothing. Setting temperature to 0? Still broken. The fix right now: give your AI everything upfront in one message instead of back-and-forth. Every benchmark you've seen was tested on single-turn prompts in perfect lab conditions. Real conversations break every model on the market and nobody's talking about it.

hasantoxr's tweet photo. 🚨BREAKING: Microsoft Research + Salesforce just dropped a paper that should scare every AI builder.

They tested 15 top LLMs GPT-4.1, Gemini 2.5 Pro, Claude 3.7 Sonnet, o3, DeepSeek R1, Llama 4 across 200,000+ simulated conversations.

Single-turn prompt: 90% performance.
Multi-turn conversation: 65% performance.

Same model. Same task. Just... talking normally.

The culprit isn't intelligence. Aptitude only dropped 15%.

Unreliability EXPLODED by 112%.

→ LLMs answer before you finish explaining (wrong assumptions get baked in permanently)
→ They fall in love with their first wrong answer and build on it
→ They forget the middle of your conversation entirely
→ Longer responses introduce more assumptions = more errors

Even reasoning models failed. o3 and DeepSeek R1 performed just as badly.
Extra thinking tokens did nothing.

Setting temperature to 0? Still broken.

The fix right now: give your AI everything upfront in one message instead of back-and-forth.

Every benchmark you've seen was tested on single-turn prompts in perfect lab conditions.

Real conversations break every model on the market and nobody's talking about it.

692

9K

2K

7K

2M

AlexPipushev retweeted

deadrosesxyz

@deadrosesxyz

4 months ago

@pashov vibecoded 🤝 vibeaudited

4

53

2

0

2K

AlexPipushev retweeted

Dennis Layden

@d3layd

4 months ago

@pashov >Claude getting told what happened "Oh yea, wow whoops I see that now, I should have done that differently. Thanks for pointing that out."

5

389

5

20K

Who to follow

⚡️THOR⚡️

@ThorInAsgard

Welcome to my world of crypto #Sonic

Crypto is Future & #BNB is the key #ETH #BTC is everything NFA DYOR on top always Gnoma

AlexPipushev retweeted

a16z crypto @a16zcrypto

5 months ago

https://t.co/Xb0vvVVUvm

115

1K

190

772

218K

AlexPipushev retweeted

Massimo

@Rainmaker1973

5 months ago

Chinese safety awareness clips ⚠️

799

47K

5K

12K

16M

AlexPipushev retweeted

Peter Girnus 🦅

@gothburz

6 months ago

Last quarter I rolled out Microsoft Copilot to 4,000 employees. $30 per seat per month. $1.4 million annually. I called it "digital transformation." The board loved that phrase. They approved it in eleven minutes. No one asked what it would actually do. Including me. I told everyone it would "10x productivity." That's not a real number. But it sounds like one. HR asked how we'd measure the 10x. I said we'd "leverage analytics dashboards." They stopped asking. Three months later I checked the usage reports. 47 people had opened it. 12 had used it more than once. One of them was me. I used it to summarize an email I could have read in 30 seconds. It took 45 seconds. Plus the time it took to fix the hallucinations. But I called it a "pilot success." Success means the pilot didn't visibly fail. The CFO asked about ROI. I showed him a graph. The graph went up and to the right. It measured "AI enablement." I made that metric up. He nodded approvingly. We're "AI-enabled" now. I don't know what that means. But it's in our investor deck. A senior developer asked why we didn't use Claude or ChatGPT. I said we needed "enterprise-grade security." He asked what that meant. I said "compliance." He asked which compliance. I said "all of them." He looked skeptical. I scheduled him for a "career development conversation." He stopped asking questions. Microsoft sent a case study team. They wanted to feature us as a success story. I told them we "saved 40,000 hours." I calculated that number by multiplying employees by a number I made up. They didn't verify it. They never do. Now we're on Microsoft's website. "Global enterprise achieves 40,000 hours of productivity gains with Copilot." The CEO shared it on LinkedIn. He got 3,000 likes. He's never used Copilot. None of the executives have. We have an exemption. "Strategic focus requires minimal digital distraction." I wrote that policy. The licenses renew next month. I'm requesting an expansion. 5,000 more seats. We haven't used the first 4,000. But this time we'll "drive adoption." Adoption means mandatory training. Training means a 45-minute webinar no one watches. But completion will be tracked. Completion is a metric. Metrics go in dashboards. Dashboards go in board presentations. Board presentations get me promoted. I'll be SVP by Q3. I still don't know what Copilot does. But I know what it's for. It's for showing we're "investing in AI." Investment means spending. Spending means commitment. Commitment means we're serious about the future. The future is whatever I say it is. As long as the graph goes up and to the right.

5K

171K

26K

55K

26M

AlexPipushev retweeted

BNS (e/acc) 🤖🛡️

@Beuselmann

over 1 year ago

Asked my favorite #LLM (o1, r1, flash 2.0, sonnet 3.5) for one truly novel insight about humans. Nothing too groundbreaking—but the best part? They all answer as if they were humans themselves. Are they aligning with us… or 🤖 just good at pretending? (Prompt idea from @adonis_singh ) @OpenAI @GeminiApp @AnthropicAI @deepseek_ai

Beuselmann's tweet photo. Asked my favorite #LLM (o1, r1, flash 2.0, sonnet 3.5) for one truly novel insight about humans.

Nothing too groundbreaking—but the best part? They all answer as if they were humans themselves.

Are they aligning with us… or 🤖 just good at pretending?

(Prompt idea from @adonis_singh )

@OpenAI @GeminiApp @AnthropicAI @deepseek_ai

1

3

1

0

405

AlexPipushev retweeted

Dubai Future Foundation

@DubaiFuture

over 1 year ago

Robotics is shaping the future, and participants at the ROS Winter Camp are right on board. By covering ROS fundamentals, intermediate concepts, and deep-diving into practical applications, they are building skills for tomorrow’s tech-driven world. #DubaiFuture #Robotics #ROS #ROSCamp

0

2

1

0

428

اليكسي @AlexPipushev

about 2 years ago · Dubai

@ViolaSRow @QubixArena POSTMAN

0

2

0

36

اليكسي @AlexPipushev

about 2 years ago · Dubai

@ManshaSaqlain @GtonCapital that's weird plz email me: [email protected]

0

27

AlexPipushev retweeted

Synchron

@synchroninc

over 2 years ago

📢 Landmark day for our patients and the field of #neurotechnology - Rodney has just reached 1,000 days with our #BCI. To our employees, colleagues and industry peers - keep innovating. To our patients and caregivers - we’re with you and we thank you.

1

34

6

1

2K

اليكسي @AlexPipushev

almost 3 years ago

@lex_node “you will be assimilated”

1

0

168

اليكسي @AlexPipushev

almost 3 years ago

@remarkablepaper just started writing my book using #TypeFolio absolutely love it!

2

5

0

2K

اليكسي @AlexPipushev

almost 3 years ago

@chaserchapman it works for every MLM, not only #crypto

0

1

0

167

اليكسي @AlexPipushev

almost 3 years ago · Dubai

@jchervinsky Using this line of reasoning, one could argue that tokenized stocks, futures, bonds, and other "securities," or any other real-world assets (RWAs), including stablecoins, would not render as securities either (in certain circumstances), correct? Asking for a friend.

2

1

0

625

AlexPipushev retweeted

اليكسي @AlexPipushev

almost 3 years ago · Dubai

@abacaj “garbage in - garbage out”

0

11

1

0

1K

اليكسي @AlexPipushev

almost 3 years ago · Dubai

@abacaj “garbage in - garbage out”

0

11

1

0

1K

اليكسي @AlexPipushev

almost 3 years ago · Dubai

@loobah_l AI will do all the Ops

0

1

0

1K

اليكسي @AlexPipushev

almost 3 years ago · Dubai

@Culture_Crit inflation & the rude nature of capitalism

1

5

0

2K

اليكسي

@AlexPipushev

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users