Patrick Shriwise @pshriwise - Twitter Profile

Patrick Shriwise @pshriwise

almost 3 years ago

@DDawiseman21 stream?

0

5

0

40

Patrick Shriwise @pshriwise

almost 3 years ago

Or maybe its teen years

0

145

Patrick Shriwise @pshriwise

almost 3 years ago

“The report also showed how the model ignored requests to follow step-by-step reasoning, and it was less likely to generate code that ran without modifications.” Chat-GPT entering its toddler phase

Santiago

@svpino

almost 3 years ago

Yes, GPT-4 seems to be getting worse. But now we have new information. And well, it's complicated. Yesterday, I posted about a study showing that GPT-4 success rate deciding whether a number is prime went from 97.6% in March to 2.4% in June. The report also showed how the model ignored requests to follow step-by-step reasoning, and it was less likely to generate code that ran without modifications. Hundreds of people replied with their anecdotes. The overwhelming consensus is that GPT-4 is considerably less capable than before. But the study that started the conversation is misleading. They used a dataset of 500 problems and had the model figure out whether a given number was prime. The latest GPT-4 version did much worse than the one from a few months ago, with only 12 correct answers out of 500. But there was an issue: Every one of the 500 integers used in the study was a prime number! They never tested composite numbers. So what happens when you make the same comparison with composite and prime numbers? It turns out that March's GPT-4 is as bad as the June version! In March, GPT-4 answered that most numbers were prime, while the June version answered that most were composite. Since the team behind the study only tested prime numbers, they concluded that GPT-4 is now much worse at determining primality, but that's not the case. Okay, so where do we stand? Funny enough, the apparent conclusion is that GPT-4 sucks at finding whether a number is prime. It didn't get worse; it was never good at it. There's still, however, a large unanswered issue related to the inability of developers to trust these models. We still don't know why the sudden change in behavior between March and June since OpenAI has firmly denied they have changed the model. What's next? OpenAI acknowledged the behavior change, and they are investigating. I hope they publish an explanation behind the drift. I'm also looking forward to a proper versioning system that developers can trust and rely on. This finding doesn't change the overall sentiment from people who overwhelmingly believe the model has worsened. Could this be confirmation bias? Could the honeymoon phase with Large Language Models be over, and people start finding the real problems when building actual applications? What do you think it's going on here?

svpino's tweet photo. Yes, GPT-4 seems to be getting worse.

But now we have new information. And well, it's complicated.

Yesterday, I posted about a study showing that GPT-4 success rate deciding whether a number is prime went from 97.6% in March to 2.4% in June.

The report also showed how the model ignored requests to follow step-by-step reasoning, and it was less likely to generate code that ran without modifications.

Hundreds of people replied with their anecdotes. The overwhelming consensus is that GPT-4 is considerably less capable than before.

But the study that started the conversation is misleading.

They used a dataset of 500 problems and had the model figure out whether a given number was prime. The latest GPT-4 version did much worse than the one from a few months ago, with only 12 correct answers out of 500.

But there was an issue:

Every one of the 500 integers used in the study was a prime number! They never tested composite numbers.

So what happens when you make the same comparison with composite and prime numbers?

It turns out that March's GPT-4 is as bad as the June version! In March, GPT-4 answered that most numbers were prime, while the June version answered that most were composite. Since the team behind the study only tested prime numbers, they concluded that GPT-4 is now much worse at determining primality, but that's not the case.

Okay, so where do we stand?

Funny enough, the apparent conclusion is that GPT-4 sucks at finding whether a number is prime. It didn't get worse; it was never good at it.

There's still, however, a large unanswered issue related to the inability of developers to trust these models. We still don't know why the sudden change in behavior between March and June since OpenAI has firmly denied they have changed the model.

What's next?

OpenAI acknowledged the behavior change, and they are investigating. I hope they publish an explanation behind the drift. I'm also looking forward to a proper versioning system that developers can trust and rely on.

This finding doesn't change the overall sentiment from people who overwhelmingly believe the model has worsened. Could this be confirmation bias? Could the honeymoon phase with Large Language Models be over, and people start finding the real problems when building actual applications?

What do you think it's going on here?

110

989

207

326

515K

1

3

0

531

pshriwise retweeted

Nick Touran

@whatisnuclear

almost 3 years ago

And now, please enjoy this 1958 AEC film 🍿⚛️ that I merely found and re-hosted on YouTube. Please enjoy POWER REACTORS USA, featuring Shippingport, APPR, Yankee Rowe, Indian Point 1, EBWR, Vallecitos, Dresden, the HREs, OMRE, SRE, EBR-1, and Fermi 1! https://t.co/nR3VQMzIfr

4

44

11

6

4K

Who to follow

pshriwise retweeted

Argonne National Lab @argonne

almost 3 years ago

Q&A with Argonne Maria Goeppert Mayer Fellow April Novak - https://t.co/BW5AgkjzEG "It’s a very exciting time to be a nuclear engineer. The last 10 years have been called a “renaissance” for nuclear energy."

argonne's tweet photo. Q&A with Argonne Maria Goeppert Mayer Fellow April Novak - https://t.co/BW5AgkjzEG

"It’s a very exciting time to be a nuclear engineer. The last 10 years have been called a “renaissance” for nuclear energy." https://t.co/gNexR3kxav

0

16

5

0

2K

pshriwise retweeted

Calvin Brown @Hobbes1118

about 3 years ago

I love watching frisbee when the camera is centered on the thrower because its so suspenseful. Who's gonna get open? How is the defense containing the cutters? What offense are they running? It makes for great cinema

Hobbes1118's tweet photo. I love watching frisbee when the camera is centered on the thrower because its so suspenseful. Who's gonna get open? How is the defense containing the cutters? What offense are they running? It makes for great cinema https://t.co/7oySyXQRw6

4

208

7

1

17K

Patrick Shriwise @pshriwise

about 3 years ago

@whatisnuclear Where might one acquire uranium marbles? @nuclearkatie?

1

3

0

475

Patrick Shriwise @pshriwise

about 3 years ago

@HuckRuffner @EvanLepler We lost that game, Evan

0

259

Patrick Shriwise @pshriwise

about 3 years ago

@historyinmemes @colorize_bot

1

0

223

Patrick Shriwise @pshriwise

about 3 years ago

@HISTORY @colorize_bot

1

0

22

pshriwise retweeted

Madison Radicals @MadisonRadicals

over 3 years ago

Welcome back (checks notes, double checks) Brian Hart! Brian last played with the team in 2017. He helped lead the team to 5 straight final four appearances from 2013-2017!

MadisonRadicals's tweet photo. Welcome back (checks notes, double checks) Brian Hart! Brian last played with the team in 2017. He helped lead the team to 5 straight final four appearances from 2013-2017! https://t.co/0gp5v3HiCo

3

42

2

0

12K

pshriwise retweeted

Nate Morrical @NateMorrical

over 3 years ago

Discovering on Vulkan RT on NVIDIA that I can't write to the ray payload structure in anyhit programs... Is this a driver bug? Has anyone here from the NV VKRT camp been able to do this before?

2

1

1K

pshriwise retweeted

Nate Morrical @NateMorrical

over 3 years ago

I'm looking to drum up support for a uint8_t type in HLSL. Is this something folks here would be interested in? If so, could you give a thumbs up / +1 on this github issue? Or even better, possibly chime in potential motivating reasons? https://t.co/CyGjVUVK1s

0

2

1

0

475

pshriwise retweeted

Paul P.H. Wilson⚛ @gonuke

over 3 years ago

We are still accepting applications for three new faculty positions through January 13, 2023!! Please reach out with questions and apply ASAP!

0

3

4

0

2K

Patrick Shriwise @pshriwise

over 3 years ago

@ultimatebecky Monster

0

2

0

Patrick Shriwise @pshriwise

over 3 years ago

@camurphy3 Beans beans beans!!

0

2

0

pshriwise retweeted

Chicago Machine @MachineUltimate

almost 4 years ago

The World Games just proved to the whole globe what we already knew… Nate Goff IS THAT DUDE! Congrats to our captain, our tall guy, and, most importantly, our friend for winning 🥇 this past week.

MachineUltimate's tweet photo. The World Games just proved to the whole globe what we already knew… Nate Goff IS THAT DUDE!

Congrats to our captain, our tall guy, and, most importantly, our friend for winning 🥇 this past week. https://t.co/BufSlajCqd

1

74

3

0

Patrick Shriwise @pshriwise

about 4 years ago

@nuclearkatie Woof

0

1

0

Patrick Shriwise

@pshriwise

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users